30 | | Only by specifying {{{restart}}} as activation string, PALM is instructed to compute the remaining CPU time after each time step and to stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference between the available job time (determined by the '''palmrun''' option {{{-t}}}) and the time used by the job so far becomes smaller than the time given by the runtime parameter [../runtime_parameters#termination_time_needed termination_time_needed]. The runtime parameter '''termination_time_needed''' can be used to inform PALM, how much time is required for copying the binary data for restart runs, as well as for other pre- or post-processing steps that are done within the job. Thus, as soon as the remaining job time is less than '''termination_time_needed''', PALM interrupts the time stepping and outputs the restart data to local file/folder [../iofiles#BINOUT BINOUT]. The [../initialization_parameters initialization parameters] are also added to that file. In a last step, PALM creates a flag file with local name [[CONTINUE_RUN]]. The presence of this file signals '''palmrun''' that a restart run needs to be generated and initiates and starts a respective job. |
| 30 | Only by specifying {{{restart}}} as activation string, PALM is instructed to compute the remaining CPU time after each time step and to stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference between the available job time (determined by the '''palmrun''' option {{{-t}}}) and the time used by the job so far becomes smaller than the time given by the runtime parameter [../runtime_parameters#termination_time_needed termination_time_needed]. The runtime parameter '''termination_time_needed''' can be used to inform PALM, how much time is required for copying the binary data for restart runs, as well as for other pre- or post-processing steps that are done within the job. Thus, as soon as the remaining job time is less than '''termination_time_needed''', PALM interrupts the time stepping and outputs the restart data to local file/folder [../iofiles#BINOUT BINOUT]. The [../initialization_parameters initialization parameters] are also added to that file. In a last step, PALM creates a flag file with local name {{{CONTINUE_RUN}}}. The presence of this file signals '''palmrun''' that a restart run needs to be generated and initiates and starts a respective job. |
36 | | must be active for the restart runs. Given that this statement only gets active if the option {{{-r}}} is given the value {{{d3r}}} and that the '''palmrun''' call for this restart run is produced automatically (thus not by the user), '''palmrun''' obviously has to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} within the call of this restart run. Actually, with restart runs all {{{"#"}}} characters within the strings given for the options {{{-r}}} , {{{-i}}} and {{{-o}}} are replaced by {{{"f"}}}.\\\\ |
37 | | For example, for the initial run the permanent file |
| 36 | must be active for the restart runs. Given that this statement only becomes active with option {{{-r "d3r"}}}, and that the '''palmrun''' call for this restart run is generated automatically (thus not yourself), '''palmrun''' obviously needs to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} for the restart run. Actually, with restart runs all {{{"#"}}} characters within the arguments given for options {{{-r}}} are replaced by {{{"r"}}}. |
| 37 | |
| 38 | This way, folling the above palmrun example, the initial run will use the permanent file |
49 | | However, after the file produced by the previous job was read in by the model and after the local file BINOUT was produced at the end of the job, the restart job does not overwrite this permanent file ({{{.../abcde_d3d}}}) with the new data. Instead of that, it is examined whether already a permanent file with the name {{{.../abcde_d3d}}} exists when copying the output file (BINOUT) of '''palmrun'''. If this is the case, BINOUT is copied to the file {{{.../abcde_d3d.1}}}. Even if this file is already present, {{{.../abcde_d3d.2}}} is tried etc. For an input file the highest existing cycle of the respective permanent file is copied. In the example above this means: the initial run creates the permanent file {{{.../abcde_d3d}}}, the first restart run uses this file and creates {{{.../abcde_d3d.1}}}, the second restart run creates {{{.../abcde_d3d.2}}} etc. After completion of the job chain the user can still access all files created by the jobs. This makes it possible for the user for example to restart the model run of a certain job of the job chain again.\\\\ |
50 | | Therefore restart jobs can not only be started automatically through '''palmrun''', but also manually by the user. This is necessary e.g. whenever after the end of a job chain it is decided that the simulation must be continued further, because the phenomenon which should be examined did not reach the desired state yet. In such cases the '''palmrun''' options completely correspond to those of the initial call; simply the {{{"#"}}} characters in the arguments of options {{{-r}}}, {{{-i}}} and {{{-o}}} must be replaced by {{{"f"}}}.\\\\ |
| 51 | However, palmrun does not overwrite the restart data from the previous job with the new data that is output at the end of the current run. Instead of that, the local output file BINOUT is copied to a permanent file with a cycle number suffix, i.e. |
| 52 | {{{ |
| 53 | $restart_data_path/abcde/RESTART/abcde_d3d.001 |
| 54 | }}} |
| 55 | If a file with that cycle number already exists, it will be incremented and {{{abcde_d3d.002}}} will be created. |
| 56 | |
| 57 | Concerning the restart data input file, the highest existing cycle of the respective permanent file will be used. |
| 58 | |
| 59 | Concerning the example given above, the initial run creates the permanent file {{{.../abcde_d3d.000}}}, the first restart run uses this file and creates {{{.../abcde_d3d.001}}}, the second restart run creates {{{.../abcde_d3d.002}}} etc. You can still access all files created by the runs after the job chain has finished. For example, this allows you to re-run the model starting from different positions of the job chain by manually calling palmrun with argument {{{d3r}}}. You also need to remove all file cycles beyond the one you like to start from. |
| 60 | |
| 61 | |