Changes between Version 21 and Version 22 of doc/app/runs


Ignore:
Timestamp:
Apr 20, 2021 3:12:53 PM (4 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/runs

    v21 v22  
    2828Giving the activation string {{{restart}}} as argument of option {{{-a}}} is essential. Only for that case the model writes binary data for a restart run to the local file [../iofiles#BINOUT BINOUT] (in case of running on more than 1 core, BINOUT is a folder). The local output file is then saved to a permanent file as defined in the file connection statement for BINOUT. The last line of the example above shows that savinf of the file is only be done of the activation string {{{restart}}} has been set.
    2929
    30 Only by specifying {{{restart}}} as activation string, PALM is instructed to compute the remaining CPU time after each time step and to stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference between the available job time (determined by the '''palmrun''' option {{{-t}}}) and the time used by the job so far becomes smaller than the time given by the runtime parameter [../runtime_parameters#termination_time_needed termination_time_needed]. The runtime parameter '''termination_time_needed''' can be used to inform PALM, how much time is required for copying the binary data for restart runs, as well as for other pre- or post-processing steps that are done within the job. Thus, as soon as the remaining job time is less than '''termination_time_needed''', PALM interrupts the time stepping and outputs the restart data to local file/folder [../iofiles#BINOUT BINOUT]. The [../initialization_parameters initialization parameters] are also added to that file. In a last step, PALM creates a flag file with local name [[CONTINUE_RUN]]. The presence of this file signals '''palmrun''' that a restart run needs to be generated and initiates and starts a respective job.
     30Only by specifying {{{restart}}} as activation string, PALM is instructed to compute the remaining CPU time after each time step and to stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference between the available job time (determined by the '''palmrun''' option {{{-t}}}) and the time used by the job so far becomes smaller than the time given by the runtime parameter [../runtime_parameters#termination_time_needed termination_time_needed]. The runtime parameter '''termination_time_needed''' can be used to inform PALM, how much time is required for copying the binary data for restart runs, as well as for other pre- or post-processing steps that are done within the job. Thus, as soon as the remaining job time is less than '''termination_time_needed''', PALM interrupts the time stepping and outputs the restart data to local file/folder [../iofiles#BINOUT BINOUT]. The [../initialization_parameters initialization parameters] are also added to that file. In a last step, PALM creates a flag file with local name {{{CONTINUE_RUN}}}. The presence of this file signals '''palmrun''' that a restart run needs to be generated and initiates and starts a respective job.
    3131
    3232Within PALM, the initial phase of a restart run requires different actions than during during an initial run. In case of a restart, PALM first needs to read the data written by the preceding run and also reads the initialization parameters from the same file. Therefore, these parameters do not need to be provided in the parameter file (local name [../iofiles#PARIN PARIN]). Anyhow, if they are provided and if their value differ from the respective value of the initial run, these settings are ignored. There is exactly one exception to this rule: the initialization parameter [../initialization_parameters#initializing_actions initializing_actions] determines whether the job is a restart run or an initial run. If '''initializing_actions''' = '' 'read_restart_data','' then it is a restart run, otherwise an initial run. The previous explanation make it clear that the model obviously needs two different parameter files (local name PARIN) for the case of job chains. One is required for the initial run and contains all initialization parameters and the other one is needed for restart runs. The last one only contains the initialization parameter '''initializing_actions''' (any other initialization parameters may appear in this file, but they will be ignored), which needs to be set to '' 'read_restart_data'.'' So you need to provide two different parameter files if you like to carry out restart runs. Since PALM always expects the parameters to be in the local file PARIN, regardless if it is an initial or a restart run, two different file connection statements must be given for that file in the file-connection file. One is active for the initial run only, the other one only for restart runs. The '''palmrun''' call for the initial run shown above activates the first of the two specified file connection statements for PARIN, because the activation string {{{d3#}}} with the option {{{-r}}} coincides with the string in the third column of the file connection statement. Obviously the next statement
     
    3434      PARIN in:tr     d3r   $base_data/$run_identifier/INPUT _p3dr*
    3535}}}
    36 must be active for the restart runs. Given that this statement only gets active if the option {{{-r}}} is given the value {{{d3r}}} and that the '''palmrun''' call for this restart run is produced automatically (thus not by the user), '''palmrun''' obviously has to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} within the call of this restart run. Actually, with restart runs all {{{"#"}}} characters within the strings given for the options {{{-r}}} , {{{-i}}} and {{{-o}}} are replaced by {{{"f"}}}.\\\\
    37 For example, for the initial run the permanent file
     36must be active for the restart runs. Given that this statement only becomes active with option {{{-r "d3r"}}}, and that the '''palmrun''' call for this restart run is generated automatically (thus not yourself), '''palmrun''' obviously needs to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} for the restart run. Actually, with restart runs all {{{"#"}}} characters within the arguments given for options {{{-r}}} are replaced by {{{"r"}}}.
     37
     38This way, folling the above palmrun example, the initial run will use the permanent file
    3839{{{
    3940      ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3d
    4041}}}
    41 and for restart runs the permanent file
     42while restart runs will use
    4243{{{
    4344      ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3dr
    4445}}}
    45 is used. Only with restart runs the local file [../iofiles#BININ BININ] is made available as input file, because the appropriate file connection statement also contains the character string {{{"d3r"}}} in the third column. This is logical and necessary since in BININ the binary data, produced by the model of the preceding job of the chain, are expected and the initial run does not need these data. The permanent names of this input file (local name BININ) and the corresponding output file (local name [../iofiles#BINOUT BINOUT]) are identical and read
     46
     47The binary restart data (see [../iofiles#BININ BININ]) is provided as input file only in case of restart runs, because {{{"d3r"}}} appears as activation string in the respective file connection statement (see the above example). The permanent names of this input file (local name BININ) and the corresponding output file (local name [../iofiles#BINOUT BINOUT]) are identical and read
    4648{{{
    47       ~/palm/current_version/JOBS/abcde/RESTART/abcde_d3d
     49      $restart_data_path/abcde/RESTART/abcde_d3d
    4850}}}
    49 However, after the file produced by the previous job was read in by the model and after the local file BINOUT was produced at the end of the job, the restart job does not overwrite this permanent file ({{{.../abcde_d3d}}}) with the new data. Instead of that, it is examined whether already a permanent file with the name {{{.../abcde_d3d}}} exists when copying the output file (BINOUT) of '''palmrun'''. If this is the case, BINOUT is copied to the file {{{.../abcde_d3d.1}}}. Even if this file is already present, {{{.../abcde_d3d.2}}} is tried etc. For an input file the highest existing cycle of the respective permanent file is copied. In the example above this means: the initial run creates the permanent file {{{.../abcde_d3d}}}, the first restart run uses this file and creates {{{.../abcde_d3d.1}}}, the second restart run creates {{{.../abcde_d3d.2}}} etc. After completion of the job chain the user can still access all files created by the jobs. This makes it possible for the user for example to restart the model run of a certain job of the job chain again.\\\\
    50 Therefore restart jobs can not only be started automatically through '''palmrun''', but also manually by the user. This is necessary e.g. whenever after the end of a job chain it is decided that the simulation must be continued further, because the phenomenon which should be examined did not reach the desired state yet. In such cases the '''palmrun''' options completely correspond to those of the initial call; simply the {{{"#"}}} characters in the arguments of options {{{-r}}}, {{{-i}}} and {{{-o}}} must be replaced by {{{"f"}}}.\\\\
     51However, palmrun does not overwrite the restart data from the previous job with the new data that is output at the end of the current run. Instead of that, the local output file BINOUT is copied to a permanent file with a cycle number suffix, i.e.
     52{{{
     53      $restart_data_path/abcde/RESTART/abcde_d3d.001
     54}}}
     55If a file with that cycle number already exists, it will be incremented and {{{abcde_d3d.002}}} will be created.
     56
     57Concerning the restart data input file, the highest existing cycle of the respective permanent file will be used.
     58
     59Concerning the example given above, the initial run creates the permanent file {{{.../abcde_d3d.000}}}, the first restart run uses this file and creates {{{.../abcde_d3d.001}}}, the second restart run creates {{{.../abcde_d3d.002}}} etc. You can still access all files created by the runs after the job chain has finished. For example, this allows you to re-run the model starting from different positions of the job chain by manually calling palmrun with argument {{{d3r}}}. You also need to remove all file cycles beyond the one you like to start from.
     60
     61
    5162
    5263= Handling of large (restart) files =