Changes between Version 14 and Version 15 of doc/app/runs


Ignore:
Timestamp:
Nov 21, 2018 5:29:29 PM (6 years ago)
Author:
scharf
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/runs

    v14 v15  
    1313      %write_binary true restart
    1414      #
    15       PARIN in:job      d3#   $base_data/$fname/INPUT _p3d
    16       PARIN in:job      d3f   $base_data/$fname/INPUT _p3df
    17       BININ in:loc:lnpe d3f   $base_data/$fname/RESTART _d3d
     15      PARIN in:job      d3#   $base_data/$run_identifier/INPUT _p3d
     16      PARIN in:job      d3r   $base_data/$run_identifier/INPUT _p3dr
     17      BININ in:loc:lnpe d3r   $base_data/$run_identifier/RESTART _d3d
    1818      #
    19       BINOUT out:loc:lnpe  restart  $base_data/$fname/RESTART _d3d
     19      BINOUT out:loc:lnpe  restart  $base_data/$run_identifier/RESTART _d3d
    2020}}}
    2121The '''mrun''' call for the initialization run of the job chain must look as follows:
     
    2525The specification of the environment variable {{{write_binary}}}, which must be assigned the value {{{true}}}, is essential. Only in this case the model writes binary-coded data for a possible restart run to the local file [../iofiles#BINOUT BINOUT] at the end of the run (in case of running on more than 1 core, BINOUT is a directory). Then of course this output file/directory must be stored on a permanent file/directory with an appropriate file connection statement (last line of the example above). As you can see, both instructions (variable declaration and connection statements) are only carried out by '''mrun''', if the character string {{{restart}}} is given for the option {{{-r}}} in the '''mrun''' call. Thus the example above can also be used if no restart runs are intended. In such cases the character string {{{restart}}} with the option {{{-r}}} can simply be omitted.\\\\
    2626Only by the specification of {{{write_binary=true}}} the model is instructed to compute the remaining CPU time after each time step and stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference from the available job time (determined by the '''mrun''' option {{{-t}}}) and the time used so far by the job becomes smaller than the time given by the model variable [../d3par#termination_time_needed termination_time_needed]. The variable '''termination_time_needed''' can be used to determine, how much time is needed for copying the binary data for restart runs, as well as for transfer of result data etc. (as long as this is part of the job). Thus, as soon as the remaining job time is less than '''termination_time_needed''', the model stops the time step procedure and writes the data for a restart run to the local binary file/directory [../iofiles#BINOUT BINOUT]. The so-called [../inipar initialization parameters] are also written to this file. In a last step the model produces another file with the local name [[CONTINUE_RUN]]. The presence of this file signals '''mrun''' that a restart run must be started and initiates the start of an appropriate job.\\\\
    27 During the initial phase of a restart run different actions than during the initial phase of an initial run of the model are required. In this case the model must read in the binary data written by the preceding run at the beginning of the run. Beyond that it also reads the initialization parameters from this file. Therefore these do not need to be indicated in the parameter file (local name [../iofiles#PARIN PARIN]). If they are indicated nevertheless and if their value deviates from their value of the initial run, then this is ignored. There is exactly one exception to this rule: the initialization parameter [../inipar#initializing_actions initializing_actions] determines whether the job is a restart run or an initial run. If '''initializing_actions''' = '' 'read_restart_data','' then it is a restart run, otherwise an initial run. The previous remarks make it clear that the model obviously needs two different parameter files (local name PARIN) for the case of job chains. One is needed for the initial run and contains all initialization parameters set by the user and the other one is needed for restart runs. The last one only contains the initialization parameter '''initializing_actions''' (also, initialization parameters with values different from the initial run may appear in this file, but they will be ignored), which must have the value '' 'read_restart_data'.'' Therefore the user must produce two different parameter files if he wants to operate job chains. Since the model always expects the parameter file on the local file PARIN, two different file connection statements must be given for this file in the configuration file. One may be active only for the initial run, the other one only for restart runs. The '''mrun''' call for the initial run shown above activates the first of the two specified connection statements, because the character string {{{d3#}}} with the option {{{-r}}} coincides with the character string in the third column of the connection statement. Obviously the next statement must be active
     27During the initial phase of a restart runfname different actions than during the initial phase of an initial run of the model are required. In this case the model must read in the binary data written by the preceding run at the beginning of the run. Beyond that it also reads the initialization parameters from this file. Therefore these do not need to be indicated in the parameter file (local name [../iofiles#PARIN PARIN]). If they are indicated nevertheless and if their value deviates from their value of the initial run, then this is ignored. There is exactly one exception to this rule: the initialization parameter [../inipar#initializing_actions initializing_actions] determines whether the job is a restart run or an initial run. If '''initializing_actions''' = '' 'read_restart_data','' then it is a restart run, otherwise an initial run. The previous remarks make it clear that the model obviously needs two different parameter files (local name PARIN) for the case of job chains. One is needed for the initial run and contains all initialization parameters set by the user and the other one is needed for restart runs. The last one only contains the initialization parameter '''initializing_actions''' (also, initialization parameters with values different from the initial run may appear in this file, but they will be ignored), which must have the value '' 'read_restart_data'.'' Therefore the user must produce two different parameter files if he wants to operate job chains. Since the model always expects the parameter file on the local file PARIN, two different file connection statements must be given for this file in the configuration file. One may be active only for the initial run, the other one only for restart runs. The '''mrun''' call for the initial run shown above activates the first of the two specified connection statements, because the character string {{{d3#}}} with the option {{{-r}}} coincides with the character string in the third column of the connection statement. Obviously the next statement must be active
    2828{{{
    29       PARIN in:job d3f  $base_data/$fname/INPUT _p3df
     29      PARIN in:job d3r  $base_data/$run_identifier/INPUT _p3dr
    3030}}}
    31 with the restart runs. Given that this statement only gets active if the option {{{-r}}} is given the value {{{d3f}}} and that the '''mrun''' call for this restart run is produced automatically (thus not by the user), '''mrun''' obviously has to replace {{{"d3#"}}} of the initial run with {{{"d3f"}}} within the call of this restart run. Actually, with restart runs all {{{"#"}}} characters within the strings given for the options {{{-r}}} , {{{-i}}} and {{{-o}}} are replaced by {{{"f"}}}.\\\\
     31with the restart runs. Given that this statement only gets active if the option {{{-r}}} is given the value {{{d3r}}} and that the '''mrun''' call for this restart run is produced automatically (thus not by the user), '''mrun''' obviously has to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} within the call of this restart run. Actually, with restart runs all {{{"#"}}} characters within the strings given for the options {{{-r}}} , {{{-i}}} and {{{-o}}} are replaced by {{{"f"}}}.\\\\
    3232For example, for the initial run the permanent file
    3333{{{
     
    3636and for restart runs the permanent file
    3737{{{
    38       ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3df
     38      ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3dr
    3939}}}
    40 is used. Only with restart runs the local file [../iofiles#BININ BININ] is made available as input file, because the appropriate file connection statement also contains the character string {{{"d3f"}}} in the third column. This is logical and necessary since in BININ the binary data, produced by the model of the preceding job of the chain, are expected and the initial run does not need these data. The permanent names of this input file (local name BININ) and the corresponding output file (local name [../iofiles#BINOUT BINOUT]) are identical and read
     40is used. Only with restart runs the local file [../iofiles#BININ BININ] is made available as input file, because the appropriate file connection statement also contains the character string {{{"d3r"}}} in the third column. This is logical and necessary since in BININ the binary data, produced by the model of the preceding job of the chain, are expected and the initial run does not need these data. The permanent names of this input file (local name BININ) and the corresponding output file (local name [../iofiles#BINOUT BINOUT]) are identical and read
    4141{{{
    4242      ~/palm/current_version/JOBS/abcde/RESTART/abcde_d3d
     
    5454You can tell '''mrun''' to use {{{ln}}} instead of {{{cp}}} by giving the file attribute {{{ln}}} in the respective file connection statement, e.g.:
    5555{{{
    56 BININ   in:loc:lnpe  d3f       $base_data/$fname/RESTART  _d3d
    57 BINOUT  out:loc:lnpe restart   $base_data/$fname/RESTART  _d3d
     56BININ   in:loc:lnpe  d3r       $base_data/$run_identifier/RESTART  _d3d
     57BINOUT  out:loc:lnpe restart   $base_data/$run_identifier/RESTART  _d3d
    5858}}}
    5959However, performing a link requires that the link to a TARGET file with the name LINK_NAME must be located on the same physical file system as the TARGET file. If TARGET file and LINK_NAME are on different file systems, the TARGET file will be copied instead (and the advantage of using the {{{ln}}} command is lost).
     
    6969#
    7070# file connection statements for restart files
    71 BININ   in:loc:lnpe  d3f       $tmp_data_catalog/$fname/RESTART  _d3d
    72 BINOUT  out:loc:lnpe restart   $tmp_data_catalog/$fname/RESTART  _d3d
     71BININ   in:loc:lnpe  d3r       $tmp_data_catalog/$run_identifier/RESTART  _d3d
     72BINOUT  out:loc:lnpe restart   $tmp_data_catalog/$run_identifier/RESTART  _d3d
    7373}}}
    7474Such fast file systems are generally not allowed to store files for a longer time, so the user has to take care for archiving himself.