= Job chains / Restart runs =

Batch systems generally limit the CPU time that is allowed to be requested by a job, e.g. to a maximum of 12 hours or 24 hours. If a simulation needs more time to run, it has to be split into several parts/jobs. The first job is called the ''initial'' run or job, the others are called ''restart'' runs/jobs. Together they form a so-called ''job chain''. Restart runs require as input the state of all flow variables as they were calculated in the final time step of the previous run. They need to be output by the previous run into a so-called ''restart-file'' which is a required input file for the (next) restart run.

{{{palmrun}}} allows you to automatically generate job chains and to handle the restart files. Of course, automatic generation does not work if you run PALM in interactive mode. 

A job started by '''[../../app/palmrun palmrun]''' will be queued by the queuing-system of the local or remote computer into a suitable job class which fulfills the requirements that are set via {{{palmrun}}} options {{{-t}}}, {{{-m}}}, {{{-X}}}, and {{{-T}}}, which define the requested CPU-time, memory, and number of cores. Each job class permits only jobs with certain maximum requirements (e.g. the allowed CPU time or the maximum number of cores that can be used by the job). Some queuing systems automatically sort the jobs into the respective job class, others require an explicit setting of the class for which the job shall be queued. You can set the job class via {{{palmrun}}} option {{{-q}}}. The job classes are important for the scheduling process of the computer. Jobs with small requirements usually come to execution very fast, jobs with larger requirements must wait longer (sometimes several days). Be aware that the available job classes vary a lot among different computer centers.

Before starting a run, you have to estimate how much CPU time your complete simulation will need. The required time in seconds has to be given with '''palmrun''' option {{{-t}}}. Due to the fact that the model uses a variable time step by default, the number of time steps to be carried out and consequently the time required to finish the simulation can often only roughly be estimated. So it may happen that more time is needed to finish the simulation than indicated by option {{{-t}}}. That will normally cause the job scheduler to terminate the job as soon as the available CPU time is consumed. In principle, you may avoid this problem by setting a very generously estimated value for {{{-t}}}, but the maximum allowed CPU-time is often limited due to job class restrictions.

'''Restart runs''' are the method to circumvent these job class restrictions. During the time stepping, PALM is able to continuously check how much time is left for the execution of the job. If the run can not be completed and finished before expiration of this time, the PALM stops and outputs (nearly) all model variables (especially the 3d-prognostic quantities) in binary format to a file (or folder) with local name [../iofiles#BINOUT BINOUT]). After the local output files have been saved, '''palmrun''' automatically generates a restart run. For this purpose a new '''palmrun''' call is automatically initiated, i.e. '''palmrun''' recursively calls itself. The '''palmrun''' options of this call correspond to those of the initial call. PALM restarts and this time, at the beginning, it reads the binary data that have been written by the previous run, and continues the run based on this final state of the previous run. If the simulation still cannot be finished, another restart run is generated, etc., until the time to be simulated is reached (this is the one set via parameter {{{end_time}}}). This way a whole set of restart runs may be generated - a so-called job chain.

Restart runs require certain entries in the file-connection file (see [source:palm/trunk/SCRIPTS/.palm.iofiles .palm.iofiles], and its [wiki:doc/app/palm_iofiles description]) and in the parameter file, which will be described and explained now.

The following entries are important and are already contained in the default file-connection file: 
{{{
      PARIN in:tr     d3#   $base_data/$run_identifier/INPUT _p3d*
      PARIN in:tr     d3r   $base_data/$run_identifier/INPUT _p3dr*
      BININ in:lnpe   d3r   $restart_data_path/$run_identifier/RESTART _d3d*
      #
      BINOUT* out:lnpe  restart  $restart_data_path/$run_identifier/RESTART _d3d
}}}
The '''palmrun''' call for the initialization run of the job chain reads:
{{{
      palmrun -c <any histname> -d abcde -t 900 -r “d3# restart”
}}}
The specification of the environment variable {{{write_binary}}}, which must be assigned the value {{{true}}}, is essential. Only in this case the model writes binary-coded data for a possible restart run to the local file [../iofiles#BINOUT BINOUT] at the end of the run (in case of running on more than 1 core, BINOUT is a directory). Then of course this output file/directory must be stored on a permanent file/directory with an appropriate file connection statement (last line of the example above). As you can see, both instructions (variable declaration and connection statements) are only carried out by '''palmrun''', if the character string {{{restart}}} is given for the option {{{-r}}} in the '''palmrun''' call. Thus the example above can also be used if no restart runs are intended. In such cases the character string {{{restart}}} with the option {{{-r}}} can simply be omitted.\\\\
Only by the specification of {{{write_binary=true}}} the model is instructed to compute the remaining CPU time after each time step and stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference from the available job time (determined by the '''palmrun''' option {{{-t}}}) and the time used so far by the job becomes smaller than the time given by the model variable [../runtime_parameters#termination_time_needed termination_time_needed]. The variable '''termination_time_needed''' can be used to determine, how much time is needed for copying the binary data for restart runs, as well as for transfer of result data etc. (as long as this is part of the job). Thus, as soon as the remaining job time is less than '''termination_time_needed''', the model stops the time step procedure and writes the data for a restart run to the local binary file/directory [../iofiles#BINOUT BINOUT]. The so-called [../initialization_parameters initialization parameters] are also written to this file. In a last step the model produces another file with the local name [[CONTINUE_RUN]]. The presence of this file signals '''palmrun''' that a restart run must be started and initiates the start of an appropriate job.\\\\
During the initial phase of a restart runfname different actions than during the initial phase of an initial run of the model are required. In this case the model must read in the binary data written by the preceding run at the beginning of the run. Beyond that it also reads the initialization parameters from this file. Therefore these do not need to be indicated in the parameter file (local name [../iofiles#PARIN PARIN]). If they are indicated nevertheless and if their value deviates from their value of the initial run, then this is ignored. There is exactly one exception to this rule: the initialization parameter [../initialization_parameters#initializing_actions initializing_actions] determines whether the job is a restart run or an initial run. If '''initializing_actions''' = '' 'read_restart_data','' then it is a restart run, otherwise an initial run. The previous remarks make it clear that the model obviously needs two different parameter files (local name PARIN) for the case of job chains. One is needed for the initial run and contains all initialization parameters set by the user and the other one is needed for restart runs. The last one only contains the initialization parameter '''initializing_actions''' (also, initialization parameters with values different from the initial run may appear in this file, but they will be ignored), which must have the value '' 'read_restart_data'.'' Therefore the user must produce two different parameter files if he wants to operate job chains. Since the model always expects the parameter file on the local file PARIN, two different file connection statements must be given for this file in the configuration file. One may be active only for the initial run, the other one only for restart runs. The '''palmrun''' call for the initial run shown above activates the first of the two specified connection statements, because the character string {{{d3#}}} with the option {{{-r}}} coincides with the character string in the third column of the connection statement. Obviously the next statement must be active
{{{
      PARIN in:job d3r  $base_data/$run_identifier/INPUT _p3dr
}}}
with the restart runs. Given that this statement only gets active if the option {{{-r}}} is given the value {{{d3r}}} and that the '''palmrun''' call for this restart run is produced automatically (thus not by the user), '''palmrun''' obviously has to replace {{{"d3#"}}} of the initial run with {{{"d3r"}}} within the call of this restart run. Actually, with restart runs all {{{"#"}}} characters within the strings given for the options {{{-r}}} , {{{-i}}} and {{{-o}}} are replaced by {{{"f"}}}.\\\\
For example, for the initial run the permanent file
{{{
      ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3d
}}}
and for restart runs the permanent file
{{{
      ~/palm/current_version/JOBS/abcde/INPUT/abcde_p3dr
}}}
is used. Only with restart runs the local file [../iofiles#BININ BININ] is made available as input file, because the appropriate file connection statement also contains the character string {{{"d3r"}}} in the third column. This is logical and necessary since in BININ the binary data, produced by the model of the preceding job of the chain, are expected and the initial run does not need these data. The permanent names of this input file (local name BININ) and the corresponding output file (local name [../iofiles#BINOUT BINOUT]) are identical and read
{{{
      ~/palm/current_version/JOBS/abcde/RESTART/abcde_d3d
}}}
However, after the file produced by the previous job was read in by the model and after the local file BINOUT was produced at the end of the job, the restart job does not overwrite this permanent file ({{{.../abcde_d3d}}}) with the new data. Instead of that, it is examined whether already a permanent file with the name {{{.../abcde_d3d}}} exists when copying the output file (BINOUT) of '''palmrun'''. If this is the case, BINOUT is copied to the file {{{.../abcde_d3d.1}}}. Even if this file is already present, {{{.../abcde_d3d.2}}} is tried etc. For an input file the highest existing cycle of the respective permanent file is copied. In the example above this means: the initial run creates the permanent file {{{.../abcde_d3d}}}, the first restart run uses this file and creates {{{.../abcde_d3d.1}}}, the second restart run creates {{{.../abcde_d3d.2}}} etc. After completion of the job chain the user can still access all files created by the jobs. This makes it possible for the user for example to restart the model run of a certain job of the job chain again.\\\\
Therefore restart jobs can not only be started automatically through '''palmrun''', but also manually by the user. This is necessary e.g. whenever after the end of a job chain it is decided that the simulation must be continued further, because the phenomenon which should be examined did not reach the desired state yet. In such cases the '''palmrun''' options completely correspond to those of the initial call; simply the {{{"#"}}} characters in the arguments of options {{{-r}}}, {{{-i}}} and {{{-o}}} must be replaced by {{{"f"}}}.\\\\

= Handling of large (restart) files =

In case of very large files, the copy of data from and to '''palmrun's''' temporary working directory may need a long time. The CPU cores requested for the job run idle during that time and may consume significant amount of the job time without doing anything. The time required for copying can be spared by using a file link instead of copying the data.
{{{
   cp large_local_file  large_permanent file                                 # may take long time
   ln existing_large_local_TARGET_file  LINK_NAME_to_large_local_file        # is done immediately, i.e. requires almost no time
}}}
You can tell '''palmrun''' to use {{{ln}}} instead of {{{cp}}} by giving the file attribute {{{ln}}} in the respective file connection statement, e.g.:
{{{
BININ   in:loc:lnpe  d3r       $base_data/$run_identifier/RESTART  _d3d
BINOUT  out:loc:lnpe restart   $base_data/$run_identifier/RESTART  _d3d
}}}
However, performing a link requires that the link to a TARGET file with the name LINK_NAME must be located on the same physical file system as the TARGET file. If TARGET file and LINK_NAME are on different file systems, the TARGET file will be copied instead (and the advantage of using the {{{ln}}} command is lost).

Most computing centers provide a file systems for fast I/O and this should be used as '''palmrun's''' temporary working directory, which can be set in the configuration file by environment variable {{{tmp_user_catalog}}}. Since the LINK_NAME should be on the same file system, the user should provide a directory on that file system for storing the large files. Respective settings in the configuration file could be (example for Cray-XC40 at HLRN):
{{{
#
# folder in which palmrun's temporary working catalog is created (will be deleted after end of job)
%tmp_user_catalog    /gfs2/work/<replace by username>     lccrayh parallel
#
# folder in which large binary files shall be stored
%tmp_data_catalog    /gfs2/work/<replace by username>     lccrayh parallel
#
# file connection statements for restart files
BININ   in:loc:lnpe  d3r       $tmp_data_catalog/$run_identifier/RESTART  _d3d
BINOUT  out:loc:lnpe restart   $tmp_data_catalog/$run_identifier/RESTART  _d3d
}}}
Such fast file systems are generally not allowed to store files for a longer time, so the user has to take care for archiving himself.