Changes between Version 19 and Version 20 of doc/app/runs


Ignore:
Timestamp:
Apr 13, 2021 2:37:41 PM (4 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/runs

    v19 v20  
    1010Before starting a run, you have to estimate how much CPU time your complete simulation will need. The required time in seconds has to be given with '''palmrun''' option {{{-t}}}. Due to the fact that the model uses a variable time step by default, the number of time steps to be carried out and consequently the time required to finish the simulation can often only roughly be estimated. So it may happen that more time is needed to finish the simulation than indicated by option {{{-t}}}. That will normally cause the job scheduler to terminate the job as soon as the available CPU time is consumed. In principle, you may avoid this problem by setting a very generously estimated value for {{{-t}}}, but the maximum allowed CPU-time is often limited due to job class restrictions.
    1111
    12 To avoid this problem '''palmrun''' offers the possibility of so-called '''restart runs'''. During the model run PALM continuously examines how much time is left for the execution of the job. If the run is not completed and finished shortly before expiration of this time, the model stops and writes the values of (nearly) all model variables (especially the 3d-prognostic quantities) in binary form to a file (local name [../iofiles#BINOUT BINOUT]). After copying the output files requested by the user, '''palmrun''' automatically starts a restart run. For this purpose a new '''palmrun''' call is set off automatically on the local computer of the user; '''palmrun''' thus calls itself. The options with this call correspond to a large extent to those which the user had selected with his initial call of '''palmrun'''. The model restarts and this time at the beginning it reads in the binary data written before and continues the run with them. If in this job the CPU time is not sufficient either, in order to terminate the run, at the end of the job another restart run is started, etc., until the time which shall be simulated by the model, is reached. Thus a set of restart runs can develop - a so-called job chain. The first run of this chain (model start at t=0) is called '''initial run'''.\\\\
    13 Working with restart runs and their generation through '''palmrun''' requires certain entries in the palmrun-configuration file and in the parameter file, which are described and explained in the following. The configuration file should contain the following entries:
     12'''Restart runs''' are the method to circumvent these job class restrictions. During the time stepping, PALM is able to continuously check how much time is left for the execution of the job. If the run can not be completed and finished before expiration of this time, the PALM stops and outputs (nearly) all model variables (especially the 3d-prognostic quantities) in binary format to a file (or folder) with local name [../iofiles#BINOUT BINOUT]). After the local output files have been saved, '''palmrun''' automatically generates a restart run. For this purpose a new '''palmrun''' call is automatically initiated, i.e. '''palmrun''' recursively calls itself. The '''palmrun''' options of this call correspond to those of the initial call. PALM restarts and this time, at the beginning, it reads the binary data that have been written by the previous run, and continues the run based on this final state of the previous run. If the simulation still cannot be finished, another restart run is generated, etc., until the time to be simulated is reached (this is the one set via parameter {{{end_time}}}). This way a whole set of restart runs may be generated - a so-called job chain.
     13
     14Restart runs require certain entries in the file-connection file (see [source:palm/trunk/SCRIPTS/.palm.iofiles .palm.iofiles], and its [wiki:doc/app/palm_iofiles description]) and in the parameter file, which will be described and explained now.
     15
     16The following entries are important and are already contained in the default file-connection file:
    1417{{{
    15       %write_binary true restart
     18      PARIN in:tr     d3#   $base_data/$run_identifier/INPUT _p3d*
     19      PARIN in:tr     d3r   $base_data/$run_identifier/INPUT _p3dr*
     20      BININ in:lnpe   d3r   $restart_data_path/$run_identifier/RESTART _d3d*
    1621      #
    17       PARIN in:job      d3#   $base_data/$run_identifier/INPUT _p3d
    18       PARIN in:job      d3r   $base_data/$run_identifier/INPUT _p3dr
    19       BININ in:loc:lnpe d3r   $base_data/$run_identifier/RESTART _d3d
    20       #
    21       BINOUT out:loc:lnpe  restart  $base_data/$run_identifier/RESTART _d3d
     22      BINOUT* out:lnpe  restart  $restart_data_path/$run_identifier/RESTART _d3d
    2223}}}
    23 The '''palmrun''' call for the initialization run of the job chain must look as follows:
     24The '''palmrun''' call for the initialization run of the job chain reads:
    2425{{{
    25       palmrun -h <any histname> -d abcde -t 900 -r “d3# restart”
     26      palmrun -c <any histname> -d abcde -t 900 -r “d3# restart”
    2627}}}
    2728The specification of the environment variable {{{write_binary}}}, which must be assigned the value {{{true}}}, is essential. Only in this case the model writes binary-coded data for a possible restart run to the local file [../iofiles#BINOUT BINOUT] at the end of the run (in case of running on more than 1 core, BINOUT is a directory). Then of course this output file/directory must be stored on a permanent file/directory with an appropriate file connection statement (last line of the example above). As you can see, both instructions (variable declaration and connection statements) are only carried out by '''palmrun''', if the character string {{{restart}}} is given for the option {{{-r}}} in the '''palmrun''' call. Thus the example above can also be used if no restart runs are intended. In such cases the character string {{{restart}}} with the option {{{-r}}} can simply be omitted.\\\\