Changes between Version 31 and Version 32 of doc/app/runs


Ignore:
Timestamp:
Apr 11, 2024 6:31:44 AM (7 months ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/runs

    v31 v32  
    22= Job chains / Restart runs =
    33
    4 Batch systems generally limit the CPU time that is allowed to be requested by a job, e.g. to a maximum of 12 hours or 24 hours. If a simulation needs more time to run, it has to be split into several parts/jobs. The first job is called the ''initial'' run or job, the others are called ''restart'' runs/jobs. Together they form a so-called ''job chain''. Restart runs require as input the state of all flow variables as they were calculated in the final time step of the previous run. They need to be output by the previous run into a so-called ''restart-file'' which is a required input file for the (next) restart run.
     4== Automatic Restarts ==
     5
     6Automatic restarts requires that a batch system is running on the respective computer. Batch systems generally limit the CPU time that is allowed to be requested by a job, e.g. to a maximum of 12 hours or 24 hours. If a simulation needs more time to run, it has to be split into several parts/jobs. The first job is called the ''initial'' run or job, the others are called ''restart'' runs/jobs. Together they form a so-called ''job chain''. Restart runs require as input the state of all flow variables as they were calculated in the final time step of the previous run. They need to be output by the previous run into a so-called ''restart-file'' which is a required input file for the (next) restart run.
    57
    68'''[wiki:doc/app/palmrun palmrun]''' allows you to automatically generate job chains and to handle the restart files. Of course, automatic generation does not work if you run PALM in interactive mode.
     
    2931
    3032Only by specifying {{{restart}}} as activation string, PALM is instructed to compute the remaining CPU time after each time step and to stop, if the run is not going to be completed and finished briefly before expiration of this time. Actually the stop takes place when the difference between the available job time (determined by the '''[wiki:doc/app/palmrun palmrun]''' option {{{-t}}}) and the time used by the job so far becomes smaller than the time given by the runtime parameter [https://docs.palm-model.org/latest/Reference/LES_Model/Namelists/#runtime_parameters--termination_time_needed termination_time_needed]. The runtime parameter '''termination_time_needed''' can be used to inform PALM, how much time is required for copying the binary data for restart runs, as well as for other pre- or post-processing steps that are done within the job. Thus, as soon as the remaining job time is less than '''termination_time_needed''', PALM interrupts the time stepping and outputs the restart data to local file/folder [../iofiles#BINOUT BINOUT]. The [https://docs.palm-model.org/latest/Reference/LES_Model/Namelists/#initialization-parameters initialization parameters] are also added to that file. In a last step, PALM creates a flag file with local name {{{CONTINUE_RUN}}}. The presence of this file signals '''[wiki:doc/app/palmrun palmrun]''' that a restart run needs to be generated and initiates and starts a respective job.
     33
     34Restarts at a given simulated time and prescribed time intervals (e.g. every 2h simulated time) can be steered via runtime parameters [https://docs.palm-model.org/latest/Reference/LES_Model/Namelists/#runtime_parameters--restart_time restart_time] and [https://docs.palm-model.org/latest/Reference/LES_Model/Namelists/#runtime_parameters--dt_restart dt_restart].
    3135
    3236Within PALM, the initial phase of a restart run requires different actions than during an initial run. In case of a restart, PALM first needs to read the data written by the preceding run and also reads the initialization parameters from the same file. Therefore, these parameters do not need to be provided in the parameter file (local name [../iofiles#PARIN PARIN]). Anyhow, if they are provided and if their value differ from the respective value of the initial run, these settings are ignored. There is exactly one exception to this rule: the initialization parameter [https://docs.palm-model.org/latest/Reference/LES_Model/Namelists/#initialization_parameters--initializing_actions initializing_actions] determines whether the job is a restart run or an initial run. If '''initializing_actions''' = '' 'read_restart_data','' then it is a restart run, otherwise an initial run. The previous explanation make it clear that the model obviously needs two different parameter files (local name PARIN) for the case of job chains. One is required for the initial run and contains all initialization parameters and the other one is needed for restart runs. The last one only contains the initialization parameter '''initializing_actions''' (any other initialization parameters may appear in this file, but they will be ignored), which needs to be set to '' 'read_restart_data'.'' So you need to provide two different parameter files if you like to carry out restart runs. Since PALM always expects the parameters to be in the local file PARIN, regardless if it is an initial or a restart run, two different file connection statements must be given for that file in the file-connection file. One is active for the initial run only, the other one only for restart runs. The '''[wiki:doc/app/palmrun palmrun]''' call for the initial run shown above activates the first of the two specified file connection statements for PARIN, because the activation string {{{d3#}}} with the option {{{-r}}} coincides with the string in the third column of the file connection statement. Obviously the next statement
     
    5963Concerning the example given above, the initial run creates the permanent file {{{.../abcde_d3d.000}}}, the first restart run uses this file and creates {{{.../abcde_d3d.001}}}, the second restart run creates {{{.../abcde_d3d.002}}} etc. You can still access all files created by the runs after the job chain has finished. For example, this allows you to re-run the model starting from different positions of the job chain by manually calling '''[wiki:doc/app/palmrun palmrun]''' with argument {{{d3r}}}. You also need to remove all file cycles beyond the one you like to start from.
    6064
     65== Manual Restarts ==
    6166
     67The main and obvious difference to automatic restarts is that manual restarts requires to initiate the restarts runs manually via
     68{{{
     69      palmrun ...  -a “d3r restart”
     70}}}
    6271
    6372= Handling of large (restart) files =