Changes between Version 6 and Version 7 of doc/app/runs


Ignore:
Timestamp:
Aug 2, 2013 8:59:12 AM (11 years ago)
Author:
maronga
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/runs

    v6 v7  
    22= Initialization and restart runs =
    33
    4 A job started by '''[../../tec/mrun mrun]''' will - according to its requested computing time, its memory size requirement and the number of necessary processing elements (on parallel computers) - be queued by the queuing-system of the remote computer into a suitable job class which fulfills these requirements. Each job class permits only jobs with certain maximum requirements (e.g. the job class {{{cdev}}} on the IBM Regatta "hanni" of the HLRN permits only jobs with no more than 7200 seconds required computing time and with using no more than 32 processing elements). The job classes are important for the scheduling process of the computer. Jobs with small requirements usually come to execution very fast, jobs with higher requirements must wait longer (sometimes several days).\\\\
     4A job started by '''[../../app/jobcontrol mrun]''' will - according to its requested computing time, its memory size requirement and the number of necessary processing elements (on parallel computers) - be queued by the queuing-system of the remote computer into a suitable job class which fulfills these requirements. Each job class permits only jobs with certain maximum requirements (e.g. the job class {{{cdev}}} on the IBM Regatta "hanni" of the HLRN permits only jobs with no more than 7200 seconds required computing time and with using no more than 32 processing elements). The job classes are important for the scheduling process of the computer. Jobs with small requirements usually come to execution very fast, jobs with higher requirements must wait longer (sometimes several days).\\\\
    55Before the start of a model run the user must estimate how much CPU time the model will need for the simulation. The necessary time in seconds has to be indicated with the '''mrun''' option {{{-t}}} and has an influence on the job class into which the job is queued. Due to the fact that the model usually uses a variable time step and thus the number of time steps to be executed and consequently the time needed by the model is not known at the beginning, this can be measured only very roughly in many cases. So it may happen that the model needs more time than indicated for the option {{{-t}}}, which normally leads to an abort of the job as soon as the available CPU time is consumed. In principle one could solve this problem by setting a very generously estimated value for {{{-t}}}, but this will possibly lead to the disadvantage that the queued job has to wait longer for execution.\\\\
    66To avoid this problem '''mrun''' offers the possibility of so-called '''restart runs'''. During the model run PALM continuously examines how much time is left for the execution of the job. If the run is not completed and finished shortly before expiration of this time, the model stops and writes down the values of (nearly) all model variables in binary form to a file (local name [../iofiles#BINOUT BINOUT]). After copying the output files required by the user, '''mrun''' automatically starts a restart run. For this purpose a new '''mrun''' call is set off automatically on the local computer of the user; '''mrun''' thus calls itself. The options with this call correspond to a large extent to those which the user had selected with his initial call of '''mrun'''. The model restarts and this time at the beginning it reads in the binary data written before and continues the run with them. If in this job the CPU time is not sufficient either, in order to terminate the run, at the end of the job another restart run is started, etc., until the time which shall be simulated by the model, is reached. Thus a set of restart runs can develop - a so-called job chain. The first run of this chain (model start at t=0) is called '''initial run'''.\\\\