Changes between Version 24 and Version 25 of doc/app/palmrun


Ignore:
Timestamp:
May 22, 2018 12:50:45 PM (7 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v24 v25  
    201201== How to run PALM in batch mode
    202202
    203 Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. It creates a batch job (a file containing directives for a queuing-system plus commands to run PALM) which is either submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your configuration file {{{.palm.config....}}}, and that a batch system (e.g. PBS or ...) is installed on the respective computer.
     203Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your configuration file {{{.palm.config....}}}, and that a batch system (e.g. PBS or ...) is installed on the respective computer.
    204204
    205205=== Running PALM in batch on a local computer
     
    221221The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer.
    222222
    223 Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system.  Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
     223Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system.  Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
    224224{{{
    225225BD:#!/bin/bash
     
    279279After confirming the {{{palmrun}}} settings by entering {{{y}}}, following information will be output to the terminal:
    280280{{{
    281 ................
    282 }}}
    283 The job is now queued and you have to wait until he is finished. A job protocol file with name ... will be put in ...
    284 
     281 >>> everything o.k. (y/n) ?  y
     282
     283 ***  batch-job will be created and submitted
     284
     285  *** creating executable and other sources
     286  *** nothing to compile for this run
     287  *** executable and other sources created
     288  *** input files have been copied
     289 
     290 *** submit the job (output of submit command, e.g. the job-id, may follow)
     291<<<submit message from batch system>>>
     292
     293 --> palmrun finished
     294
     295}}}
     296Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has beeen created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<host identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-h}}} and {{{-d}}} (here it will be {{{batch_neutral}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
     297
     298Typically, batch systems allow you to run jobs only for limited time, e.g. 12 hours. See chapter [wiki:doc/restarts job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
     299
     300 
     301=== Running PALM in batch on a remote computer
    285302
    286303