Changes between Version 44 and Version 45 of doc/app/palmrun


Ignore:
Timestamp:
Nov 20, 2018 4:45:02 PM (6 years ago)
Author:
scharf
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v44 v45  
    210210**Note:** The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer!
    211211
    212 Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system.  Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
    213 {{{
    214 BD:#!/bin/bash
    215 BD:#PBS -N {{job_id}}
    216 BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}}
    217 BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}}
    218 BD:#PBS -o {{job_protocol_file}}
    219 BD:#PBS -j oe
    220 BD:#PBS -q {{queue}}
    221 }}}
    222 Batch directive lines in the configuration file must start in the first column with string {{{BD:}}}, immediately followed by the directive of the respective batch system (the PBS directives must e.g. start with {{{#PBS}}} followed by a {{{blank}}}). Strings parenthesized by double curly brackets {{{ {{...}} }}} are variables used in {{{palmrun}}} and will be replaced by their respective values while {{{palmrun}}} creates the batch job file. A complete list of {{{palmrun}}} variables that can be used in batch directives is given in section [wiki:doc/app/batch_directives batch_directives].
    223 
    224 In addition to the batch directives, the configuration file requires further information to be set for using the batch system, which is done by adding / modifying variable assignments in the general form
    225 {{{
    226 %<variable name> <value>
    227 }}}
    228 where {{{<variable name>}}} is the name of the Unix environment variable in the {{{palmrun}}} script and {{{<value>}}} is the value to be assigned to this variable. Each assignment must start with a {{{%}}}. A minimum set of variables to be added / modified
    229 {{{
    230 # to be added
    231 %submit_command      /opt/moab/default/bin/msub -E
    232 %defaultqueue        small
    233 %memory              1500
    234 
    235 # to be modified
    236 %local_jobcatalog    /home/username/job_queue
    237 %fast_io_catalog     /gfs2/work
    238 %execute_command     aprun  -n {{mpi_tasks}}  -N {{tasks_per_node}}  ./palm
    239 }}}
    240 Given values are just examples! The automatic installer may have already included these variable settings as comment lines (starting with {{{#}}}). Then just remove the {{{#}}} and provide a proper value.
    241 
    242 The meaning of these variables is as follows:
    243  * {{{submit_command}}}: Batch system specific command to submit batch jobs plus options which may be required on your system. Please give the full path to the submit command. See your batch system documentation for any details.
    244  * {{{defaultqueue}}}: Name of the queue to be used if the {{{palmrun}}} option {{{-q}}} is omitted. See your batch system documentation for queue names available on your system.
    245  * {{{memory}}}: Memory in MByte requested by each core. If given, this value is used as the default in case that {{{palmrun}}} option {{{-m}}} has not been set.
    246  * {{{local_jobcatalog}}}: Name of the folder where your job protocol file is put after the batch job has been finished. Batch queuing systems usually create a protocol file for each batch job which contains relevant information about all activities performed within the job.
    247  * {{{fast_io_catalog}}}: Folder to be used by {{{palmrun}}}/PALM for temporary I/O files. Since PALM setups with large number of grid points may create very huge files, data should be written to a file system with very fast hard discs or SSD in order to get a good I/O performance. Computer centers typically provide such file systems and you should set your {{{fast_io_catalog}}} to such a file system.
    248  * {{{execute_command}}}: Command to execute PALM (i.e. the executable that has been created by the compiler). It depends on the MPI library and the operating system that is used. See your MPI documentation or information provided by your computing center. Strings {{{ {{mpi_tasks}} }}} and {{{ {{tasks_per_node}} }}} will be replaced by {{{palmrun}}} depending on {{{palmrun}}} options {{{-X}}} and {{{-T}}}.
    249 
    250 You can find more details in the [wiki:doc/app/palmconfig complete description of the configuration file].
     212Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you cannot use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system. You can find more details in the complete description of the [wiki:doc/app/palm_config configuration file].
    251213
    252214Now you may start your first batch job by entering
     
    254216   palmrun  -b -r neutral -c batch -t 5400 -m 1500 -X 48 -T 12 -q medium -a "d3#"
    255217}}}
    256 Based on these arguments, the environment variables that have been described above will be set by {{{palmrun}}} to:
     218Based on these arguments, the environment variables that have been described [wiki:doc/app/palm_config here] will be set by {{{palmrun}}} to:
    257219 * {{{ {{job_id}} }}} = neutral.##### \\ where ##### is a five digit random number which is newly created for each job. The {{{job_id}}} is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system.
    258  * {{{ {{cpu_hours}} }}} = 1 \\ calculated from option {{{-t}}}
    259  * {{{ {{cpu_minutes}} }}} = 30  \\ calculated from option {{{-t}}}
    260  * {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}}
     220 * {{{ {{cpu_hours}} }}} = 1, {{{ {{cpu_minutes}} }}} = 30  and {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}}
    261221 * {{{ {{mpi_tasks}} }}} = 48 \\ as given by option {{{-X}}}
    262222 * {{{ {{tasks_per_node}} }}} = 12 \\ as given by option {{{-T}}}
     
    285245Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has beeen created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<configuration identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-c}}} and {{{-r}}} (here it will be {{{batch_neutral}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
    286246
    287 Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter [wiki:doc/restarts job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
     247Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter [wiki:doc/app/runs job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
    288248
    289249