Changes between Version 23 and Version 24 of doc/app/palmrun


Ignore:
Timestamp:
Apr 30, 2018 3:30:21 PM (7 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v23 v24  
    209209For running PALM in batch mode you need to include additional options in the {{{palmrun}}} command to specify the system resources requested by the job, and to modify your configuration file. A minimum set of additional {{{palmrun}}} options is
    210210{{{
    211    palmrun  .... -t <cputime>  -X <total number of cores>  -T <MPI tasks per node>  -q <queue>  -m <memory per core>
     211   palmrun  ....-b -h <host configuration> -t <cputime>  -X <total number of cores>  -T <MPI tasks per node>  -q <queue>  -m <memory per core>
    212212}}}
    213213where
     214 * {{{<host configuration>}}} is the configuration file containing your batch mode settings
    214215 * {{{<cputime>}}} is the maximum CPU time (wall clock time) in seconds requested by the batch job
    215216 * {{{<total number of cores>}}} is the total number of CPU cores (not CPUs!) that shall be used for your run
     
    218219 * {{{<memory per core>}}} is the memory in MByte requested by each core
    219220
    220 As first step add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file. It is not necessarily required to create a new file, since you can use the same configuration file for running interactive jobs and batch jobs as well. Let's assume here that you have created a new file {{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system.  Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
     221The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer.
     222
     223Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system.  Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
    221224{{{
    222225BD:#!/bin/bash
     
    230233Batch directive lines in the configuration file must start in the first column with string {{{BD:}}}, immediately followed by the directive of the respective batch system (the PBS directives must e.g. start with {{{#PBS}}} followed by a {{{blank}}}). Strings parenthesized by double curly brackets {{{ {{...}} }}} are variables used in {{{palmrun}}} and will be replaced by their respective values while {{{palmrun}}} creates the batch job file. A complete list of {{{palmrun}}} variables that can be used in batch directives is given in section [wiki:doc/app/batch_directives batch_directives].
    231234
    232 
     235In addition to the batch directives, the configuration file requires further information to be set for using the batch system, which is done by adding / modifying variable assignments in the general form
     236{{{
     237%<variable name> <value>
     238}}}
     239where {{{<variable name>}}} is the name of the Unix environment variable in the {{{palmrun}}} script and {{{<value>}}} is the value to be assigned to this variable. Each assignment must start with a {{{%}}}. A minimum set of variables to be added / modified
     240{{{
     241# to be added
     242%submit_command      /opt/moab/default/bin/msub -E
     243%defaultqueue        small
     244%memory              1500
     245
     246# to be modified
     247%local_jobcatalog    /home/username/job_queue
     248%fast_io_catalog     /gfs2/work
     249%execute_command     aprun  -n {{mpi_tasks}}  -N {{tasks_per_node}}  ./palm
     250}}}
     251Given values are just examples! The automatic installer may have already included these variable settings as comment lines (starting with {{{#}}}). Then just remove the {{{#}}} and provide a proper value.
     252
     253The meaning of these variables is as follows:
     254 * {{{submit_command}}}: Batch system specific command to submit batch jobs plus options which may be required on your system. Please give the full path to the submit command. See your batch system documentation for any details.
     255 * {{{defaultqueue}}}: Name of the queue to be used if the {{{palmrun}}} option {{{-q}}} is omitted. See your batch system documentation for queue names available on your system.
     256 * {{{memory}}}: Memory in MByte requested by each core. If given, this value is used as the default in case that {{{palmrun}}} option {{{-m}}} has not been set.
     257 * {{{local_jobcatalog}}}: Name of the folder where your job protocol file is put after the batch job has been finished. Batch queuing systems usually create a protocol file for each batch job which contains relevant information about all activities performed within the job.
     258 * {{{fast_io_catalog}}}: Folder to be used by {{{palmrun}}}/PALM for temporary I/O files. Since PALM setups with large number of grid points may create very huge files, data should be written to a file system with very fast hard discs or SSD in order to get a good I/O performance. Computer centers typically provide such file systems and you should set your {{{fast_io_catalog}}} to such a file system.
     259 * {{{execute_command}}}: Command to execute PALM (i.e. the executable that has been created by the compiler). It depends on the MPI library and the operating system that is used. See your MPI documentation or information provided by your computing center. Strings {{{ {{mpi_tasks}} }}} and {{{ {{tasks_per_node}} }}} will be replaced by {{{palmrun}}} depending on {{{palmrun}}} options {{{-X}}} and {{{-T}}}.
     260
     261You can find more details in the [wiki:doc/app/palmconfig complete description of the configuration file].
     262
     263Now you may start your first batch job by entering
     264{{{
     265   palmrun  -b -d neutral -h batch -t 5400 -m 1500 -X 48 -T 12 -q medium -a "d3#"
     266}}}
     267Based on these arguments, the environment variables that have been described above will be set by {{{palmrun}}} to:
     268 * {{{ {{job_id}} }}} = neutral.##### \\ where ##### is a five digit random number which is newly created for each job. The {{{job_id}}} is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system.
     269 * {{{ {{cpu_hours}} }}} = 1 \\ calculated from option {{{-t}}}
     270 * {{{ {{cpu_minutes}} }}} = 30  \\ calculated from option {{{-t}}}
     271 * {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}}
     272 * {{{ {{mpi_tasks}} }}} = 48 \\ as given by option {{{-X}}}
     273 * {{{ {{tasks_per_node}} }}} = 12 \\ as given by option {{{-T}}}
     274 * {{{ {{nodes}} }}} = 4 \\ calculated from {{{-X}}} / {{{-T}}}. If {{{-X}}} is not a multiple of {{{-T}}}, {{{nodes}}} is incremented by one, e.g. {{{-X 49 -T 12}}} gives {{{nodes = 5}}}.
     275 * {{{ {{queue}} }}} = medium \\ as given by option {{{-q}}}
     276
     277When you enter the above command for the first time, {{{palmrun}}} will call the script {{{palmbuild}}} to re-compile the PALM code. The compiled code will be put into folder {{{$HOME/palm/current_version/MAKE_DEPOSITORY_batch}}}. Re-compilation is required since {{{palmrun}}} expects a separate make depository for each configuration file (because the configuration files may contain different compiler settings).
     278
     279After confirming the {{{palmrun}}} settings by entering {{{y}}}, following information will be output to the terminal:
     280{{{
     281................
     282}}}
     283The job is now queued and you have to wait until he is finished. A job protocol file with name ... will be put in ...
    233284
    234285