Changes between Version 48 and Version 49 of doc/app/palmrun


Ignore:
Timestamp:
Nov 23, 2018 11:52:48 AM (6 years ago)
Author:
kanani
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v48 v49  
    11= The PALM run script=
    22
    3 The main script to execute PALM is called {{{palmrun}}}. This chapter describes the actions / operations carried out by {{{palmrun}}} and gives a [#options complete list and description of its available options].
    4 
    5 
    6 PALM can be run in different modes:
    7 * [#interactive interactive mode] | PALM executes (almost) immediately within your terminal session after entering the {{{palmrun}}} command.
    8 * [#batch batch mode] | PALM job is submitted by {{{palmrun}}} to a queuing/batch system (e.g. PBS, ...), where it is scheduled for execution.
     3The main script to execute PALM is called {{{palmrun}}}. This chapter describes the actions carried out by {{{palmrun}}} and gives a complete list and description of its available [#options options].
     4
     5PALM can run in different modes:\\
     6 [#interactive Interactive mode]:: PALM executes (almost) immediately within your terminal session after entering the {{{palmrun}}} command.
     7 [#batch Batch mode]:: PALM job is submitted by {{{palmrun}}} to a queuing/batch system (e.g. PBS, SLURM, ...), where it is scheduled for execution.
    98
    109A batch system is a must-have on high-performance computers, and a nice-to-have for computers that are shared among a larger number of users.
    1110The handling of PALM differs between interactive and batch mode, and it slightly varies, depending if the PALM job is submitted to the
    12 * [#batch_local local computer/host] | The computer that you are currently sitting at or are logged in via your terminal (ssh).
    13 * [#batch_remote remote computer/host] | Any computer with a batch system, that you have ssh access to, but are not logged in at the moment. The remote host becomes your local host as soon as you log in to the remote host via ssh.
     11 [#batch_local Local host]:: The computer that you are currently sitting at or are logged in via your terminal (ssh).
     12 [#batch_remote Remote host]:: Any computer with a batch system, that you have ssh access to, but are not logged in at the moment. The remote host becomes your local host as soon as you log in to the remote host via ssh.
    1413
    1514== [=#interactive Interactive mode] ==
    16 You can follow the progress of the simulation on the terminal where a lot of informative messages will be output. You can also stop the simulation at any time by typing {{{Ctrl+C}}}.
    17 
    18 The following instructions assume, that the [wiki:doc/install/automatic automatic installer] has run without any problems. You should now be able to start the first PALM simulation yourself. Please enter
    19 {{{
    20    palmrun  -r example_cbl  -c default  -a "d3#"  -X4
    21 }}}
    22 
    23 After entering the {{{palmrun}}} command, some general settings will be listed on the terminal and the user is prompted for confirmation:
     15The following instructions assume, that the [wiki:doc/install/automatic automatic installer] has installed PALM without any problems. You should now be able to start the first PALM simulation yourself. Please enter
     16{{{
     17   palmrun  -r example_cbl  -c default  -a "d3#"  -X 4
     18}}}
     19
     20You can follow the progress of the simulation on the terminal where a lot of informative messages will be output. You can also stop the simulation at any time by typing {{{Ctrl+C}}}.
     21Some general settings will be listed on the terminal and the user is prompted for confirmation:
    2422{{{
    2523*** palmrun  1.0 Rev: 3151 $
     
    3028
    3129  *** INFORMATIVE: additional source code directory
    32       "/home/raasch/palm/current_version/JOBS/example_cbl/USER_CODE"
     30      "/home/<local_username>/palm/current_version/JOBS/example_cbl/USER_CODE"
    3331      does not exist or is not a directory.
    3432      No source code will be used from this directory!
     
    3836| PALM code    Rev: 3209                                                 |
    3937|                                                                        |
    40 | called on:               bora                                          |
    41 | config. identifier:      imuk (execute on IP: 130.75.105.103)          |
     38| called on:               <local host name>                             |
     39| config. identifier:      imuk (execute on IP: 111.11.111.111)          |
    4240| running in:              interactive run mode                          |
    4341| number of cores:         4                                             |
    4442| tasks per node:          4 (number of nodes: 1)                        |
    4543|                                                                        |
    46 | cpp directives:          -cpp -D__parallel -DMPI_REAL=MPI_DOUBLE_PRECI |
    47 |                          SION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__ff |
    48 |                          tw -D__netcdf                                 |
    49 | compiler options:        -fpe0 -O3 -xHost -fp-model source -ftz -no-pr |
    50 |                          ec-div -no-prec-sqrt -ip -I /muksoft/packages |
    51 |                          /fftw/3.3.4/include -L/muksoft/packages/fftw/ |
    52 |                          3.3.4/lib64 -lfftw3 -I /muksoft/packages/netc |
    53 |                          df/4_intel/include -L/muksoft/packages/netcdf |
    54 |                          /4_intel/lib -lnetcdf -lnetcdff               |
    55 | linker options:          -fpe0 -O3 -xHost -fp-model source -ftz -no-pr |
    56 |                          ec-div -no-prec-sqrt -ip -I /muksoft/packages |
    57 |                          /fftw/3.3.4/include -L/muksoft/packages/fftw/ |
    58 |                          3.3.4/lib64 -lfftw3 -I /muksoft/packages/netc |
    59 |                          df/4_intel/include -L/muksoft/packages/netcdf |
    60 |                          /4_intel/lib -lnetcdf -lnetcdff               |
     44| cpp directives:          -cpp -D__parallel ...                         |
     45| compiler options:        -fpe0 -O3 -xHost -fp-model source ...         |
     46| linker options:          -fpe0 -O3 -xHost -fp-model source ...         |
    6147|                                                                        |
    6248| run identifier:          example_cbl                                   |
     
    6652 >>> everything o.k. (y/n) ?
    6753}}}
    68 Listed settings are determined by the {{{palmrun}}} options and settings in the configuration file (here {{{.palm.config.default}}}). Entering {{{n}}} will abort {{{palmrun}}}. Entering {{{y}}} will finally start execution of PALM and a larger number of informative messages will appear on the terminal:
    69 {{{
    70  ***  PALMRUN will now continue to execute on this machine
     54
     55Listed settings are determined by the {{{palmrun}}} options and settings in the [wiki:doc/app/palm_config configuration file] (here {{{.palm.config.default}}}).\\
     56 Entering {{{n}}}:: aborts {{{palmrun}}}\\
     57 Entering {{{y}}}:: starts execution of PALM, and som more informative messages will appear on the terminal.\\
     58
     59{{{
     60***  PALMRUN will now continue to execute on this machine
    7161
    7262  *** creating executable and other sources for the local host
     
    9484  ----------------------------------------------------------------------------
    9585
    96   *** running on: bora bora bora bora
     86  *** running on: hostname hostname hostname hostname
    9787  *** execute command:
    9888      "mpiexec -machinefile hostfile -n 4 palm"
     
    118108  *** execution finished
    119109}}}
    120 In case that {{{palmrun}}} has proceeded to this point ({{{finished time stepping}}} and {{{execution finished}}}) without giving warning- or error-messages, the PALM simulation has finished successfully. The displayed progress bar ({{{xxxxx}}}}) allows you to estimate how long the run still needs to finish.
     110In case that {{{palmrun}}} has proceeded to this point ({{{finished time stepping}}} and {{{execution finished}}}) without giving warning- or error-messages, the PALM simulation has finished successfully. The displayed progress bar ({{{xxxxx}}}) allows you to estimate how long the run still needs to finish.
    121111
    122112Subsequent messages give information about post processing and copying of output data:
     
    149139  ----------------------------------------------------------------------------
    150140  >>> OUTPUT: RUN_CONTROL  to
    151               /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_rc
     141              /home/<local_username>/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_rc
    152142
    153143  >>> OUTPUT: HEADER  to
    154               /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_header
     144              /home/<local_username>/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_header
    155145
    156146  >>> OUTPUT: CPU_MEASURES  to
    157               /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_cpu
     147              /home/<local_username>/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_cpu
    158148
    159149  >>> OUTPUT: DATA_1D_PR_NETCDF  to
    160               /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_pr.nc
     150              /home/<local_username>/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_pr.nc
    161151
    162152  >>> OUTPUT: DATA_1D_TS_NETCDF  to
    163               /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_ts.nc
     153              /home/<local_username>/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_ts.nc
    164154
    165155  >>> OUTPUT: DATA_2D_XY_NETCDF  to
    166               /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xy.nc
     156              /home/<local_username>/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xy.nc
    167157
    168158  >>> OUTPUT: DATA_2D_XZ_NETCDF  to
    169               /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz.nc
     159              /home/<local_username>/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz.nc
    170160
    171161  >>> OUTPUT: DATA_2D_XZ_AV_NETCDF  to
    172               /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz_av.nc
     162              /home/<local_username>/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz_av.nc
    173163
    174164  ----------------------------------------------------------------------------
     
    177167 --> palmrun finished
    178168}}}
    179 You should find the output files at their respective positions as listed in the terminal output. Most of PALM's output files are written in NetCDF format and are copied to subdirectory {{{OUTPUT}}}. Some general information files are written in ASCII format and are copied to folder {{{MONITORING}}}. Please see here (add link) for a complete list of different output data/files that PALM offers. Section ..... describes how to steer PALM's output (e.g. output quantities, output intervals, etc.).
    180 
    181 You are now at the point where you can define and run your own simulation set-up for the first time.
    182 
    183 == How to create a new simulation set-up
    184 
    185 First give your new set-up a name to be used as the run identifier, e.g. {{{neutral}}}. Create a new parameter file and set all parameters required for defining your set-up (number of grid points, grid spacing, etc.) . You may find it more convenient to use an existing parameter file and modify it, e.g. the one which has come with the automatic installation:
    186 {{{
    187    cd ~/palm/current_version
    188    mkdir -p JOBS/neutral/INPUT
    189    cp JOBS/example_cbl/INPUT/example_cbl_p3d JOBS/neutral/INPUT/neutral_p3d
    190 }}}
    191 Edit file {{{neutral_p3d}}} and add, delete, or change parameters. Run your new set-up with
    192 {{{
    193    palmrun -r neutral -c default -X4 -a "d3#"
    194 }}}
    195 If the run has finished successfully, results can be found in folders {{{JOBS/neutral/MONITORING}}} and {{{JOBS/neutral/OUTPUT}}}.
     169You should find the output files at their respective positions as listed in the terminal output. Most of PALM's output files are written in NetCDF format and are copied to subdirectory {{{OUTPUT}}}. Some general information files are written in ASCII format and are copied to folder {{{MONITORING}}}. All available output files of PALM are listed [wiki:doc/app/palm_iofiles here]. PALM offers several [wiki:doc/app/d3par#output namelist parameters] to steer the PALM output.
     170
     171You are now at the point where you can [wiki:doc/app/palmrun_quickstart#create define and run your own simulation set-up] for the first time.
    196172
    197173== [=#batch Batch mode] ==
    198174
    199 Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your [wiki:doc/app/palm_config configuration file], and that a batch system (e.g. PBS, Slurm, ...) is installed on the respective computer.
     175Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires that you manually modify and extend your [wiki:doc/app/palm_config configuration file], and that a batch system (e.g. PBS, Slurm, ...) is installed on the respective computer.
    200176
    201177=== [=#batch_local Running PALM in batch on a local computer] ===
     
    203179The local computer is the one where the commands that you enter in a terminal sessions are executed. This might be your local PC/workstation, or a login-node of a cluster-system / computer center where you are logged in via ssh. Regardless of the computer, it is assumed that PALM has been successfully installed on that machine, either using the automatic installer or via manual installation.
    204180
    205 For running PALM in batch mode you need to include additional options in the {{{palmrun}}} command to specify the system resources requested by the job, and to modify your configuration file. A minimum set of additional {{{palmrun}}} options is
    206 {{{
    207    palmrun  ....-b -c <configuration identifier>  -t <cputime>  -X <total number of cores>  -T <MPI tasks per node>  -q <queue>
     181For running PALM in batch mode you need to include __additional__ options in the {{{palmrun}}} command to specify the system resources requested by the job, and to modify your configuration file. A minimum set of __additional__ {{{palmrun}}} options is
     182{{{
     183palmrun  ....-b -c <configuration identifier>  -m <memory> -t <cputime>
     184             -X <total number of cores>  -T <MPI tasks per node>  -q <queue>
    208185}}}
    209186
    210187**Note:** The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer!
    211188
    212 Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you cannot use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system. You can find more details in the complete description of the [wiki:doc/app/palm_config#Batchjobdirectives configuration file].
    213 
    214 Now you may start your first batch job by entering
    215 {{{
    216    palmrun  -b -r neutral -c batch -t 5400 -m 1500 -X 48 -T 12 -q medium -a "d3#"
    217 }}}
     189Before entering the above command, you need to add information to your configuration file. **Best practice** would be to create a new file, e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file. On a system that allows both batch and interactive mode in the same software environment, you may use one and the same configuration file to start palmrun in either of the modes. You can find more details in the complete description of the [wiki:doc/app/palm_config#Batchjobdirectives configuration file].\\
     190
    218191Based on these arguments, the environment variables that have been described [wiki:doc/app/palm_config here] will be set by {{{palmrun}}} to:
    219  * {{{ {{job_id}} }}} = neutral.##### \\ where ##### is a five digit random number which is newly created for each job. The {{{job_id}}} is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system.
    220  * {{{ {{cpu_hours}} }}} = 1, {{{ {{cpu_minutes}} }}} = 30  and {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}}
    221  * {{{ {{mpi_tasks}} }}} = 48 \\ as given by option {{{-X}}}
    222  * {{{ {{tasks_per_node}} }}} = 12 \\ as given by option {{{-T}}}
    223  * {{{ {{nodes}} }}} = 4 \\ calculated from {{{-X}}} / {{{-T}}}. If {{{-X}}} is not a multiple of {{{-T}}}, {{{nodes}}} is incremented by one, e.g. {{{-X 49 -T 12}}} gives {{{nodes = 5}}}.
    224  * {{{ {{queue}} }}} = medium \\ as given by option {{{-q}}}
     192* {{{ {{run_id}} }}} = example_cbl.##### \\ where ##### is a five digit random number which is newly created for each job. The {{{run_id}}} is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system.
     193* {{{ {{cpu_hours}} }}} = 1, {{{ {{cpu_minutes}} }}} = 30  and {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}}
     194* {{{ {{mpi_tasks}} }}} = 48 \\ as given by option {{{-X}}}
     195* {{{ {{tasks_per_node}} }}} = 12 \\ as given by option {{{-T}}}
     196* {{{ {{nodes}} }}} = 4 \\ calculated from {{{-X}}} / {{{-T}}}. If {{{-X}}} is not a multiple of {{{-T}}}, {{{nodes}}} is incremented by one, e.g. {{{-X 49 -T 12}}} gives {{{nodes = 5}}}.
     197* {{{ {{queue}} }}} = medium \\ as given by option {{{-q}}}
    225198
    226199When you enter the above command for the first time, {{{palmrun}}} will call the script {{{palmbuild}}} to re-compile the PALM code. The compiled code will be put into folder {{{$HOME/palm/current_version/MAKE_DEPOSITORY_batch}}}. Re-compilation is required since {{{palmrun}}} expects a separate make depository for each configuration file (because the configuration files may contain different compiler settings).
     
    243216
    244217}}}
    245 Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has been created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<configuration identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-c}}} and {{{-r}}} (here it will be {{{batch_neutral}}}) will be put in the folder that you have set by variable {{{
    246 local_jobcatalog
    247 }}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
     218Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has been created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<configuration identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-c}}} and {{{-r}}} (here it will be {{{batch_example_cbl}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
    248219
    249220Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter [wiki:doc/app/runs job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
     
    260231Now, let's start with the configuration file settings for remote batch jobs. For this it would be convenient to create a new configuration file based on the one you already used locally, e.g. by
    261232{{{
    262    cp  .palm.config.batch  .palm.config.<remote configuration identifier>
    263 }}}
    264 where {{{<remote configuration identifier>}}} can be any string to identify your remote host. Edit this file as described [wiki:doc/app/palm_config#Additionaldirectivesforbatchjobsonremotehosts here].
     233   cp  .palm.config.batch  .palm.config.batch_remote
     234}}}
     235where {{{batch_remote}}} can be any string to identify your remote host. Edit this file as described [wiki:doc/app/palm_config#Additionaldirectivesforbatchjobsonremotehosts here].
    265236
    266237After setting up the configuration file and before calling {{{palmrun}}}, you need to call the {{{palmbuild}}} command to generate the PALM executable for the remote host:
    267238{{{
    268    palmbuild -c <remote configuration identifier>
    269 }}}
    270 Keep in mind that the configuration file {{{.palm.config.<remote configuration identifier>}}} requires correct settings valid for your remote computer (compiler name, compiler options, include and library paths, etc.). If you forgot to call {{{palmbuild}}}, {{{palmrun}}} will ask you to do this for you.
     239   palmbuild -c batch_remote
     240}}}
     241Keep in mind that the configuration file {{{.palm.config.batch_remote}}} requires correct settings valid for your remote computer (compiler name, compiler options, include and library paths, etc.). If you forgot to call {{{palmbuild}}}, {{{palmrun}}} will ask you to do this for you.
    271242
    272243If {{{palmbuild}}} succeeded, you can enter the {{{palmrun}}} command, like
    273244{{{
    274    palmrun -r neutral -c <remote configuration identifier> ......
     245   palmrun -r example_cbl -c batch_remote ......
    275246}}}
    276247After confirming the {{{palmrun}}} settings by entering {{{y}}}, similar information as for local batch jobs will be output to the terminal. {{{palmrun}}} finally terminates with messsage {{{--> palmrun finished}}}. The batch job is now queued on the remote system. After the job has been finished, the job protocol will be transferred back to your local computer and put into the folder defined by {{{local_jobcatalog}}}. If this file does not appear, because e.g. the transfer failed, you may find the protocol file on the remote host in the folder defined by {{{remote_jobcatalog}}}. Like in case of batch jobs running on local computers, check the contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, and especially you get information about where to find the output files on your local computer.