Version 37 (modified by scharf, 6 years ago) (diff) |
---|
This page is under construction!
palmrun technical description
Introduction
palmrun is the main script to execute PALM in interactive mode (running PALM in a terminal) or in batch mode. This chapter describes the actions / operations carried out by palmrun and gives a complete list and description of its available options. At the end of this chapter you will find detailed information for experienced PALM-users who migrate from the old mrun / mbuild scripts to the new scripts palmrun / palmbuild.
Mode of operation
Explanations will follow about the order of actions carried out by palmrun in case of interactive runs, batch runs, and batch runs on remote machines ...
palmrun options
There are two groups of options, one are the user options that you can specify yourself when manually calling palmrun from the terminal, and another group of internal options that are used for automatically created internal calls of palmrun. Internal calls of palmrun are those that are part of the batch job, and those used for automatically starting restart jobs. Normally, you should never use the internal options. User options from a manual call are automatically added to the internal calls of palmrun.
The following gives complete lists of palmrun user and internal options. A --- in the second column means that the respective option has no argument.
palmrun user options
option | default value | meaning |
---|---|---|
-a | " " | activation string list |
-A | " " | project account number |
-b | --- | create a batch job |
-B | --- | Do not delete the temporary working directory |
-c | default | configuration identifier |
-C | --- | Tells that it is a palmrun call for a restart job that has been automatically created. This is an internal option but it can be used for manually generated restart runs, if the user likes to re-use the contents of the SOURCES_FOR_RUN... folder. |
-F | --- | Create a batch job file only, and do not submit it. |
-k | false | If set true, input files that have the ln attribute and that have been generated by a previous run within a job chain will be automatically deleted at the end of the run. |
-m | " " | memory in MByte to be requested in batch jobs per MPI task |
-M | " " | Makefile to compile the PALM code and utility programs. By default, the name of the makefile is Makefile, and it is expected to be in the folder that is given by variable source_path in the configuration file. |
-O | 1 | OpenMP threads to be started per MPI task. Environment variable OMP_NUM_THREADS will be set to this value |
-q | none | name of the job queue to which batch jobs will be submitted |
-r | test | Name of the run |
-s | " " | List of subroutines (Fortran file names) from the SVN repository (under .../trunk/SOURCES) that shall be compiled for this run. Compiled files will be exclusively used for the run and not be put in the MAKE_DEPOSITORY. In case of -s LM, all files in the repository that have been modified by the used will be compiled. |
-t | " " | cpu time request for batch jobs in seconds |
-T | " " | MPI tasks per node request for batch jobs |
-v | --- | Suppresses parts of palmrun's terminal output and prevents palmrun queries |
-V | --- | Use existing SOURCES_FOR_RUN_... folder. Prevents palmrun from creating a new SOURCES_FOR_RUN_... folder. Use this option if you do not want the user interface files to be compiled again. |
-w | as -X | Number of parallel I/O streams to be opened by PALM. In the default case, all MPI processes write at the same time. This may cause file system problems in case of a very large number of cores. |
-W | " " | Name (id) of a previous job. Can be used as variable {{previous_job}} as part of job directives in the configuration file, in order to prevent the job to start before the specified previous job has been finished. The job name must be the one that have been given by the batch system. |
-x | --- | Causes palmrun to output excessive debug information for both interactive sessions as well as batch jobs. |
-X | 1 | Total number of cores to be used for the run. |
-y | --- | Use file appendix _O for local PALM-I/O files in case of uncoupled ocean runs, e.g. if the run is a precursor run and files shall later be used for coupled atmosphere-ocean runs. |
-Y | " " | In case of a coupled atmosphere-ocean run, the parameter tells PALM how many cores shall be assigned to the atmosphere- and ocean-model, respectively. For example, in case of -X 64 -Y "16 48"" 16 cores are assigned to the atmosphere model, and 48 cores to the ocean model. |
-Z | --- | Do not call combine_plot_fields after PALM has finished. In that case, data output of 2d-cross section or 3d-volumes that has been done be each core into a separate file will not be collected into one file. In order to later process these files, option -B should be set too. -Z might be required for very large jobs in order to reduce computational demands, because combine_plot_fields is running on one core only, so that all other cores will run idle. |
? | --- | Print a short list of available user options on the terminal. |
palmrun internal options
option | default value | meaning |
---|---|---|
-C | --- | Tells that it is a palmrun call for a restart job that has been automatically created |
-G | " " | Global revision number of the PALM code in trunk/SOURCES |
-i | Five digit random number that gives a run-id and that is used as part of the batch job name as well as the name of the temporary working directory and other files. A new random number is created for each call of palmrun (either a manual call by the user or an automatic call for generating a restart job), and is passed to the batch job internal call of palmrun via this option. | |
-j | --- | Tells that palmrun is running within a batch job |
-R | " " | Return address. Tells the remote batch job to which IP-address the PALM output and the job protocol file has to be send, and from which machine automatic restarts have to be generated. |
-u | " " | Username on the remote host as given in the configuration file by variable remote_username |
-U | " " | Username on the local host as given in the configuration file by variable local_username |
How does palmrun operate?
A detailed list of consecutive steps that are carried out will follow soon ...
palmrun / palmbuild to mrun / mbuild migration
Attention: Following text is for experienced PALM-users who switch from the old mrun / mbuild scripts to the new scripts palmrun / palmbuild. It is not up-to-date and will be removed at a later time.
Configuring and running PALM with palmbuild and palmrun
Changes compared to mrun/build
- The new scripts will run on any kind of Linux / Unix system without requiring any adjustments. All settings are controlled via two configuration files.
- mbuild is replaced by palmbuild, and mrun is replaced by palmrun. The old script subjob is not used any more (submitting jobs is now part of palmrun).
- Setting the environment variable PALM_BIN in shell-profile files (e.g. .bashrc) is not required any more.
- The old configuration file .mrun.config has been split into two files .palm.config.<configuration_identifier> and .palm.iofiles, where <configuration_identifier> (short <ci>) is an arbitrary string that you can define.
"Configuration" means a setting for a specific computer with a specific compiler, compiler options, libraries, etc.
If you like to run PALM with different configurations, e.g. one with debug options switched on, and one with high optimization, you need to create separate files for each configuration, e.g. .palm.config.optimized and .palm.config.debug. This replaces the old block structure in .mrun.config. The configuration file to be used is defined by palmrun- or palmbuild-option -c., e.g. palmrun ... -c optimized will use .palm.config.optimized. You find examples of .palm.config.<...> files in your PALM copy under .../trunk/SCRIPTS
You will need only one file .palm.iofiles which contains the file connection statements to be used for all configurations.
The file attributes (second column in the file connection statements) have been partly changed. The second attribute, which was either loc, locopt or job, has been completely removed. Optional input files now require inopt as first attribute. Those input files to be send to the remote host require tr as second attribute (instead of job). fl and flpe must be changed to ln and lnpe respectively.
For output files, a wildcard * can be given as file activation string in the third column. In such a case, existing local output files will always be copied to their permanent position. No warning will be given if they do not exist.
Wildcards (*) are allowed for local names of output files (e.g. BINOUT*) and file extensions of input files (e.g. _p3d*). Using wildcards, only one file connection statement is required, e.g. for nested runs which require different input files for each domain (_p3d, _p3d_N01, _p3d_N02, etc.) or which generate different output files (e.g. BINOUT, BINOUT_N01, etc.). The additional extensions that are identified from the existing files (e.g. _N01, _N02) will be automatically added to the local filename (in case of input files) or to the file extension (in case of output files).
The utility program interpret_config has been removed. The configuration files are now directly interpreted by the shellscripts.
- Only one call of palmbuild is required to compile for both the utilities and the PALM source code (there is no option -u anymore). The compiled routines (object files and executables) are put into folder MAKE_DEPOSITORY_<configuration_identifier>, where <configuration_identifier> equals the string given with palmbuild-option -c.
- palmrun does not compile any more at the beginning of a batch job. The palm-executable for the batch-job (or for the interactive session) is created as part of the palmrun-call that you have manually entered at your terminal, and it is created before the batch-job is submitted. The executable is put into the folder SOURCES_FOR_RUN_<run_identifier>, where <run_identifier> is the string provided with palmrun-option -r. This folder is now put into the folder set with variable fast_io_catalog (see below for fast_io_catalog). If you do not use a user-interface, palmrun will not compile at all and will take the executable from folder MAKE_DEPOSITORY_<configuration_identifier> that has been generated with your last call of palmbuild. If palmrun cannot find the folder MAKE_DEPOSITORY_<configuration_identifier>, it will internally call palmbuild in order to generate it. If palmrun finds a folder SOURCES_FOR_RUN_<run_identifier> that has been generated by a previous call of palmrun, it will ask you if executables from that folder shall be used. This way, you can avoid to re-compile your user-interface with each call of palmrun. Automatically generated restart runs will always use executables from SOURCES_FOR_RUN_<run_identifier>.
You may have to remove folders SOURCES_FOR_RUN_... manually from time to time, because they are not deleted automatically at the end of a job (or the last job of a restart job chain).
- The option for giving the file activation strings is now -a "d3# ..." instead of -r "d3# ...".
- In case of automatic restart runs, hashes ("#") in the file activation strings are now replaced by character "r" instead of character "f".
- The .palm.config.<ci> file does not contain blocks any more. Several variable names have been changed (e.g. compiler_options instead of fopts) and new variables have been introduced (e.g. execute_command in order to give the command for starting the executable). Colons (:) for separating e.g. compiler options must not be used any more. Here is an example (with some lines truncated, as displayed by ....)
#$Id$ #column 1 column 2 #name of variable value of variable (~ must not be used, except for base_data) #------------------------------------------------------------------------------ %base_data ~/palm/current_version/JOBS %base_directory $HOME/palm/current_version %source_path $HOME/palm/current_version/trunk/SOURCE %user_source_path $base_directory/JOBS/$fname/USER_CODE %fast_io_catalog /localdata/your_linux_username # %local_ip 111.11.111.111 %local_username your_linux_username # %compiler_name mpif90 %compiler_name_ser ifort %cpp_options -cpp -D__parallel -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__fftw -D__netcdf %make_options -j 4 %compiler_options -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/.... %linker_options -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/.... %hostfile auto %execute_command mpiexec -machinefile hostfile -n {{mpi_tasks}} ./palm
- Some further comments concerning specific variables:
- fast_io_catalog replaces the old variables tmp_user_catalog and tmp_data_catalog. It should be a folder on a file system with fast discs, as typically provided on large computer systems for temporary I/O, e.g. something like /work/.... The temporary working catalog created by palmrun will be in this folder, and your restart data should be put in this folder too. The default .palm.iofiles is using fast_io_catalog for the restart files.
- For cpp_options, you now have to give ALL switches required, especially -D__parallel to use the parallel version of PALM, which was implicitly set with mrun-option -K parallel before. The -K option has been removed.
- The compiler- and linker-options now require to give ALL include- and library-paths for the libraries that you intend to use (e.g. MPI, NetCDF, FFTW), if they are not automatically set by a module-environment (like e.g. on Cray-systems). Old variables like netcdf_inc or netcdf_lib have been removed from the configuration file.
- execute_coammand is required to define the command to execute PALM. It will depend on the MPI-library that you are using. The wildcard {{mpi_tasks}} will be replaced by the value provided with palmrun-option -X. A further wildcard that can be used is {{tasks_per_node}} , which will be replaced by the value provided with palmrun-option -T.
- The variable write_binary (formerly used to switch on the output of restart data) has been removed from the configuration file. Output of restart data is now switched on with the activation string "restart", i.e. palmrun ..... -a "... restart".
- For running PALM on a remote host in batch, additional settings are required in the configuration file. The following is an example for using the Cray-XC40 of HLRN as a remote host:
#column 1 column 2 #name of variable value of variable (~ must not be used) #---------------------------------------------------------------------------- %base_data ~/palm/current_version/JOBS %base_directory $HOME/palm/current_version %source_path $HOME/palm/current_version/trunk/SOURCE %user_source_path $base_directory/JOBS/$fname/USER_CODE %fast_io_catalog /gfs2/work/niksiraa %local_jobcatalog /home/raasch/job_queue %remote_jobcatalog /home/h/niksiraa/job_queue # %local_ip 130.75.105.103 %local_username raasch %remote_ip 130.75.4.1 %remote_username niksiraa %remote_loginnode hlogin1 %ssh_key id_rsa_hlrn %defaultqueue mpp2testq %submit_command /opt/moab/default/bin/msub -E # %compiler_name ftn %compiler_name_ser ftn %cpp_options -e Z -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__parallel -D__netcdf -D__netcdf4 -D__netcdf4_parallel -D__fftw %make_options -j 4 %compiler_options -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic %linker_options -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic -dynamic %execute_command aprun -n {{mpi_tasks}} -N {{tasks_per_node}} palm %memory 2300 %module_commands module load fftw cray-hdf5-parallel cray-netcdf-hdf5parallel %login_init_cmd module switch craype-ivybridge craype-haswell # # BATCH-directives to be used for batch jobs. If $-characters are required, hide them with \\\ BD:#!/bin/bash BD:#PBS -A {{project_account}} BD:#PBS -N {{job_id}} BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}} BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}} BD:#PBS -o {{job_protocol_file}} BD:#PBS -j oe BD:#PBS -q {{queue}} # # BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host BDT:#!/bin/bash BDT:#PBS -A {{project_account}} BDT:#PBS -N job_protocol_transfer BDT:#PBS -l walltime=00:30:00 BDT:#PBS -l nodes=1:ppn=1 BDT:#PBS -o {{job_transfer_protocol_file}} BDT:#PBS -j oe BDT:#PBS -q dataq # #---------------------------------------------------------------------------- # INPUT-commands, executed before running PALM - lines must start with "IC:" #---------------------------------------------------------------------------- IC:export ATP_ENABLED=1 IC:export MPICH_GNI_BTE_MULTI_CHANNEL=disabled IC:ulimit -s unlimited
- Some additional settings are required here:
- fast_io_catalog is the one to be used on the remote host.
- IP-addresses and user names have to be given for the local AND the remote host. Usually, the remote host IP-address is the one for the login-node.
- remote_loginnode: on many of the large computer systems, the compute nodes do not allow for ssh- or scp-commands in order to transfer data to the local host or to start restart jobs. If remote_loginnode is set, palmrun tries to start these commands via the login-node. Attention: In most cases, the systems do not accept an IP-address. You have to give the mnemonic name of the login-node.
- ssh_key: here you can give the filename of a special ssh-key for using ssh / scp without password. The key must be in folder ~/.ssh. This is a special setting for the HLRN-system and should not be required on other systems.
- default_queue: if you do not set the queue via palmrun-option -q, this queue will be taken as the default queue. Other than mrun, palmrun does not check for valid queue names any more.
- submit_command: command for submitting a job to a batch system
- module_commands: loading of necessary modules for running PALM
- login_init_cmd: commands to be carried directly after login to the remote computer
- Lines starting with BD:: Here you have to give the batch directives that are required by your batch-system. palmrun will replace wildcards in the following way:
- {{project_account}} : To be used if you like to run the job under a specific account number. Is replaced by value provided with palmrun-option -A.
- {{job_id}} : The job's name. It will be formed by the run identifier provided with palmrun-option -r and a 5-digit random number, e.g. -r example_cbl may give example_cbl.12345.
- {{cpu_hours}}, {{cpu_minutes}}, {{cpu_seconds}} : Will be replaced based on the total CPU time in seconds provided with palmrun-option -t, .e.g. -t 3666 will replace {{cpu_hours}}=1, {{cpu_minutes}}=1, {{cpu_seconds}=6 .
- {{nodes}} : The number of nodes requested by the job. It will be replaced by the result of totalcores / ( noMPIt * noOpenMPt ), where totalcores is the total number of cores as requested with palmrun-option -X, noMPIt is the number of MPI-tasks to be started on each node, as given my palmrun-option -T, and noOpenMPT is the number of OpenMP-threads to be started per MPI-task, as given by palmrun-option -O.
- {{tasks_per_node}} : The number of MPI-tasks to be started on each node, as given my palmrun-option -T.
- {{jobfile}} : Name of the job protocol file. The filename for jobs running on a remote host is created from palmrun-options -c and -r, e.g. for palmrun -r example_cbl -c crayh ... the job protocol file name will be crayh_example_cbl. For jobs running on a local host, the name part from option -c will be omitted.
- {{queue}} : The name of the queue to which the job shall be submitted. Will be replaced by the value provided with palmrun-option -q, or, if -q is omitted, by the value of variable defaultqueue (see further above).
- {{previous_job}} : The name of a previous job as given by palmrun-option -W. Can be used to set job dependencies.
- Lines starting with BDT:: Here you have to give special batch directives for a small job that is required to send the job protocol file from the remote host back to your local host (meaning that these lines are only required if you are running batch jobs on a remote host). Since the job protocol file generated by the main job (which is started by palmrun) is not available before the end of the job, the main job has to start another small job at its end, which has the only task to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively.