Version 31 (modified by raasch, 6 years ago) (diff) |
---|
Running PALM with palmrun
Introduction
PALM can be run in different modes:
- interactive mode | PALM executes (almost) immediately within your terminal session after entering the palmrun command.
- batch mode | PALM job is submitted by palmrun to a queuing/batch system (e.g. PBS, ...), where it is scheduled for execution.
A batch system is a must-have on high-performance computers, and a nice-to-have for computers that are shared among a larger number of users. The handling of PALM differs between interactive and batch mode, and it slightly varies, depending if the PALM job is submitted to the
- local computer/host | The computer that you are currently sitting at or are logged in via your terminal (ssh).
- remote computer/host | Any computer with a batch system, that you have ssh access to, but are not logged in at the moment. The remote host becomes your local host as soon as you log in to the remote host via ssh.
Interactive mode
You can follow the progress of the simulation on the terminal where a lot of informative messages will be output. You can also stop the simulation at any time by typing CTRL C.
The following instructions assume, that the automatic installer has run without any problems. In case that the automatic installer has failed or cannot be used (e.g. on many supercomputer center systems), you need to adjust settings in the configuration file manually. If the automatic installer has run without problems, please switch to your working directory and check if the default configuration and parameter files have been generated
cd ~/palm/current_version ls -al ls -al JOBS/example_cbl/INPUT
You should see the default configuration file .palm.config.default. Furthermore, the parameter file JOBS/example_cbl/INPUT/example_cbl_p3d should also exist. This is a FORTRAN-NAMELIST file to define the simulation setup and to steer the PALM simulation. You should now be able to start the first PALM simulation yourself. Please enter
palmrun -d example_cbl -h default -a "d3#" -X4
example_cbl is the so-called run identifier and tells palmrun to use the NAMELIST file example_cbl_p3d from JOBS/example_cbl/INPUT. It also determines folders and names of output files generated by PALM using informations from the default file configuration file ..../trunk/SCRIPTS/.palm.iofiles. Chapter PALM iofiles explains the format of this file and how you can modify or extend it. As a new user, you should not need to care about this file because the default settings should do the job for you.
Option -h specifies the so-called host identifier. It tells palmrun which configuration file should be used. -h default means to use the configuration file .palm.config.default. The configuration file contains all the computer (host) specific settings, e.g. which compiler and compiler options should be used, the pathnames of libraries (e.g. NetCDF or MPI), or the name of the execution command (e.g. mpirun or mpiexec), as well as many other important settings. If the automatic installer worked correctly, it created this file for you with settings based on your responses during the installation process. You may create additional configuration files with different settings for other computers (hosts), or for the same computer, e.g. if you like to compile and run PALM with debug compiler options (see chapter PALM configuration file).
Option -a is used for steering the handling of input and output files that are required / generated by PALM. Its argument is called the file activation string(s). The file configuration file ..../trunk/SCRIPTS/.palm.iofiles contains a complete list of PALM's I/O files, one line per file. PALM expects its input files in a temporary working directory that is created by each call of palmrun and it outputs data to this temporary directory too. The file configuration file tells palmrun where to find your input files and where to copy the output files (because the temporary working directory is automatically deleted before palmrun has finished). The default setting is that all these files are in subdirectory $HOME/palm/current_version/JOBS/<run_identifier>, where <run_identifier> is the one given with option -d. The argument of option -a tells palmrun which of these files need to be copied. If the option is omitted, no I/O files will be copied at all. Argument "d3#" means that the parameter/NAMELIST file for steering PALM shall be provided as input file. This is the minimum setting for option -a, because PALM cannot run without this parameter file. Multiple activation strings can be given. See chapter PALM iofiles for handling PALM I/O files.
Option -X specifies on how many cores the simulation shall run. The argument should not be larger than the maximum number of cores available on your computer (except in case of hyperthreading), because that would usually slow down the performance significantly.
After entering the palmrun command, some general settings will be listed on the terminal and the user is prompted for confirmation:
*** palmrun 1.0 Rev: 3151 $ will be executed. Please wait ... Reading the configuration file... Reading the I/O files... *** INFORMATIVE: additional source code directory "/home/raasch/palm/current_version/JOBS/example_cbl/USER_CODE" does not exist or is not a directory. No source code will be used from this directory! #------------------------------------------------------------------------# | palmrun 1.0 Rev: 3151 $ Tue Aug 28 09:49:44 CEST 2018 | | PALM code Rev: 3209 | | | | called on: bora | | config. identifier: imuk (execute on IP: 130.75.105.103) | | running in: interactive run mode | | number of cores: 4 | | tasks per node: 4 (number of nodes: 1) | | | | cpp directives: -cpp -D__parallel -DMPI_REAL=MPI_DOUBLE_PRECI | | SION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__ff | | tw -D__netcdf | | compiler options: -fpe0 -O3 -xHost -fp-model source -ftz -no-pr | | ec-div -no-prec-sqrt -ip -I /muksoft/packages | | /fftw/3.3.4/include -L/muksoft/packages/fftw/ | | 3.3.4/lib64 -lfftw3 -I /muksoft/packages/netc | | df/4_intel/include -L/muksoft/packages/netcdf | | /4_intel/lib -lnetcdf -lnetcdff | | linker options: -fpe0 -O3 -xHost -fp-model source -ftz -no-pr | | ec-div -no-prec-sqrt -ip -I /muksoft/packages | | /fftw/3.3.4/include -L/muksoft/packages/fftw/ | | 3.3.4/lib64 -lfftw3 -I /muksoft/packages/netc | | df/4_intel/include -L/muksoft/packages/netcdf | | /4_intel/lib -lnetcdf -lnetcdff | | | | run identifier: example_cbl | | activation string list: d3# | #------------------------------------------------------------------------# >>> everything o.k. (y/n) ?
Listed settings are determined by the palmrun options and settings in the configuration file (here .palm.config.default). Entering n will abort palmrun. Entering y will finally start execution of PALM and a larger number of informative messages will appear on the terminal:
*** PALMRUN will now continue to execute on this machine *** creating executable and other sources for the local host *** nothing to compile for this run *** executable and other sources created *** changed to temporary directory: /localdata/......./example_cbl.23751 *** providing INPUT-files: ---------------------------------------------------------------------------- >>> INPUT: /home/....../palm/current_version/JOBS/example_cbl/INPUT/example_cbl_p3d to PARIN *** INFORMATIVE: some optional INPUT-files are not present ---------------------------------------------------------------------------- *** all INPUT-files provided *** execution of INPUT-commands: ---------------------------------------------------------------------------- >>> ulimit -s unlimited ---------------------------------------------------------------------------- *** execution starts in directory "/localdata/....../example_cbl.23751" ---------------------------------------------------------------------------- *** running on: bora bora bora bora *** execute command: "mpiexec -machinefile hostfile -n 4 palm" ... reading environment parameters from ENVPAR --- finished ... reading NAMELIST parameters from PARIN --- finished ... creating virtual PE grids + MPI derived data types --- finished ... checking parameters --- finished ... allocating arrays --- finished ... initializing with constant profiles --- finished ... initializing statistics, boundary conditions, etc. --- finished ... creating initial disturbances --- finished ... calling pressure solver --- finished ... initializing surface layer --- finished --- leaving init_3d_model --- starting timestep-sequence [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 0.0 left --- finished time-stepping ... calculating cpu statistics --- finished ---------------------------------------------------------------------------- *** execution finished
In case that palmrun has proceeded to this point (finished time stepping and execution finished) without giving warning- or error-messages, the PALM simulation has finished successfully. The displayed progress bar (xxxxx}) allows you to estimate how long the run still needs to finish.
Subsequent messages give information about post processing and copying of output data:
*** post-processing: now executing "mpiexec -machinefile hostfile -n 1 combine_plot_fields.x" ... *** combine_plot_fields *** uncoupled run NetCDF output enabled no XY-section data available NetCDF output enabled no XZ-section data available no YZ-section data available no 3D-data file available *** execution of OUTPUT-commands: ---------------------------------------------------------------------------- >>> [[ -f LIST_PROFIL_1D ]] && cat LIST_PROFIL_1D >> LIST_PROFILE >>> [[ -f LIST_PROFIL ]] && cat LIST_PROFIL >> LIST_PROFILE >>> [[ -f PARTICLE_INFOS/_0000 ]] && cat PARTICLE_INFOS/* >> PARTICLE_INFO ---------------------------------------------------------------------------- *** saving OUTPUT-files: ---------------------------------------------------------------------------- >>> OUTPUT: RUN_CONTROL to /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_rc >>> OUTPUT: HEADER to /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_header >>> OUTPUT: CPU_MEASURES to /home/raasch/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_cpu >>> OUTPUT: DATA_1D_PR_NETCDF to /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_pr.nc >>> OUTPUT: DATA_1D_TS_NETCDF to /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_ts.nc >>> OUTPUT: DATA_2D_XY_NETCDF to /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xy.nc >>> OUTPUT: DATA_2D_XZ_NETCDF to /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz.nc >>> OUTPUT: DATA_2D_XZ_AV_NETCDF to /home/raasch/palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz_av.nc ---------------------------------------------------------------------------- *** all OUTPUT-files saved --> palmrun finished
You should find the output files at their respective positions as listed in the terminal output. Most of PALM's output files are written in NetCDF format and are copied to subdirectory OUTPUT. Some general information files are written in ASCII format and are copied to folder MONITORING. Please see here (add link) for a complete list of different output data/files that PALM offers. Section ..... describes how to steer PALM's output (e.g. output quantities, output intervals, etc.).
You are now at the point where you can define and run your own simulation set-up for the first time.
How to create a new simulation set-up
First give your new set-up a name to be used as the run identifier, e.g. neutral. Create a new parameter file and set all parameters required for defining your set-up (number of grid points, grid spacing, etc.) . You may find it more convenient to use an existing parameter file and modify it, e.g. the one which has come with the automatic installation:
cd ~/palm/current_version mkdir -p JOBS/neutral/INPUT cp JOBS/example_cbl/INPUT/example_cbl_p3d JOBS/neutral/INPUT/neutral_p3d
Edit file neutral_p3d and add, delete, or change parameters. Run your new set-up with
palmrun -d neutral -h default -X4 -a "d3#"
If the run has finished successfully, results can be found in folders JOBS/neutral/MONITORING and JOBS/neutral/OUTPUT.
Batch mode
Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. palmrun supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your configuration file .palm.config...., and that a batch system (e.g. PBS or ...) is installed on the respective computer.
Running PALM in batch on a local computer
The local computer is the one where the commands that you enter in a terminal sessions are executed. This might be your local PC/workstation, or a login-node of a cluster-system / computer center where you are logged in via ssh. Regardless of the computer, it is assumed that PALM has been successfully installed on that machine, either using the automatic installer or via manual installation.
For running PALM in batch mode you need to include additional options in the palmrun command to specify the system resources requested by the job, and to modify your configuration file. A minimum set of additional palmrun options is
palmrun ....-b -h <host configuration> -t <cputime> -X <total number of cores> -T <MPI tasks per node> -q <queue> -m <memory per core>
where
- <host configuration> is the configuration file containing your batch mode settings
- <cputime> is the maximum CPU time (wall clock time) in seconds requested by the batch job
- <total number of cores> is the total number of CPU cores (not CPUs!) that shall be used for your run
- <MPI tasks per node> is the number of MPI tasks to be started on one node of the computer system. Typically, <MPI tasks per node> is chosen as the total number of CPU cores available on one node, e.g. if a node has two CPUs with 12 cores each, then <MPI tasks per node> = 24.
- {{{<queue>>> is the name of the batch job queue that you like to use. See your batch system documentation about available queues and keep in mind that usually each queue has special limits for requested resources.
- <memory per core> is the memory in MByte requested by each core
The first option -b is required to tell palmrun to create a batch job running on the local computer.
Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. .palm.config.default) or create a new one (e.g. by copying the default file to e.g. .palm.config.batch and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file .palm.config.batch. Edit this file and add those batch directives required by your batch system. Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).
BD:#!/bin/bash BD:#PBS -N {{job_id}} BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}} BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}} BD:#PBS -o {{job_protocol_file}} BD:#PBS -j oe BD:#PBS -q {{queue}}
Batch directive lines in the configuration file must start in the first column with string BD:, immediately followed by the directive of the respective batch system (the PBS directives must e.g. start with #PBS followed by a blank). Strings parenthesized by double curly brackets {{...}} are variables used in palmrun and will be replaced by their respective values while palmrun creates the batch job file. A complete list of palmrun variables that can be used in batch directives is given in section batch_directives?.
In addition to the batch directives, the configuration file requires further information to be set for using the batch system, which is done by adding / modifying variable assignments in the general form
%<variable name> <value>
where <variable name> is the name of the Unix environment variable in the palmrun script and <value> is the value to be assigned to this variable. Each assignment must start with a %. A minimum set of variables to be added / modified
# to be added %submit_command /opt/moab/default/bin/msub -E %defaultqueue small %memory 1500 # to be modified %local_jobcatalog /home/username/job_queue %fast_io_catalog /gfs2/work %execute_command aprun -n {{mpi_tasks}} -N {{tasks_per_node}} ./palm
Given values are just examples! The automatic installer may have already included these variable settings as comment lines (starting with #). Then just remove the # and provide a proper value.
The meaning of these variables is as follows:
- submit_command: Batch system specific command to submit batch jobs plus options which may be required on your system. Please give the full path to the submit command. See your batch system documentation for any details.
- defaultqueue: Name of the queue to be used if the palmrun option -q is omitted. See your batch system documentation for queue names available on your system.
- memory: Memory in MByte requested by each core. If given, this value is used as the default in case that palmrun option -m has not been set.
- local_jobcatalog: Name of the folder where your job protocol file is put after the batch job has been finished. Batch queuing systems usually create a protocol file for each batch job which contains relevant information about all activities performed within the job.
- fast_io_catalog: Folder to be used by palmrun/PALM for temporary I/O files. Since PALM setups with large number of grid points may create very huge files, data should be written to a file system with very fast hard discs or SSD in order to get a good I/O performance. Computer centers typically provide such file systems and you should set your fast_io_catalog to such a file system.
- execute_command: Command to execute PALM (i.e. the executable that has been created by the compiler). It depends on the MPI library and the operating system that is used. See your MPI documentation or information provided by your computing center. Strings {{mpi_tasks}} and {{tasks_per_node}} will be replaced by palmrun depending on palmrun options -X and -T.
You can find more details in the complete description of the configuration file?.
Now you may start your first batch job by entering
palmrun -b -d neutral -h batch -t 5400 -m 1500 -X 48 -T 12 -q medium -a "d3#"
Based on these arguments, the environment variables that have been described above will be set by palmrun to:
- {{job_id}} = neutral.#####
where ##### is a five digit random number which is newly created for each job. The job_id is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system. - {{cpu_hours}} = 1
calculated from option -t - {{cpu_minutes}} = 30
calculated from option -t - {{cpu_seconds}} = 0
calculated from option -t - {{mpi_tasks}} = 48
as given by option -X - {{tasks_per_node}} = 12
as given by option -T - {{nodes}} = 4
calculated from -X / -T. If -X is not a multiple of -T, nodes is incremented by one, e.g. -X 49 -T 12 gives nodes = 5. - {{queue}} = medium
as given by option -q
When you enter the above command for the first time, palmrun will call the script palmbuild to re-compile the PALM code. The compiled code will be put into folder $HOME/palm/current_version/MAKE_DEPOSITORY_batch. Re-compilation is required since palmrun expects a separate make depository for each configuration file (because the configuration files may contain different compiler settings).
After confirming the palmrun settings by entering y, following information will be output to the terminal:
>>> everything o.k. (y/n) ? y *** batch-job will be created and submitted *** creating executable and other sources *** nothing to compile for this run *** executable and other sources created *** input files have been copied *** submit the job (output of submit command, e.g. the job-id, may follow) <<<submit message from batch system>>> --> palmrun finished
Before the batch job is finally submitted, palmrun creates a folder named SOURCES_FOR_RUN_<run_identifier> which is located in the fast_io_catalog and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages *** executable and other sources created and *** input files have been copied tell you that this folder has beeen created. *** nothing to compile for this run means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message (<<<submit message from batch system>>>) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the palmrun command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name <host identifier>_<run identifier> as given with palmrun options -h and -d (here it will be batch_neutral) will be put in the folder that you have set by variable local_jobcatalog in your configuration file (.palm.config.batch). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the palmrun command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter job chains and restart jobs? on how you can use palmrun to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
Running PALM in batch on a remote computer
You can use the palmrun command on your local computer (e.g. your local PC or workstation) and let it submit a batch job to a remote computer at any place in the world. palmrun copies required input files from your local computer to the remote machine and transfers output files back to your local machine, depending on the settings in the .palm.iofiles file. The job protocol file will also be automatically copied back to your local computer.
If you like to use this palmrun feature, you need additional/special settings in the configuration file. Furthermore, you need to pre-compile the PALM-code for the remote machine using the palmbuild command. The automatic PALM installer can not be used to install PALM on that machine. You need to do most of the settings manually.
Furthermore, passwordless ssh/scp access is required from the local computer to the remote computer, as well as from the remote to the local computer. In remote mode, palmrun and palmbuild are heavily using ssh and scp commands, and if you have not established passwordless access, you would need to enter your password several times before the batch job is finally submitted. Moreover, the job protocol file and any output files cannot be transferred back to your local computer because there is no connection to the job which could be used to provide passwords for these transfers (and even if you could, your job may require your input during nighttime while you are sleeping).
Now, let's start with the configuration file settings for remote batch jobs. For this it would be convenient to create a new configuration file based on the one you already used locally, e.g. by
cp .palm.config.batch .palm.config.<remote host identifier>
where <remote host identifier> can be any string to identify your remote host. Edit this file and set at minimum the following additional variables:
%remote_jobcatalog /home/username/job_queue %remote_ip 123.45.6.7 %remote_username your_username_on_the_remote_system
After the batch directives (lines that start with BD:) put another set of batch directives starting with BDT: that are required to generate a small additional batch job which does no more than transferring the job protocol back to your local system. Since the job protocol file generated by the main job (which is started by palmrun) is not available before the end of that job, the main job has to start another small job at its end, which only task is to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively. Here is an example for a CRAY-XC40 system:
# BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host BDT:#!/bin/bash BDT:#PBS -N job_protocol_transfer BDT:#PBS -l walltime=00:30:00 BDT:#PBS -l nodes=1:ppn=1 BDT:#PBS -o {{job_transfer_protocol_file}} BDT:#PBS -j oe BDT:#PBS -q dataq
Only few resources are requested (e.g. 30 minutes cpu time and one core) and the job is running in a special queue dataq. You may need to adjust these settings with respect to your batch system.
Additional settings for batch jobs on remote hosts can be found in the complete description of the configuration file?.
After setting up the configuration file and before calling palmrun, you need to call the palmbuild command to generate the PALM executable for the remote host:
palmbuild -h <remote host identifier>
Keep in mind that the configuration file .palm.config.<remote host identifier> requires correct settings valid for your remote computer (compiler name, compiler options, include and library paths, etc.). If you forgot to call palmbuild, palmrun will ask you to do this for you.
If palmbuild succeeded, you can enter the palmrun command, like
palmrun -d neutral -h <remote host identifier> ......
After confirming the palmrun settings by entering y, similar information as for local batch jobs will be output to the terminal. palmrun finally terminates with messsage --> palmrun finished. The batch job is now queued on the remote system. After the job has been finished, the job protocol will be transferred back to your local computer and put into the folder defined by local_jobcatalog. If this file does not appear, because e.g. the transfer failed, you may find the protocol file on the remote host in the folder defined by remote_jobcatalog. Like in case of batch jobs running on local computers, check the contents of this file carefully. Beside some additional information, it mainly contains the output of the palmrun command as you get it during interactive execution, and especially you get information about where to find the output files on your local computer.
The configuration file .palm.iofiles offers special controls for copying INPUT/OUTPUT files, since large PALM-setups (those using large number of grid points) can produce extremely large output files which would require long time for transferring them to your local system and which might have sizes that exceed the capacity of your local discs. See chapter INPUT/OUTPUT files? which explains how to control copying of INPUT/OUTPUT files.
Attention: Following text is for experienced PALM-users who switch from the old mrun / mbuild scripts to the new scripts palmrun / palmbuild. It is not up-to-date and will be removed at a later time.
Configuring and running PALM with palmbuild and palmrun
Changes compared to mrun/build
- The new scripts will run on any kind of Linux / Unix system without requiring any adjustments. All settings are controlled via two configuration files.
- mbuild is replaced by palmbuild, and mrun is replaced by palmrun. The old script subjob is not used any more (submitting jobs is now part of palmrun).
- Setting the environment variable PALM_BIN in shell-profile files (e.g. .bashrc) is not required any more.
- The old configuration file .mrun.config has been split into two files .palm.config.<configuration_identifier> and .palm.iofiles, where <configuration_identifier> (short <ci>) is an arbitrary string that you can define.
"Configuration" means a setting for a specific computer with a specific compiler, compiler options, libraries, etc.
If you like to run PALM with different configurations, e.g. one with debug options switched on, and one with high optimization, you need to create separate files for each configuration, e.g. .palm.config.optimized and .palm.config.debug. This replaces the old block structure in .mrun.config. The configuration file to be used is defined by palmrun- or palmbuild-option -h., e.g. palmrun ... -h optimized will use .palm.config.optimized. You find examples of .palm.config.<...> files in your PALM copy under .../trunk/SCRIPTS
You will need only one file .palm.iofiles which contains the file connection statements to be used for all configurations.
The file attributes (second column in the file connection statements) have been partly changed. The second attribute, which was either loc, locopt or job, has been completely removed. Optional input files now require inopt as first attribute. Those input files to be send to the remote host require tr as second attribute (instead of job). fl and flpe must be changed to ln and lnpe respectively.
For output files, a wildcard * can be given as file activation string in the third column. In such a case, existing local output files will always be copied to their permanent position. No warning will be given if they do not exist.
Wildcards (*) are allowed for local names of output files (e.g. BINOUT*) and file extensions of input files (e.g. _p3d*). Using wildcards, only one file connection statement is required, e.g. for nested runs which require different input files for each domain (_p3d, _p3d_N01, _p3d_N02, etc.) or which generate different output files (e.g. BINOUT, BINOUT_N01, etc.). The additional extensions that are identified from the existing files (e.g. _N01, _N02) will be automatically added to the local filename (in case of input files) or to the file extension (in case of output files).
The utility program interpret_config has been removed. The configuration files are now directly interpreted by the shellscripts.
- Only one call of palmbuild is required to compile for both the utilities and the PALM source code (there is no option -u anymore). The compiled routines (object files and executables) are put into folder MAKE_DEPOSITORY_<configuration_identifier>, where <configuration_identifier> equals the string given with palmbuild-option -h.
- palmrun does not compile any more at the beginning of a batch job. The palm-executable for the batch-job (or for the interactive session) is created as part of the palmrun-call that you have manually entered at your terminal, and it is created before the batch-job is submitted. The executable is put into the folder SOURCES_FOR_RUN_<run_identifier>, where <run_identifier> is the string provided with palmrun-option -d. This folder is now put into the folder set with variable fast_io_catalog (see below for fast_io_catalog). If you do not use a user-interface, palmrun will not compile at all and will take the executable from folder MAKE_DEPOSITORY_<configuration_identifier> that has been generated with your last call of palmbuild. If palmrun cannot find the folder MAKE_DEPOSITORY_<configuration_identifier>, it will internally call palmbuild in order to generate it. If palmrun finds a folder SOURCES_FOR_RUN_<run_identifier> that has been generated by a previous call of palmrun, it will ask you if executables from that folder shall be used. This way, you can avoid to re-compile your user-interface with each call of palmrun. Automatically generated restart runs will always use executables from SOURCES_FOR_RUN_<run_identifier>.
You may have to remove folders SOURCES_FOR_RUN_... manually from time to time, because they are not deleted automatically at the end of a job (or the last job of a restart job chain).
- The option for giving the file activation strings is now -a "d3# ..." instead of -r "d3# ...".
- In case of automatic restart runs, hashes ("#") in the file activation strings are now replaced by character "r" instead of character "f".
- The .palm.config.<ci> file does not contain blocks any more. Several variable names have been changed (e.g. compiler_options instead of fopts) and new variables have been introduced (e.g. execute_command in order to give the command for starting the executable). Colons (:) for separating e.g. compiler options must not be used any more. Here is an example (with some lines truncated, as displayed by ....)
#$Id$ #column 1 column 2 #name of variable value of variable (~ must not be used, except for base_data) #------------------------------------------------------------------------------ %base_data ~/palm/current_version/JOBS %base_directory $HOME/palm/current_version %source_path $HOME/palm/current_version/trunk/SOURCE %user_source_path $base_directory/JOBS/$fname/USER_CODE %fast_io_catalog /localdata/your_linux_username # %local_ip 111.11.111.111 %local_username your_linux_username # %compiler_name mpif90 %compiler_name_ser ifort %cpp_options -cpp -D__parallel -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__fftw -D__netcdf %make_options -j 4 %compiler_options -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/.... %linker_options -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/.... %hostfile auto %execute_command mpiexec -machinefile hostfile -n {{mpi_tasks}} ./palm
- Some further comments concerning specific variables:
- fast_io_catalog replaces the old variables tmp_user_catalog and tmp_data_catalog. It should be a folder on a file system with fast discs, as typically provided on large computer systems for temporary I/O, e.g. something like /work/.... The temporary working catalog created by palmrun will be in this folder, and your restart data should be put in this folder too. The default .palm.iofiles is using fast_io_catalog for the restart files.
- For cpp_options, you now have to give ALL switches required, especially -D__parallel to use the parallel version of PALM, which was implicitly set with mrun-option -K parallel before. The -K option has been removed.
- The compiler- and linker-options now require to give ALL include- and library-paths for the libraries that you intend to use (e.g. MPI, NetCDF, FFTW), if they are not automatically set by a module-environment (like e.g. on Cray-systems). Old variables like netcdf_inc or netcdf_lib have been removed from the configuration file.
- execute_coammand is required to define the command to execute PALM. It will depend on the MPI-library that you are using. The wildcard {{mpi_tasks}} will be replaced by the value provided with palmrun-option -X. A further wildcard that can be used is {{tasks_per_node}} , which will be replaced by the value provided with palmrun-option -T.
- The variable write_binary (formerly used to switch on the output of restart data) has been removed from the configuration file. Output of restart data is now switched on with the activation string "restart", i.e. palmrun ..... -a "... restart".
- For running PALM on a remote host in batch, additional settings are required in the configuration file. The following is an example for using the Cray-XC40 of HLRN as a remote host:
#column 1 column 2 #name of variable value of variable (~ must not be used) #---------------------------------------------------------------------------- %base_data ~/palm/current_version/JOBS %base_directory $HOME/palm/current_version %source_path $HOME/palm/current_version/trunk/SOURCE %user_source_path $base_directory/JOBS/$fname/USER_CODE %fast_io_catalog /gfs2/work/niksiraa %local_jobcatalog /home/raasch/job_queue %remote_jobcatalog /home/h/niksiraa/job_queue # %local_ip 130.75.105.103 %local_username raasch %remote_ip 130.75.4.1 %remote_username niksiraa %remote_loginnode hlogin1 %ssh_key id_rsa_hlrn %defaultqueue mpp2testq %submit_command /opt/moab/default/bin/msub -E # %compiler_name ftn %compiler_name_ser ftn %cpp_options -e Z -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__parallel -D__netcdf -D__netcdf4 -D__netcdf4_parallel -D__fftw %make_options -j 4 %compiler_options -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic %linker_options -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic -dynamic %execute_command aprun -n {{mpi_tasks}} -N {{tasks_per_node}} palm %memory 2300 %module_commands module load fftw cray-hdf5-parallel cray-netcdf-hdf5parallel %login_init_cmd module switch craype-ivybridge craype-haswell # # BATCH-directives to be used for batch jobs. If $-characters are required, hide them with \\\ BD:#!/bin/bash BD:#PBS -A {{project_account}} BD:#PBS -N {{job_id}} BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}} BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}} BD:#PBS -o {{job_protocol_file}} BD:#PBS -j oe BD:#PBS -q {{queue}} # # BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host BDT:#!/bin/bash BDT:#PBS -A {{project_account}} BDT:#PBS -N job_protocol_transfer BDT:#PBS -l walltime=00:30:00 BDT:#PBS -l nodes=1:ppn=1 BDT:#PBS -o {{job_transfer_protocol_file}} BDT:#PBS -j oe BDT:#PBS -q dataq # #---------------------------------------------------------------------------- # INPUT-commands, executed before running PALM - lines must start with "IC:" #---------------------------------------------------------------------------- IC:export ATP_ENABLED=1 IC:export MPICH_GNI_BTE_MULTI_CHANNEL=disabled IC:ulimit -s unlimited
- Some additional settings are required here:
- fast_io_catalog is the one to be used on the remote host.
- IP-addresses and user names have to be given for the local AND the remote host. Usually, the remote host IP-address is the one for the login-node.
- remote_loginnode: on many of the large computer systems, the compute nodes do not allow for ssh- or scp-commands in order to transfer data to the local host or to start restart jobs. If remote_loginnode is set, palmrun tries to start these commands via the login-node. Attention: In most cases, the systems do not accept an IP-address. You have to give the mnemonic name of the login-node.
- ssh_key: here you can give the filename of a special ssh-key for using ssh / scp without password. The key must be in folder ~/.ssh. This is a special setting for the HLRN-system and should not be required on other systems.
- default_queue: if you do not set the queue via palmrun-option -q, this queue will be taken as the default queue. Other than mrun, palmrun does not check for valid queue names any more.
- submit_command: command for submitting a job to a batch system
- module_commands: loading of necessary modules for running PALM
- login_init_cmd: commands to be carried directly after login to the remote computer
- Lines starting with BD:: Here you have to give the batch directives that are required by your batch-system. palmrun will replace wildcards in the following way:
- {{project_account}} : To be used if you like to run the job under a specific account number. Is replaced by value provided with palmrun-option -A.
- {{job_id}} : The job's name. It will be formed by the run identifier provided with palmrun-option -d and a 5-digit random number, e.g. -d example_cbl may give example_cbl.12345.
- {{cpu_hours}}, {{cpu_minutes}}, {{cpu_seconds}} : Will be replaced based on the total CPU time in seconds provided with palmrun-option -t, .e.g. -t 3666 will replace {{cpu_hours}}=1, {{cpu_minutes}}=1, {{cpu_seconds}=6 .
- {{nodes}} : The number of nodes requested by the job. It will be replaced by the result of totalcores / ( noMPIt * noOpenMPt ), where totalcores is the total number of cores as requested with palmrun-option -X, noMPIt is the number of MPI-tasks to be started on each node, as given my palmrun-option -T, and noOpenMPT is the number of OpenMP-threads to be started per MPI-task, as given by palmrun-option -O.
- {{tasks_per_node}} : The number of MPI-tasks to be started on each node, as given my palmrun-option -T.
- {{jobfile}} : Name of the job protocol file. The filename for jobs running on a remote host is created from palmrun-options -h and -d, e.g. for palmrun -d example_cbl -h crayh ... the job protocol file name will be crayh_example_cbl. For jobs running on a local host, the name part from option -h will be omitted.
- {{queue}} : The name of the queue to which the job shall be submitted. Will be replaced by the value provided with palmrun-option -q, or, if -q is omitted, by the value of variable defaultqueue (see further above).
- {{previous_job}} : The name of a previous job as given by palmrun-option -W. Can be used to set job dependencies.
- Lines starting with BDT:: Here you have to give special batch directives for a small job that is required to send the job protocol file from the remote host back to your local host (meaning that these lines are only required if you are running batch jobs on a remote host). Since the job protocol file generated by the main job (which is started by palmrun) is not available before the end of the job, the main job has to start another small job at its end, which has the only task to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively.