= The PALM run script= The main script to execute PALM is called {{{palmrun}}}. This chapter describes the actions carried out by {{{palmrun}}} and gives a complete list and description of its available [#options options]. PALM can run in different modes:\\ [#interactive Interactive mode]:: PALM executes (almost) immediately within your terminal session after entering the {{{palmrun}}} command. [#batch Batch mode]:: PALM job is submitted by {{{palmrun}}} to a queuing/batch system (e.g. PBS, SLURM, ...), where it is scheduled for execution. A batch system is a must-have on high-performance computers, and a nice-to-have for computers that are shared among a larger number of users. The handling of PALM differs between interactive and batch mode, and it slightly varies, depending if the PALM job is submitted to the [#batch_local Local host]:: The computer that you are currently sitting at or are logged in via your terminal (ssh). [#batch_remote Remote host]:: Any computer with a batch system, that you have ssh access to, but are not logged in at the moment. The remote host becomes your local host as soon as you log in to the remote host via ssh. == [=#interactive Interactive mode] == The following instructions assume, that the [wiki:doc/install/automatic automatic installer] has installed PALM without any problems. You should now be able to start the first PALM simulation yourself. Please enter {{{ palmrun -r example_cbl -c default -a "d3#" -X 4 }}} You can follow the progress of the simulation on the terminal where a lot of informative messages will be output. You can also stop the simulation at any time by typing {{{Ctrl+C}}}. Some general settings will be listed on the terminal and the user is prompted for confirmation: {{{ *** palmrun 1.0 Rev: 3151 $ will be executed. Please wait ... Reading the configuration file... Reading the I/O files... *** INFORMATIVE: additional source code directory "/home//palm/current_version/JOBS/example_cbl/USER_CODE" does not exist or is not a directory. No source code will be used from this directory! #------------------------------------------------------------------------# | palmrun 1.0 Rev: 3151 $ Tue Aug 28 09:49:44 CEST 2018 | | PALM code Rev: 3209 | | | | called on: | | config. identifier: imuk (execute on IP: 111.11.111.111) | | running in: interactive run mode | | number of cores: 4 | | tasks per node: 4 (number of nodes: 1) | | | | cpp directives: -cpp -D__parallel ... | | compiler options: -fpe0 -O3 -xHost -fp-model source ... | | linker options: -fpe0 -O3 -xHost -fp-model source ... | | | | run identifier: example_cbl | | activation string list: d3# | #------------------------------------------------------------------------# >>> everything o.k. (y/n) ? }}} Listed settings are determined by the {{{palmrun}}} options and settings in the [wiki:doc/app/palm_config configuration file] (here {{{.palm.config.default}}}).\\ Entering {{{n}}}:: aborts {{{palmrun}}}\\ Entering {{{y}}}:: starts execution of PALM, and som more informative messages will appear on the terminal.\\ {{{ *** PALMRUN will now continue to execute on this machine *** creating executable and other sources for the local host *** nothing to compile for this run *** executable and other sources created *** changed to temporary directory: /localdata/......./example_cbl.23751 *** providing INPUT-files: ---------------------------------------------------------------------------- >>> INPUT: /home/....../palm/current_version/JOBS/example_cbl/INPUT/example_cbl_p3d to PARIN *** INFORMATIVE: some optional INPUT-files are not present ---------------------------------------------------------------------------- *** all INPUT-files provided *** execution of INPUT-commands: ---------------------------------------------------------------------------- >>> ulimit -s unlimited ---------------------------------------------------------------------------- *** execution starts in directory "/localdata/....../example_cbl.23751" ---------------------------------------------------------------------------- *** running on: hostname hostname hostname hostname *** execute command: "mpiexec -machinefile hostfile -n 4 palm" ... reading environment parameters from ENVPAR --- finished ... reading NAMELIST parameters from PARIN --- finished ... creating virtual PE grids + MPI derived data types --- finished ... checking parameters --- finished ... allocating arrays --- finished ... initializing with constant profiles --- finished ... initializing statistics, boundary conditions, etc. --- finished ... creating initial disturbances --- finished ... calling pressure solver --- finished ... initializing surface layer --- finished --- leaving init_3d_model --- starting timestep-sequence [XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX] 0.0 left --- finished time-stepping ... calculating cpu statistics --- finished ---------------------------------------------------------------------------- *** execution finished }}} In case that {{{palmrun}}} has proceeded to this point ({{{finished time stepping}}} and {{{execution finished}}}) without giving warning- or error-messages, the PALM simulation has finished successfully. The displayed progress bar ({{{xxxxx}}}) allows you to estimate how long the run still needs to finish. Subsequent messages give information about post processing and copying of output data: {{{ *** post-processing: now executing "mpiexec -machinefile hostfile -n 1 combine_plot_fields.x" ... *** combine_plot_fields *** uncoupled run NetCDF output enabled no XY-section data available NetCDF output enabled no XZ-section data available no YZ-section data available no 3D-data file available *** execution of OUTPUT-commands: ---------------------------------------------------------------------------- >>> [[ -f LIST_PROFIL_1D ]] && cat LIST_PROFIL_1D >> LIST_PROFILE >>> [[ -f LIST_PROFIL ]] && cat LIST_PROFIL >> LIST_PROFILE >>> [[ -f PARTICLE_INFOS/_0000 ]] && cat PARTICLE_INFOS/* >> PARTICLE_INFO ---------------------------------------------------------------------------- *** saving OUTPUT-files: ---------------------------------------------------------------------------- >>> OUTPUT: RUN_CONTROL to /home//palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_rc >>> OUTPUT: HEADER to /home//palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_header >>> OUTPUT: CPU_MEASURES to /home//palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_cpu >>> OUTPUT: DATA_1D_PR_NETCDF to /home//palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_pr.nc >>> OUTPUT: DATA_1D_TS_NETCDF to /home//palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_ts.nc >>> OUTPUT: DATA_2D_XY_NETCDF to /home//palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xy.nc >>> OUTPUT: DATA_2D_XZ_NETCDF to /home//palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz.nc >>> OUTPUT: DATA_2D_XZ_AV_NETCDF to /home//palm/current_version/JOBS/example_cbl/OUTPUT/example_cbl_xz_av.nc ---------------------------------------------------------------------------- *** all OUTPUT-files saved --> palmrun finished }}} You should find the output files at their respective positions as listed in the terminal output. Most of PALM's output files are written in NetCDF format and are copied to subdirectory {{{OUTPUT}}}. Some general information files are written in ASCII format and are copied to folder {{{MONITORING}}}. All available output files of PALM are listed [wiki:doc/app/palm_iofiles here]. PALM offers several [wiki:doc/app/d3par#output namelist parameters] to steer the PALM output. You are now at the point where you can [wiki:doc/app/palmrun_quickstart#create define and run your own simulation set-up] for the first time. == [=#batch Batch mode] == Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires that you manually modify and extend your [wiki:doc/app/palm_config configuration file], and that a batch system (e.g. PBS, Slurm, ...) is installed on the respective computer. === [=#batch_local Running PALM in batch on a local computer] === The local computer is the one where the commands that you enter in a terminal sessions are executed. This might be your local PC/workstation, or a login-node of a cluster-system / computer center where you are logged in via ssh. Regardless of the computer, it is assumed that PALM has been successfully installed on that machine, either using the automatic installer or via manual installation. For running PALM in batch mode you need to include __additional__ options in the {{{palmrun}}} command to specify the system resources requested by the job, and to modify your configuration file. A minimum set of __additional__ {{{palmrun}}} options is {{{ palmrun ....-b -c -m -t -X -T -q }}} **Note:** The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer! Before entering the above command, you need to add information to your configuration file. **Best practice** would be to create a new file, e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file. On a system that allows both batch and interactive mode in the same software environment, you may use one and the same configuration file to start palmrun in either of the modes. You can find more details in the complete description of the [wiki:doc/app/palm_config#Batchjobdirectives configuration file].\\ Based on the {{{palmrun}}} arguments, environment variables (for a description of available variables see here: [wiki:doc/app/palm_config]) will be set by {{{palmrun}}} as described below. The following list assumes a {{{palmrun}}} call {{{ palmrun .... -t 5400 -X 48 -T 12 -q medium }}} * {{{ {{run_id}} }}} = example_cbl.##### \\ where ##### is a five digit random number which is newly created for each job. The {{{run_id}}} is used for different purposes, e.g. it defines the name under which you can find the job in the queuing system. * {{{ {{cpu_hours}} }}} = 1, {{{ {{cpu_minutes}} }}} = 30 and {{{ {{cpu_seconds}} }}} = 0 \\ calculated from option {{{-t}}} * {{{ {{mpi_tasks}} }}} = 48 \\ as given by option {{{-X}}} * {{{ {{tasks_per_node}} }}} = 12 \\ as given by option {{{-T}}} * {{{ {{nodes}} }}} = 4 \\ calculated from {{{-X}}} / {{{-T}}}. If {{{-X}}} is not a multiple of {{{-T}}}, {{{nodes}}} is incremented by one, e.g. {{{-X 49 -T 12}}} gives {{{nodes = 5}}}. * {{{ {{queue}} }}} = medium \\ as given by option {{{-q}}} When you enter the above command for the first time, {{{palmrun}}} will call the script {{{palmbuild}}} to re-compile the PALM code. The compiled code will be put into folder {{{$HOME/palm/current_version/MAKE_DEPOSITORY_batch}}}. Re-compilation is required since {{{palmrun}}} expects a separate make depository for each configuration file (because the configuration files may contain different compiler settings). After confirming the {{{palmrun}}} settings by entering {{{y}}}, following information will be output to the terminal: {{{ >>> everything o.k. (y/n) ? y *** batch-job will be created and submitted *** creating executable and other sources *** nothing to compile for this run *** executable and other sources created *** input files have been copied *** submit the job (output of submit command, e.g. the job-id, may follow) <<>> --> palmrun finished }}} Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has been created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{_}}} as given with {{{palmrun}}} options {{{-c}}} and {{{-r}}} (here it will be {{{batch_example_cbl}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied. Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter [wiki:doc/app/runs job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs. === [=#batch_remote Running PALM in batch on a remote computer] === You can use the {{{palmrun}}} command on your local computer (e.g. your local PC or workstation) and let it submit a batch job to a remote computer at any place in the world. {{{palmrun}}} copies required input files from your local computer to the remote machine and transfers output files back to your local machine, depending on the settings in the {{{.palm.iofiles}}} file. The job protocol file will also be automatically copied back to your local computer. If you like to use this {{{palmrun}}} feature, you need additional/special settings in the configuration file. Furthermore, you need to pre-compile the PALM-code for the remote machine using the {{{palmbuild}}} command. The automatic PALM installer can not be used to install PALM on that machine. You need to do most of the settings manually. Furthermore, [wiki:doc/install/passwordless passwordless ssh/scp access] is required from the local computer to the remote computer, as well as from the remote to the local computer. In remote mode, {{{palmrun}}} and {{{palmbuild}}} are heavily using ssh and scp commands, and if you have not established passwordless access, you would need to enter your password several times before the batch job is finally submitted. Moreover, the job protocol file and any output files cannot be transferred back to your local computer because there is no connection to the job which could be used to provide passwords for these transfers (and even if you could, your job may require your input during nighttime while you are sleeping). Now, let's start with the configuration file settings for remote batch jobs. For this it would be convenient to create a new configuration file based on the one you already used locally, e.g. by {{{ cp .palm.config.batch .palm.config.batch_remote }}} where {{{batch_remote}}} can be any string to identify your remote host. Edit this file as described [wiki:doc/app/palm_config#Additionaldirectivesforbatchjobsonremotehosts here]. After setting up the configuration file and before calling {{{palmrun}}}, you need to call the {{{palmbuild}}} command to generate the PALM executable for the remote host: {{{ palmbuild -c batch_remote }}} Keep in mind that the configuration file {{{.palm.config.batch_remote}}} requires correct settings valid for your remote computer (compiler name, compiler options, include and library paths, etc.). If you forgot to call {{{palmbuild}}}, {{{palmrun}}} will ask you to do this for you. If {{{palmbuild}}} succeeded, you can enter the {{{palmrun}}} command, like {{{ palmrun -r example_cbl -c batch_remote ...... }}} After confirming the {{{palmrun}}} settings by entering {{{y}}}, similar information as for local batch jobs will be output to the terminal. {{{palmrun}}} finally terminates with messsage {{{--> palmrun finished}}}. The batch job is now queued on the remote system. After the job has been finished, the job protocol will be transferred back to your local computer and put into the folder defined by {{{local_jobcatalog}}}. If this file does not appear, because e.g. the transfer failed, you may find the protocol file on the remote host in the folder defined by {{{remote_jobcatalog}}}. Like in case of batch jobs running on local computers, check the contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, and especially you get information about where to find the output files on your local computer. **Note:** Since large PALM-setups (those using large number of grid points) can produce extremely large output files which would require long time for transferring them to your local system and which might have sizes that exceed the capacity of your local discs. See chapter [wiki:doc/app/palm_iofiles I/O file connection configuration] which explains how to control copying of INPUT/OUTPUT files. == [=#options palmrun options] == There are two groups of options, one are the '''user options''' that you can specify yourself when manually calling {{{palmrun}}} from the terminal, and another group of '''internal options''' that are used for automatically created internal calls of {{{palmrun}}}. Internal calls of {{{palmrun}}} are those that are part of the batch job, and those used for automatically starting restart jobs. Normally, you should never use the internal options. User options from a manual call are automatically added to the internal calls of {{{palmrun}}}. The following gives complete lists of {{{palmrun}}} user and internal options. A {{{---}}} in the second column means that the respective option has no argument. === palmrun user options === ||='''option''' =||='''default value''' =||='''meaning''' =|| |----------- ||-a ||" " ||For steering the handling of input and output files as defined in the [wiki:doc/app/palm_iofiles file configuration file] {{{..../trunk/SCRIPTS/.palm.iofiles}}}. Argument {{{"d3#"}}} means that the parameter/NAMELIST file for steering PALM shall be provided as input file. This is the minimum setting for option {{{-a}}}, because PALM cannot run without this parameter file. || ||-A ||" " ||project account number || ||-b ||--- ||create a batch job || ||-B ||--- ||Do not delete the temporary working directory || ||-c ||default ||Specifies the so-called configuration identifier. It tells {{{palmrun}}} which [wiki:doc/app/palm_config configuration file] should be used. {{{-c default}}} means to use the configuration file {{{.palm.config.default}}}. || ||-C ||--- ||Tells that it is a {{{palmrun}}} call for a restart job that has been automatically created. This is an internal option but it can be used for manually generated restart runs, if the user likes to re-use the contents of the {{{SOURCES_FOR_RUN...}}} folder. || ||-F ||--- ||Create a batch job file only, and do not submit it. || ||-k ||false ||If set true, input files that have the {{{ln}}} attribute and that have been generated by a previous run within a job chain will be automatically deleted at the end of the run. || ||-m ||" " ||memory in MByte to be requested in batch jobs per MPI task || ||-M ||" " ||Makefile to compile the PALM code and utility programs. By default, the name of the makefile is {{{Makefile}}}, and it is expected to be in the folder that is given by variable {{{source_path}}} in the configuration file. || ||-O ||1 ||OpenMP threads to be started per MPI task. Environment variable {{{OMP_NUM_THREADS}}} will be set to this value || ||-q ||none ||name of the job queue to which batch jobs will be submitted. See your batch system documentation about available queues and keep in mind that usually each queue has special limits for requested resources. || ||-r ||test ||The name of the run given by {{{-r}}} tells {{{palmrun}}} to use the NAMELIST file {{{_p3d}}} from {{{JOBS//INPUT}}}. It also determines folders and names of output files generated by PALM using informations from the default file configuration file {{{..../trunk/SCRIPTS/.palm.iofiles}}}. Chapter [wiki:doc/app/palm_iofiles PALM iofiles] explains the format of this file and how you can modify or extend it. || ||-s ||" " ||List of subroutines (Fortran file names) from the SVN repository (under {{{.../trunk/SOURCES}}}) that shall be compiled for this run. Compiled files will be exclusively used for the run and not be put in the MAKE_DEPOSITORY. In case of {{{-s LM}}}, all files in the repository that have been modified by the used will be compiled. || ||-t ||" " ||maximum CPU time (wall clock time) in seconds requested by the batch job. This option is ignored in interactive runs. || ||-T ||" " ||number of MPI tasks to be started on one node of the computer system. Typically, {{{}}} is chosen as the total number of CPU cores available on one node, e.g. if a node has two CPUs with 12 cores each, then {{{ = 24}}}. || ||-v ||--- ||Suppresses parts of {{{palmrun}}}'s terminal output and prevents {{{palmrun}}} queries || ||-V ||--- ||Use existing {{{SOURCES_FOR_RUN_...}}} folder. Prevents {{{palmrun}}} from creating a new {{{SOURCES_FOR_RUN_...}}} folder. Use this option if you do not want the user interface files to be compiled again. || ||-w ||as -X||Number of parallel I/O streams to be opened by PALM. In the default case, all MPI processes write at the same time. This may cause file system problems in case of a very large number of cores. || ||-W ||" " ||Name (id) of a previous job. Can be used as variable {{{ {{previous_job}} }}} as part of job directives in the configuration file, in order to prevent the job to start before the specified previous job has been finished. The job name must be the one that have been given by the batch system. || ||-x ||--- ||Causes {{{palmrun}}} to output excessive debug information for both interactive sessions as well as batch jobs. || ||-X ||1 ||Total number of cores (not CPUs!) to be used for the run. The argument should not be larger than the maximum number of cores available on your computer (except in case of hyperthreading), because that would usually slow down the performance significantly.|| ||-y ||--- ||Use file appendix {{{_O}}} for local PALM-I/O files in case of uncoupled ocean runs, e.g. if the run is a precursor run and files shall later be used for coupled atmosphere-ocean runs. || ||-Y ||" " ||In case of a coupled atmosphere-ocean run, the parameter tells PALM how many cores shall be assigned to the atmosphere- and ocean-model, respectively. For example, in case of {{{-X 64 -Y "16 48""}}} 16 cores are assigned to the atmosphere model, and 48 cores to the ocean model. || ||-Z ||--- ||Do not call {{{combine_plot_fields}}} after PALM has finished. In that case, data output of 2d-cross section or 3d-volumes that has been done be each core into a separate file will not be collected into one file. In order to later process these files, option {{{-B}}} should be set too. {{{-Z}}} might be required for very large jobs in order to reduce computational demands, because {{{combine_plot_fields}}} is running on one core only, so that all other cores will run idle. || ||? ||--- ||Print a short list of available user options on the terminal. || === palmrun internal options === ||='''option''' =||='''default value''' =||='''meaning''' =|| |----------- ||-C ||--- ||Tells that it is a {{{palmrun}}} call for a restart job that has been automatically created || ||-G ||" " ||Global revision number of the PALM code in {{{trunk/SOURCES}}} || ||-i || ||Five digit random number that gives a run-id and that is used as part of the batch job name as well as the name of the temporary working directory and other files. A new random number is created for each call of {{{palmrun}}} (either a manual call by the user or an automatic call for generating a restart job), and is passed to the batch job internal call of {{{palmrun}}} via this option. || ||-j ||--- ||Tells that {{{palmrun}}} is running within a batch job || ||-R ||" " ||Return address. Tells the remote batch job to which IP-address the PALM output and the job protocol file has to be send, and from which machine automatic restarts have to be generated. || ||-u ||" " ||Username on the remote host as given in the configuration file by variable {{{remote_username}}} || ||-U ||" " ||Username on the local host as given in the configuration file by variable {{{local_username}}} || \\