Changes between Version 39 and Version 40 of doc/app/palmrun


Ignore:
Timestamp:
Nov 20, 2018 10:09:55 AM (6 years ago)
Author:
scharf
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v39 v40  
    410410
    411411\\
    412 
    413 
    414 
    415 
    416 
    417 
    418 
    419 
    420 
    421 
    422 
    423 
    424 \\\\\\\\\\\\
    425 == {{{palmrun}}} / {{{palmbuild}}}  to  {{{mrun}}} / {{{mbuild}}} migration
    426 
    427 '''Attention: Following text is for experienced PALM-users who switch from the old {{{mrun / mbuild}}} scripts to the new scripts {{{palmrun / palmbuild}}}. It is not up-to-date and will be removed at a later time.
    428 '''
    429 
    430 = Configuring and running PALM with {{{palmbuild}}} and {{{palmrun}}}
    431 
    432 
    433 === Changes compared to mrun/build  ===
    434 
    435 * The new scripts will run on any kind of Linux / Unix system without requiring any adjustments. All settings are controlled via two configuration files.
    436 
    437 * {{{mbuild}}} is replaced by {{{palmbuild}}}, and {{{mrun}}} is replaced by {{{palmrun}}}. The old script {{{subjob}}} is not used any more (submitting jobs is now part of {{{palmrun}}}).
    438 
    439 * Setting the environment variable {{{PALM_BIN}}} in shell-profile files (e.g. {{{.bashrc}}}) is not required any more.
    440 
    441 * The old configuration file {{{.mrun.config}}} has been split into two files {{{.palm.config.<configuration_identifier>}}} and {{{.palm.iofiles}}}, where {{{<configuration_identifier>}}} (short {{{<ci>}}}) is an arbitrary string that you can define. \\\\ "Configuration" means a setting for a specific computer with a specific compiler, compiler options, libraries, etc. \\ If you like to run '''PALM''' with different configurations, e.g. one with debug options switched on, and one with high optimization, you need to create separate files for each configuration, e.g. {{{.palm.config.optimized}}} and {{{.palm.config.debug}}}. This replaces the old block structure in {{{.mrun.config}}}. The configuration file to be used is defined by {{{palmrun}}}- or {{{palmbuild}}}-option {{{-c}}}., e.g. {{{palmrun ... -c optimized}}} will use {{{.palm.config.optimized}}}. You find examples of {{{.palm.config.<...>}}} files in your '''PALM''' copy under {{{.../trunk/SCRIPTS}}} \\\\ You will need only one file {{{.palm.iofiles}}} which contains the file connection statements to be used for all configurations. \\ The file attributes (second column in the file connection statements) have been partly changed. The second attribute, which was either {{{loc}}}, {{{locopt}}} or {{{job}}}, has been completely removed. Optional input files now require {{{inopt}}} as first attribute. Those input files to be send to the remote host require {{{tr}}} as second attribute (instead of {{{job}}}). {{{fl}}} and {{{flpe}}} must be changed to {{{ln}}} and {{{lnpe}}} respectively. \\\\ For output files, a wildcard {{{*}}} can be given as file activation string in the third column. In such a case, existing local output files will always be copied to their permanent position. No warning will be given if they do not exist. \\\\ Wildcards (*) are allowed for local names of output files (e.g. {{{BINOUT*}}}) and file extensions of input files (e.g. {{{_p3d*}}}). Using wildcards, only one file connection statement is required, e.g. for nested runs which require different input files for each domain ({{{_p3d, _p3d_N01, _p3d_N02}}}, etc.) or which generate different output files (e.g. {{{BINOUT, BINOUT_N01}}}, etc.). The additional extensions that are identified from the existing files (e.g. {{{_N01, _N02}}}) will be automatically added to the local filename (in case of input files) or to the file extension (in case of output files). \\\\ The utility program {{{interpret_config}}} has been removed. The configuration files are now directly interpreted by the shellscripts.
    442 
    443 * Only one call of {{{palmbuild}}} is required to compile for both the utilities and the PALM source code (there is no option {{{-u}}} anymore). The compiled routines (object files and executables) are put into folder {{{MAKE_DEPOSITORY_<configuration_identifier>}}}, where {{{<configuration_identifier>}}} equals the string given with {{{palmbuild}}}-option {{{-c}}}.
    444 
    445 * {{{palmrun}}} does not compile any more at the beginning of a batch job. The palm-executable for the batch-job (or for the interactive session) is created as part of the {{{palmrun}}}-call that you have manually entered at your terminal, and it is created before the batch-job is submitted. The executable is put into the folder {{{SOURCES_FOR_RUN_<run_identifier>}}}, where {{{<run_identifier>}}} is the string provided with {{{palmrun}}}-option {{{-r}}}. This folder is now put into the folder set with variable {{{fast_io_catalog}}} (see below for {{{fast_io_catalog}}}). If you do not use a user-interface, {{{palmrun}}} will not compile at all and will take the executable from folder {{{MAKE_DEPOSITORY_<configuration_identifier>}}} that has been generated with your last call of {{{palmbuild}}}. If {{{palmrun}}} cannot find the folder {{{MAKE_DEPOSITORY_<configuration_identifier>}}}, it will internally call {{{palmbuild}}} in order to generate it. If {{{palmrun}}} finds a folder {{{SOURCES_FOR_RUN_<run_identifier>}}} that has been generated by a previous call of {{{palmrun}}}, it will ask you if executables from that folder shall be used. This way, you can avoid to re-compile your user-interface with each call of {{{palmrun}}}. Automatically generated restart runs will always use executables from  {{{SOURCES_FOR_RUN_<run_identifier>}}}. \\\\ You may have to remove folders {{{SOURCES_FOR_RUN_...}}} manually from time to time, because they are not deleted automatically at the end of a job (or the last job of a restart job chain).
    446 
    447 * The option for giving the file activation strings is now {{{-a "d3# ..."}}} instead of {{{-r "d3# ..."}}}.
    448 
    449 * In case of automatic restart runs, hashes ("#") in the file activation strings are now replaced by character "r" instead of character "f".
    450 
    451 * The {{{.palm.config.<ci>}}} file does not contain blocks any more. Several variable names have been changed (e.g. {{{compiler_options}}} instead of {{{fopts}}}) and new variables have been introduced (e.g. {{{execute_command}}} in order to give the command for starting the executable). Colons ({{{:}}}) for separating e.g. compiler options must not be used any more. Here is an example (with some lines truncated, as displayed by ....)
    452 {{{
    453 #$Id$
    454 #column 1          column 2
    455 #name of variable  value of variable (~ must not be used, except for base_data)
    456 #------------------------------------------------------------------------------
    457 %base_data         ~/palm/current_version/JOBS
    458 %base_directory    $HOME/palm/current_version
    459 %source_path       $HOME/palm/current_version/trunk/SOURCE
    460 %user_source_path  $base_directory/JOBS/$fname/USER_CODE
    461 %fast_io_catalog     /localdata/your_linux_username
    462 #
    463 %local_ip            111.11.111.111
    464 %local_username      your_linux_username
    465 #
    466 %compiler_name       mpif90
    467 %compiler_name_ser   ifort
    468 %cpp_options         -cpp -D__parallel -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION
    469                      -D__fftw -D__netcdf
    470 %make_options        -j 4
    471 %compiler_options    -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs
    472                      -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/....
    473 %linker_options      -openmp -fpe0 -O3 -xHost -fp-model source -ftz -fno-alias -ip -nbs
    474                      -I /muksoft/packages/fftw/3.3.4/include -L/muksoft/....
    475 %hostfile            auto
    476 %execute_command     mpiexec  -machinefile hostfile  -n {{mpi_tasks}}  ./palm
    477 }}}
    478 * Some further comments concerning specific variables: \\\\
    479  - {{{fast_io_catalog}}} replaces the old variables {{{tmp_user_catalog}}} and {{{tmp_data_catalog}}}. It should be a folder on a file system with fast discs, as typically provided on large computer systems for temporary I/O, e.g. something like {{{/work/...}}}. The temporary working catalog created by {{{palmrun}}} will be in this folder, and your restart data should be put in this folder too. The default {{{.palm.iofiles}}} is using {{{fast_io_catalog}}} for the restart files.
    480  - For {{{cpp_options}}}, you now have to give ALL switches required, especially {{{-D__parallel}}} to use the parallel version of PALM, which was implicitly set with {{{mrun}}}-option {{{-K parallel}}} before. The {{{-K}}} option has been removed.
    481  - The compiler- and linker-options now require to give '''ALL''' include- and library-paths for the libraries that you intend to use (e.g. MPI, NetCDF, FFTW), if they are not automatically set by a module-environment (like e.g. on Cray-systems). Old variables like {{{netcdf_inc}}} or {{{netcdf_lib}}} have been removed from the configuration file.
    482  - {{{execute_coammand}}} is required to define the command to execute PALM. It will depend on the MPI-library that you are using. The wildcard{{{ {{mpi_tasks}} }}}will be replaced by the value provided with {{{palmrun}}}-option {{{-X}}}. A further wildcard that can be used is{{{ {{tasks_per_node}} }}}, which will be replaced by the value provided with {{{palmrun}}}-option {{{-T}}}.
    483  - The variable {{{write_binary}}} (formerly used to switch on the output of restart data) has been removed from the configuration file. Output of restart data is now switched on with the activation string {{{"restart"}}}, i.e. {{{palmrun ..... -a "...  restart"}}}.
    484 
    485 * For running PALM on a remote host in batch, additional settings are required in the configuration file. The following is an example for using the Cray-XC40 of HLRN as a remote host:
    486 {{{
    487 #column 1          column 2
    488 #name of variable  value of variable (~ must not be used)
    489 #----------------------------------------------------------------------------
    490 %base_data           ~/palm/current_version/JOBS
    491 %base_directory      $HOME/palm/current_version
    492 %source_path         $HOME/palm/current_version/trunk/SOURCE
    493 %user_source_path    $base_directory/JOBS/$fname/USER_CODE
    494 %fast_io_catalog     /gfs2/work/niksiraa
    495 %local_jobcatalog    /home/raasch/job_queue
    496 %remote_jobcatalog   /home/h/niksiraa/job_queue
    497 #
    498 %local_ip            130.75.105.103
    499 %local_username      raasch
    500 %remote_ip           130.75.4.1
    501 %remote_username     niksiraa
    502 %remote_loginnode    hlogin1
    503 %ssh_key             id_rsa_hlrn
    504 %defaultqueue        mpp2testq
    505 %submit_command      /opt/moab/default/bin/msub -E
    506 #
    507 %compiler_name       ftn
    508 %compiler_name_ser   ftn
    509 %cpp_options         -e Z -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__parallel -D__netcdf
    510                      -D__netcdf4 -D__netcdf4_parallel -D__fftw
    511 %make_options        -j 4
    512 %compiler_options    -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic
    513 %linker_options      -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic -dynamic
    514 %execute_command     aprun  -n {{mpi_tasks}}  -N {{tasks_per_node}}  palm
    515 %memory              2300
    516 %module_commands     module load fftw cray-hdf5-parallel cray-netcdf-hdf5parallel
    517 %login_init_cmd      module switch craype-ivybridge craype-haswell
    518 #
    519 # BATCH-directives to be used for batch jobs. If $-characters are required, hide them with \\\
    520 BD:#!/bin/bash
    521 BD:#PBS -A {{project_account}}
    522 BD:#PBS -N {{job_id}}
    523 BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}}
    524 BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}}
    525 BD:#PBS -o {{job_protocol_file}}
    526 BD:#PBS -j oe
    527 BD:#PBS -q {{queue}}
    528 #
    529 # BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host
    530 BDT:#!/bin/bash
    531 BDT:#PBS -A {{project_account}}
    532 BDT:#PBS -N job_protocol_transfer
    533 BDT:#PBS -l walltime=00:30:00
    534 BDT:#PBS -l nodes=1:ppn=1
    535 BDT:#PBS -o {{job_transfer_protocol_file}}
    536 BDT:#PBS -j oe
    537 BDT:#PBS -q dataq
    538 #
    539 #----------------------------------------------------------------------------
    540 # INPUT-commands, executed before running PALM - lines must start with "IC:"
    541 #----------------------------------------------------------------------------
    542 IC:export ATP_ENABLED=1
    543 IC:export MPICH_GNI_BTE_MULTI_CHANNEL=disabled
    544 IC:ulimit  -s unlimited
    545 }}}
    546 * Some additional settings are required here: \\\\
    547  - {{{fast_io_catalog}}} is the one to be used on the remote host.
    548  - IP-addresses and user names have to be given for the local AND the remote host. Usually, the remote host IP-address is the one for the login-node.
    549  - {{{remote_loginnode}}}: on many of the large computer systems, the compute nodes do not allow for {{{ssh}}}- or {{{scp}}}-commands in order to transfer data to the local host or to start restart jobs. If {{{remote_loginnode}}} is set, {{{palmrun}}} tries to start these commands via the login-node. '''Attention:''' In most cases, the systems do not accept an IP-address. You have to give the mnemonic name of the login-node.
    550  - {{{ssh_key}}}: here you can give the filename of a special ssh-key for using ssh / scp without password. The key must be in folder {{{~/.ssh}}}. This is a special setting for the HLRN-system and should not be required on other systems.
    551  - {{{default_queue}}}: if you do not set the queue via {{{palmrun}}}-option {{{-q}}}, this queue will be taken as the default queue. Other than {{{mrun}}}, {{{palmrun}}} does not check for valid queue names any more.
    552  - {{{submit_command}}}: command for submitting a job to a batch system
    553  - {{{module_commands}}}: loading of necessary modules for running PALM
    554  - {{{login_init_cmd}}}: commands to be carried directly after login to the remote computer
    555  - Lines starting with {{{BD:}}}: Here you have to give the batch directives that are required by your batch-system. {{{palmrun}}} will replace wildcards in the following way:
    556    * {{{ {{project_account}} }}}: To be used if you like to run the job under a specific account number. Is replaced by value provided with {{{palmrun}}}-option {{{-A}}}.
    557    * {{{ {{job_id}} }}}: The job's name. It will be formed by the run identifier provided with {{{palmrun}}}-option {{{-r}}} and a 5-digit random number, e.g. {{{-r example_cbl}}} may give {{{example_cbl.12345}}}.
    558    * {{{ {{cpu_hours}}, {{cpu_minutes}}, {{cpu_seconds}} }}}: Will be replaced based on the total CPU time in seconds provided with {{{palmrun}}}-option {{{-t}}}, .e.g. {{{-t 3666}}} will replace {{{ {{cpu_hours}}=1, {{cpu_minutes}}=1, {{cpu_seconds}=6 }}}.
    559    * {{{ {{nodes}} }}}: The number of nodes requested by the job. It will be replaced by the result of {{{ totalcores / ( noMPIt * noOpenMPt )}}}, where {{{totalcores}}} is the total number of cores as requested with {{{palmrun}}}-option {{{-X}}}, {{{noMPIt}}} is the number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}, and {{{noOpenMPT}}} is the number of OpenMP-threads to be started per MPI-task, as given by {{{palmrun}}}-option {{{-O}}}.
    560    * {{{ {{tasks_per_node}} }}}: The number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}.
    561    * {{{ {{jobfile}} }}}: Name of the job protocol file. The filename for jobs running on a remote host is created from {{{palmrun}}}-options {{{-c}}} and {{{-r}}}, e.g. for {{{palmrun -r example_cbl -c crayh ...}}} the job protocol file name will be {{{crayh_example_cbl}}}. For jobs running on a local host, the name part from option {{{-c}}} will be omitted.
    562    * {{{ {{queue}} }}}: The name of the queue to which the job shall be submitted. Will be replaced by the value provided with {{{palmrun}}}-option {{{-q}}}, or, if {{{-q}}} is omitted, by the value of variable {{{defaultqueue}}} (see further above).
    563    * {{{ {{previous_job}} }}}: The name of a previous job as given by {{{palmrun}}}-option {{{-W}}}. Can be used to set job dependencies.
    564  - Lines starting with {{{BDT:}}}: Here you have to give special batch directives for a small job that is required to send the job protocol file from the remote host back to your local host (meaning that these lines are only required if you are running batch jobs on a remote host). Since the job protocol file generated by the main job (which is started by {{{palmrun}}}) is not available before the end of the job, the main job has to start another small job at its end, which has the only task to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively.