Changes between Version 16 and Version 17 of doc/app/palmrun


Ignore:
Timestamp:
Dec 20, 2017 5:37:30 PM (7 years ago)
Author:
kanani
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v16 v17  
    8282%compiler_options    -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic
    8383%linker_options      -em -O3 -hnoomp -hnoacc -hfp3 -hdynamic -dynamic
    84 %execute_command     aprun  -n {{MPI_TASKS}}  -N {{TASKS_PER_NODE}}  palm
     84%execute_command     aprun  -n {{mpi_tasks}}  -N {{tasks_per_node}}  palm
    8585%memory              2300
    8686%module_commands     module load fftw cray-hdf5-parallel cray-netcdf-hdf5parallel
     
    8989# BATCH-directives to be used for batch jobs. If $-characters are required, hide them with \\\
    9090BD:#!/bin/bash
    91 BD:#PBS -A {{PROJECT_ACCOUNT}}
    92 BD:#PBS -N {{JOB_ID}}
    93 BD:#PBS -l walltime={{CPU_HOURS}}:{{CPU_MINUTES}}:{{CPU_SECONDS}}
    94 BD:#PBS -l nodes={{NODES}}:ppn={{TASKS_PER_NODE}}
    95 BD:#PBS -o {{JOBFILE}}
     91BD:#PBS -A {{project_account}}
     92BD:#PBS -N {{job_id}}
     93BD:#PBS -l walltime={{cpu_hours}}:{{cpu_minutes}}:{{cpu_seconds}}
     94BD:#PBS -l nodes={{nodes}}:ppn={{tasks_per_node}}
     95BD:#PBS -o {{job_protocol_file}}
    9696BD:#PBS -j oe
    97 BD:#PBS -q {{QUEUE}}
     97BD:#PBS -q {{queue}}
    9898#
    9999# BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host
    100100BDT:#!/bin/bash
    101 BDT:#PBS -A {{PROJECT_ACCOUNT}}
     101BDT:#PBS -A {{project_account}}
    102102BDT:#PBS -N job_protocol_transfer
    103103BDT:#PBS -l walltime=00:30:00
    104104BDT:#PBS -l nodes=1:ppn=1
    105 BDT:#PBS -o {{JOB_TRANSFER_PROTOCOL_FILE}}
     105BDT:#PBS -o {{job_transfer_protocol_file}}
    106106BDT:#PBS -j oe
    107107BDT:#PBS -q dataq
     
    117117 - {{{fast_io_catalog}}} is the one to be used on the remote host.
    118118 - IP-addresses and user names have to be given for the local AND the remote host. Usually, the remote host IP-address is the one for the login-node.
    119  - {{{remote_loginnode}}}: on many of the large computer systems, the compute nodes do not allow for {{{ssh}}}- or {{{scp}}}-commands in order to transfer data to the local host or to start restart jobs. If {{{remote_loginnode}}} is set, {{{palmrun}}} tries to start these commands via the login-node. '''Attention:''' In most cases, the systems to not accept an IP-address. You have to give the mnemonic name of the login-node.
     119 - {{{remote_loginnode}}}: on many of the large computer systems, the compute nodes do not allow for {{{ssh}}}- or {{{scp}}}-commands in order to transfer data to the local host or to start restart jobs. If {{{remote_loginnode}}} is set, {{{palmrun}}} tries to start these commands via the login-node. '''Attention:''' In most cases, the systems do not accept an IP-address. You have to give the mnemonic name of the login-node.
    120120 - {{{ssh_key}}}: here you can give the filename of a special ssh-key for using ssh / scp without password. The key must be in folder {{{~/.ssh}}}. This is a special setting for the HLRN-system and should not be required on other systems.
    121121 - {{{default_queue}}}: if you do not set the queue via {{{palmrun}}}-option {{{-q}}}, this queue will be taken as the default queue. Other than {{{mrun}}}, {{{palmrun}}} does not check for valid queue names any more.
    122  - {{{submit_command}}}: ...
    123  - {{{module_commands}}}: ...
    124  - {{{login_init_cmd}}}: ...
     122 - {{{submit_command}}}: command for submitting a job to a batch system
     123 - {{{module_commands}}}: loading of necessary modules for running PALM
     124 - {{{login_init_cmd}}}: commands to be carried directly after login to the remote computer
    125125 - Lines starting with {{{BD:}}}: Here you have to give the batch directives that are required by your batch-system. {{{palmrun}}} will replace wildcards in the following way:
    126    * {{{ {{PROJECT_ACCOUNT}} }}}: To be used if you like to run the job under a specific account number. Is replace by value provided with {{{palmrun}}}-option {{{-A}}}.
    127    * {{{ {{JOB_ID}} }}}: The job's name. It will be formed by the run identifier provided with {{{palmrun}}}-option {{{-d}}} and a 5-digit random number, e.g. {{{-d example_cbl}}} may give {{{example_cbl.12345}}}.
    128    * {{{ {{CPU_HOURS}}, {{CPU_MINUTES}}, {{CPU_SECONDS}} }}}: Will be replaced based on the total CPU time in seconds provided with {{{palmrun}}}-option {{{-t}}}, .e.g. {{{-t 3666}}} will replace {{{ {{CPU_HOURS}}=1, {{CPU_MINUTES}}=1, {{CPU_SECONDS}=6 }}}.
    129    * {{{ {{NODES}} }}}: The number of nodes requested by the job. It will be replaced by the result of {{{ totalcores / ( noMPIt * noOpenMPt )}}}, where {{{totalcores}}} is the total number of cores as requested with {{{palmrun}}}-option {{{-X}}}, {{{noMPIt}}} is the number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}, and {{{noOpenMPT}}} is the number of OpenMP-threads to be started per MPI-task, as given by {{{palmrun}}}-option {{{-O}}}.
    130    * {{{ {{TASKS_PER_NODE}} }}}: The number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}.
    131    * {{{ {{JOBFILE}} }}}: Name of the job protocol file. The filename for jobs running on a remote host is created from {{{palmrun}}}-options {{{-h}}} and {{{-d}}}, e.g. for {{{palmrun -d example_cbl -h crayh ...}}} the job protocol file name will be {{{crayh_example_cbl}}}. For jobs running on a local host, the name part from option {{{-h}}} will be omitted.
    132    * {{{ {{QUEUE}} }}}: The name of the queue to which the job shall be submitted. Will be replaced by the value provided with {{{palmrun}}}-option {{{-q}}}, or, if {{{-q}}} is omitted, by the value of variable {{{defaultqueue}}} (see further above).
    133    * {{{ {{PREVIOUS_JOB}} }}}: The name of a previous job as given by {{{palmrun}}}-option {{{-W}}}. Can be used to set job dependencies.
     126   * {{{ {{project_account}} }}}: To be used if you like to run the job under a specific account number. Is replaced by value provided with {{{palmrun}}}-option {{{-A}}}.
     127   * {{{ {{job_id}} }}}: The job's name. It will be formed by the run identifier provided with {{{palmrun}}}-option {{{-d}}} and a 5-digit random number, e.g. {{{-d example_cbl}}} may give {{{example_cbl.12345}}}.
     128   * {{{ {{cpu_hours}}, {{cpu_minutes}}, {{cpu_seconds}} }}}: Will be replaced based on the total CPU time in seconds provided with {{{palmrun}}}-option {{{-t}}}, .e.g. {{{-t 3666}}} will replace {{{ {{cpu_hours}}=1, {{cpu_minutes}}=1, {{cpu_seconds}=6 }}}.
     129   * {{{ {{nodes}} }}}: The number of nodes requested by the job. It will be replaced by the result of {{{ totalcores / ( noMPIt * noOpenMPt )}}}, where {{{totalcores}}} is the total number of cores as requested with {{{palmrun}}}-option {{{-X}}}, {{{noMPIt}}} is the number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}, and {{{noOpenMPT}}} is the number of OpenMP-threads to be started per MPI-task, as given by {{{palmrun}}}-option {{{-O}}}.
     130   * {{{ {{tasks_per_node}} }}}: The number of MPI-tasks to be started on each node, as given my {{{palmrun}}}-option {{{-T}}}.
     131   * {{{ {{jobfile}} }}}: Name of the job protocol file. The filename for jobs running on a remote host is created from {{{palmrun}}}-options {{{-h}}} and {{{-d}}}, e.g. for {{{palmrun -d example_cbl -h crayh ...}}} the job protocol file name will be {{{crayh_example_cbl}}}. For jobs running on a local host, the name part from option {{{-h}}} will be omitted.
     132   * {{{ {{queue}} }}}: The name of the queue to which the job shall be submitted. Will be replaced by the value provided with {{{palmrun}}}-option {{{-q}}}, or, if {{{-q}}} is omitted, by the value of variable {{{defaultqueue}}} (see further above).
     133   * {{{ {{previous_job}} }}}: The name of a previous job as given by {{{palmrun}}}-option {{{-W}}}. Can be used to set job dependencies.
    134134 - Lines starting with {{{BDT:}}}: Here you have to give special batch directives for a small job that is required to send the job protocol file from the remote host back to your local host (meaning that these lines are only required if you are running batch jobs on a remote host). Since the job protocol file generated by the main job (which is started by {{{palmrun}}}) is not available before the end of the job, the main job has to start another small job at its end, which has the only task to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively.