Changes between Version 25 and Version 26 of doc/app/palmrun


Ignore:
Timestamp:
May 22, 2018 3:19:45 PM (6 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palmrun

    v25 v26  
    1616   palmrun  -d example_cbl  -h default  -a "d3#"  -X4
    1717}}}
    18 {{{example_cbl}}} is the so-called ''run identifier'' and tells {{{palmrun}}} to use the NAMELIST file {{{example_cbl_p3d}}} from {{{JOBS/example_cbl/INPUT}}}. It also determines folders and names of output files generated by PALM using informations from the default file configuration file {{{..../trunk/SCRIPTS/.palm.iofiles}}}. Chapter .... explains the format of this file and how you can modify or extend it. As a new user, you should not need to care about this file because the default settings should do the job for you.
     18{{{example_cbl}}} is the so-called ''run identifier'' and tells {{{palmrun}}} to use the NAMELIST file {{{example_cbl_p3d}}} from {{{JOBS/example_cbl/INPUT}}}. It also determines folders and names of output files generated by PALM using informations from the default file configuration file {{{..../trunk/SCRIPTS/.palm.iofiles}}}. Chapter [wiki:doc/palm_iofiles INPUT/OUTPUT files] explains the format of this file and how you can modify or extend it. As a new user, you should not need to care about this file because the default settings should do the job for you.
    1919
    2020Option {{{-h}}} specifies the so-called host identifier. It tells {{{palmrun}}} which configuration file should be used. {{{-h default}}} means to use the configuration file {{{.palm.config.default}}}. The configuration file contains all the computer (host) specific settings, e.g. which compiler and compiler options should be used, the pathnames of libraries (e.g. NetCDF or MPI), or the name of the execution command (e.g. {{{mpirun}}} or {{{mpiexec}}}), as well as many other important settings. If the automatic installer worked correctly, it created this file for you with settings based on your responses during the installation process. You may create additional configuration files with different settings for other computers (hosts), or for the same computer, e.g. if you like to compile and run PALM with debug compiler options (see "creating configuration files manually").
     
    296296Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has beeen created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<host identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-h}}} and {{{-d}}} (here it will be {{{batch_neutral}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied.
    297297
    298 Typically, batch systems allow you to run jobs only for limited time, e.g. 12 hours. See chapter [wiki:doc/restarts job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
     298Typically, batch systems allow you to run jobs only for a limited time, e.g. 12 hours. See chapter [wiki:doc/restarts job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs.
    299299
    300300 
    301301=== Running PALM in batch on a remote computer
     302
     303You can use the {{{palmrun}}} command on your local computer (e.g. your local PC or workstation) and let it submit a batch job to a remote computer at any place on the world. {{{palmrun}}} copies required input files from your local computer to the remote machine and output files back to your local machine, depending on the settings in the {{{.palm.iofiles}}} file. The job protocol file will also be automatically copied back to your local computer.
     304
     305If you like to use this {{{palmrun}}} feature, you need additional/special settings in the configuration file. Furthermore, you need to pre-compile the PALM-code for the remote machine using the {{{palmbuild}}} command. The automatic PALM installer can not be used to install PALM on that machine. You need to do most of the settings manually.
     306
     307Furthermore, passwordless ssh/scp access from the local computer to the remote computer, as well as from the remote to the local computer, is required. In remote mode, {{{palmrun}}} and {{{palmbuild}}} are heavily using ssh and scp commands, and if you have not established passwordless access, you would need to enter your password several times before the batch job is finally submitted. Moreover, the job protocol file and any output files cannot be transferred back to your local computer because there is no connection to the job which could be used to provide passwords for these transfers (and even if you could, your job may require your input during nighttime while you are sleeping).
     308
     309Now, let's start with the configuration file settings for remote batch jobs. For this it would be convenient to create a new configuration file based on the one you already used locally, e.g. by
     310{{{
     311   cp  .palm.config.batch  .palm.config.<remote host identifier>
     312}}}
     313where {{{<remote host identifier>}}} can be any string to identify your remote host. Edit this file and set at minimum the following additional variables:
     314{{{
     315%remote_jobcatalog   /home/username/job_queue
     316%remote_ip           123.45.6.7
     317%remote_username     your_username_on_the_remote_system
     318}}}
     319After the batch directives (lines that start with {{{BD:}}}) put another set of batch directives starting with {{{BDT:}}} that are required to generate a small additional batch job which does no more than transferring the job protocol back to your local system. Since the job protocol file generated by the main job (which is started by {{{palmrun}}}) is not available before the end of that job, the main job has to start another small job at its end, which only task is to send back the job protocol to the local host. The computing centers normally have special queues for these kind of small jobs, and you should request the job resources respectively. Here is an example for a CRAY-XC40 system:
     320{{{
     321# BATCH-directives for batch jobs used to send back the jobfile from a remote to a local host
     322BDT:#!/bin/bash
     323BDT:#PBS -N job_protocol_transfer
     324BDT:#PBS -l walltime=00:30:00
     325BDT:#PBS -l nodes=1:ppn=1
     326BDT:#PBS -o {{job_transfer_protocol_file}}
     327BDT:#PBS -j oe
     328BDT:#PBS -q dataq
     329}}}
     330Only few resources are requested (e.g. 30 minutes cpu time and one core) and the job is running in a special queue {{{dataq}}}. You may need to adjust these settings with respect to your batch system.
     331
     332Additional settings for batch jobs on remote hosts can be found in the [wiki:doc/app/palmconfig complete description of the configuration file].
     333
     334After setting up the configuration file and before calling {{{palmrun}}}, you need to call the {{{palmbuild}}} command to generate the PALM executable for the remote host:
     335{{{
     336   palmbuild -h <remote host identifier>
     337}}}
     338Keep in mind that the configuration file {{{.palm.config.<remote host identifier>}}} requires correct settings valid for your remote computer (compiler name, compiler options, include and library paths, etc.). If you forgot to call {{{palmbuild}}}, {{{palmrun}}} will ask you to do this for you.
     339
     340Now you can call {{{palmrun}}} with
     341{{{
     342   palmrun -d neutral -h <remote host identifier> ......
     343}}}
     344After confirming the {{{palmrun}}} settings by entering {{{y}}}, similar information as for local batch jobs will be output to the terminal. After the job has been finished, the job protocol will be transferred to your local computer and put into the folder defined by {{{local_jobcatalog}}}. If this file does not appear, because e.g. the transfer failed, you may find the protocol file on the remote host in the folder defined by {{{remote_jobcatalog}}}. Like in case of batch jobs running on local computers, check the contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, and especially you get information about where to find the output files on your local computer.
     345
     346The configuration file {{{.palm.iofiles}}} offers special controls for copying INPUT/OUTPUT files, since large PALM-setups (those using large number of grid points) can produce extremely large output files which would require long time for transferring them to your local system and which might have sizes that exceed the capacity of your local discs. See chapter [wiki:doc/palm_iofiles INPUT/OUTPUT files] which explains how to control copying of INPUT/OUTPUT files.
    302347
    303348