Changes between Version 24 and Version 25 of doc/app/palmrun
- Timestamp:
- May 22, 2018 12:50:45 PM (7 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/app/palmrun
v24 v25 201 201 == How to run PALM in batch mode 202 202 203 Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. I t creates a batch job (a file containing directives for a queuing-system plus commands to run PALM) which iseither submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your configuration file {{{.palm.config....}}}, and that a batch system (e.g. PBS or ...) is installed on the respective computer.203 Large simulation set-ups usually cannot be run interactively, since the large amount of required resources (memory as well as cpu-time) are only provided through batch environments. {{{palmrun}}} supports two different ways to run PALM in batch mode. In both cases it creates a batch job, i.e. a file containing directives for a queuing-system plus commands to run PALM, which is then either submitted to your local computer or to a remote computer. Running PALM in batch mode requires you to manually modify and extend your configuration file {{{.palm.config....}}}, and that a batch system (e.g. PBS or ...) is installed on the respective computer. 204 204 205 205 === Running PALM in batch on a local computer … … 221 221 The first option {{{-b}}} is required to tell {{{palmrun}}} to create a batch job running on the local computer. 222 222 223 Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{ .palm.config.batch}}}. Edit this file and add those batch directives required by your batch system. Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS).223 Before entering the above command, you need to add information to your configuration file. You may edit an existing file (.e.g. {{{.palm.config.default}}}) or create a new one (e.g. by copying the default file to e.g. {{{.palm.config.batch}}} and then editing the new file). In general, you can not use the same configuration file for running interactive jobs and batch jobs as well since different settings are required. Let's assume here that you have created a new file {{{.palm.config.batch}}}. Edit this file and add those batch directives required by your batch system. Keep in mind that there is a wide variety of batch systems and that many computer centers introduce their own special settings for these systems. Please read the documentation of your respective batch system carefully in order to figure out the required settings for your system (e.g. to run an MPI job on multiple cores). The following lines give a minimum example for the portable batch system (PBS). 224 224 {{{ 225 225 BD:#!/bin/bash … … 279 279 After confirming the {{{palmrun}}} settings by entering {{{y}}}, following information will be output to the terminal: 280 280 {{{ 281 ................ 282 }}} 283 The job is now queued and you have to wait until he is finished. A job protocol file with name ... will be put in ... 284 281 >>> everything o.k. (y/n) ? y 282 283 *** batch-job will be created and submitted 284 285 *** creating executable and other sources 286 *** nothing to compile for this run 287 *** executable and other sources created 288 *** input files have been copied 289 290 *** submit the job (output of submit command, e.g. the job-id, may follow) 291 <<<submit message from batch system>>> 292 293 --> palmrun finished 294 295 }}} 296 Before the batch job is finally submitted, {{{palmrun}}} creates a folder named {{{SOURCES_FOR_RUN_<run_identifier>}}} which is located in the {{{fast_io_catalog}}} and which contains various files required for the run (e.g. the PALM executable, PALM's source code and object files, copies of the configuration files, etc.). Messages {{{*** executable and other sources created}}} and {{{*** input files have been copied}}} tell you that this folder has beeen created. {{{*** nothing to compile for this run}}} means that no user interface needs to be compiled. After the job submission, the batch system usually prompts a message ({{{<<<submit message from batch system>>>}}}) which tells you the batch system id under which you can find your job in the queueing system (e.g. if you like to cancel it). The job is now queued and you have to wait until it is finished. The main task of the job is to execute the {{{palmrun}}} command again, that you have entered, but now on the compute nodes of your system. A job protocol file with name {{{<host identifier>_<run identifier>}}} as given with {{{palmrun}}} options {{{-h}}} and {{{-d}}} (here it will be {{{batch_neutral}}}) will be put in the folder that you have set by variable {{{local_jobcatalog}}} in your configuration file ({{{.palm.config.batch}}}). Check contents of this file carefully. Beside some additional information, it mainly contains the output of the {{{palmrun}}} command as you get it during interactive execution, e.g. information is given to where the output files have been copied. 297 298 Typically, batch systems allow you to run jobs only for limited time, e.g. 12 hours. See chapter [wiki:doc/restarts job chains and restart jobs] on how you can use {{{palmrun}}} to create so-called job chains in order to carry out simulations which exceed the time limit for single jobs. 299 300 301 === Running PALM in batch on a remote computer 285 302 286 303