Changes between Version 9 and Version 10 of doc/install/advanced


Ignore:
Timestamp:
Sep 5, 2018 3:09:10 PM (7 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/install/advanced

    v9 v10  
    77'''Before you start, please check if you have fulfilled all [wiki:doc/install installation requirements!]'''
    88
    9 === [=#package_installation]First step: Package installation ===
     9The [wiki:doc/install/automatic automatic installer] normally cares for steps 1-5 that are described below. Failure of the automatic installation process is usually caused by inconsistencies in your software environment (e.g. mismatches between your compiler, NetCDF- and MPI- libraries) which will also cause failure of the manual installation. Anyhow, at least parts of the installation steps may be required to be carried out manually. For example, if your system has a very strict firewall and does not allow downloads from our repository, you may carry out the download (second step below) on a different system and copy the {{trunk}}} folder to your target system, before carrying on with the [wiki:doc/install/automatic automatic installer].
     10
     11Installation and configuration for batch jobs cannot be done by the [wiki:doc/install/automatic automatic installer] and requires manual work in any case, as described further below.
     12
     13== [=#package_installation]First step: Package installation ==
    1014
    1115The '''first installation step''' requires creating a set of directories on the local and, for the advanced method, on the remote host. These are:
     
    3236
    3337
    34 === [=#package_configuration]Package configuration ===
     38== [=#package_configuration]Third step: Package configuration ==
    3539
    3640Compilation and execution of PALM is mainly controlled by two shell scripts named [wiki:doc/app/palmbuild {{{palmbuild}}}] and [wiki:doc/app/palmrun {{{palmrun}] that are part of the download and reside in folder [[[.../trunk/SCRIPTS]]]. To use these scripts, you need to extend your {{{PATH}}}-variable, by adding a line
     
    5963
    6064
    61 === [=#package_configuration]Compiling the PALM sources ===
     65== [=#package_configuration]Fourth step: Compiling the PALM sources ==
    6266
    6367
     
    7377
    7478
    75 === [=#verification]Installation verification ===
     79== [=#verification]Fifth step: Installation verification ==
    7680
    77 As a last step, after the compilation has been finished, the PALM installation has to be verified. For this purpose a simple test run is carried out. This once again requires the '''mrun''' [[wiki:doc/app/configexample|configuration file]], as well as the [[wiki:doc/app/par|parameter file]]. The parameter file must be copied from the PALM working copy by
     81As a last step, after the compilation has been finished, the PALM installation has to be verified. For this purpose a simple test run needs to be started using the script {{{palmrun}}}. In addition to the configuration file, {{{palmrun}}} requires a [wiki:doc/app/par parameter file] as well. The parameter file for the test case is provided as part of the download and needs to be copied first:
    7882{{{
     83  cd ~/palm/current_version
    7984  mkdir -p JOBS/example_cbl/INPUT
    8085  cp trunk/INSTALL/example_cbl_p3d JOBS/example_cbl/INPUT/example_cbl_p3d
    8186}}}
    82 The test run can now be started by executing the command
     87Here, the string {{{example_cbl}}} acts as the so-called ''run identifier''.
     88The test run can now be started by entering
    8389{{{
    84   mrun -d example_cbl -h lccrayh -K parallel -X 8 -T 8 -t 500 -q mpp1testq -r "d3# pr#"
     90  palmrun -d example_cbl -h default -X4 -a "d3#"
    8591}}}
    86 This specific run will be carried out on 8 PEs and is allowed to use up to 500 seconds CPU time. After pressing <return>, the most important settings of the job are displayed at the terminal window and the user is prompted for o.k. ("{{{y}}}"). Next, a message of the queuing system like "''Request … Submitted to queue… by…''" should be displayed. Now the job is queued and either started immediately or at a later time, depending on the current workload of the remote host. Provided that it is executed immediately and that all things work as designed, the job protocol of this run will appear under the file name {{{~/job_queue/lccrayh_example}}} no more than a few minutes later. The content of this file should be carefully examined for any error messages.\\\\
    87 Beside the job protocol and according to the configuration file and arguments given for 'mrun' options {{{-d}}} and {{{-r}}}, further files should be found in the directories
     92See the [wiki:doc/app/palmrun palmrun description] for detailed explanations of available options. This specific run will be carried out on 4 cores (if available on your system, others you may need to adjust the {{{-X}}} option). Most important settings of this run are displayed at the terminal window and you are prompted for o.k. ("{{{y}}}") to continue. Informations about the progress of the simulation will be output to the terminal. After {{{palmrun}}} has finished, you should find some result files in folder {{{JOBS/example_cbl/MONITORING}}}. Please compare the contents of file
    8893{{{
    89   ~/palm/current_version/JOBS/example_cbl/MONITORING
     94  ~/palm/current_version/JOBS/example_cbl/MONITORING/example_cbl_rc
    9095}}}
    91 and
     96with those of the example result file that is provided under {{{trunk/INSTALL/example_cbl_rc}}}, e.g. by using the standard {{{diff}}} command
    9297{{{
    93   ~/palm/current_version/JOBS/example_cbl/OUTPUT
     98   cd ~/palm/current_version
     99   diff  JOBS/example_cbl/MONITORING/example_cbl_rc trunk/INSTALL/example_cbl_rc
    94100}}}
    95 Please compare the contents of file
    96 {{{
    97   ~/palm/current_version/JOBS/example_cbl/MONITORING/lccrayh_example_rc
    98 }}}
    99 with those of the example result file which can be found under {{{trunk/INSTALL/example_cbl_rc}}}, e.g. by using the standard {{{diff}}} command
    100 {{{
    101 diff  JOBS/example_cbl/MONITORING/lccrayh_example_cbl_rc trunk/INSTALL/example_cbl_rc
    102 }}}
    103 where it is assumed that your working directory is {{{~/palm/current_version}}}.\\\\
    104101'''You should not find any difference between these two files''', except for the run date and time displayed at the top of the file header. If the file contents are identical, the installation is successfully completed.\\\\
    105102
    106103
    107104
     105== Installation for running PALM in batch mode ==
     106
     107=== Installation for batch jobs on the local machine ===
     108
     109Running PALM in batch mode on your local computer (requires that the computer where you are logged in has a batch system running) requires to add appropriate batch directives to the configuration file as well as settings for variables like {{{local_jobcatalog}}}, {{{defaultqueue}}}, {{{memory}}}, and {{{submit_command}}}. Settings for {{{module_commands}}} and {{{login_init_cmd}}} may be needed too. See the [wiki:doc/app/palm_config configuration file description] for further details. In order to run PALM in batch mode, the installation process is the same as described above, but {{{palmrun}}} requires additional options and may look like this
     110{{{
     111   palmrun -d example_cbl -h default -X4 -T4 -t200 -m1000 -a "d3#" -q testqueue -b
     112}}}
     113The {{{-b}}} option is essential to tell {{{palmrun}}} to generate and submit a batch job. Otherwise, it will try to execute PALM interactively in your terminal session. Again, result files for verifying the installation can be found in folder {{{JOBS/example_cbl/MONITORING}}}, after the batch job has been executed. The protocol file of the batch job, which is typically created by every batch system, can be found in the folder that has been set by {{{local_jobcatalog}}} under the name {{{<configuration identifier>_<run identifier>}}}, which is {{{default_example_cbl}}} in the given example. Further informations about running PALM in batch mode on local machines can be found in the [wiki:doc/app/palmrun_quickstart palmrun quickstart guide].
     114
     115
     116=== Batch jobs on a remote machine ===
     117
     118Follow the installation steps described above. In addition to the settings for a local batch job, installation of PALM for running batch jobs on remote machines requires further additional entries in the configuration file, at least variables {{{remote_ip}}}, {{{remote_username}}}, and {{{remote_jobcatalog}}} need to be set. For further informations see the [wiki:doc/app/palmrun_quickstart palmrun quickstart guide] and the [wiki:doc/app/palmrun palmrun documentation]. Assuming a configuration file {{{.palm.config.remote_system}}}, compiling the PALM sources via
     119{{{
     120   palmbuild -h remote_system
     121}}}
     122will copy the PALM sources by {{{scp}}} from your local computer to the remote system and invokes the remote compiler using {{{ssh}}}. The binaries will be put in folder {{{$HOME/palm/current_version/MAKE_DEPOSITORY_remote_system}}} on the remote system.
     123
     124For using {{{palmrun}}}, additional batch directives have to be added to the configuration file in order to transfer back the job protocol file (see the [wiki:doc/app/palmrun_quickstart palmrun quickstart guide] for further details). The {{{palmrun}}} command for generating the test run then reads
     125{{{
     126   palmrun -d example_cbl -h remote_system -X4 -T4 -t200 -m1000 -a "d3#" -q testqueue
     127}}}
     128{{{palmrun}}} transfers back the result file via {{{scp}}} and you should find it on your local system in folder {{{JOBS/example_cbl/MONITORING}}} under the name {{{remote_system_example_cbl_rc}}} after the job on the remote system has finished. The job protocol file will also be copied to the folder that has been set by {{{local_jobcatalog}}}.
     129
     130Using {{{palmbuild}}} and {{{palmrun}}} for installing and running PALM on remote machines requires passwordless login via {{{scp}}} and {{{ssh}}}, as descrobed in the next section.
    108131
    109132=== Passwordless login via ssh ===
    110133
    111 All hosts (local as well as remote) are accessed via the secure shell (ssh). The user must establish passwordless login using the [[wiki:/doc/install/passwordless|private/public-key mechanism]] (HLRNIII users please see [[wiki:/doc/app/machine/hlrnIII|hints]]). '''To ensure proper function of mrun, passwordless login must be established in both directions, from the local to the remote host as well as from the remote to the local host! '''Test this by carrying out e.g. on the local host:
     134All hosts (local as well as remote) are accessed via the secure shell (ssh). The user must establish passwordless login using the [[wiki:/doc/install/passwordless|private/public-key mechanism]] (HLRNIII users please see [[wiki:/doc/app/machine/hlrnIII|hints]]). '''To ensure proper function of {{{palmbuild}}} and {{{palmrun}}}, passwordless login must be established in '''both directions''', from the local to the remote host as well as from the remote to the local host! '''Test this by carrying out e.g. on the local host:
    112135{{{
    113136  ssh  <username on remote host>@<remote IP-address>
     
    117140  ssh  <username on local host>@<local IP-address>
    118141}}}
    119 In both cases you should not be prompted for a password. '''Before continuing the further installation process, this must be absolutely guaranteed! '''It must also be guaranteed for '''all''' other remote hosts, on which PALM shall run.\\\\
    120 Please note that on many remote hosts, passwordless login must also work '''within the remote host''', i.e. for ssh connections from the remote host to itself. Test this by executing on the remote host:
     142In both cases you should not be prompted for a password. '''Before starting with the installation process, this should be absolutely guaranteed! '''It must also be guaranteed for '''all''' other remote hosts, on which PALM shall run.\\\\
     143Please note that on many remote hosts, passwordless login must also work '''within the remote host''', i.e. for {{{ssh}}} connections from the remote host to itself (e.g. for connections from compute nodes to login nodes). Test this by executing on the remote host:
    121144{{{
    122145  ssh <username on remote host>@<remote IP-address>
     
    124147You should not be prompted for a password.\\\\
    125148
    126 === [=#other_machines]Configuration for other machines ===
    127 
    128 Starting from version 3.2a, beside the default hosts (HLRN, etc.), PALM can also be installed and run on other Linux-Cluster-, IBM-AIX, or NEC-SX-systems. To configure PALM for a non-default host only requires to add some lines to the configuration file {{{.mrun.config}}}.\\\\
    129 First, you have to define the host identifier (a string of arbitrary length) under which your local host shall be identified by adding a line
    130 {{{
    131   %host_identifier  <hostname>  <host identifier>
    132 }}}
    133 to the configuration file (best to do this in the section where the other default host identifiers are defined). Here {{{<hostname>}}} must be the name of your local host as provided by the unix-command "{{{hostname}}}". The first characters of {{{<host identifier>}}} have to be "{{{lc}}}", if your system is (part of) a linux-cluster, "{{{ibm}}}", or "{{{nec}}}" in case of an IBM-AIX- or NEC-SX-system, respectively. For example, if you want to install on a linux-cluster, the line may read as
    134 {{{
    135   %host_identifier  foo  lc_bar
    136 }}}
    137 In the second step, you have to give all informations neccessary to compile and run PALM on your local host by adding an additional section to the configuration file:
    138 {{{
    139   %remote_username   <1>      <host identifier> parallel
    140   %tmp_user_catalog  <2>      <host identifier> parallel
    141   %compiler_name     <3>      <host identifier> parallel
    142   %compiler_name_ser <4>      <host identifier> parallel
    143   %cpp_options       <5>      <host identifier> parallel
    144   %netcdf_inc        <6>      <host identifier> parallel
    145   %netcdf_lib        <7>      <host identifier> parallel
    146   %fopts             <8>      <host identifier> parallel
    147   %lopts             <9>      <host identifier> parallel
    148 }}}
    149 The section consists of four columns each separated by one or more blanks. The first column gives the name of the respective environment variable used by '''mrun''' and '''mbuild''', while the second column defines its value. The third column has to be the host identifier as defined above, and the last column in each line must contain the string "{{{parallel}}}". Otherwise, the respective line(s) will be interpreted as belonging to the setup for compiling and running a serial (non-parallel) version of PALM.\\\\
    150 All brackets have to be replaced by the appropriate settings for your local host:
    151 
    152   * {{{<1>}}} is the username on your LOCAL host
    153   * {{{<2>}}} is the temporary directory in which PALM runs will be carried out
    154   * {{{<3>}}} is the compiler name which generates parallel code
    155   * {{{<4>}}} is the compiler name for generating serial code
    156   * {{{<5>}}} are the preprocessor options to be invoked. In most of the cases, it will be neccessary to adjust the MPI data types to double precision by giving {{{-DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION}}}. To switch on the netCDF support, you also have to give {{{-D__netcdf}}} and {{{-D__netcdf4}}} (if you like to have netCDF4/HDF5 data format; this requires a netCDF4 library!).
    157   * {{{<6>}}} is the compiler option for specifying the include path to search for the netCDF module/include files
    158   * {{{<7>}}} are the linker options to search for the netCDF library
    159   * {{{<8>}}} are the general compiler options to be used. You should allways switch on double precision (e.g. {{{-r8}}}) and code optimization (e.g. {{{-O2}}}).
    160   * {{{<9>}}} are the linker options
    161   * {{{<host identifier>}}} is the host identifier as defined before
    162 
    163 A typical example may be:
    164 {{{
    165   %remote_username   raasch                                  lc_bar parallel
    166   %tmp_user_catalog  /tmp                                    lc_bar parallel
    167   %compiler_name     mpif90                                  lc_bar parallel
    168   %compiler_name_ser ifort                                   lc_bar parallel
    169   %cpp_options       -DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__netcdf  lc_bar parallel
    170   %netcdf_inc        -I:/usr/local/netcdf/include            lc_bar parallel
    171   %netcdf_lib        -L/usr/local/netcdf/lib:-lnetcdf        lc_bar parallel
    172   %fopts             -axW:-cpp:-openmp:-r8:-nbs              lc_bar parallel
    173   %lopts             -axW:-cpp:-openmp:-r8:-nbs:-Vaxlib      lc_bar parallel
    174 }}}
    175 Currently (version 3.7a), depending on the MPI version which is running on your local host, the options for the execution command (which may be {{{mpirun}}} or {{{mpiexec}}}) may have to be adjusted manually in the '''mrun'''-script. A future version will allow to give the respective settings in the configuration file.\\\\
    176 If you have any problems with the PALM installation, the members of the PALM working group are pleased to help you.\\\\\\
    177149
    178150
    179 = [=#update]Installation of new / other versions, version update =
     151== [=#update]Installation of new / other revisions, revision update =
    180152
    181 The PALM group announces code revisions by emails send to the PALM mailing list. If you like to be put on this list, just send an email to raasch@muk.uni-hannover.de. Details about new releases can be found in the [../tec/changelog PALM change log].\\\\
    182 Generally, there are two ways of installing new / other versions. You can install a version from the list of available PALM releases or you can update your current installation with the newest developer version of PALM.\\\\
    183 If you have previously checked out the most recent (at that time) PALM developer version by using
     153All code revisions are documented under [wiki:doc/tec/changelog]. The PALM group announces major code revisions via the PALM mailing list. You will be automatically set on the list by creating an account using the [[//trac/register|register form]].\\\\
     154Generally, there are two ways of installing new / other versions. You can install a version from the [wiki:doc/tec/releasenotes list of available PALM releases] or you can update your current installation with the newest developer revision of PALM.\\\\
     155If you have previously checked out the PALM developer revison by using
    184156{{{
    185157  svn checkout ...../palm/trunk trunk
    186158}}}
    187 you can easily make an update to the newest version by changing into the working directory {{{~/palm/current_version}}} and executing
     159you can easily make an update to the newest revision by
    188160{{{
     161  cd ~/palm/current_version
    189162  svn update trunk
    190163}}}
    191 This updates all files in the PALM working copy in subdirectory {{{trunk}}}. The update may fail due the '''subversion''' rules, if you have modified the contents of trunk. In case of any conflicts with the repository, please refer to the '''subversion''' documentation on how to remove them. In order to avoid such conflicts, modifications of the default PALM code should be omitted and be restricted to the user-interface only (see [../app/userint here]).\\\\
     164This updates all files in the  working copy in folder {{{trunk}}} (which is your working copy of the PALM repository). The update may fail due the '''subversion''' rules, if you have modified the contents of trunk. In case of any conflicts with the repository, please refer to the '''subversion''' documentation on how to remove them. In order to avoid such conflicts, modifications of the default PALM code should be omitted and be restricted to the user-interface only (see [../app/userint here]), except you are a PALM developer.\\\\
    192165Alternatively, you can install new or other releases in a different directory, eg.
    193166{{{
    194   mkdir ~/palm/release-3.1c
    195   cd ~/palm/release-3.1c
    196   svn checkout --username <your username> https://palm.muk.uni-hannover.de/svn/palm/tags/release-3.1c trunk
     167  mkdir ~/palm/release-4.0
     168  cd ~/palm/release-4.0
     169  svn checkout --username <your username> https://palm.muk.uni-hannover.de/svn/palm/tags/release-4.0 trunk
    197170}}}
    198 However, this would require to carry out again the complete installation process described above. So far, different versions of PALM cannot be used at the same time. The PALM releases from {{{palm/tags}}} never have to be updated with "{{{svn update}}}", since these releases are frozen! \\\\
    199 After updating the working copy, please check for any differences between your current configuration file ({{{.mrun.config}}}) and the default configuration files under {{{trunk/SCRIPTS/.mrun.config.<compiler>}}} and adjust your current file, if neccessary.\\\\
    200 The scripts and the pre-compiled code must then be updated via
     171However, this requires to carry out again the complete installation process described above. So far, different versions of PALM cannot be used at the same time. The PALM releases from {{{palm/tags}}} never have to be updated with "{{{svn update}}}", since these versions are frozen! \\\\
     172
     173The compiled PALM code and helper routines must then be updated via
    201174{{{
    202   mbuild -u -h lcmuk
    203   mbuild -u -h ibmh
    204   mbuild -h ibmh
     175   palmbuild -h default
    205176}}}
    206 or via
     177or for any other configuration files that you are using.\\\\
     178You can use '''subversion''' for code comparison between the different revisions. Also, modified code can be committed to the repository, but this is restricted to PALM developers.\\\\
     179
     180If you want to recompile PALM via {{{palmbuild}}} after you have modified the configuration file (e.g. if you changed compiler options or switched to other libraries), you need to apply the {{{touch}}} command on all source files in advance:
    207181{{{
    208   mbuild -u
    209   mbuild 
     182   touch trunk/SOURCE/*
    210183}}}
    211 on all remote hosts listed in the configuration file {{{.mrun.config}}}.\\\\
    212 You can use '''subversion''' for code comparison between the different versions. Also, modified code can be committed to the repository, but this is restricted to PALM developers.\\\\
     184because otherwise the {{{make}}} mechanism will not detect any source file that needs to be compiled. As an alternative, instead of ''touching'' the files, you may delete the {{{MAKE_DEPOSITORY}}} folder before calling {{{palmbuild}}}, but then the complete code will be re-compiled. 
    213185
    214 If you want to recompile PALM via {{{mbuild}}} after you have modified the configuration file {{{.mrun.config}}} (e.g. if you switch to a newer compiler or NetCDF version), you will have to perform the touch command on all source files:
    215 {{{
    216 touch trunk/SOURCE/* .
    217 }}}
    218 because otherwise the {{{make}}} mechanism will not be able to recompile the code.
    219 
    220 As a last step, a suitable test run should be carried out. It should be carefully examined whether and how the results created by the new version differ from those of the old version. Possible discrepancies which go beyond the ones announced in the [../tec/changelog PALM change log] should be communicated as soon as possible to the PALM group.
     186As a last step, a suitable test run should be carried out. It should be carefully examined whether and how the results created by the new revision differ from those of the old version. Possible discrepancies which go beyond the ones announced in the [wiki:doc/tec/changelog PALM change log] should be communicated as soon as possible via our [/newticket ticket system].