Changes between Version 17 and Version 18 of doc/app/palm_wd


Ignore:
Timestamp:
Jul 13, 2015 7:12:26 AM (9 years ago)
Author:
maronga
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palm_wd

    v17 v18  
    55
    66== Configuration of the watchdog
    7 The watchdog consists of two scripts, palm_wd (watchdog client to be run on the local host) and palm_wdd (server to be located on each remote host to be monitored). Before running the watchdog, both client and server require system-specific configuration:
     7The watchdog consists of two scripts, palm_wd (watchdog client to be run on the local host) and palm_wdd (server to be located on each remote host to be monitored). Before running the watchdog, both client and server require system-specific configuration that have to been provided in configuration files {{{.wd.config}}} and {.wdd.config}}}:
    88
    9 1. in palm_wd, create one item for each remote host in the following three lists hostname, username and description, e.g.
     91. Make a copy of {{{trunk/SCRIPTS/palm_wd_files/.wd.config.default}}}, rename it to .wd.config, and move it to your local palm directory (e.g. {{{~/palm/current_version/}}}, then edit the file (e.g. here for HLRN-III):
    1010{{{
    11 hostname     = ["hlogin.hlrn.de", "blogin.hlrn.de"]
    12 username     = ["nikname"       , "nikname"       ]
    13 description  = ["Hannover"      , "Berlin"        ]
     11[Hannover]
     12hostname=hlogin.hlrn.de
     13username=<replace_by_your_remote_username>
     14
     15[Berlin]
     16hostname=blogin.hlrn.de
     17username=<replace_by_your_remote_username>
     18
     19[Settings]
     20update_frequency=10
    1421}}}
    15    here hostname is the IP of the remote host (assuming that a passwordless login via ssh-key is available), username is the user name on the remote host, and description is an arbritrary string to identify the host.
     22For each remote host to be monitored, create a separate section with Description of your choice (here "Hannover" and "Berline"). hostname is the IP/name of the remote host (assuming that a passwordless login via ssh-key is available), username is the user name on the remote host. The automatical update frequency must be given in minutes.
    1623
    17    Additionally, the update_frequency can be adjusted:
     242. palm_wdd requires system-specific configurations. Make a copy of {{{trunk/SCRIPTS/palm_wd_files/.wdd.config.default}}} for each host to be monitored, rename it to {{{.wdd.config}}}, and edit the files appropriately. For HLRN-III {{{.wdd.config}}} reads:
    1825{{{
    19 update_frequency = 600*1000
     26[Settings]
     27readqueue="showq | egrep"
     28tmpdir="/gfs1/tmp/"
     29canceljob="canceljob"
     30checkjob="checkjob"
     31realname_grep="AName"
     32starttime="showstart"
     33starttime_grep="start in"
    2034}}}
     35As the queuing system on different computing systems may vary, it is not possible to provide detailed instructions how to set this configuration. In case you are struggling with the configuration, please feel free to create a [/newticket new ticket]
    2136
    22 2. in palm_wdd, system-specific configurations must be made. The default is configured to be used on the Cray-XC40 at HLRN-III and reads
    23 {{{
    24 cmd_readqueue      = "showq | egrep "
    25 cmd_tmpdir         = "/gfs1/tmp/"
    26 cmd_canceljob      = "canceljob"
    27 cmd_checkjob       = "checkjob"
    28 cmd_realname_grep  = "AName"
    29 cmd_starttime      = "showstart"
    30 cmd_starttime_grep = "start in"
    31 }}}
    32    For other hosts, the parameters above must be adjusted appropriately.
    33 
    34 3. Copy palm_wdd into the $HOME directory of each of the remote hosts, i.e. for HLRN-III:
     373. Now copy palm_wdd and the configuration files into the $HOME directory of each of the remote hosts, i.e. for HLRN-III:
    3538{{{
    3639scp palm_wdd nikname@hlogin.hlrn.de:
     40scp .wdd.config.hlrnIII nikname@hlogin.hlrn.de:.wdd.config
    3741scp palm_wdd nikname@blogin.hlrn.de:
    38 }}}
    39 
    40 4. Create database files for the watchdog in your working directory:
    41 {{{
    42 cp $PALM_BIN/palm_wd_files/.wd.olddata $HOME/palm/current_version
    43 cp $PALM_BIN/palm_wd_files/.wd.newdata $HOME/palm/current_version
     42scp .wdd.config.hlrnIII nikname@blogin.hlrn.de:.wdd.config
    4443}}}
    4544
     
    5554
    5655== Documentation
    57 The watchdog is to large extent self-explanatory. The following features, however, require a short description.
     56The watchdog is to large extent self-explanatory. The following features, however, might require a short description.
    5857
    5958=== Progress bar