Changes between Version 17 and Version 18 of doc/app/palm_wd
- Timestamp:
- Jul 13, 2015 7:12:26 AM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/app/palm_wd
v17 v18 5 5 6 6 == Configuration of the watchdog 7 The watchdog consists of two scripts, palm_wd (watchdog client to be run on the local host) and palm_wdd (server to be located on each remote host to be monitored). Before running the watchdog, both client and server require system-specific configuration :7 The watchdog consists of two scripts, palm_wd (watchdog client to be run on the local host) and palm_wdd (server to be located on each remote host to be monitored). Before running the watchdog, both client and server require system-specific configuration that have to been provided in configuration files {{{.wd.config}}} and {.wdd.config}}}: 8 8 9 1. in palm_wd, create one item for each remote host in the following three lists hostname, username and description, e.g.9 1. Make a copy of {{{trunk/SCRIPTS/palm_wd_files/.wd.config.default}}}, rename it to .wd.config, and move it to your local palm directory (e.g. {{{~/palm/current_version/}}}, then edit the file (e.g. here for HLRN-III): 10 10 {{{ 11 hostname = ["hlogin.hlrn.de", "blogin.hlrn.de"] 12 username = ["nikname" , "nikname" ] 13 description = ["Hannover" , "Berlin" ] 11 [Hannover] 12 hostname=hlogin.hlrn.de 13 username=<replace_by_your_remote_username> 14 15 [Berlin] 16 hostname=blogin.hlrn.de 17 username=<replace_by_your_remote_username> 18 19 [Settings] 20 update_frequency=10 14 21 }}} 15 here hostname is the IP of the remote host (assuming that a passwordless login via ssh-key is available), username is the user name on the remote host, and description is an arbritrary string to identify the host.22 For each remote host to be monitored, create a separate section with Description of your choice (here "Hannover" and "Berline"). hostname is the IP/name of the remote host (assuming that a passwordless login via ssh-key is available), username is the user name on the remote host. The automatical update frequency must be given in minutes. 16 23 17 Additionally, the update_frequency can be adjusted:24 2. palm_wdd requires system-specific configurations. Make a copy of {{{trunk/SCRIPTS/palm_wd_files/.wdd.config.default}}} for each host to be monitored, rename it to {{{.wdd.config}}}, and edit the files appropriately. For HLRN-III {{{.wdd.config}}} reads: 18 25 {{{ 19 update_frequency = 600*1000 26 [Settings] 27 readqueue="showq | egrep" 28 tmpdir="/gfs1/tmp/" 29 canceljob="canceljob" 30 checkjob="checkjob" 31 realname_grep="AName" 32 starttime="showstart" 33 starttime_grep="start in" 20 34 }}} 35 As the queuing system on different computing systems may vary, it is not possible to provide detailed instructions how to set this configuration. In case you are struggling with the configuration, please feel free to create a [/newticket new ticket] 21 36 22 2. in palm_wdd, system-specific configurations must be made. The default is configured to be used on the Cray-XC40 at HLRN-III and reads 23 {{{ 24 cmd_readqueue = "showq | egrep " 25 cmd_tmpdir = "/gfs1/tmp/" 26 cmd_canceljob = "canceljob" 27 cmd_checkjob = "checkjob" 28 cmd_realname_grep = "AName" 29 cmd_starttime = "showstart" 30 cmd_starttime_grep = "start in" 31 }}} 32 For other hosts, the parameters above must be adjusted appropriately. 33 34 3. Copy palm_wdd into the $HOME directory of each of the remote hosts, i.e. for HLRN-III: 37 3. Now copy palm_wdd and the configuration files into the $HOME directory of each of the remote hosts, i.e. for HLRN-III: 35 38 {{{ 36 39 scp palm_wdd nikname@hlogin.hlrn.de: 40 scp .wdd.config.hlrnIII nikname@hlogin.hlrn.de:.wdd.config 37 41 scp palm_wdd nikname@blogin.hlrn.de: 38 }}} 39 40 4. Create database files for the watchdog in your working directory: 41 {{{ 42 cp $PALM_BIN/palm_wd_files/.wd.olddata $HOME/palm/current_version 43 cp $PALM_BIN/palm_wd_files/.wd.newdata $HOME/palm/current_version 42 scp .wdd.config.hlrnIII nikname@blogin.hlrn.de:.wdd.config 44 43 }}} 45 44 … … 55 54 56 55 == Documentation 57 The watchdog is to large extent self-explanatory. The following features, however, require a short description.56 The watchdog is to large extent self-explanatory. The following features, however, might require a short description. 58 57 59 58 === Progress bar