Changes between Version 4 and Version 5 of doc/app/palm_wd


Ignore:
Timestamp:
Jul 7, 2015 1:19:49 PM (10 years ago)
Author:
maronga
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/app/palm_wd

    v4 v5  
    11= PALM watchdog
    2 From revision r1611 a batch job monitoring tool (watchdog), called '''palm_wd''' is available. It is based on python 2.7 and Qt4.\\
     2From revision r1611 a batch job monitoring tool (watchdog), called '''palm_wd''' is available. It is based on python 2.7 and Qt4.\\\\
     3
    34[[Image(palm_logo_wd.ico, border=0, center, nolink)]]
    45
    56== Configuration of the watchdog
     7The watchdog consists of two scripts, palm_wd (watchdog client to be run on the local host) and palm_wdd (server to be located on each remote host to be monitored). Before running the watchdog, both client and server require system-specific configurations:
    68
     91. in palm_wd, create one item for each remote host in the following three lists hostname, username and description, e.g.
     10{{{
     11hostname     = ["hlogin.hlrn.de", "blogin.hlrn.de"]
     12username     = ["nikname"       , "nikname"       ]
     13description  = ["Hannover"      , "Berlin"        ]
     14}}}
     15here hostname is the IP of the remote host (assuming that a passwordless login via ssh-key is available), username is the user name on the remote host, and description is an arbritrary string to identify the host.
     16
     17Additionally, the update_frequency can be adjusted:
     18{{{
     19update_frequency = 600*1000
     20}}}
     21
     222. in palm_wdd, system-specific configurations must be made. The default is configured to be used on the Cray-XC40 at HLRN-III and reads
     23{{{
     24cmd_readqueue      = "showq | egrep "
     25cmd_tmpdir         = "/gfs1/tmp/"
     26cmd_canceljob      = "canceljob"
     27cmd_checkjob       = "checkjob"
     28cmd_realname_grep  = "AName"
     29cmd_starttime      = "showstart"
     30cmd_starttime_grep = "start in"
     31}}}
     32For other hosts, the parameters above must be adjusted appropriately.
     33
     343. Copy palm_wdd into the $HOME directory of each of the remote hosts, i.e. for HLRN-III:
     35{{{
     36scp palm_wdd, nikname@hlogin.hlrn.de
     37scp palm_wdd, nikname@blogin.hlrn.de
     38}}}
     39
     404. Create database files for the watchdog in your working directory:
     41{{{
     42cp $PALM_BIN/palm_wd_files/.wd.olddata $HOME/palm/current_version
     43cp $PALM_BIN/palm_wd_files/.wd.newdata $HOME/palm/current_version
     44}}}
    745
    846== Running the watchdog
     47The watchdog can be either started by typing
     48{{{
     49palm_wd
     50}}}
     51into the shell, or via the mrungui (Start -> Start watchdog).
    952
    1053