Version 6 (modified by fricke, 11 years ago) (diff) |
---|
Hints for using the Cray-XC30 at HLRN
- Running remote jobs
- Fortran issues
- Output problem with combine_plot_fields
Running remote jobs
Starting from r1255, PALM allows full remote access of the Berlin complex of HLRNIII. Since the batch compute nodes do not allow to use ssh/scp (which is required by mrun for carrying out several crucial tasks, e.g. for automatic submission of restart runs), the ssh/scp commands are executed on one of the login nodes (blogin1) as a workaround. Therefore, blogin1 must be a known host for ssh/scp. This requires the user to carry out the following three steps just once:
- Login on blogin and create a pair of private/public ssh-keys (replace <hlrn-username> by your HLRN-username):
ssh <hlrn-username>@blogin1.hlrn.de ssh-keygen -t dsa
Enter <return> for any query, until the shell-prompt appears again.
- On blogin, define the public key as one of the authorized keys to access the system:
cat id_dsa.pub >> authorized_keys
- Still logged in on blogin, login on blogin1:
ssh <hlrn-username>@blogin1
After the third step, the messageWarning: Permanently added 'blogin1,130.73.233.1' (RSA) to the list of known hosts.
should appear on the terminal.
Fortran issues
The Cray Fortran Compiler (ftn) on HLRNIII is known to be less flexible when it comes to the Fortran code style. In the following you find known issues observed at HLRNIII.
NAMELIST files
- It is no longer allowed to use a space character between the variable name of an array (e.g. mask_x_loop) and the bracket "(".
Example:
mask_x_loop (1,:) = 0., 500. ,50., (old)
mask_x_loop(1,:) = 0., 500. ,50., (new).
Conditional statements (IF-THEN-ELSE)
- It is no longer possible to use == or .EQ. for comparison of variables of type LOGICAL.
Example:
IF ( variable == .TRUE. ) THEN is not supported. You must use IF ( variable ) THEN (or IF ( .NOT. variable ) THEN) instead.
Output problem with combine_plot_fields
This problem is solved in revision 1270
The output of 2D or 3D data with PALM may cause the following error message in the job protocol:
*** post-processing: now executing "combine_plot_fields_parallel.x" ..../mrun: line 3923: 30156: Memory fault
"/mrun: line 3923:" refers to the line where combine_plot_fields is called in the mrun-script (line number may vary with script version).
Since each processor opens its own output file and writes 2D- or 3D-binary data into it, the routine combine_plot_fields combines these output files into one single file. Output format is netcdf. The reason for this error is that combine_plot_fields is started on the Cray system managment (MOM) nodes, where the stack size is limited to 8 Mbytes. This value is exceeded e.g. if a cross-section has more than 1024 x 1024 grid points. The stack size should not be increased, otherwise the system may crash (see the HLRN site for more information). To start combine_plot_fields on the computing nodes, aprun is required (so far, combine_plot_fields is not started with aprun in PALM).
For the moment we recommend to carry out the following steps:
- If you start the job, save the temporary directory by using the following option:
mrun ... -B
- After the job has finished, the executable file 'combine_plot_fields_<block>.x' has to be copied from trunk/SCRIPTS/ to the temporary directory. <block> is given in the .mrun.config in column five (and six), e.g. parallel. The location of the temporary directory is given by %tmp_user_catalog in the .mrun.config.
- Create a batch script which is using aprun to start the executable file, e.g. like this:
#!/bin/bash #PBS -l nodes=1:ppn=1 #PBS -q mpp1q #PBS -l walltime=00:30:00 #PBS -l partition=berlin cd <%tmp_user_catalog> aprun -n 1 -N 1 ./combine_plot_fields_<block>.x
Attention: Use only the batch queues mmp1q or testq, otherwise it may not be working.
- After running the batch script, the following files should be available in the temporary directory (depending on the chosen output during the simulation): DATA_2D_XY_NETCDF, DATA_2D_XZ_NETCDF, DATA_2D_YZ_NETCDF, DATA_2D_XY_AV_NETCDF, DATA_2D_XZ_AV_NETCDF and DATA_2D_YZ_AV_NETCDF. You can copy these files to the standard output directory and you can rename them, e.g. DATA_2D_XY_NETCDF to <job_name>_xy.nc.
Attachments (1)
-
UseParallelNetCDF.pdf
(56.4 KB) -
added by raasch 11 years ago.
parallel NetCDF I/O on Cray-XC30 at HLRNIII
Download all attachments as: .zip