Home

Context Navigation

Changes between Version 13 and Version 14 of doc/tec/gpu

Timestamp:: Feb 8, 2016 2:33:27 PM (9 years ago)
Author:: raasch
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

doc/tec/gpu

-                      v13
+                      v14
 The session uses 2 Intel-CPU-cores for 10000 seconds. You will need two cores if you want to run PALM on both of the two K40 boards.
 Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings:
+Configuration file settings should be as followed:
 {{{
+module load pgi-compiler/2013-136
+%remote_username   <replace>                          lcmuk parallel pgigpu146
+#%modules           pgi/14.6:mvapich2/2.0-pgi-cuda    lcmuk parallel pgigpu146     # mvapich doesn't work so far
+%modules           pgi/14.6:openmpi/1.8.3-pgi-cuda    lcmuk parallel pgigpu146
+%tmp_user_catalog  /tmp                               lcmuk parallel pgigpu146
+%compiler_name     mpif90                             lcmuk parallel pgigpu146
+%compiler_name_ser pgf90                              lcmuk parallel pgigpu146
+%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
+%mopts             -j:1                               lcmuk parallel pgigpu146
+%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
+%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
 }}}
+The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance.
+It might be neccessary to load the modules manually before calling mbuild or mrun:
+{{{
+module load pgi/14.6 openmpi/1.8.3-pgi-cuda
+}}}
+Furthermore, it is required to set the environment variable
+{{{
+export OMPI_COMM_WORLD_LOCAL_RANK=1
+}}}
+before calling mrun!  Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model.
+A test parameter-set:
+{{{
+/home/raasch/current_version/JOBS/acc_medium/INPUT/acc_medium_p3d
+}}}
+Here are some hints for running the single-GPU (no-MPI) version:\\
 Compiler settings are given in
 {{{
 …
 Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}.
+r1747 \\
+Partial adjustments for new surface layer scheme. Version is (in principle) instrumented to run on multiple GPUs
 '''Results for 256x256x64 grid (time in micro-s per gridpoint and timestep):''' \\
 ||.1 ||1*Tesla, single-core (no MPI), pgi13.6             ||0.33342 ||r1221 ||
 …
 The initialization time of the GPU (power up) can be avoided by running {{{/muksoft/packages/pgi/2013-136/linux86-64/13.6/bin/pgcudainit}}} in background.
 For current PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).
+For PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).
 '''Next steps:'''