Home

Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Version 20 and Version 21 of doc/tec/gpu

Timestamp:: Nov 21, 2018 5:10:59 PM (7 years ago)
Author:: scharf
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

doc/tec/gpu

-                      v20
+                      v21
 Configuration file settings should be as followed:
 {{{
 %remote_username   <replace>                          lcmuk parallel pgigpu146
 #%modules           pgi/14.6:mvapich2/2.0-pgi-cuda    lcmuk parallel pgigpu146     # mvapich doesn't work so far
 %modules           pgi/14.6:openmpi/1.8.3-pgi-cuda    lcmuk parallel pgigpu146
 %tmp_user_catalog  /tmp                               lcmuk parallel pgigpu146
 %compiler_name     mpif90                             lcmuk parallel pgigpu146
 %compiler_name_ser pgf90                              lcmuk parallel pgigpu146
 %cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
 %mopts             -j:1                               lcmuk parallel pgigpu146
 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
+%remote_username   <replace>
+#%modules           pgi/14.6:mvapich2/2.0-pgi-cuda         # mvapich doesn't work so far
+%modules           pgi/14.6:openmpi/1.8.3-pgi-cuda
+%tmp_user_catalog  /tmp
+%compiler_name     mpif90
+%compiler_name_ser pgf90
+%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc
+%mopts             -j:1
+%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0
+%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft
 }}}
 Please note settings of cpp-directives ({{{-D__nopointer -D__openacc -D__cuda_fft}}} + CUDA library path in {{{lopts}}}).\\The {{{nocache}}} compiler switch '''is not required any more'''. (Earlier compiler versions, e.g. 13.6 gave a significant loss of performance in case of omitting this switch). The {{{time}}}-switch creates and outputs performance data at the end of a run. Very useful! \\ \\
 It might be necessary to load the modules manually before calling mbuild or mrun:
+It might be necessary to load the modules manually before calling palmbuild or palmrun:
 {{{
 module load pgi/14.6 openmpi/1.8.3-pgi-cuda
 …
 export PGI_ACC_NOSYNCQUEUE=1
 }}}
 before calling mrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\
+before calling palmrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\
 Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model:
 …
 Please note that {{{loop_optimization = 'acc'}}} and {{{psolver = 'poisfft'}}} have to be set. {{{fft_method = 'system-specific'}}} is required to switch on the CUDA-fft. All other fft-methods do not run on the GPU, i.e. they are extremely slow. \\ \\
 mrun-command to run on two GPU-devices:
+palmrun-command to run on two GPU-devices:
 {{{
 mrun -d acc_medium -h lcmuk -K "parallel pgigpu146" -X2 -T2 -r "d3#"
+palmrun -r acc_medium -c lcmuk -K "parallel pgigpu146" -X2 -T2 -a "d3#"
 }}}
  \\
 …
 Runs on a single GPU without MPI (i.e. no domain decomposition) require this configuration:
 {{{
 %compiler_name     pgf90                                                       lcmuk pgigpu146
 %compiler_name_ser pgf90                                                       lcmuk pgigpu146
 %cpp_options       -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk pgigpu146
 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0  lcmuk pgigpu146
 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk pgigpu146
+%compiler_name     pgf90
+%compiler_name_ser pgf90
+%cpp_options       -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc
+%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0
+%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft
 }}}
 Run it with
 {{{
 mrun -d acc_medium -K pgigpu146 -r "d3#"
+palmrun -r acc_medium -K pgigpu146 -a "d3#"
 }}}
  \\