Changes between Version 13 and Version 14 of doc/tec/gpu
- Timestamp:
- Feb 8, 2016 2:33:27 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v13 v14 18 18 The session uses 2 Intel-CPU-cores for 10000 seconds. You will need two cores if you want to run PALM on both of the two K40 boards. 19 19 20 Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings:20 Configuration file settings should be as followed: 21 21 {{{ 22 module load pgi-compiler/2013-136 22 %remote_username <replace> lcmuk parallel pgigpu146 23 #%modules pgi/14.6:mvapich2/2.0-pgi-cuda lcmuk parallel pgigpu146 # mvapich doesn't work so far 24 %modules pgi/14.6:openmpi/1.8.3-pgi-cuda lcmuk parallel pgigpu146 25 %tmp_user_catalog /tmp lcmuk parallel pgigpu146 26 %compiler_name mpif90 lcmuk parallel pgigpu146 27 %compiler_name_ser pgf90 lcmuk parallel pgigpu146 28 %cpp_options -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc lcmuk parallel pgigpu146 29 %mopts -j:1 lcmuk parallel pgigpu146 30 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0 lcmuk parallel pgigpu146 31 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk parallel pgigpu146 23 32 }}} 33 The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance. 34 It might be neccessary to load the modules manually before calling mbuild or mrun: 35 {{{ 36 module load pgi/14.6 openmpi/1.8.3-pgi-cuda 37 }}} 38 Furthermore, it is required to set the environment variable 39 {{{ 40 export OMPI_COMM_WORLD_LOCAL_RANK=1 41 }}} 42 before calling mrun! Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model. 43 44 A test parameter-set: 45 {{{ 46 /home/raasch/current_version/JOBS/acc_medium/INPUT/acc_medium_p3d 47 }}} 48 49 Here are some hints for running the single-GPU (no-MPI) version:\\ 24 50 Compiler settings are given in 25 51 {{{ … … 55 81 Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}. 56 82 83 r1747 \\ 84 Partial adjustments for new surface layer scheme. Version is (in principle) instrumented to run on multiple GPUs 85 57 86 '''Results for 256x256x64 grid (time in micro-s per gridpoint and timestep):''' \\ 58 87 ||.1 ||1*Tesla, single-core (no MPI), pgi13.6 ||0.33342 ||r1221 || … … 61 90 The initialization time of the GPU (power up) can be avoided by running {{{/muksoft/packages/pgi/2013-136/linux86-64/13.6/bin/pgcudainit}}} in background. 62 91 63 For currentPGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).92 For PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!). 64 93 65 94 '''Next steps:'''