Changes between Version 20 and Version 21 of doc/tec/gpu
- Timestamp:
- Nov 21, 2018 5:10:59 PM (6 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v20 v21 23 23 Configuration file settings should be as followed: 24 24 {{{ 25 %remote_username <replace> lcmuk parallel pgigpu14626 #%modules pgi/14.6:mvapich2/2.0-pgi-cuda lcmuk parallel pgigpu146# mvapich doesn't work so far27 %modules pgi/14.6:openmpi/1.8.3-pgi-cuda lcmuk parallel pgigpu14628 %tmp_user_catalog /tmp lcmuk parallel pgigpu14629 %compiler_name mpif90 lcmuk parallel pgigpu14630 %compiler_name_ser pgf90 lcmuk parallel pgigpu14631 %cpp_options -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc lcmuk parallel pgigpu14632 %mopts -j:1 lcmuk parallel pgigpu14633 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0 lcmuk parallel pgigpu14634 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk parallel pgigpu14625 %remote_username <replace> 26 #%modules pgi/14.6:mvapich2/2.0-pgi-cuda # mvapich doesn't work so far 27 %modules pgi/14.6:openmpi/1.8.3-pgi-cuda 28 %tmp_user_catalog /tmp 29 %compiler_name mpif90 30 %compiler_name_ser pgf90 31 %cpp_options -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc 32 %mopts -j:1 33 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0 34 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft 35 35 }}} 36 36 Please note settings of cpp-directives ({{{-D__nopointer -D__openacc -D__cuda_fft}}} + CUDA library path in {{{lopts}}}).\\The {{{nocache}}} compiler switch '''is not required any more'''. (Earlier compiler versions, e.g. 13.6 gave a significant loss of performance in case of omitting this switch). The {{{time}}}-switch creates and outputs performance data at the end of a run. Very useful! \\ \\ 37 37 38 It might be necessary to load the modules manually before calling mbuild ormrun:38 It might be necessary to load the modules manually before calling palmbuild or palmrun: 39 39 {{{ 40 40 module load pgi/14.6 openmpi/1.8.3-pgi-cuda … … 45 45 export PGI_ACC_NOSYNCQUEUE=1 46 46 }}} 47 before calling mrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\47 before calling palmrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\ 48 48 49 49 Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model: … … 68 68 Please note that {{{loop_optimization = 'acc'}}} and {{{psolver = 'poisfft'}}} have to be set. {{{fft_method = 'system-specific'}}} is required to switch on the CUDA-fft. All other fft-methods do not run on the GPU, i.e. they are extremely slow. \\ \\ 69 69 70 mrun-command to run on two GPU-devices:70 palmrun-command to run on two GPU-devices: 71 71 {{{ 72 mrun -d acc_medium -h lcmuk -K "parallel pgigpu146" -X2 -T2 -r"d3#"72 palmrun -r acc_medium -c lcmuk -K "parallel pgigpu146" -X2 -T2 -a "d3#" 73 73 }}} 74 74 \\ … … 76 76 Runs on a single GPU without MPI (i.e. no domain decomposition) require this configuration: 77 77 {{{ 78 %compiler_name pgf90 lcmuk pgigpu14679 %compiler_name_ser pgf90 lcmuk pgigpu14680 %cpp_options -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc lcmuk pgigpu14681 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0 lcmuk pgigpu14682 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk pgigpu14678 %compiler_name pgf90 79 %compiler_name_ser pgf90 80 %cpp_options -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc 81 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0 82 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft 83 83 84 84 }}} 85 85 Run it with 86 86 {{{ 87 mrun -d acc_medium -K pgigpu146 -r"d3#"87 palmrun -r acc_medium -K pgigpu146 -a "d3#" 88 88 }}} 89 89 \\