Changes between Version 17 and Version 18 of doc/tec/gpu
- Timestamp:
- Feb 9, 2016 4:38:15 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v17 v18 31 31 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk parallel pgigpu146 32 32 }}} 33 The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance. 34 It might be neccessary to load the modules manually before calling mbuild or mrun: 33 The {{{nocache}}} compiler switch '''is not required any more'''. (Earlier compiler versions, e.g. 13.6 gave a significant loss of performance in case of omitting this switch). The {{{time}}}-switch creates and outputs performance data at the end of a run. Very useful! \\ \\ 34 35 It might be necessary to load the modules manually before calling mbuild or mrun: 35 36 {{{ 36 37 module load pgi/14.6 openmpi/1.8.3-pgi-cuda 37 38 }}} 38 Furthermore, it is required to set the environment variable 39 Furthermore, it is required to set the environment variables 39 40 {{{ 40 41 export OMPI_COMM_WORLD_LOCAL_RANK=1 42 export PGI_ACC_NOSYNCQUEUE=1 41 43 }}} 42 before calling mrun! Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model. 43 44 A test parameter-set: 44 before calling mrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! 45 46 Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model: 47 {{{ 48 cuEventRecord returned error 400: Invalid handle 49 }}} 50 I guess that this problem is also somehow connected with usage of streams. I got following informations from Mat Colgrove (NVidia/PGI): 51 {{{ 52 We were able to determine the issue with calling cuFFT (TPR#20579). In 14.4 we stopped using stream 0 as the default 53 stream for OpenACC since stream 0 has some special properties that made asynchronous behavior. The problem with that 54 if combined with a calling a CUDA code, which still uses stream 0, the streams and hence the data can get out of sync. 55 In 14.7, we'll change OpenACC to use stream 0 again if "-Mcuda" is used. 56 57 In the meantime, you can set the environment variable "PGI_ACC_NOSYNCQUEUE=1" to work around the issue. 58 }}} 59 60 A test parameter-set can be found here: 45 61 {{{ 46 62 /home/raasch/current_version/JOBS/acc_medium/INPUT/acc_medium_p3d … … 112 128 * Routine {{{surface_layer_fluxes}}}: there are some loops (DO WHILE, DO without specific loop counter, etc.) which cannot be vectorized 113 129 * Routine {{{swap_timelevel}}}: Why does the compiler cannot vectorize the FORTRAN vector assignments like {{{u = u_p}}}? 114 * Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU? 130 * Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU? 115 131 132 '''Things that still need to be ported:''' 133 * multigrid-solver 134 * cloud physics 135 * stuff related with non-cyclic BC 136 * random-number generator 137 * the complete LPM 138 139