Changes between Version 17 and Version 18 of doc/tec/gpu


Ignore:
Timestamp:
Feb 9, 2016 4:38:15 PM (9 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v17 v18  
    3131%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
    3232}}}
    33 The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance.
    34 It might be neccessary to load the modules manually before calling mbuild or mrun:
     33The {{{nocache}}} compiler switch '''is not required any more'''. (Earlier compiler versions, e.g. 13.6 gave a significant loss of performance in case of omitting this switch). The {{{time}}}-switch creates and outputs performance data at the end of a run. Very useful! \\ \\
     34 
     35It might be necessary to load the modules manually before calling mbuild or mrun:
    3536{{{
    3637module load pgi/14.6 openmpi/1.8.3-pgi-cuda
    3738}}}
    38 Furthermore, it is required to set the environment variable
     39Furthermore, it is required to set the environment variables
    3940{{{
    4041export OMPI_COMM_WORLD_LOCAL_RANK=1
     42export PGI_ACC_NOSYNCQUEUE=1
    4143}}}
    42 before calling mrun!  Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model.
    43  
    44 A test parameter-set:
     44before calling mrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence!
     45
     46Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model:
     47{{{
     48cuEventRecord returned error 400: Invalid handle
     49}}}
     50I guess that this problem is also somehow connected with usage of streams. I got following informations from Mat Colgrove (NVidia/PGI):
     51{{{
     52We were able to determine the issue with calling cuFFT (TPR#20579).  In 14.4 we stopped using stream 0 as the default
     53stream for OpenACC since stream 0 has some special properties that made asynchronous behavior.  The problem with that
     54if combined with a calling a CUDA code, which still uses stream 0, the streams and hence the data can get out of sync.
     55In 14.7, we'll change OpenACC to use stream 0 again if "-Mcuda" is used.
     56
     57In the meantime, you can set the environment variable "PGI_ACC_NOSYNCQUEUE=1" to work around the issue.
     58}}}
     59
     60A test parameter-set can be found here:
    4561{{{
    4662/home/raasch/current_version/JOBS/acc_medium/INPUT/acc_medium_p3d
     
    112128* Routine {{{surface_layer_fluxes}}}: there are some loops (DO WHILE, DO without specific loop counter, etc.) which cannot be vectorized
    113129* Routine {{{swap_timelevel}}}: Why does the compiler cannot vectorize the FORTRAN vector assignments like {{{u = u_p}}}?
    114 * Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU? 
     130* Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU?
    115131
     132'''Things that still need to be ported:'''
     133* multigrid-solver
     134* cloud physics
     135* stuff related with non-cyclic BC
     136* random-number generator
     137* the complete LPM
     138
     139