Changes between Initial Version and Version 1 of doc/tec/gpu


Ignore:
Timestamp:
Sep 27, 2012 9:17:48 AM (12 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v1 v1  
     1== Porting the code to NVidia GPU using the OpenACC programming model
     2
     3Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings:
     4{{{
     5export LM_LICENSE_FILE=27000@lizenzserv.rrzn.uni-hannover.de
     6export PATH=/localdata/opt/mpich2/1.4.1p1/bin:$PATH
     7export PATH=$PATH:/muksoft/packages/intel/bin:/muksoft/bin
     8export PATH=$PATH:/localdata/opt/pgi/linux86-64/12.5/bin:/usr/local/cuda/bin
     9}}}
     10Compiler settings are given in
     11{{{
     12.../trunk/SCRIPTS/.mrun.config.imuk_gpu
     13}}}
     14Please note settings of cpp-directives.\\
     15Test parameter set:
     16{{{
     17/home/raasch/current_version/JOBS/gputest/INPUT/gputest_p3d
     18}}}
     19Please note that {{{loop_optomization = 'acc'}}} has to be set. Results of tests are stored in the respective {{{MONITORING}}} directory.
     20
     21'''Report on current activities:'''
     22
     23r1015 \\
     24prognostic equations (partly: q and sa is missing), prandtl_fluxes, and diffusivities have been ported \\
     25statistics are not ported at all \\
     26speedup seems to be similar to what have been reported by Klaus Ketelsen \\
     27measurements with Intel compiler on {{{inferno}}} still have to be carried out
     28
     29'''Results:''' \\
     30.6   pgf90 without any acc kernels \\
     31.31  last acc version \\
     32.32  ifort (on bora) using acc-branch \\
     33.34  ifort (on bora) using vector-branch \\\\
     34
     35'''Next steps:'''
     36
     37* porting the Poisson solver following Klaus' suggestions (there is still a bug in his last version), implement fast tridiagonal solver for GPU
     38* creating a single core version (without using MPI, so that host-device transfer is minimized)
     39* testing the PGI 12.6 compiler version, porting of flow_statistics if reduction is implemented, check the capability of parallel regions
     40* update ghost boundaries only, overlapping of update/MPI and computation?
     41* overlapping communication
     42* porting of remaining things (averaging, I/O, etc.)
     43* ...