Home

Context Navigation

← Previous Change
Wiki History
Next Change →

Changes between Initial Version and Version 1 of doc/tec/gpu

Timestamp:: Sep 27, 2012 9:17:48 AM (13 years ago)
Author:: raasch
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

doc/tec/gpu

                       v1
+== Porting the code to NVidia GPU using the OpenACC programming model
+Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings:
+{{{
+export LM_LICENSE_FILE=27000@lizenzserv.rrzn.uni-hannover.de
+export PATH=/localdata/opt/mpich2/1.4.1p1/bin:$PATH
+export PATH=$PATH:/muksoft/packages/intel/bin:/muksoft/bin
+export PATH=$PATH:/localdata/opt/pgi/linux86-64/12.5/bin:/usr/local/cuda/bin
+}}}
+Compiler settings are given in
+{{{
+.../trunk/SCRIPTS/.mrun.config.imuk_gpu
+}}}
+Please note settings of cpp-directives.\\
+Test parameter set:
+{{{
+/home/raasch/current_version/JOBS/gputest/INPUT/gputest_p3d
+}}}
+Please note that {{{loop_optomization = 'acc'}}} has to be set. Results of tests are stored in the respective {{{MONITORING}}} directory.
+'''Report on current activities:'''
+r1015 \\
+prognostic equations (partly: q and sa is missing), prandtl_fluxes, and diffusivities have been ported \\
+statistics are not ported at all \\
+speedup seems to be similar to what have been reported by Klaus Ketelsen \\
+measurements with Intel compiler on {{{inferno}}} still have to be carried out
+'''Results:''' \\
+.6   pgf90 without any acc kernels \\
+.31  last acc version \\
+.32  ifort (on bora) using acc-branch \\
+.34  ifort (on bora) using vector-branch \\\\
+'''Next steps:'''
+* porting the Poisson solver following Klaus' suggestions (there is still a bug in his last version), implement fast tridiagonal solver for GPU
+* creating a single core version (without using MPI, so that host-device transfer is minimized)
+* testing the PGI 12.6 compiler version, porting of flow_statistics if reduction is implemented, check the capability of parallel regions
+* update ghost boundaries only, overlapping of update/MPI and computation?
+* overlapping communication
+* porting of remaining things (averaging, I/O, etc.)
+* ...