== Porting the code to NVidia GPU using the OpenACC programming model Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings: {{{ export LM_LICENSE_FILE=27000@lizenzserv.rrzn.uni-hannover.de export PATH=/localdata/opt/mpich2/1.4.1p1/bin:$PATH export PATH=$PATH:/muksoft/packages/intel/bin:/muksoft/bin export PATH=$PATH:/localdata/opt/pgi/linux86-64/12.5/bin:/usr/local/cuda/bin }}} Compiler settings are given in {{{ .../trunk/SCRIPTS/.mrun.config.imuk_gpu }}} Please note settings of cpp-directives.\\ Test parameter set: {{{ /home/raasch/current_version/JOBS/gputest/INPUT/gputest_p3d }}} Please note that {{{loop_optomization = 'acc'}}} has to be set. Results of tests are stored in the respective {{{MONITORING}}} directory. '''Report on current activities:''' r1015 \\ prognostic equations (partly: q and sa is missing), prandtl_fluxes, and diffusivities have been ported \\ additional versions for tendency subroutines have been created ({{{..._acc}}}) \\ statistics are not ported at all \\ speedup seems to be similar to what have been reported by Klaus Ketelsen \\ measurements with Intel compiler on {{{inferno}}} still have to be carried out '''Results:''' \\ .6 pgf90 without any acc kernels \\ .31 last acc version \\ .32 ifort (on bora) using acc-branch \\ .34 ifort (on bora) using vector-branch \\\\ '''Next steps:''' * porting the Poisson solver following Klaus' suggestions (there is still a bug in his last version), implement fast tridiagonal solver for GPU * creating a single core version (without using MPI, so that host-device transfer is minimized) * testing the PGI 12.6 compiler version, porting of flow_statistics if reduction is implemented, check the capability of parallel regions * update ghost boundaries only, overlapping of update/MPI and computation? * overlapping communication * porting of remaining things (averaging, I/O, etc.) * ...