== Porting the code to NVidia GPU using the OpenACC programming model Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings: {{{ export LM_LICENSE_FILE=27000@lizenzserv.rrzn.uni-hannover.de export PATH=/localdata/opt/mpich2/1.4.1p1/bin:$PATH export PATH=$PATH:/muksoft/packages/intel/bin:/muksoft/bin export PATH=$PATH:/localdata/opt/pgi/linux86-64/12.5/bin:/usr/local/cuda/bin }}} Compiler settings are given in {{{ .../trunk/SCRIPTS/.mrun.config.imuk_gpu }}} Please note settings of cpp-directives.\\ Test parameter set: {{{ /home/raasch/current_version/JOBS/gputest/INPUT/gputest_p3d }}} Please note that {{{loop_optomization = 'acc'}}} has to be set. Results of tests are stored in the respective {{{MONITORING}}} directory. '''Report on current activities:''' r1015 \\ prognostic equations (partly: q and sa is missing), prandtl_fluxes, and diffusivities have been ported \\ statistics are not ported at all \\ speedup seems to be similar to what have been reported by Klaus Ketelsen \\ measurements with Intel compiler on {{{inferno}}} still have to be carried out '''Results:''' \\ .6 pgf90 without any acc kernels \\ .31 last acc version \\ .32 ifort (on bora) using acc-branch \\ .34 ifort (on bora) using vector-branch \\\\ '''Next steps:''' * porting the Poisson solver following Klaus' suggestions (there is still a bug in his last version), implement fast tridiagonal solver for GPU * creating a single core version (without using MPI, so that host-device transfer is minimized) * testing the PGI 12.6 compiler version, porting of flow_statistics if reduction is implemented, check the capability of parallel regions * update ghost boundaries only, overlapping of update/MPI and computation? * overlapping communication * porting of remaining things (averaging, I/O, etc.) * ...