| 1 | == Porting the code to NVidia GPU using the OpenACC programming model |
| 2 | |
| 3 | Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings: |
| 4 | {{{ |
| 5 | export LM_LICENSE_FILE=27000@lizenzserv.rrzn.uni-hannover.de |
| 6 | export PATH=/localdata/opt/mpich2/1.4.1p1/bin:$PATH |
| 7 | export PATH=$PATH:/muksoft/packages/intel/bin:/muksoft/bin |
| 8 | export PATH=$PATH:/localdata/opt/pgi/linux86-64/12.5/bin:/usr/local/cuda/bin |
| 9 | }}} |
| 10 | Compiler settings are given in |
| 11 | {{{ |
| 12 | .../trunk/SCRIPTS/.mrun.config.imuk_gpu |
| 13 | }}} |
| 14 | Please note settings of cpp-directives.\\ |
| 15 | Test parameter set: |
| 16 | {{{ |
| 17 | /home/raasch/current_version/JOBS/gputest/INPUT/gputest_p3d |
| 18 | }}} |
| 19 | Please note that {{{loop_optomization = 'acc'}}} has to be set. Results of tests are stored in the respective {{{MONITORING}}} directory. |
| 20 | |
| 21 | '''Report on current activities:''' |
| 22 | |
| 23 | r1015 \\ |
| 24 | prognostic equations (partly: q and sa is missing), prandtl_fluxes, and diffusivities have been ported \\ |
| 25 | statistics are not ported at all \\ |
| 26 | speedup seems to be similar to what have been reported by Klaus Ketelsen \\ |
| 27 | measurements with Intel compiler on {{{inferno}}} still have to be carried out |
| 28 | |
| 29 | '''Results:''' \\ |
| 30 | .6 pgf90 without any acc kernels \\ |
| 31 | .31 last acc version \\ |
| 32 | .32 ifort (on bora) using acc-branch \\ |
| 33 | .34 ifort (on bora) using vector-branch \\\\ |
| 34 | |
| 35 | '''Next steps:''' |
| 36 | |
| 37 | * porting the Poisson solver following Klaus' suggestions (there is still a bug in his last version), implement fast tridiagonal solver for GPU |
| 38 | * creating a single core version (without using MPI, so that host-device transfer is minimized) |
| 39 | * testing the PGI 12.6 compiler version, porting of flow_statistics if reduction is implemented, check the capability of parallel regions |
| 40 | * update ghost boundaries only, overlapping of update/MPI and computation? |
| 41 | * overlapping communication |
| 42 | * porting of remaining things (averaging, I/O, etc.) |
| 43 | * ... |