Home

Context Navigation

Changes between Version 15 and Version 16 of doc/tec/gpu

Timestamp:: Feb 8, 2016 4:58:43 PM (9 years ago)
Author:: raasch
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

doc/tec/gpu

-                      v15
+                      v16
 %cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
 %mopts             -j:1                               lcmuk parallel pgigpu146
 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
+%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
+%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
 }}}
 The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance.
 …
 * In routine {{{advec_ws}}} I had to introduce another array {{{wall_flags_00}}} to hold wall flags for bits 32-63. It seems, that OpenACC/PGI-compiler can only handle single precision (32bit) INTEGER. Is that true?
 * Routine {{{fft_xy}}}: The clause {{{!$acc declare create( ar_tmp )}}} does not work starting with the 14.1 compiler-version. Instead, I had to use {{{!$acc data create( ar_tmp )}}} clauses. Why? Does this problem still exist for the current compiler version?
 * Routine {{{surface_layer_fluxes}}}: inlining of functions
+* Routine {{{surface_layer_fluxes}}}: there are some loops (DO WHILE, DO without specific loop counter, etc.) which cannot be vectorized
 * Routine {{{swap_timelevel}}}: Why does the compiler cannot vectorize the FORTRAN vector assignments like {{{u = u_p}}}?
 * Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU?