Changes between Version 15 and Version 16 of doc/tec/gpu
- Timestamp:
- Feb 8, 2016 4:58:43 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v15 v16 28 28 %cpp_options -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc lcmuk parallel pgigpu146 29 29 %mopts -j:1 lcmuk parallel pgigpu146 30 %fopts -acc:-ta=tesla,6.0,nocache,time:-Min fo=acc:-fastsse:-Mcuda=cuda6.0 lcmuk parallel pgigpu14631 %lopts -acc:-ta=tesla,6.0,nocache,time:-Min fo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk parallel pgigpu14630 %fopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0 lcmuk parallel pgigpu146 31 %lopts -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft lcmuk parallel pgigpu146 32 32 }}} 33 33 The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance. … … 110 110 * In routine {{{advec_ws}}} I had to introduce another array {{{wall_flags_00}}} to hold wall flags for bits 32-63. It seems, that OpenACC/PGI-compiler can only handle single precision (32bit) INTEGER. Is that true? 111 111 * Routine {{{fft_xy}}}: The clause {{{!$acc declare create( ar_tmp )}}} does not work starting with the 14.1 compiler-version. Instead, I had to use {{{!$acc data create( ar_tmp )}}} clauses. Why? Does this problem still exist for the current compiler version? 112 * Routine {{{surface_layer_fluxes}}}: inlining of functions112 * Routine {{{surface_layer_fluxes}}}: there are some loops (DO WHILE, DO without specific loop counter, etc.) which cannot be vectorized 113 113 * Routine {{{swap_timelevel}}}: Why does the compiler cannot vectorize the FORTRAN vector assignments like {{{u = u_p}}}? 114 114 * Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU?