Changes between Version 15 and Version 16 of doc/tec/gpu


Ignore:
Timestamp:
Feb 8, 2016 4:58:43 PM (9 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v15 v16  
    2828%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
    2929%mopts             -j:1                               lcmuk parallel pgigpu146
    30 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
    31 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
     30%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
     31%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
    3232}}}
    3333The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance.
     
    110110* In routine {{{advec_ws}}} I had to introduce another array {{{wall_flags_00}}} to hold wall flags for bits 32-63. It seems, that OpenACC/PGI-compiler can only handle single precision (32bit) INTEGER. Is that true?
    111111* Routine {{{fft_xy}}}: The clause {{{!$acc declare create( ar_tmp )}}} does not work starting with the 14.1 compiler-version. Instead, I had to use {{{!$acc data create( ar_tmp )}}} clauses. Why? Does this problem still exist for the current compiler version?
    112 * Routine {{{surface_layer_fluxes}}}: inlining of functions
     112* Routine {{{surface_layer_fluxes}}}: there are some loops (DO WHILE, DO without specific loop counter, etc.) which cannot be vectorized
    113113* Routine {{{swap_timelevel}}}: Why does the compiler cannot vectorize the FORTRAN vector assignments like {{{u = u_p}}}?
    114114* Routine {{{timestep}}}: Is there a chance that the FORTRAN functions {{{MINLOC}}} and {{{MAXLOC}}}, which are used in routine {{{global_min_max}}}, are directly supported on the GPU?