Changes between Version 20 and Version 21 of doc/tec/gpu


Ignore:
Timestamp:
Nov 21, 2018 5:10:59 PM (6 years ago)
Author:
scharf
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v20 v21  
    2323Configuration file settings should be as followed:
    2424{{{
    25 %remote_username   <replace>                          lcmuk parallel pgigpu146
    26 #%modules           pgi/14.6:mvapich2/2.0-pgi-cuda    lcmuk parallel pgigpu146     # mvapich doesn't work so far
    27 %modules           pgi/14.6:openmpi/1.8.3-pgi-cuda    lcmuk parallel pgigpu146
    28 %tmp_user_catalog  /tmp                               lcmuk parallel pgigpu146
    29 %compiler_name     mpif90                             lcmuk parallel pgigpu146
    30 %compiler_name_ser pgf90                              lcmuk parallel pgigpu146
    31 %cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
    32 %mopts             -j:1                               lcmuk parallel pgigpu146
    33 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
    34 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
     25%remote_username   <replace>                         
     26#%modules           pgi/14.6:mvapich2/2.0-pgi-cuda         # mvapich doesn't work so far
     27%modules           pgi/14.6:openmpi/1.8.3-pgi-cuda   
     28%tmp_user_catalog  /tmp                               
     29%compiler_name     mpif90                             
     30%compiler_name_ser pgf90                             
     31%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc 
     32%mopts             -j:1                               
     33%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0 
     34%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft 
    3535}}}
    3636Please note settings of cpp-directives ({{{-D__nopointer -D__openacc -D__cuda_fft}}} + CUDA library path in {{{lopts}}}).\\The {{{nocache}}} compiler switch '''is not required any more'''. (Earlier compiler versions, e.g. 13.6 gave a significant loss of performance in case of omitting this switch). The {{{time}}}-switch creates and outputs performance data at the end of a run. Very useful! \\ \\
    3737 
    38 It might be necessary to load the modules manually before calling mbuild or mrun:
     38It might be necessary to load the modules manually before calling palmbuild or palmrun:
    3939{{{
    4040module load pgi/14.6 openmpi/1.8.3-pgi-cuda
     
    4545export PGI_ACC_NOSYNCQUEUE=1
    4646}}}
    47 before calling mrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\
     47before calling palmrun! The second one is '''absolutely required''' in case of using the CUDA-fft ({{{fft_method='system_specific' + -D__cuda_fft}}}). If it is not used, the pressure solver does not reduce the divergence! \\
    4848
    4949Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model:
     
    6868Please note that {{{loop_optimization = 'acc'}}} and {{{psolver = 'poisfft'}}} have to be set. {{{fft_method = 'system-specific'}}} is required to switch on the CUDA-fft. All other fft-methods do not run on the GPU, i.e. they are extremely slow. \\ \\
    6969
    70 mrun-command to run on two GPU-devices:
     70palmrun-command to run on two GPU-devices:
    7171{{{
    72 mrun -d acc_medium -h lcmuk -K "parallel pgigpu146" -X2 -T2 -r "d3#"
     72palmrun -r acc_medium -c lcmuk -K "parallel pgigpu146" -X2 -T2 -a "d3#"
    7373}}}
    7474 \\
     
    7676Runs on a single GPU without MPI (i.e. no domain decomposition) require this configuration:
    7777{{{
    78 %compiler_name     pgf90                                                       lcmuk pgigpu146
    79 %compiler_name_ser pgf90                                                       lcmuk pgigpu146
    80 %cpp_options       -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk pgigpu146
    81 %fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0  lcmuk pgigpu146
    82 %lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk pgigpu146
     78%compiler_name     pgf90                                                       
     79%compiler_name_ser pgf90                                                       
     80%cpp_options       -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc 
     81%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0 
     82%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minline:-Minfo=acc:-Mcray=pointer:-fastsse:-Mcuda=cuda6.0:-lcufft 
    8383
    8484}}}
    8585Run it with
    8686{{{
    87 mrun -d acc_medium -K pgigpu146 -r "d3#"
     87palmrun -r acc_medium -K pgigpu146 -a "d3#"
    8888}}}
    8989 \\