Changes between Version 13 and Version 14 of doc/tec/gpu


Ignore:
Timestamp:
Feb 8, 2016 2:33:27 PM (9 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v13 v14  
    1818The session uses 2 Intel-CPU-cores for 10000 seconds. You will need two cores if you want to run PALM on both of the two K40 boards.
    1919
    20 Tests can be done on host {{{inferno}}} only, using the PGI-FORTRAN compiler. Required settings:
     20Configuration file settings should be as followed:
    2121{{{
    22 module load pgi-compiler/2013-136
     22%remote_username   <replace>                          lcmuk parallel pgigpu146
     23#%modules           pgi/14.6:mvapich2/2.0-pgi-cuda    lcmuk parallel pgigpu146     # mvapich doesn't work so far
     24%modules           pgi/14.6:openmpi/1.8.3-pgi-cuda    lcmuk parallel pgigpu146
     25%tmp_user_catalog  /tmp                               lcmuk parallel pgigpu146
     26%compiler_name     mpif90                             lcmuk parallel pgigpu146
     27%compiler_name_ser pgf90                              lcmuk parallel pgigpu146
     28%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft:-D__lc  lcmuk parallel pgigpu146
     29%mopts             -j:1                               lcmuk parallel pgigpu146
     30%fopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0  lcmuk parallel pgigpu146
     31%lopts             -acc:-ta=tesla,6.0,nocache,time:-Minfo=acc:-fastsse:-Mcuda=cuda6.0:-lcufft  lcmuk parallel pgigpu146
    2332}}}
     33The {{{nocache}}} compiler switch is currently required. Otherwise there would be a significant loss of performance.
     34It might be neccessary to load the modules manually before calling mbuild or mrun:
     35{{{
     36module load pgi/14.6 openmpi/1.8.3-pgi-cuda
     37}}}
     38Furthermore, it is required to set the environment variable
     39{{{
     40export OMPI_COMM_WORLD_LOCAL_RANK=1
     41}}}
     42before calling mrun!  Compiler version 14.10 gives a runtime error when pres is called for the first time in init_3d_model.
     43 
     44A test parameter-set:
     45{{{
     46/home/raasch/current_version/JOBS/acc_medium/INPUT/acc_medium_p3d
     47}}}
     48
     49Here are some hints for running the single-GPU (no-MPI) version:\\
    2450Compiler settings are given in
    2551{{{
     
    5581Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}.
    5682
     83r1747 \\
     84Partial adjustments for new surface layer scheme. Version is (in principle) instrumented to run on multiple GPUs
     85
    5786'''Results for 256x256x64 grid (time in micro-s per gridpoint and timestep):''' \\
    5887||.1 ||1*Tesla, single-core (no MPI), pgi13.6             ||0.33342 ||r1221 ||
     
    6190The initialization time of the GPU (power up) can be avoided by running {{{/muksoft/packages/pgi/2013-136/linux86-64/13.6/bin/pgcudainit}}} in background.
    6291
    63 For current PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).
     92For PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).
    6493
    6594'''Next steps:'''