Changes between Version 9 and Version 10 of doc/tec/gpu


Ignore:
Timestamp:
Sep 10, 2013 11:18:56 AM (11 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v9 v10  
    4242In single-core mode, lateral boundary conditions completely run on device. Most loops in {{{pres}}} ported. Vertical boundary conditions ({{{boundary_conds}}}) ported.
    4343
    44 '''Results for 512x512x64 grid (time in micro-s per gridpoint and timestep):''' \\
    45 ||.1 ||2*Tesla, quadcore, pgi             ||0.32606 ||
    46 ||.2 ||1*Tesla, single-core (no MPI), pgi ||0.42138 ||
    47 ||.3 ||quadcore, pgi   (acc)              ||0.78062 ||
    48 ||.4 ||quadcore, pgi   (vec)              ||0.64060 ||
    49 ||.5 ||quadcore ??     (cache)            ||0.67272 ||
    50 ||.6 ||quadcore ??     (cache)            ||0.79969 ||
    51 ||.7 ||quadcore ??     (vec)              ||0.77608 ||
    52 ||.8 ||quadcore ??     (acc)              ||1.00139 ||
     44r1221 \\
     45Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}.
     46
     47'''Results for 256x256x64 grid (time in micro-s per gridpoint and timestep):''' \\
     48||.1 ||1*Tesla, single-core (no MPI), pgi13.6             ||0.33342 ||r1221 ||
     49||.2 ||single-core (no MPI), pgi13.6   (cache, Temperton) ||2.34144 ||r1221 ||
    5350
    5451The initialization time of the GPU (power up) can be avoided by running {{{/muksoft/packages/pgi/2013-136/linux86-64/13.6/bin/pgcudainit}}} in background.
    5552
     53For current PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!).
     54
    5655'''Next steps:'''
    5756
    58 * testing the newest PGI 13.2 compiler version, porting of reduction operations ({{{timestep}}}, {{{flow_statistics}}}, divergence in {{{pres}}}), check the capability of parallel regions (can IF-constructs be removed from inner loops?)
     57* porting of MIN/MAXLOC operations in ({{{timestep}}}, porting of {{{disturb_fields}}} (implement parallel random number generator)
     58* check the capability of parallel regions (can IF-constructs be removed from inner loops?)
    5959* for MPI mode update ghost boundaries only, overlapping of update/MPI-transfer and computation
    6060* overlapping communication in pressure solver (alltoall operations)
    61 * porting of remaining things (disturbances, calc_liquid_water_content, compute_vpt, averaging, I/O, etc.)
     61* porting of remaining things (calc_liquid_water_content, compute_vpt, averaging, I/O, etc.)
    6262* ...