Changes between Version 9 and Version 10 of doc/tec/gpu
- Timestamp:
- Sep 10, 2013 11:18:56 AM (11 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v9 v10 42 42 In single-core mode, lateral boundary conditions completely run on device. Most loops in {{{pres}}} ported. Vertical boundary conditions ({{{boundary_conds}}}) ported. 43 43 44 '''Results for 512x512x64 grid (time in micro-s per gridpoint and timestep):''' \\ 45 ||.1 ||2*Tesla, quadcore, pgi ||0.32606 || 46 ||.2 ||1*Tesla, single-core (no MPI), pgi ||0.42138 || 47 ||.3 ||quadcore, pgi (acc) ||0.78062 || 48 ||.4 ||quadcore, pgi (vec) ||0.64060 || 49 ||.5 ||quadcore ?? (cache) ||0.67272 || 50 ||.6 ||quadcore ?? (cache) ||0.79969 || 51 ||.7 ||quadcore ?? (vec) ||0.77608 || 52 ||.8 ||quadcore ?? (acc) ||1.00139 || 44 r1221 \\ 45 Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}. 46 47 '''Results for 256x256x64 grid (time in micro-s per gridpoint and timestep):''' \\ 48 ||.1 ||1*Tesla, single-core (no MPI), pgi13.6 ||0.33342 ||r1221 || 49 ||.2 ||single-core (no MPI), pgi13.6 (cache, Temperton) ||2.34144 ||r1221 || 53 50 54 51 The initialization time of the GPU (power up) can be avoided by running {{{/muksoft/packages/pgi/2013-136/linux86-64/13.6/bin/pgcudainit}}} in background. 55 52 53 For current PGI compiler version 13.6, use "-ta=nocache" and set environment variable {{{PGI_ACC_SYNCHRONOUS=1}}}. Otherwise, there will be a significant loss in performance (factor of two!). 54 56 55 '''Next steps:''' 57 56 58 * testing the newest PGI 13.2 compiler version, porting of reduction operations ({{{timestep}}}, {{{flow_statistics}}}, divergence in {{{pres}}}), check the capability of parallel regions (can IF-constructs be removed from inner loops?) 57 * porting of MIN/MAXLOC operations in ({{{timestep}}}, porting of {{{disturb_fields}}} (implement parallel random number generator) 58 * check the capability of parallel regions (can IF-constructs be removed from inner loops?) 59 59 * for MPI mode update ghost boundaries only, overlapping of update/MPI-transfer and computation 60 60 * overlapping communication in pressure solver (alltoall operations) 61 * porting of remaining things ( disturbances,calc_liquid_water_content, compute_vpt, averaging, I/O, etc.)61 * porting of remaining things (calc_liquid_water_content, compute_vpt, averaging, I/O, etc.) 62 62 * ...