Changes between Version 16 and Version 17 of doc/tec/gpu


Ignore:
Timestamp:
Feb 9, 2016 12:25:29 PM (9 years ago)
Author:
raasch
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • doc/tec/gpu

    v16 v17  
    8181Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}.
    8282
    83 r1747 \\
     83r1749 \\
    8484Partial adjustments for new surface layer scheme. Version is (in principle) instrumented to run on multiple GPUs
    8585
     
    104104'''work packages fpr the EuroHack:'''
    105105
    106 * getting the CUDA-aware MPI to run: for this routines {{{time_integration}}} and {{{exchange_horiz}}} in r1747 have to be replaced by the routines that I provided. If the exchange of ghost points is running sufficiently, the next step would be to make the {{{MPI_ALLTOALL}}} in {{{transpose.f90}}} CUDA-aware. This should be very easy. Just add (e.g.) {{{host_data use_device( f_inv, work )}}} clauses in front of the {{{MPI_ALLTOALL}}} calls and remove the existing {{{update host}}} and {{{data copyin}}} clauses. Also, {{{update host}}} and {{{update device}}} clauses for array {{{ar}}} have to be removed in {{{poisfft}}}.
     106* getting the CUDA-aware MPI to run: for this routines {{{time_integration}}} and {{{exchange_horiz}}} in r1749 have to be replaced by the routines that I provided. If the exchange of ghost points is running sufficiently, the next step would be to make the {{{MPI_ALLTOALL}}} in {{{transpose.f90}}} CUDA-aware. This should be very easy. Just add (e.g.) {{{host_data use_device( f_inv, work )}}} clauses in front of the {{{MPI_ALLTOALL}}} calls and remove the existing {{{update host}}} and {{{data copyin}}} clauses. Also, {{{update host}}} and {{{update device}}} clauses for array {{{ar}}} have to be removed in {{{poisfft}}}.
    107107* CUDA-fft has been implemented and successfully tested for the single-GPU (non-MPI) mode. It can be switched on using parameter {{{fft_method = 'system-specific}}}. Additionally, the compiler-switch {{{-D__cuda_fft}}} and the linker option {{{-lcufft}}} have to be set. For an unknown reason, this method does not work in the MPI-mode (the pressure solver does not reduce the divergence).
    108108* In general: do the existing clauses (e.g. {{{loop vector / loop gang}}} give the best performance?