Changes between Version 16 and Version 17 of doc/tec/gpu
- Timestamp:
- Feb 9, 2016 12:25:29 PM (9 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
doc/tec/gpu
v16 v17 81 81 Reduction operations in {{{pres}}} and {{{flow_statistics ported}}}. 82 82 83 r174 7\\83 r1749 \\ 84 84 Partial adjustments for new surface layer scheme. Version is (in principle) instrumented to run on multiple GPUs 85 85 … … 104 104 '''work packages fpr the EuroHack:''' 105 105 106 * getting the CUDA-aware MPI to run: for this routines {{{time_integration}}} and {{{exchange_horiz}}} in r174 7have to be replaced by the routines that I provided. If the exchange of ghost points is running sufficiently, the next step would be to make the {{{MPI_ALLTOALL}}} in {{{transpose.f90}}} CUDA-aware. This should be very easy. Just add (e.g.) {{{host_data use_device( f_inv, work )}}} clauses in front of the {{{MPI_ALLTOALL}}} calls and remove the existing {{{update host}}} and {{{data copyin}}} clauses. Also, {{{update host}}} and {{{update device}}} clauses for array {{{ar}}} have to be removed in {{{poisfft}}}.106 * getting the CUDA-aware MPI to run: for this routines {{{time_integration}}} and {{{exchange_horiz}}} in r1749 have to be replaced by the routines that I provided. If the exchange of ghost points is running sufficiently, the next step would be to make the {{{MPI_ALLTOALL}}} in {{{transpose.f90}}} CUDA-aware. This should be very easy. Just add (e.g.) {{{host_data use_device( f_inv, work )}}} clauses in front of the {{{MPI_ALLTOALL}}} calls and remove the existing {{{update host}}} and {{{data copyin}}} clauses. Also, {{{update host}}} and {{{update device}}} clauses for array {{{ar}}} have to be removed in {{{poisfft}}}. 107 107 * CUDA-fft has been implemented and successfully tested for the single-GPU (non-MPI) mode. It can be switched on using parameter {{{fft_method = 'system-specific}}}. Additionally, the compiler-switch {{{-D__cuda_fft}}} and the linker option {{{-lcufft}}} have to be set. For an unknown reason, this method does not work in the MPI-mode (the pressure solver does not reduce the divergence). 108 108 * In general: do the existing clauses (e.g. {{{loop vector / loop gang}}} give the best performance?