| 13 | |---------------- |
| 14 | {{{#!td style="vertical-align:top;width: 50px" |
| 15 | 09/03/13 |
| 16 | }}} |
| 17 | {{{#!td style="vertical-align:top;width: 50px" |
| 18 | SR |
| 19 | }}} |
| 20 | {{{#!td style="vertical-align:top;width: 75px" |
| 21 | r1111 |
| 22 | }}} |
| 23 | {{{#!td style="vertical-align:top" |
| 24 | 3.9 |
| 25 | }}} |
| 26 | {{{#!td style="vertical-align:top" |
| 27 | N, C, B |
| 28 | }}} |
| 29 | {{{#!td style="vertical-align:top" |
| 30 | '''New:'''\\ |
| 31 | GPU porting of {{{pres}}}, {{{swap_timelevel}}}. Further porting of {{{poisfft}}} (including the tridiagonal solver), which now runs completely on GPU without any host/device data transfer for serial and parallel runs (but parallel runs still require data transfer before and after the MPI transpositions). The tridiagonal routines have been split into external subroutines (instead using embedded routines with {{{CONTAINS}}}). There is no distinction between parallel/non-parallel runs in {{{poisfft}}} and {{{tridia}}} any more. The respective preprocessor directives have been removed. The tridia routines have been moved to the end of file {{{poisfft.f90}}} because of a probable bug in the PGI compiler 12.5 (otherwise "invalid device function" is indicated during runtime). Resorting from/to array {{{work}}} have been changed in the {{{transpose}}} routines. {{{work}}} now has 4 dimensions instead of 1. Adjustments of openACC directives. Output of accelerator board information. (cuda_fft_interfaces, fft_xy, flow_statistics, header, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose, .mrun.config.imuk_gpu) |
| 32 | |
| 33 | Optimization of {{{tridia}}} routines: constant elements and coefficients of array {{{tri}}} are stored in separate arrays {{{ddzuw}}} and {{{tric}}} and only calculated once at beginning. Last dimension of {{{tri}}} has been reduced from 5 to 2. Routine {{{poisfft_init}}} is now called internally from {{{poisfft}}}. (Makefile, Makefile_check, init_grid, init_3d_model, modules, palm, poisfft, poisfft_hybrid) |
| 34 | |
| 35 | CPU-time per grid point and timestep is output to {{{CPU_MEASURES}}} file. (cpu_statistics, modules, time_integration) |
| 36 | |
| 37 | '''Changed:'''\\ |
| 38 | Array {{{diss}}} is allocated only if required. (init_3d_model) |
| 39 | |
| 40 | Pressure boundary condition "Neumann+inhomo" has been removed from the code. (check_parameters, header, poisfft, poisfft_hybrid, pres) |
| 41 | |
| 42 | '''Bugfixes:'''\\ |
| 43 | Missing dependency added for {{{cuda_fft_interfaces}}}. (Makefile) |
| 44 | |
| 45 | CUDA fft plans adjusted for domain decomposition (before they have always been defined for the total domain). (fft_xy) |
| 46 | }}} |