| 13 | |---------------- |
| 14 | {{{#!td style="vertical-align:top;width: 50px" |
| 15 | 10/09/13 |
| 16 | }}} |
| 17 | {{{#!td style="vertical-align:top;width: 50px" |
| 18 | SR |
| 19 | }}} |
| 20 | {{{#!td style="vertical-align:top;width: 75px" |
| 21 | r1221 |
| 22 | }}} |
| 23 | {{{#!td style="vertical-align:top" |
| 24 | 3.9 |
| 25 | }}} |
| 26 | {{{#!td style="vertical-align:top" |
| 27 | N, C, B |
| 28 | }}} |
| 29 | {{{#!td style="vertical-align:top" |
| 30 | '''New:'''\\ |
| 31 | openACC porting of reduction operations. An accelerator-version for {{{flow_statistics}}} with modified loop structure k,i,j has been implemented. It is activated with preprocessor flag {{{-D__openacc}}}. The extra accelerator version is required because so far, the openACC standard only allows reduction operations on simple scalars. Since 1D-vectors along k are used in flow_statistics, they had to be replaced by scalars and the k loop has now to be used as the outermost loop. Additional 3D-flag arrays have been introduced for replacing the 2D-index arrays {{{nzb_s_inner}}} and {{{nzb_diff_s_inner}}} in routines {{{pres}}} and {{{flow_statistics}}}. Respective "global-sum" loops are running from {{{k = nzb}}}. Within the loops, values for grid points below the surface (topography) are multiplied by zero, all others by one, using the flag array {{{rflags_invers}}}. This array is dimensioned (j,i,k) to allow for better cache usage in the loops of the accelerator version of {{{flow_statistics}}}. |
| 32 | (flow_statistics, init_grid, init_3d_model, modules, palm, pres, time_integration) |
| 33 | |
| 34 | |
| 35 | '''Changed:'''\\ |
| 36 | For PGI/openACC performance reasons (PGI compiler version 13.6, CUDA 5.0) the default compile options have been set to "{{{-ta=nocache}}}", which gives a speed-up of about 10-20%. For the same reason, the environment variable {{{PGI_ACC_SYNCHRONOUS}}} is set to 1 in the simple run script, which significantly improves the performance about 80%. |
| 37 | (MAKE.inc.pgi.openacc, palm_simple_run) |
| 38 | |
| 39 | The type of flag array {{{wall_flags_0}}}, used in the Wicker-Skamarock scheme for advection of the vertical wind component, has been changed to 32bit {{{INTEGER}}}. An additional array {{{wall_flags_00}}} has been introduced to hold flag bits 32-63. This is required because the former used {{{KIND = SELECTED_INT_KIND(11)}}} caused wrong results with openACC. |
| 40 | (advec_ws, init_grid, modules, palm) |
| 41 | |
| 42 | '''Bugfix:'''\\ |
| 43 | Dummy argument {{{tri}}} in 1d-routines replaced by {{{tri_for_1d}}} because of name conflict with array {{{tri}}} in module {{{arrays_3d}}}. (tridia_solver) |
| 44 | }}} |