source: palm/trunk/SOURCE/transpose.f90 @ 1257

Last change on this file since 1257 was 1257, checked in by raasch, 10 years ago

New:
---

openACC porting of timestep calculation
(modules, timestep, time_integration)

Changed:


openACC loop directives and vector clauses removed (because they do not give any performance improvement with PGI
compiler versions > 13.6)
(advec_ws, buoyancy, coriolis, diffusion_e, diffusion_s, diffusion_u, diffusion_v, diffusion_w, diffusivities, exchange_horiz, fft_xy, pres, production_e, transpose, tridia_solver, wall_fluxes)

openACC loop independent clauses added
(boundary_conds, prandtl_fluxes, pres)

openACC declare create statements moved after FORTRAN declaration statement
(diffusion_u, diffusion_v, diffusion_w, fft_xy, poisfft, production_e, tridia_solver)

openACC end parallel replaced by end parallel loop
(flow_statistics, pres)

openACC "kernels do" replaced by "kernels loop"
(prandtl_fluxes)

output format for theta* changed to avoid output of *
(run_control)

Errors:


bugfix for calculation of advective timestep (old version may cause wrong timesteps in case of
vertixcally stretched grids)
Attention: standard run-control output has changed!
(timestep)

  • Property svn:keywords set to Id
File size: 25.9 KB
RevLine 
[1216]1 SUBROUTINE resort_for_xy( f_in, f_inv )
[1]2
[1036]3!--------------------------------------------------------------------------------!
4! This file is part of PALM.
5!
6! PALM is free software: you can redistribute it and/or modify it under the terms
7! of the GNU General Public License as published by the Free Software Foundation,
8! either version 3 of the License, or (at your option) any later version.
9!
10! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12! A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
13!
14! You should have received a copy of the GNU General Public License along with
15! PALM. If not, see <http://www.gnu.org/licenses/>.
16!
17! Copyright 1997-2012  Leibniz University Hannover
18!--------------------------------------------------------------------------------!
19!
[484]20! Current revisions:
[1]21! -----------------
[1257]22! openacc loop and loop vector clauses removed
[198]23!
24! Former revisions:
25! -----------------
26! $Id: transpose.f90 1257 2013-11-08 15:18:40Z raasch $
27!
[1217]28! 1216 2013-08-26 09:31:42Z raasch
29! re-sorting of the transposed / to be transposed arrays moved to separate
30! routines resort_for_...
31!
[1112]32! 1111 2013-03-08 23:54:10Z raasch
33! openACC directives added,
34! resorting data from/to work changed, work got 4 dimensions instead of 1
35!
[1107]36! 1106 2013-03-04 05:31:38Z raasch
37! preprocessor lines rearranged so that routines can also be used in serial
38! (non-parallel) mode
39!
[1093]40! 1092 2013-02-02 11:24:22Z raasch
41! unused variables removed
42!
[1037]43! 1036 2012-10-22 13:43:42Z raasch
44! code put under GPL (PALM 3.9)
45!
[1004]46! 1003 2012-09-14 14:35:53Z raasch
47! indices nxa, nya, etc. replaced by nx, ny, etc.
48!
[684]49! 683 2011-02-09 14:25:15Z raasch
50! openMP parallelization of transpositions for 2d-domain-decomposition
51!
[623]52! 622 2010-12-10 08:08:13Z raasch
53! optional barriers included in order to speed up collective operations
54!
[198]55! 164 2008-05-15 08:46:15Z raasch
[164]56! f_inv changed from subroutine argument to automatic array in order to do
57! re-ordering from f_in to f_inv in one step, one array work is needed instead
58! of work1 and work2
[1]59!
[198]60! February 2007
[3]61! RCS Log replace by Id keyword, revision history cleaned up
62!
[1]63! Revision 1.2  2004/04/30 13:12:17  raasch
64! Switched from mpi_alltoallv to the simpler mpi_alltoall,
65! all former transpose-routine files collected in this file, enlarged
66! transposition arrays introduced
67!
68! Revision 1.1  2004/04/30 13:08:16  raasch
69! Initial revision (collection of former routines transpose_xy, transpose_xz,
70!                   transpose_yx, transpose_yz, transpose_zx, transpose_zy)
71!
72! Revision 1.1  1997/07/24 11:25:18  raasch
73! Initial revision
74!
[1216]75!------------------------------------------------------------------------------!
76! Description:
77! ------------
78! Resorting data for the transposition from x to y. The transposition itself
79! is carried out in transpose_xy
80!------------------------------------------------------------------------------!
81
82     USE indices
83     USE transpose_indices
84
85     IMPLICIT NONE
86
87     REAL ::  f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
88     REAL ::  f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
89
90
91     INTEGER ::  i, j, k
92
[1]93!
[1216]94!-- Rearrange indices of input array in order to make data to be send
95!-- by MPI contiguous
96    !$OMP  PARALLEL PRIVATE ( i, j, k )
97    !$OMP  DO
98    !$acc kernels present( f_in, f_inv )
99     DO  i = 0, nx
100         DO  k = nzb_x, nzt_x
101             DO  j = nys_x, nyn_x
102                 f_inv(j,k,i) = f_in(i,j,k)
103             ENDDO
104         ENDDO
105     ENDDO
106     !$acc end kernels
107     !$OMP  END PARALLEL
108
109 END SUBROUTINE resort_for_xy
110
111
112 SUBROUTINE transpose_xy( f_inv, f_out )
113
114!------------------------------------------------------------------------------!
[1]115! Description:
116! ------------
117! Transposition of input array (f_in) from x to y. For the input array, all
118! elements along x reside on the same PE, while after transposition, all
119! elements along y reside on the same PE.
120!------------------------------------------------------------------------------!
121
122    USE cpulog
123    USE indices
124    USE interfaces
125    USE pegrid
126    USE transpose_indices
127
128    IMPLICIT NONE
129
[1111]130    INTEGER ::  i, j, k, l, ys
[1]131   
[1216]132    REAL ::  f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx), f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
[1]133
[1111]134    REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) ::  work
135
136
[1106]137    IF ( numprocs /= 1 )  THEN
138
139#if defined( __parallel )
[1]140!
[1106]141!--    Transpose array
142       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
143       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]144       !$acc update host( f_inv )
145       CALL MPI_ALLTOALL( f_inv(nys_x,nzb_x,0),  sendrecvcount_xy, MPI_REAL, &
146                          work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
[1106]147                          comm1dy, ierr )
148       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]149
150!
[1106]151!--    Reorder transposed array
[1111]152!$OMP  PARALLEL PRIVATE ( i, j, k, l, ys )
[683]153!$OMP  DO
[1216]154       !$acc data copyin( work )
[1106]155       DO  l = 0, pdims(2) - 1
156          ys = 0 + l * ( nyn_x - nys_x + 1 )
[1111]157          !$acc kernels present( f_out, work )
[1106]158          DO  i = nxl_y, nxr_y
159             DO  k = nzb_y, nzt_y
160                DO  j = ys, ys + nyn_x - nys_x
[1111]161                   f_out(j,i,k) = work(j-ys+1,k,i,l)
[1106]162                ENDDO
[1]163             ENDDO
164          ENDDO
[1111]165          !$acc end kernels
[1]166       ENDDO
[1216]167       !$acc end data
[683]168!$OMP  END PARALLEL
[1]169#endif
170
[1106]171    ELSE
172
173!
174!--    Reorder transposed array
175!$OMP  PARALLEL PRIVATE ( i, j, k )
176!$OMP  DO
[1216]177       !$acc kernels present( f_inv, f_out )
[1106]178       DO  k = nzb_y, nzt_y
179          DO  i = nxl_y, nxr_y
180             DO  j = 0, ny
181                f_out(j,i,k) = f_inv(j,k,i)
182             ENDDO
183          ENDDO
184       ENDDO
[1111]185       !$acc end kernels
[1106]186!$OMP  END PARALLEL
187
188    ENDIF
189
[1]190 END SUBROUTINE transpose_xy
191
192
[1216]193 SUBROUTINE resort_for_xz( f_inv, f_out )
[1]194
195!------------------------------------------------------------------------------!
196! Description:
197! ------------
[1216]198! Resorting data after the transposition from x to z. The transposition itself
199! is carried out in transpose_xz
200!------------------------------------------------------------------------------!
201
202     USE indices
203     USE transpose_indices
204
205     IMPLICIT NONE
206
207     REAL ::  f_inv(nys:nyn,nxl:nxr,1:nz)
208     REAL ::  f_out(1:nz,nys:nyn,nxl:nxr)
209
210
211     INTEGER ::  i, j, k
212
213!
214!-- Rearrange indices of input array in order to make data to be send
215!-- by MPI contiguous.
216!-- In case of parallel fft/transposition, scattered store is faster in
217!-- backward direction!!!
218    !$OMP  PARALLEL PRIVATE ( i, j, k )
219    !$OMP  DO
220    !$acc kernels present( f_inv, f_out )
221     DO  k = 1, nz
222         DO  i = nxl, nxr
223             DO  j = nys, nyn
224                 f_out(k,j,i) = f_inv(j,i,k)
225             ENDDO
226         ENDDO
227     ENDDO
228     !$acc end kernels
229     !$OMP  END PARALLEL
230
231 END SUBROUTINE resort_for_xz
232
233
234 SUBROUTINE transpose_xz( f_in, f_inv )
235
236!------------------------------------------------------------------------------!
237! Description:
238! ------------
[1]239! Transposition of input array (f_in) from x to z. For the input array, all
240! elements along x reside on the same PE, while after transposition, all
241! elements along z reside on the same PE.
242!------------------------------------------------------------------------------!
243
244    USE cpulog
245    USE indices
246    USE interfaces
247    USE pegrid
248    USE transpose_indices
249
250    IMPLICIT NONE
251
[1111]252    INTEGER ::  i, j, k, l, xs
[1]253   
[1216]254    REAL ::  f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x), f_inv(nys:nyn,nxl:nxr,1:nz)
[1]255
[1111]256    REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) ::  work
[1]257
[1111]258
[1]259!
260!-- If the PE grid is one-dimensional along y, the array has only to be
261!-- reordered locally and therefore no transposition has to be done.
262    IF ( pdims(1) /= 1 )  THEN
[1106]263
264#if defined( __parallel )
[1]265!
266!--    Reorder input array for transposition
[1111]267!$OMP  PARALLEL PRIVATE ( i, j, k, l, xs )
[683]268!$OMP  DO
[1216]269       !$acc data copyout( work )
[1]270       DO  l = 0, pdims(1) - 1
271          xs = 0 + l * nnx
[1111]272          !$acc kernels present( f_in, work )
[1003]273          DO  k = nzb_x, nzt_x
[164]274             DO  i = xs, xs + nnx - 1
[1003]275                DO  j = nys_x, nyn_x
[1111]276                   work(j,i-xs+1,k,l) = f_in(i,j,k)
[1]277                ENDDO
278             ENDDO
279          ENDDO
[1111]280          !$acc end kernels
[1]281       ENDDO
[1216]282       !$acc end data
[683]283!$OMP  END PARALLEL
[1]284
285!
286!--    Transpose array
287       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]288       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]289       CALL MPI_ALLTOALL( work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
290                          f_inv(nys,nxl,1),      sendrecvcount_zx, MPI_REAL, &
[1]291                          comm1dx, ierr )
[1111]292       !$acc update device( f_inv )
[1]293       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1106]294#endif
295
[1]296    ELSE
[1106]297
[1]298!
299!--    Reorder the array in a way that the z index is in first position
[683]300!$OMP  PARALLEL PRIVATE ( i, j, k )
301!$OMP  DO
[1216]302       !$acc kernels present( f_in, f_inv )
[1003]303       DO  i = nxl, nxr
304          DO  j = nys, nyn
305             DO  k = 1, nz
[164]306                f_inv(j,i,k) = f_in(i,j,k)
[1]307             ENDDO
308          ENDDO
309       ENDDO
[1111]310       !$acc end kernels
[683]311!$OMP  END PARALLEL
[1]312
[164]313    ENDIF
314
[1]315 END SUBROUTINE transpose_xz
316
317
[1216]318 SUBROUTINE resort_for_yx( f_inv, f_out )
[1]319
320!------------------------------------------------------------------------------!
321! Description:
322! ------------
[1216]323! Resorting data after the transposition from y to x. The transposition itself
324! is carried out in transpose_yx
325!------------------------------------------------------------------------------!
326
327     USE indices
328     USE transpose_indices
329
330     IMPLICIT NONE
331
332     REAL ::  f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
333     REAL ::  f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
334
335
336     INTEGER ::  i, j, k
337
338!
339!-- Rearrange indices of input array in order to make data to be send
340!-- by MPI contiguous
341    !$OMP  PARALLEL PRIVATE ( i, j, k )
342    !$OMP  DO
343    !$acc kernels present( f_inv, f_out )
344     DO  i = 0, nx
345         DO  k = nzb_x, nzt_x
346             DO  j = nys_x, nyn_x
347                 f_out(i,j,k) = f_inv(j,k,i)
348             ENDDO
349         ENDDO
350     ENDDO
351     !$acc end kernels
352     !$OMP  END PARALLEL
353
354 END SUBROUTINE resort_for_yx
355
356
357 SUBROUTINE transpose_yx( f_in, f_inv )
358
359!------------------------------------------------------------------------------!
360! Description:
361! ------------
[1]362! Transposition of input array (f_in) from y to x. For the input array, all
363! elements along y reside on the same PE, while after transposition, all
364! elements along x reside on the same PE.
365!------------------------------------------------------------------------------!
366
367    USE cpulog
368    USE indices
369    USE interfaces
370    USE pegrid
371    USE transpose_indices
372
373    IMPLICIT NONE
374
[1111]375    INTEGER ::  i, j, k, l, ys
[1]376   
[1216]377    REAL ::  f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y), f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
[1]378
[1111]379    REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) ::  work
380
381
[1106]382    IF ( numprocs /= 1 )  THEN
383
[1]384#if defined( __parallel )
385!
[1106]386!--    Reorder input array for transposition
[1111]387!$OMP  PARALLEL PRIVATE ( i, j, k, l, ys )
[683]388!$OMP  DO
[1216]389       !$acc data copyout( work )
[1106]390       DO  l = 0, pdims(2) - 1
391          ys = 0 + l * ( nyn_x - nys_x + 1 )
[1111]392          !$acc kernels present( f_in, work )
[1106]393          DO  i = nxl_y, nxr_y
394             DO  k = nzb_y, nzt_y
395                DO  j = ys, ys + nyn_x - nys_x
[1111]396                   work(j-ys+1,k,i,l) = f_in(j,i,k)
[1106]397                ENDDO
398             ENDDO
399          ENDDO
[1111]400          !$acc end kernels
[1106]401       ENDDO
[1216]402       !$acc end data
[1106]403!$OMP  END PARALLEL
404
405!
406!--    Transpose array
407       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
408       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]409       CALL MPI_ALLTOALL( work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
410                          f_inv(nys_x,nzb_x,0),  sendrecvcount_xy, MPI_REAL, &
[1106]411                          comm1dy, ierr )
[1111]412       !$acc update device( f_inv )
[1106]413       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
414#endif
415
416    ELSE
417
418!
419!--    Reorder array f_in the same way as ALLTOALL did it
420!$OMP  PARALLEL PRIVATE ( i, j, k )
421!$OMP  DO
[1216]422       !$acc kernels present( f_in, f_inv )
[1003]423       DO  i = nxl_y, nxr_y
424          DO  k = nzb_y, nzt_y
[1106]425             DO  j = 0, ny
426                f_inv(j,k,i) = f_in(j,i,k)
[1]427             ENDDO
428          ENDDO
429       ENDDO
[1111]430       !$acc end kernels
[683]431!$OMP  END PARALLEL
[1]432
[1106]433    ENDIF
[1]434
435 END SUBROUTINE transpose_yx
436
437
[1216]438 SUBROUTINE transpose_yxd( f_in, f_out )
[1]439
440!------------------------------------------------------------------------------!
441! Description:
442! ------------
443! Transposition of input array (f_in) from y to x. For the input array, all
444! elements along y reside on the same PE, while after transposition, all
445! elements along x reside on the same PE.
446! This is a direct transposition for arrays with indices in regular order
447! (k,j,i) (cf. transpose_yx).
448!------------------------------------------------------------------------------!
449
450    USE cpulog
451    USE indices
452    USE interfaces
453    USE pegrid
454    USE transpose_indices
455
456    IMPLICIT NONE
457
[1092]458    INTEGER ::  i, j, k, l, m, xs
[1]459
[1003]460    REAL ::  f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nxl:nxr,1:nz,nys:nyn), &
461             f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x),                     &
[164]462             work(nnx*nny*nnz)
[1]463
464#if defined( __parallel )
465
466!
467!-- Rearrange indices of input array in order to make data to be send
468!-- by MPI contiguous
[1003]469    DO  k = 1, nz
470       DO  j = nys, nyn
471          DO  i = nxl, nxr
[164]472             f_inv(i,k,j) = f_in(k,j,i)
[1]473          ENDDO
474       ENDDO
475    ENDDO
476
477!
478!-- Transpose array
479    CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]480    IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1]481    CALL MPI_ALLTOALL( f_inv(nxl,1,nys), sendrecvcount_xy, MPI_REAL, &
[164]482                       work(1),          sendrecvcount_xy, MPI_REAL, &
[1]483                       comm1dx, ierr )
484    CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
485
486!
487!-- Reorder transposed array
488    m = 0
489    DO  l = 0, pdims(1) - 1
490       xs = 0 + l * nnx
[1003]491       DO  j = nys_x, nyn_x
492          DO  k = 1, nz
[1]493             DO  i = xs, xs + nnx - 1
494                m = m + 1
[164]495                f_out(i,j,k) = work(m)
[1]496             ENDDO
497          ENDDO
498       ENDDO
499    ENDDO
500
501#endif
502
503 END SUBROUTINE transpose_yxd
504
505
[1216]506 SUBROUTINE resort_for_yz( f_in, f_inv )
[1]507
508!------------------------------------------------------------------------------!
509! Description:
510! ------------
[1216]511! Resorting data for the transposition from y to z. The transposition itself
512! is carried out in transpose_yz
513!------------------------------------------------------------------------------!
514
515     USE indices
516     USE transpose_indices
517
518     IMPLICIT NONE
519
520     REAL ::  f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
521     REAL ::  f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
522
523
524     INTEGER ::  i, j, k
525
526!
527!-- Rearrange indices of input array in order to make data to be send
528!-- by MPI contiguous
529    !$OMP  PARALLEL PRIVATE ( i, j, k )
530    !$OMP  DO
531    !$acc kernels present( f_in, f_inv )
532     DO  j = 0, ny
533         DO  k = nzb_y, nzt_y
534             DO  i = nxl_y, nxr_y
535                 f_inv(i,k,j) = f_in(j,i,k)
536             ENDDO
537         ENDDO
538     ENDDO
539     !$acc end kernels
540     !$OMP  END PARALLEL
541
542 END SUBROUTINE resort_for_yz
543
544
545 SUBROUTINE transpose_yz( f_inv, f_out )
546
547!------------------------------------------------------------------------------!
548! Description:
549! ------------
[1]550! Transposition of input array (f_in) from y to z. For the input array, all
551! elements along y reside on the same PE, while after transposition, all
552! elements along z reside on the same PE.
553!------------------------------------------------------------------------------!
554
555    USE cpulog
556    USE indices
557    USE interfaces
558    USE pegrid
559    USE transpose_indices
560
561    IMPLICIT NONE
562
[1111]563    INTEGER ::  i, j, k, l, zs
[1]564   
[1216]565    REAL ::  f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny), f_out(nxl_z:nxr_z,nys_z:nyn_z,1:nz)
[1]566
[1111]567    REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) ::  work
568
569
[1]570!
571!-- If the PE grid is one-dimensional along y, only local reordering
572!-- of the data is necessary and no transposition has to be done.
573    IF ( pdims(1) == 1 )  THEN
[1106]574
[683]575!$OMP  PARALLEL PRIVATE ( i, j, k )
576!$OMP  DO
[1216]577       !$acc kernels present( f_inv, f_out )
[1003]578       DO  j = 0, ny
579          DO  k = nzb_y, nzt_y
580             DO  i = nxl_y, nxr_y
[164]581                f_out(i,j,k) = f_inv(i,k,j)
[1]582             ENDDO
583          ENDDO
584       ENDDO
[1111]585       !$acc end kernels
[683]586!$OMP  END PARALLEL
[1]587
[1106]588    ELSE
589
590#if defined( __parallel )
[1]591!
[1106]592!--    Transpose array
593       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
594       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]595       !$acc update host( f_inv )
596       CALL MPI_ALLTOALL( f_inv(nxl_y,nzb_y,0),  sendrecvcount_yz, MPI_REAL, &
597                          work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
[1106]598                          comm1dx, ierr )
599       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]600
601!
[1106]602!--    Reorder transposed array
[1111]603!$OMP  PARALLEL PRIVATE ( i, j, k, l, zs )
[683]604!$OMP  DO
[1216]605       !$acc data copyin( work )
[1106]606       DO  l = 0, pdims(1) - 1
607          zs = 1 + l * ( nzt_y - nzb_y + 1 )
[1216]608          !$acc kernels present( f_out )
[1106]609          DO  j = nys_z, nyn_z
610             DO  k = zs, zs + nzt_y - nzb_y
611                DO  i = nxl_z, nxr_z
[1111]612                   f_out(i,j,k) = work(i,k-zs+1,j,l)
[1106]613                ENDDO
[1]614             ENDDO
615          ENDDO
[1111]616          !$acc end kernels
[1]617       ENDDO
[1216]618       !$acc end data
[683]619!$OMP  END PARALLEL
[1]620#endif
621
[1106]622   ENDIF
623
[1]624 END SUBROUTINE transpose_yz
625
626
[1216]627 SUBROUTINE resort_for_zx( f_in, f_inv )
[1]628
629!------------------------------------------------------------------------------!
630! Description:
631! ------------
[1216]632! Resorting data for the transposition from z to x. The transposition itself
633! is carried out in transpose_zx
634!------------------------------------------------------------------------------!
635
636     USE indices
637     USE transpose_indices
638
639     IMPLICIT NONE
640
641     REAL ::  f_in(1:nz,nys:nyn,nxl:nxr)
642     REAL ::  f_inv(nys:nyn,nxl:nxr,1:nz)
643
644
645     INTEGER ::  i, j, k
646
647!
648!-- Rearrange indices of input array in order to make data to be send
649!-- by MPI contiguous
650    !$OMP  PARALLEL PRIVATE ( i, j, k )
651    !$OMP  DO
652    !$acc kernels present( f_in, f_inv )
653     DO  k = 1,nz
654         DO  i = nxl, nxr
655             DO  j = nys, nyn
656                 f_inv(j,i,k) = f_in(k,j,i)
657             ENDDO
658         ENDDO
659     ENDDO
660     !$acc end kernels
661     !$OMP  END PARALLEL
662
663 END SUBROUTINE resort_for_zx
664
665
666 SUBROUTINE transpose_zx( f_inv, f_out )
667
668!------------------------------------------------------------------------------!
669! Description:
670! ------------
[1]671! Transposition of input array (f_in) from z to x. For the input array, all
672! elements along z reside on the same PE, while after transposition, all
673! elements along x reside on the same PE.
674!------------------------------------------------------------------------------!
675
676    USE cpulog
677    USE indices
678    USE interfaces
679    USE pegrid
680    USE transpose_indices
681
682    IMPLICIT NONE
683
[1111]684    INTEGER ::  i, j, k, l, xs
[1]685   
[1216]686    REAL ::  f_inv(nys:nyn,nxl:nxr,1:nz), f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
[1]687
[1111]688    REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) ::  work
689
[1]690
691!
692!-- If the PE grid is one-dimensional along y, only local reordering
693!-- of the data is necessary and no transposition has to be done.
694    IF ( pdims(1) == 1 )  THEN
[1106]695
[683]696!$OMP  PARALLEL PRIVATE ( i, j, k )
697!$OMP  DO
[1216]698       !$acc kernels present( f_inv, f_out )
[1003]699       DO  k = 1, nz
700          DO  i = nxl, nxr
701             DO  j = nys, nyn
[164]702                f_out(i,j,k) = f_inv(j,i,k)
[1]703             ENDDO
704          ENDDO
705       ENDDO
[1111]706       !$acc end kernels
[683]707!$OMP  END PARALLEL
[1]708
[1106]709    ELSE
710
711#if defined( __parallel )
[1]712!
[1106]713!--    Transpose array
714       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
715       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]716       !$acc update host( f_inv )
717       CALL MPI_ALLTOALL( f_inv(nys,nxl,1),      sendrecvcount_zx, MPI_REAL, &
718                          work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
[1106]719                          comm1dx, ierr )
720       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]721
722!
[1106]723!--    Reorder transposed array
[1111]724!$OMP  PARALLEL PRIVATE ( i, j, k, l, xs )
[683]725!$OMP  DO
[1216]726       !$acc data copyin( work )
[1106]727       DO  l = 0, pdims(1) - 1
728          xs = 0 + l * nnx
[1216]729          !$acc kernels present( f_out )
[1106]730          DO  k = nzb_x, nzt_x
731             DO  i = xs, xs + nnx - 1
732                DO  j = nys_x, nyn_x
[1111]733                   f_out(i,j,k) = work(j,i-xs+1,k,l)
[1106]734                ENDDO
[1]735             ENDDO
736          ENDDO
[1111]737          !$acc end kernels
[1]738       ENDDO
[1216]739       !$acc end data
[683]740!$OMP  END PARALLEL
[1]741#endif
742
[1106]743    ENDIF
744
[1]745 END SUBROUTINE transpose_zx
746
747
[1216]748 SUBROUTINE resort_for_zy( f_inv, f_out )
[1]749
750!------------------------------------------------------------------------------!
751! Description:
752! ------------
[1216]753! Resorting data after the transposition from z to y. The transposition itself
754! is carried out in transpose_zy
755!------------------------------------------------------------------------------!
756
757     USE indices
758     USE transpose_indices
759
760     IMPLICIT NONE
761
762     REAL ::  f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
763     REAL ::  f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
764
765
766     INTEGER ::  i, j, k
767
768!
769!-- Rearrange indices of input array in order to make data to be send
770!-- by MPI contiguous
771    !$OMP  PARALLEL PRIVATE ( i, j, k )
772    !$OMP  DO
773    !$acc kernels present( f_inv, f_out )
774     DO  k = nzb_y, nzt_y
775         DO  j = 0, ny
776             DO  i = nxl_y, nxr_y
777                 f_out(j,i,k) = f_inv(i,k,j)
778             ENDDO
779         ENDDO
780     ENDDO
781     !$acc end kernels
782     !$OMP  END PARALLEL
783
784 END SUBROUTINE resort_for_zy
785
786
787 SUBROUTINE transpose_zy( f_in, f_inv )
788
789!------------------------------------------------------------------------------!
790! Description:
791! ------------
[1]792! Transposition of input array (f_in) from z to y. For the input array, all
793! elements along z reside on the same PE, while after transposition, all
794! elements along y reside on the same PE.
795!------------------------------------------------------------------------------!
796
797    USE cpulog
798    USE indices
799    USE interfaces
800    USE pegrid
801    USE transpose_indices
802
803    IMPLICIT NONE
804
[1111]805    INTEGER ::  i, j, k, l, zs
[1]806   
[1216]807    REAL ::  f_in(nxl_z:nxr_z,nys_z:nyn_z,1:nz), f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
[1]808
[1111]809    REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) ::  work
810
811
[1]812!
813!-- If the PE grid is one-dimensional along y, the array has only to be
814!-- reordered locally and therefore no transposition has to be done.
815    IF ( pdims(1) /= 1 )  THEN
[1106]816
817#if defined( __parallel )
[1]818!
819!--    Reorder input array for transposition
[1111]820!$OMP  PARALLEL PRIVATE ( i, j, k, l, zs )
[683]821!$OMP  DO
[1216]822       !$acc data copyout( work )
[1]823       DO  l = 0, pdims(1) - 1
[1003]824          zs = 1 + l * ( nzt_y - nzb_y + 1 )
[1111]825          !$acc kernels present( f_in, work )
[1003]826          DO  j = nys_z, nyn_z
827             DO  k = zs, zs + nzt_y - nzb_y
828                DO  i = nxl_z, nxr_z
[1111]829                   work(i,k-zs+1,j,l) = f_in(i,j,k)
[1]830                ENDDO
831             ENDDO
832          ENDDO
[1111]833          !$acc end kernels
[1]834       ENDDO
[1216]835       !$acc end data
[683]836!$OMP  END PARALLEL
[1]837
838!
839!--    Transpose array
840       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]841       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1111]842       CALL MPI_ALLTOALL( work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
843                          f_inv(nxl_y,nzb_y,0),  sendrecvcount_yz, MPI_REAL, &
[1]844                          comm1dx, ierr )
[1111]845       !$acc update device( f_inv )
[1]846       CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1106]847#endif
[1]848
849    ELSE
850!
[1106]851!--    Reorder the array in the same way like ALLTOALL did it
[683]852!$OMP  PARALLEL PRIVATE ( i, j, k )
853!$OMP  DO
[1216]854       !$acc kernels present( f_in, f_inv )
[1003]855       DO  k = nzb_y, nzt_y
856          DO  j = 0, ny
857             DO  i = nxl_y, nxr_y
[164]858                f_inv(i,k,j) = f_in(i,j,k)
859             ENDDO
860          ENDDO
861       ENDDO
[1111]862       !$acc end kernels
[683]863!$OMP  END PARALLEL
[1106]864
865    ENDIF
866
[1]867 END SUBROUTINE transpose_zy
868
869
[1216]870 SUBROUTINE transpose_zyd( f_in, f_out )
[1]871
872!------------------------------------------------------------------------------!
873! Description:
874! ------------
875! Transposition of input array (f_in) from z to y. For the input array, all
876! elements along z reside on the same PE, while after transposition, all
877! elements along y reside on the same PE.
878! This is a direct transposition for arrays with indices in regular order
879! (k,j,i) (cf. transpose_zy).
880!------------------------------------------------------------------------------!
881
882    USE cpulog
883    USE indices
884    USE interfaces
885    USE pegrid
886    USE transpose_indices
887
888    IMPLICIT NONE
889
890    INTEGER ::  i, j, k, l, m, ys
891   
[1003]892    REAL ::  f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nys:nyn,nxl:nxr,1:nz), &
893             f_out(0:ny,nxl_yd:nxr_yd,nzb_yd:nzt_yd),                 &
[164]894             work(nnx*nny*nnz)
[1]895
896#if defined( __parallel )
897
898!
899!-- Rearrange indices of input array in order to make data to be send
900!-- by MPI contiguous
[1003]901    DO  i = nxl, nxr
902       DO  j = nys, nyn
903          DO  k = 1, nz
[164]904             f_inv(j,i,k) = f_in(k,j,i)
[1]905          ENDDO
906       ENDDO
907    ENDDO
908
909!
910!-- Move data to different array, because memory location of work1 is
911!-- needed further below (work1 = work2).
912!-- If the PE grid is one-dimensional along x, only local reordering
913!-- of the data is necessary and no transposition has to be done.
914    IF ( pdims(2) == 1 )  THEN
[1003]915       DO  k = 1, nz
916          DO  i = nxl, nxr
917             DO  j = nys, nyn
[164]918                f_out(j,i,k) = f_inv(j,i,k)
[1]919             ENDDO
920          ENDDO
921       ENDDO
922       RETURN
923    ENDIF
924
925!
926!-- Transpose array
927    CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]928    IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
[1]929    CALL MPI_ALLTOALL( f_inv(nys,nxl,1), sendrecvcount_zyd, MPI_REAL, &
[164]930                       work(1),          sendrecvcount_zyd, MPI_REAL, &
[1]931                       comm1dy, ierr )
932    CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
933
934!
935!-- Reorder transposed array
936    m = 0
937    DO  l = 0, pdims(2) - 1
938       ys = 0 + l * nny
[1003]939       DO  k = nzb_yd, nzt_yd
940          DO  i = nxl_yd, nxr_yd
[1]941             DO  j = ys, ys + nny - 1
942                m = m + 1
[164]943                f_out(j,i,k) = work(m)
[1]944             ENDDO
945          ENDDO
946       ENDDO
947    ENDDO
948
949#endif
950
951 END SUBROUTINE transpose_zyd
Note: See TracBrowser for help on using the repository browser.