Home

Context Navigation

transpose.f90 @ 1111

Last change on this file since 1111 was 1111, checked in by raasch, 11 years ago

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:

resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:

bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

Property svn:keywords set to Id

File size: 23.8 KB

Line
1	SUBROUTINE transpose_xy( f_in, work, f_out )
2
3	!--------------------------------------------------------------------------------!
4	! This file is part of PALM.
5	!
6	! PALM is free software: you can redistribute it and/or modify it under the terms
7	! of the GNU General Public License as published by the Free Software Foundation,
8	! either version 3 of the License, or (at your option) any later version.
9	!
10	! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11	! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12	! A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13	!
14	! You should have received a copy of the GNU General Public License along with
15	! PALM. If not, see <http://www.gnu.org/licenses/>.
16	!
17	! Copyright 1997-2012 Leibniz University Hannover
18	!--------------------------------------------------------------------------------!
19	!
20	! Current revisions:
21	! -----------------
22	! openACC directives added,
23	! resorting data from/to work changed, work got 4 dimensions instead of 1
24	!
25	! Former revisions:
26	! -----------------
27	! $Id: transpose.f90 1111 2013-03-08 23:54:10Z raasch $
28	!
29	! 1106 2013-03-04 05:31:38Z raasch
30	! preprocessor lines rearranged so that routines can also be used in serial
31	! (non-parallel) mode
32	!
33	! 1092 2013-02-02 11:24:22Z raasch
34	! unused variables removed
35	!
36	! 1036 2012-10-22 13:43:42Z raasch
37	! code put under GPL (PALM 3.9)
38	!
39	! 1003 2012-09-14 14:35:53Z raasch
40	! indices nxa, nya, etc. replaced by nx, ny, etc.
41	!
42	! 683 2011-02-09 14:25:15Z raasch
43	! openMP parallelization of transpositions for 2d-domain-decomposition
44	!
45	! 622 2010-12-10 08:08:13Z raasch
46	! optional barriers included in order to speed up collective operations
47	!
48	! 164 2008-05-15 08:46:15Z raasch
49	! f_inv changed from subroutine argument to automatic array in order to do
50	! re-ordering from f_in to f_inv in one step, one array work is needed instead
51	! of work1 and work2
52	!
53	! February 2007
54	! RCS Log replace by Id keyword, revision history cleaned up
55	!
56	! Revision 1.2 2004/04/30 13:12:17 raasch
57	! Switched from mpi_alltoallv to the simpler mpi_alltoall,
58	! all former transpose-routine files collected in this file, enlarged
59	! transposition arrays introduced
60	!
61	! Revision 1.1 2004/04/30 13:08:16 raasch
62	! Initial revision (collection of former routines transpose_xy, transpose_xz,
63	! transpose_yx, transpose_yz, transpose_zx, transpose_zy)
64	!
65	! Revision 1.1 1997/07/24 11:25:18 raasch
66	! Initial revision
67	!
68	!
69	! Description:
70	! ------------
71	! Transposition of input array (f_in) from x to y. For the input array, all
72	! elements along x reside on the same PE, while after transposition, all
73	! elements along y reside on the same PE.
74	!------------------------------------------------------------------------------!
75
76	USE cpulog
77	USE indices
78	USE interfaces
79	USE pegrid
80	USE transpose_indices
81
82	IMPLICIT NONE
83
84	INTEGER :: i, j, k, l, ys
85
86	REAL :: f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x), f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
87
88	REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) :: work
89
90	!$acc declare create( f_inv )
91	REAL :: f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
92
93
94	!
95	!-- Rearrange indices of input array in order to make data to be send
96	!-- by MPI contiguous
97	!$OMP PARALLEL PRIVATE ( i, j, k )
98	!$OMP DO
99	!$acc kernels present( f_in )
100	!$acc loop
101	DO i = 0, nx
102	DO k = nzb_x, nzt_x
103	!$acc loop vector( 32 )
104	DO j = nys_x, nyn_x
105	f_inv(j,k,i) = f_in(i,j,k)
106	ENDDO
107	ENDDO
108	ENDDO
109	!$acc end kernels
110	!$OMP END PARALLEL
111
112	IF ( numprocs /= 1 ) THEN
113
114	#if defined( __parallel )
115	!
116	!-- Transpose array
117	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
118	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
119	!$acc update host( f_inv )
120	CALL MPI_ALLTOALL( f_inv(nys_x,nzb_x,0), sendrecvcount_xy, MPI_REAL, &
121	work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
122	comm1dy, ierr )
123	!$acc update device( work )
124	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
125
126	!
127	!-- Reorder transposed array
128	!$OMP PARALLEL PRIVATE ( i, j, k, l, ys )
129	!$OMP DO
130	DO l = 0, pdims(2) - 1
131	ys = 0 + l * ( nyn_x - nys_x + 1 )
132	!$acc kernels present( f_out, work )
133	!$acc loop
134	DO i = nxl_y, nxr_y
135	DO k = nzb_y, nzt_y
136	!$acc loop vector( 32 )
137	DO j = ys, ys + nyn_x - nys_x
138	f_out(j,i,k) = work(j-ys+1,k,i,l)
139	ENDDO
140	ENDDO
141	ENDDO
142	!$acc end kernels
143	ENDDO
144	!$OMP END PARALLEL
145	#endif
146
147	ELSE
148
149	!
150	!-- Reorder transposed array
151	!$OMP PARALLEL PRIVATE ( i, j, k )
152	!$OMP DO
153	!$acc kernels present( f_out )
154	!$acc loop
155	DO k = nzb_y, nzt_y
156	DO i = nxl_y, nxr_y
157	!$acc loop vector( 32 )
158	DO j = 0, ny
159	f_out(j,i,k) = f_inv(j,k,i)
160	ENDDO
161	ENDDO
162	ENDDO
163	!$acc end kernels
164	!$OMP END PARALLEL
165
166	ENDIF
167
168	END SUBROUTINE transpose_xy
169
170
171	SUBROUTINE transpose_xz( f_in, work, f_out )
172
173	!------------------------------------------------------------------------------!
174	! Description:
175	! ------------
176	! Transposition of input array (f_in) from x to z. For the input array, all
177	! elements along x reside on the same PE, while after transposition, all
178	! elements along z reside on the same PE.
179	!------------------------------------------------------------------------------!
180
181	USE cpulog
182	USE indices
183	USE interfaces
184	USE pegrid
185	USE transpose_indices
186
187	IMPLICIT NONE
188
189	INTEGER :: i, j, k, l, xs
190
191	REAL :: f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x), f_out(1:nz,nys:nyn,nxl:nxr)
192
193	REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) :: work
194
195	!$acc declare create( f_inv )
196	REAL :: f_inv(nys:nyn,nxl:nxr,1:nz)
197
198
199	!
200	!-- If the PE grid is one-dimensional along y, the array has only to be
201	!-- reordered locally and therefore no transposition has to be done.
202	IF ( pdims(1) /= 1 ) THEN
203
204	#if defined( __parallel )
205	!
206	!-- Reorder input array for transposition
207	!$OMP PARALLEL PRIVATE ( i, j, k, l, xs )
208	!$OMP DO
209	DO l = 0, pdims(1) - 1
210	xs = 0 + l * nnx
211	!$acc kernels present( f_in, work )
212	!$acc loop
213	DO k = nzb_x, nzt_x
214	DO i = xs, xs + nnx - 1
215	!$acc loop vector( 32 )
216	DO j = nys_x, nyn_x
217	work(j,i-xs+1,k,l) = f_in(i,j,k)
218	ENDDO
219	ENDDO
220	ENDDO
221	!$acc end kernels
222	ENDDO
223	!$OMP END PARALLEL
224
225	!
226	!-- Transpose array
227	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
228	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
229	!$acc update host( work )
230	CALL MPI_ALLTOALL( work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
231	f_inv(nys,nxl,1), sendrecvcount_zx, MPI_REAL, &
232	comm1dx, ierr )
233	!$acc update device( f_inv )
234	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
235
236	!
237	!-- Reorder transposed array in a way that the z index is in first position
238	!$OMP PARALLEL PRIVATE ( i, j, k )
239	!$OMP DO
240	!$acc kernels present( f_out )
241	!$acc loop
242	DO k = 1, nz
243	DO i = nxl, nxr
244	!$acc loop vector( 32 )
245	DO j = nys, nyn
246	f_out(k,j,i) = f_inv(j,i,k)
247	ENDDO
248	ENDDO
249	ENDDO
250	!$acc end kernels
251	!$OMP END PARALLEL
252	#endif
253
254	ELSE
255
256	!
257	!-- Reorder the array in a way that the z index is in first position
258	!$OMP PARALLEL PRIVATE ( i, j, k )
259	!$OMP DO
260	!$acc kernels present( f_in )
261	!$acc loop
262	DO i = nxl, nxr
263	DO j = nys, nyn
264	!$acc loop vector( 32 )
265	DO k = 1, nz
266	f_inv(j,i,k) = f_in(i,j,k)
267	ENDDO
268	ENDDO
269	ENDDO
270	!$acc end kernels
271	!$OMP END PARALLEL
272
273	!$OMP PARALLEL PRIVATE ( i, j, k )
274	!$OMP DO
275	!$acc kernels present( f_out )
276	!$acc loop
277	DO k = 1, nz
278	DO i = nxl, nxr
279	!$acc loop vector( 32 )
280	DO j = nys, nyn
281	f_out(k,j,i) = f_inv(j,i,k)
282	ENDDO
283	ENDDO
284	ENDDO
285	!$acc end kernels
286	!$OMP END PARALLEL
287
288	ENDIF
289
290	END SUBROUTINE transpose_xz
291
292
293	SUBROUTINE transpose_yx( f_in, work, f_out )
294
295	!------------------------------------------------------------------------------!
296	! Description:
297	! ------------
298	! Transposition of input array (f_in) from y to x. For the input array, all
299	! elements along y reside on the same PE, while after transposition, all
300	! elements along x reside on the same PE.
301	!------------------------------------------------------------------------------!
302
303	USE cpulog
304	USE indices
305	USE interfaces
306	USE pegrid
307	USE transpose_indices
308
309	IMPLICIT NONE
310
311	INTEGER :: i, j, k, l, ys
312
313	REAL :: f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y), f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
314
315	REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) :: work
316
317	!$acc declare create( f_inv )
318	REAL :: f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
319
320
321	IF ( numprocs /= 1 ) THEN
322
323	#if defined( __parallel )
324	!
325	!-- Reorder input array for transposition
326	!$OMP PARALLEL PRIVATE ( i, j, k, l, ys )
327	!$OMP DO
328	DO l = 0, pdims(2) - 1
329	ys = 0 + l * ( nyn_x - nys_x + 1 )
330	!$acc kernels present( f_in, work )
331	!$acc loop
332	DO i = nxl_y, nxr_y
333	DO k = nzb_y, nzt_y
334	!$acc loop vector( 32 )
335	DO j = ys, ys + nyn_x - nys_x
336	work(j-ys+1,k,i,l) = f_in(j,i,k)
337	ENDDO
338	ENDDO
339	ENDDO
340	!$acc end kernels
341	ENDDO
342	!$OMP END PARALLEL
343
344	!
345	!-- Transpose array
346	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
347	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
348	!$acc update host( work )
349	CALL MPI_ALLTOALL( work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
350	f_inv(nys_x,nzb_x,0), sendrecvcount_xy, MPI_REAL, &
351	comm1dy, ierr )
352	!$acc update device( f_inv )
353	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
354	#endif
355
356	ELSE
357
358	!
359	!-- Reorder array f_in the same way as ALLTOALL did it
360	!$OMP PARALLEL PRIVATE ( i, j, k )
361	!$OMP DO
362	!$acc kernels present( f_in )
363	!$acc loop
364	DO i = nxl_y, nxr_y
365	DO k = nzb_y, nzt_y
366	!$acc loop vector( 32 )
367	DO j = 0, ny
368	f_inv(j,k,i) = f_in(j,i,k)
369	ENDDO
370	ENDDO
371	ENDDO
372	!$acc end kernels
373	!$OMP END PARALLEL
374
375	ENDIF
376
377	!
378	!-- Reorder transposed array in a way that the x index is in first position
379	!$OMP PARALLEL PRIVATE ( i, j, k )
380	!$OMP DO
381	!$acc kernels present( f_out )
382	!$acc loop
383	DO i = 0, nx
384	DO k = nzb_x, nzt_x
385	!$acc loop vector( 32 )
386	DO j = nys_x, nyn_x
387	f_out(i,j,k) = f_inv(j,k,i)
388	ENDDO
389	ENDDO
390	ENDDO
391	!$acc end kernels
392	!$OMP END PARALLEL
393
394	END SUBROUTINE transpose_yx
395
396
397	SUBROUTINE transpose_yxd( f_in, work, f_out )
398
399	!------------------------------------------------------------------------------!
400	! Description:
401	! ------------
402	! Transposition of input array (f_in) from y to x. For the input array, all
403	! elements along y reside on the same PE, while after transposition, all
404	! elements along x reside on the same PE.
405	! This is a direct transposition for arrays with indices in regular order
406	! (k,j,i) (cf. transpose_yx).
407	!------------------------------------------------------------------------------!
408
409	USE cpulog
410	USE indices
411	USE interfaces
412	USE pegrid
413	USE transpose_indices
414
415	IMPLICIT NONE
416
417	INTEGER :: i, j, k, l, m, xs
418
419	REAL :: f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nxl:nxr,1:nz,nys:nyn), &
420	f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x), &
421	work(nnxnnynnz)
422
423	#if defined( __parallel )
424
425	!
426	!-- Rearrange indices of input array in order to make data to be send
427	!-- by MPI contiguous
428	DO k = 1, nz
429	DO j = nys, nyn
430	DO i = nxl, nxr
431	f_inv(i,k,j) = f_in(k,j,i)
432	ENDDO
433	ENDDO
434	ENDDO
435
436	!
437	!-- Transpose array
438	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
439	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
440	CALL MPI_ALLTOALL( f_inv(nxl,1,nys), sendrecvcount_xy, MPI_REAL, &
441	work(1), sendrecvcount_xy, MPI_REAL, &
442	comm1dx, ierr )
443	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
444
445	!
446	!-- Reorder transposed array
447	m = 0
448	DO l = 0, pdims(1) - 1
449	xs = 0 + l * nnx
450	DO j = nys_x, nyn_x
451	DO k = 1, nz
452	DO i = xs, xs + nnx - 1
453	m = m + 1
454	f_out(i,j,k) = work(m)
455	ENDDO
456	ENDDO
457	ENDDO
458	ENDDO
459
460	#endif
461
462	END SUBROUTINE transpose_yxd
463
464
465	SUBROUTINE transpose_yz( f_in, work, f_out )
466
467	!------------------------------------------------------------------------------!
468	! Description:
469	! ------------
470	! Transposition of input array (f_in) from y to z. For the input array, all
471	! elements along y reside on the same PE, while after transposition, all
472	! elements along z reside on the same PE.
473	!------------------------------------------------------------------------------!
474
475	USE cpulog
476	USE indices
477	USE interfaces
478	USE pegrid
479	USE transpose_indices
480
481	IMPLICIT NONE
482
483	INTEGER :: i, j, k, l, zs
484
485	REAL :: f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y), f_out(nxl_z:nxr_z,nys_z:nyn_z,1:nz)
486
487	REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) :: work
488
489	!$acc declare create( f_inv )
490	REAL :: f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
491
492
493	!
494	!-- Rearrange indices of input array in order to make data to be send
495	!-- by MPI contiguous
496	!$OMP PARALLEL PRIVATE ( i, j, k )
497	!$OMP DO
498	!$acc kernels present( f_in )
499	!$acc loop
500	DO j = 0, ny
501	DO k = nzb_y, nzt_y
502	!$acc loop vector( 32 )
503	DO i = nxl_y, nxr_y
504	f_inv(i,k,j) = f_in(j,i,k)
505	ENDDO
506	ENDDO
507	ENDDO
508	!$acc end kernels
509	!$OMP END PARALLEL
510
511	!
512	!-- Move data to different array, because memory location of work1 is
513	!-- needed further below (work1 = work2).
514	!-- If the PE grid is one-dimensional along y, only local reordering
515	!-- of the data is necessary and no transposition has to be done.
516	IF ( pdims(1) == 1 ) THEN
517
518	!$OMP PARALLEL PRIVATE ( i, j, k )
519	!$OMP DO
520	!$acc kernels present( f_out )
521	!$acc loop
522	DO j = 0, ny
523	DO k = nzb_y, nzt_y
524	!$acc loop vector( 32 )
525	DO i = nxl_y, nxr_y
526	f_out(i,j,k) = f_inv(i,k,j)
527	ENDDO
528	ENDDO
529	ENDDO
530	!$acc end kernels
531	!$OMP END PARALLEL
532
533	ELSE
534
535	#if defined( __parallel )
536	!
537	!-- Transpose array
538	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
539	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
540	!$acc update host( f_inv )
541	CALL MPI_ALLTOALL( f_inv(nxl_y,nzb_y,0), sendrecvcount_yz, MPI_REAL, &
542	work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
543	comm1dx, ierr )
544	!$acc update device( work )
545	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
546
547	!
548	!-- Reorder transposed array
549	!$OMP PARALLEL PRIVATE ( i, j, k, l, zs )
550	!$OMP DO
551	DO l = 0, pdims(1) - 1
552	zs = 1 + l * ( nzt_y - nzb_y + 1 )
553	!$acc kernels present( f_out, work )
554	!$acc loop
555	DO j = nys_z, nyn_z
556	DO k = zs, zs + nzt_y - nzb_y
557	!$acc loop vector( 32 )
558	DO i = nxl_z, nxr_z
559	f_out(i,j,k) = work(i,k-zs+1,j,l)
560	ENDDO
561	ENDDO
562	ENDDO
563	!$acc end kernels
564	ENDDO
565	!$OMP END PARALLEL
566	#endif
567
568	ENDIF
569
570	END SUBROUTINE transpose_yz
571
572
573	SUBROUTINE transpose_zx( f_in, work, f_out )
574
575	!------------------------------------------------------------------------------!
576	! Description:
577	! ------------
578	! Transposition of input array (f_in) from z to x. For the input array, all
579	! elements along z reside on the same PE, while after transposition, all
580	! elements along x reside on the same PE.
581	!------------------------------------------------------------------------------!
582
583	USE cpulog
584	USE indices
585	USE interfaces
586	USE pegrid
587	USE transpose_indices
588
589	IMPLICIT NONE
590
591	INTEGER :: i, j, k, l, xs
592
593	REAL :: f_in(1:nz,nys:nyn,nxl:nxr), f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
594
595	REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) :: work
596
597	!$acc declare create( f_inv )
598	REAL :: f_inv(nys:nyn,nxl:nxr,1:nz)
599
600
601	!
602	!-- Rearrange indices of input array in order to make data to be send
603	!-- by MPI contiguous
604	!$OMP PARALLEL PRIVATE ( i, j, k )
605	!$OMP DO
606	!$acc kernels present( f_in )
607	!$acc loop
608	DO k = 1,nz
609	DO i = nxl, nxr
610	!$acc loop vector( 32 )
611	DO j = nys, nyn
612	f_inv(j,i,k) = f_in(k,j,i)
613	ENDDO
614	ENDDO
615	ENDDO
616	!$acc end kernels
617	!$OMP END PARALLEL
618
619	!
620	!-- Move data to different array, because memory location of work1 is
621	!-- needed further below (work1 = work2).
622	!-- If the PE grid is one-dimensional along y, only local reordering
623	!-- of the data is necessary and no transposition has to be done.
624	IF ( pdims(1) == 1 ) THEN
625
626	!$OMP PARALLEL PRIVATE ( i, j, k )
627	!$OMP DO
628	!$acc kernels present( f_out )
629	!$acc loop
630	DO k = 1, nz
631	DO i = nxl, nxr
632	!$acc loop vector( 32 )
633	DO j = nys, nyn
634	f_out(i,j,k) = f_inv(j,i,k)
635	ENDDO
636	ENDDO
637	ENDDO
638	!$acc end kernels
639	!$OMP END PARALLEL
640
641	ELSE
642
643	#if defined( __parallel )
644	!
645	!-- Transpose array
646	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
647	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
648	!$acc update host( f_inv )
649	CALL MPI_ALLTOALL( f_inv(nys,nxl,1), sendrecvcount_zx, MPI_REAL, &
650	work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
651	comm1dx, ierr )
652	!$acc update device( work )
653	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
654
655	!
656	!-- Reorder transposed array
657	!$OMP PARALLEL PRIVATE ( i, j, k, l, xs )
658	!$OMP DO
659	DO l = 0, pdims(1) - 1
660	xs = 0 + l * nnx
661	!$acc kernels present( f_out, work )
662	!$acc loop
663	DO k = nzb_x, nzt_x
664	DO i = xs, xs + nnx - 1
665	!$acc loop vector( 32 )
666	DO j = nys_x, nyn_x
667	f_out(i,j,k) = work(j,i-xs+1,k,l)
668	ENDDO
669	ENDDO
670	ENDDO
671	!$acc end kernels
672	ENDDO
673	!$OMP END PARALLEL
674	#endif
675
676	ENDIF
677
678	END SUBROUTINE transpose_zx
679
680
681	SUBROUTINE transpose_zy( f_in, work, f_out )
682
683	!------------------------------------------------------------------------------!
684	! Description:
685	! ------------
686	! Transposition of input array (f_in) from z to y. For the input array, all
687	! elements along z reside on the same PE, while after transposition, all
688	! elements along y reside on the same PE.
689	!------------------------------------------------------------------------------!
690
691	USE cpulog
692	USE indices
693	USE interfaces
694	USE pegrid
695	USE transpose_indices
696
697	IMPLICIT NONE
698
699	INTEGER :: i, j, k, l, zs
700
701	REAL :: f_in(nxl_z:nxr_z,nys_z:nyn_z,1:nz), f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
702
703	REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) :: work
704
705	!$acc declare create( f_inv )
706	REAL :: f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
707
708
709	!
710	!-- If the PE grid is one-dimensional along y, the array has only to be
711	!-- reordered locally and therefore no transposition has to be done.
712	IF ( pdims(1) /= 1 ) THEN
713
714	#if defined( __parallel )
715	!
716	!-- Reorder input array for transposition
717	!$OMP PARALLEL PRIVATE ( i, j, k, l, zs )
718	!$OMP DO
719	DO l = 0, pdims(1) - 1
720	zs = 1 + l * ( nzt_y - nzb_y + 1 )
721	!$acc kernels present( f_in, work )
722	!$acc loop
723	DO j = nys_z, nyn_z
724	DO k = zs, zs + nzt_y - nzb_y
725	!$acc loop vector( 32 )
726	DO i = nxl_z, nxr_z
727	work(i,k-zs+1,j,l) = f_in(i,j,k)
728	ENDDO
729	ENDDO
730	ENDDO
731	!$acc end kernels
732	ENDDO
733	!$OMP END PARALLEL
734
735	!
736	!-- Transpose array
737	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
738	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
739	!$acc update host( work )
740	CALL MPI_ALLTOALL( work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
741	f_inv(nxl_y,nzb_y,0), sendrecvcount_yz, MPI_REAL, &
742	comm1dx, ierr )
743	!$acc update device( f_inv )
744	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
745	#endif
746
747	ELSE
748	!
749	!-- Reorder the array in the same way like ALLTOALL did it
750	!$OMP PARALLEL PRIVATE ( i, j, k )
751	!$OMP DO
752	!$acc kernels present( f_in )
753	!$acc loop
754	DO k = nzb_y, nzt_y
755	DO j = 0, ny
756	!$acc loop vector( 32 )
757	DO i = nxl_y, nxr_y
758	f_inv(i,k,j) = f_in(i,j,k)
759	ENDDO
760	ENDDO
761	ENDDO
762	!$acc end kernels
763	!$OMP END PARALLEL
764
765	ENDIF
766
767	!
768	!-- Reorder transposed array in a way that the y index is in first position
769	!$OMP PARALLEL PRIVATE ( i, j, k )
770	!$OMP DO
771	!$acc kernels present( f_out )
772	!$acc loop
773	DO k = nzb_y, nzt_y
774	DO i = nxl_y, nxr_y
775	!$acc loop vector( 32 )
776	DO j = 0, ny
777	f_out(j,i,k) = f_inv(i,k,j)
778	ENDDO
779	ENDDO
780	ENDDO
781	!$acc end kernels
782	!$OMP END PARALLEL
783
784	END SUBROUTINE transpose_zy
785
786
787	SUBROUTINE transpose_zyd( f_in, work, f_out )
788
789	!------------------------------------------------------------------------------!
790	! Description:
791	! ------------
792	! Transposition of input array (f_in) from z to y. For the input array, all
793	! elements along z reside on the same PE, while after transposition, all
794	! elements along y reside on the same PE.
795	! This is a direct transposition for arrays with indices in regular order
796	! (k,j,i) (cf. transpose_zy).
797	!------------------------------------------------------------------------------!
798
799	USE cpulog
800	USE indices
801	USE interfaces
802	USE pegrid
803	USE transpose_indices
804
805	IMPLICIT NONE
806
807	INTEGER :: i, j, k, l, m, ys
808
809	REAL :: f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nys:nyn,nxl:nxr,1:nz), &
810	f_out(0:ny,nxl_yd:nxr_yd,nzb_yd:nzt_yd), &
811	work(nnxnnynnz)
812
813	#if defined( __parallel )
814
815	!
816	!-- Rearrange indices of input array in order to make data to be send
817	!-- by MPI contiguous
818	DO i = nxl, nxr
819	DO j = nys, nyn
820	DO k = 1, nz
821	f_inv(j,i,k) = f_in(k,j,i)
822	ENDDO
823	ENDDO
824	ENDDO
825
826	!
827	!-- Move data to different array, because memory location of work1 is
828	!-- needed further below (work1 = work2).
829	!-- If the PE grid is one-dimensional along x, only local reordering
830	!-- of the data is necessary and no transposition has to be done.
831	IF ( pdims(2) == 1 ) THEN
832	DO k = 1, nz
833	DO i = nxl, nxr
834	DO j = nys, nyn
835	f_out(j,i,k) = f_inv(j,i,k)
836	ENDDO
837	ENDDO
838	ENDDO
839	RETURN
840	ENDIF
841
842	!
843	!-- Transpose array
844	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
845	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
846	CALL MPI_ALLTOALL( f_inv(nys,nxl,1), sendrecvcount_zyd, MPI_REAL, &
847	work(1), sendrecvcount_zyd, MPI_REAL, &
848	comm1dy, ierr )
849	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
850
851	!
852	!-- Reorder transposed array
853	m = 0
854	DO l = 0, pdims(2) - 1
855	ys = 0 + l * nny
856	DO k = nzb_yd, nzt_yd
857	DO i = nxl_yd, nxr_yd
858	DO j = ys, ys + nny - 1
859	m = m + 1
860	f_out(j,i,k) = work(m)
861	ENDDO
862	ENDDO
863	ENDDO
864	ENDDO
865
866	#endif
867
868	END SUBROUTINE transpose_zyd

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: palm/trunk/SOURCE/transpose.f90 @ 1111

Download in other formats: