Home

Context Navigation

transpose.f90 @ 1257

Last change on this file since 1257 was 1257, checked in by raasch, 10 years ago

New:
---

openACC porting of timestep calculation
(modules, timestep, time_integration)

Changed:

openACC loop directives and vector clauses removed (because they do not give any performance improvement with PGI
compiler versions > 13.6)
(advec_ws, buoyancy, coriolis, diffusion_e, diffusion_s, diffusion_u, diffusion_v, diffusion_w, diffusivities, exchange_horiz, fft_xy, pres, production_e, transpose, tridia_solver, wall_fluxes)

openACC loop independent clauses added
(boundary_conds, prandtl_fluxes, pres)

openACC declare create statements moved after FORTRAN declaration statement
(diffusion_u, diffusion_v, diffusion_w, fft_xy, poisfft, production_e, tridia_solver)

openACC end parallel replaced by end parallel loop
(flow_statistics, pres)

openACC "kernels do" replaced by "kernels loop"
(prandtl_fluxes)

output format for theta* changed to avoid output of *
(run_control)

Errors:

bugfix for calculation of advective timestep (old version may cause wrong timesteps in case of
vertixcally stretched grids)
Attention: standard run-control output has changed!
(timestep)

Property svn:keywords set to Id

File size: 25.9 KB

Rev	Line
[1216]	1	SUBROUTINE resort_for_xy( f_in, f_inv )
[1]	2
[1036]	3	!--------------------------------------------------------------------------------!
	4	! This file is part of PALM.
	5	!
	6	! PALM is free software: you can redistribute it and/or modify it under the terms
	7	! of the GNU General Public License as published by the Free Software Foundation,
	8	! either version 3 of the License, or (at your option) any later version.
	9	!
	10	! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
	11	! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
	12	! A PARTICULAR PURPOSE. See the GNU General Public License for more details.
	13	!
	14	! You should have received a copy of the GNU General Public License along with
	15	! PALM. If not, see <http://www.gnu.org/licenses/>.
	16	!
	17	! Copyright 1997-2012 Leibniz University Hannover
	18	!--------------------------------------------------------------------------------!
	19	!
[484]	20	! Current revisions:
[1]	21	! -----------------
[1257]	22	! openacc loop and loop vector clauses removed
[198]	23	!
	24	! Former revisions:
	25	! -----------------
	26	! $Id: transpose.f90 1257 2013-11-08 15:18:40Z raasch $
	27	!
[1217]	28	! 1216 2013-08-26 09:31:42Z raasch
	29	! re-sorting of the transposed / to be transposed arrays moved to separate
	30	! routines resort_for_...
	31	!
[1112]	32	! 1111 2013-03-08 23:54:10Z raasch
	33	! openACC directives added,
	34	! resorting data from/to work changed, work got 4 dimensions instead of 1
	35	!
[1107]	36	! 1106 2013-03-04 05:31:38Z raasch
	37	! preprocessor lines rearranged so that routines can also be used in serial
	38	! (non-parallel) mode
	39	!
[1093]	40	! 1092 2013-02-02 11:24:22Z raasch
	41	! unused variables removed
	42	!
[1037]	43	! 1036 2012-10-22 13:43:42Z raasch
	44	! code put under GPL (PALM 3.9)
	45	!
[1004]	46	! 1003 2012-09-14 14:35:53Z raasch
	47	! indices nxa, nya, etc. replaced by nx, ny, etc.
	48	!
[684]	49	! 683 2011-02-09 14:25:15Z raasch
	50	! openMP parallelization of transpositions for 2d-domain-decomposition
	51	!
[623]	52	! 622 2010-12-10 08:08:13Z raasch
	53	! optional barriers included in order to speed up collective operations
	54	!
[198]	55	! 164 2008-05-15 08:46:15Z raasch
[164]	56	! f_inv changed from subroutine argument to automatic array in order to do
	57	! re-ordering from f_in to f_inv in one step, one array work is needed instead
	58	! of work1 and work2
[1]	59	!
[198]	60	! February 2007
[3]	61	! RCS Log replace by Id keyword, revision history cleaned up
	62	!
[1]	63	! Revision 1.2 2004/04/30 13:12:17 raasch
	64	! Switched from mpi_alltoallv to the simpler mpi_alltoall,
	65	! all former transpose-routine files collected in this file, enlarged
	66	! transposition arrays introduced
	67	!
	68	! Revision 1.1 2004/04/30 13:08:16 raasch
	69	! Initial revision (collection of former routines transpose_xy, transpose_xz,
	70	! transpose_yx, transpose_yz, transpose_zx, transpose_zy)
	71	!
	72	! Revision 1.1 1997/07/24 11:25:18 raasch
	73	! Initial revision
	74	!
[1216]	75	!------------------------------------------------------------------------------!
	76	! Description:
	77	! ------------
	78	! Resorting data for the transposition from x to y. The transposition itself
	79	! is carried out in transpose_xy
	80	!------------------------------------------------------------------------------!
	81
	82	USE indices
	83	USE transpose_indices
	84
	85	IMPLICIT NONE
	86
	87	REAL :: f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
	88	REAL :: f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
	89
	90
	91	INTEGER :: i, j, k
	92
[1]	93	!
[1216]	94	!-- Rearrange indices of input array in order to make data to be send
	95	!-- by MPI contiguous
	96	!$OMP PARALLEL PRIVATE ( i, j, k )
	97	!$OMP DO
	98	!$acc kernels present( f_in, f_inv )
	99	DO i = 0, nx
	100	DO k = nzb_x, nzt_x
	101	DO j = nys_x, nyn_x
	102	f_inv(j,k,i) = f_in(i,j,k)
	103	ENDDO
	104	ENDDO
	105	ENDDO
	106	!$acc end kernels
	107	!$OMP END PARALLEL
	108
	109	END SUBROUTINE resort_for_xy
	110
	111
	112	SUBROUTINE transpose_xy( f_inv, f_out )
	113
	114	!------------------------------------------------------------------------------!
[1]	115	! Description:
	116	! ------------
	117	! Transposition of input array (f_in) from x to y. For the input array, all
	118	! elements along x reside on the same PE, while after transposition, all
	119	! elements along y reside on the same PE.
	120	!------------------------------------------------------------------------------!
	121
	122	USE cpulog
	123	USE indices
	124	USE interfaces
	125	USE pegrid
	126	USE transpose_indices
	127
	128	IMPLICIT NONE
	129
[1111]	130	INTEGER :: i, j, k, l, ys
[1]	131
[1216]	132	REAL :: f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx), f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
[1]	133
[1111]	134	REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) :: work
	135
	136
[1106]	137	IF ( numprocs /= 1 ) THEN
	138
	139	#if defined( __parallel )
[1]	140	!
[1106]	141	!-- Transpose array
	142	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
	143	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	144	!$acc update host( f_inv )
	145	CALL MPI_ALLTOALL( f_inv(nys_x,nzb_x,0), sendrecvcount_xy, MPI_REAL, &
	146	work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
[1106]	147	comm1dy, ierr )
	148	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]	149
	150	!
[1106]	151	!-- Reorder transposed array
[1111]	152	!$OMP PARALLEL PRIVATE ( i, j, k, l, ys )
[683]	153	!$OMP DO
[1216]	154	!$acc data copyin( work )
[1106]	155	DO l = 0, pdims(2) - 1
	156	ys = 0 + l * ( nyn_x - nys_x + 1 )
[1111]	157	!$acc kernels present( f_out, work )
[1106]	158	DO i = nxl_y, nxr_y
	159	DO k = nzb_y, nzt_y
	160	DO j = ys, ys + nyn_x - nys_x
[1111]	161	f_out(j,i,k) = work(j-ys+1,k,i,l)
[1106]	162	ENDDO
[1]	163	ENDDO
	164	ENDDO
[1111]	165	!$acc end kernels
[1]	166	ENDDO
[1216]	167	!$acc end data
[683]	168	!$OMP END PARALLEL
[1]	169	#endif
	170
[1106]	171	ELSE
	172
	173	!
	174	!-- Reorder transposed array
	175	!$OMP PARALLEL PRIVATE ( i, j, k )
	176	!$OMP DO
[1216]	177	!$acc kernels present( f_inv, f_out )
[1106]	178	DO k = nzb_y, nzt_y
	179	DO i = nxl_y, nxr_y
	180	DO j = 0, ny
	181	f_out(j,i,k) = f_inv(j,k,i)
	182	ENDDO
	183	ENDDO
	184	ENDDO
[1111]	185	!$acc end kernels
[1106]	186	!$OMP END PARALLEL
	187
	188	ENDIF
	189
[1]	190	END SUBROUTINE transpose_xy
	191
	192
[1216]	193	SUBROUTINE resort_for_xz( f_inv, f_out )
[1]	194
	195	!------------------------------------------------------------------------------!
	196	! Description:
	197	! ------------
[1216]	198	! Resorting data after the transposition from x to z. The transposition itself
	199	! is carried out in transpose_xz
	200	!------------------------------------------------------------------------------!
	201
	202	USE indices
	203	USE transpose_indices
	204
	205	IMPLICIT NONE
	206
	207	REAL :: f_inv(nys:nyn,nxl:nxr,1:nz)
	208	REAL :: f_out(1:nz,nys:nyn,nxl:nxr)
	209
	210
	211	INTEGER :: i, j, k
	212
	213	!
	214	!-- Rearrange indices of input array in order to make data to be send
	215	!-- by MPI contiguous.
	216	!-- In case of parallel fft/transposition, scattered store is faster in
	217	!-- backward direction!!!
	218	!$OMP PARALLEL PRIVATE ( i, j, k )
	219	!$OMP DO
	220	!$acc kernels present( f_inv, f_out )
	221	DO k = 1, nz
	222	DO i = nxl, nxr
	223	DO j = nys, nyn
	224	f_out(k,j,i) = f_inv(j,i,k)
	225	ENDDO
	226	ENDDO
	227	ENDDO
	228	!$acc end kernels
	229	!$OMP END PARALLEL
	230
	231	END SUBROUTINE resort_for_xz
	232
	233
	234	SUBROUTINE transpose_xz( f_in, f_inv )
	235
	236	!------------------------------------------------------------------------------!
	237	! Description:
	238	! ------------
[1]	239	! Transposition of input array (f_in) from x to z. For the input array, all
	240	! elements along x reside on the same PE, while after transposition, all
	241	! elements along z reside on the same PE.
	242	!------------------------------------------------------------------------------!
	243
	244	USE cpulog
	245	USE indices
	246	USE interfaces
	247	USE pegrid
	248	USE transpose_indices
	249
	250	IMPLICIT NONE
	251
[1111]	252	INTEGER :: i, j, k, l, xs
[1]	253
[1216]	254	REAL :: f_in(0:nx,nys_x:nyn_x,nzb_x:nzt_x), f_inv(nys:nyn,nxl:nxr,1:nz)
[1]	255
[1111]	256	REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) :: work
[1]	257
[1111]	258
[1]	259	!
	260	!-- If the PE grid is one-dimensional along y, the array has only to be
	261	!-- reordered locally and therefore no transposition has to be done.
	262	IF ( pdims(1) /= 1 ) THEN
[1106]	263
	264	#if defined( __parallel )
[1]	265	!
	266	!-- Reorder input array for transposition
[1111]	267	!$OMP PARALLEL PRIVATE ( i, j, k, l, xs )
[683]	268	!$OMP DO
[1216]	269	!$acc data copyout( work )
[1]	270	DO l = 0, pdims(1) - 1
	271	xs = 0 + l * nnx
[1111]	272	!$acc kernels present( f_in, work )
[1003]	273	DO k = nzb_x, nzt_x
[164]	274	DO i = xs, xs + nnx - 1
[1003]	275	DO j = nys_x, nyn_x
[1111]	276	work(j,i-xs+1,k,l) = f_in(i,j,k)
[1]	277	ENDDO
	278	ENDDO
	279	ENDDO
[1111]	280	!$acc end kernels
[1]	281	ENDDO
[1216]	282	!$acc end data
[683]	283	!$OMP END PARALLEL
[1]	284
	285	!
	286	!-- Transpose array
	287	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]	288	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	289	CALL MPI_ALLTOALL( work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
	290	f_inv(nys,nxl,1), sendrecvcount_zx, MPI_REAL, &
[1]	291	comm1dx, ierr )
[1111]	292	!$acc update device( f_inv )
[1]	293	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1106]	294	#endif
	295
[1]	296	ELSE
[1106]	297
[1]	298	!
	299	!-- Reorder the array in a way that the z index is in first position
[683]	300	!$OMP PARALLEL PRIVATE ( i, j, k )
	301	!$OMP DO
[1216]	302	!$acc kernels present( f_in, f_inv )
[1003]	303	DO i = nxl, nxr
	304	DO j = nys, nyn
	305	DO k = 1, nz
[164]	306	f_inv(j,i,k) = f_in(i,j,k)
[1]	307	ENDDO
	308	ENDDO
	309	ENDDO
[1111]	310	!$acc end kernels
[683]	311	!$OMP END PARALLEL
[1]	312
[164]	313	ENDIF
	314
[1]	315	END SUBROUTINE transpose_xz
	316
	317
[1216]	318	SUBROUTINE resort_for_yx( f_inv, f_out )
[1]	319
	320	!------------------------------------------------------------------------------!
	321	! Description:
	322	! ------------
[1216]	323	! Resorting data after the transposition from y to x. The transposition itself
	324	! is carried out in transpose_yx
	325	!------------------------------------------------------------------------------!
	326
	327	USE indices
	328	USE transpose_indices
	329
	330	IMPLICIT NONE
	331
	332	REAL :: f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
	333	REAL :: f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
	334
	335
	336	INTEGER :: i, j, k
	337
	338	!
	339	!-- Rearrange indices of input array in order to make data to be send
	340	!-- by MPI contiguous
	341	!$OMP PARALLEL PRIVATE ( i, j, k )
	342	!$OMP DO
	343	!$acc kernels present( f_inv, f_out )
	344	DO i = 0, nx
	345	DO k = nzb_x, nzt_x
	346	DO j = nys_x, nyn_x
	347	f_out(i,j,k) = f_inv(j,k,i)
	348	ENDDO
	349	ENDDO
	350	ENDDO
	351	!$acc end kernels
	352	!$OMP END PARALLEL
	353
	354	END SUBROUTINE resort_for_yx
	355
	356
	357	SUBROUTINE transpose_yx( f_in, f_inv )
	358
	359	!------------------------------------------------------------------------------!
	360	! Description:
	361	! ------------
[1]	362	! Transposition of input array (f_in) from y to x. For the input array, all
	363	! elements along y reside on the same PE, while after transposition, all
	364	! elements along x reside on the same PE.
	365	!------------------------------------------------------------------------------!
	366
	367	USE cpulog
	368	USE indices
	369	USE interfaces
	370	USE pegrid
	371	USE transpose_indices
	372
	373	IMPLICIT NONE
	374
[1111]	375	INTEGER :: i, j, k, l, ys
[1]	376
[1216]	377	REAL :: f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y), f_inv(nys_x:nyn_x,nzb_x:nzt_x,0:nx)
[1]	378
[1111]	379	REAL, DIMENSION(nyn_x-nys_x+1,nzb_y:nzt_y,nxl_y:nxr_y,0:pdims(2)-1) :: work
	380
	381
[1106]	382	IF ( numprocs /= 1 ) THEN
	383
[1]	384	#if defined( __parallel )
	385	!
[1106]	386	!-- Reorder input array for transposition
[1111]	387	!$OMP PARALLEL PRIVATE ( i, j, k, l, ys )
[683]	388	!$OMP DO
[1216]	389	!$acc data copyout( work )
[1106]	390	DO l = 0, pdims(2) - 1
	391	ys = 0 + l * ( nyn_x - nys_x + 1 )
[1111]	392	!$acc kernels present( f_in, work )
[1106]	393	DO i = nxl_y, nxr_y
	394	DO k = nzb_y, nzt_y
	395	DO j = ys, ys + nyn_x - nys_x
[1111]	396	work(j-ys+1,k,i,l) = f_in(j,i,k)
[1106]	397	ENDDO
	398	ENDDO
	399	ENDDO
[1111]	400	!$acc end kernels
[1106]	401	ENDDO
[1216]	402	!$acc end data
[1106]	403	!$OMP END PARALLEL
	404
	405	!
	406	!-- Transpose array
	407	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
	408	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	409	CALL MPI_ALLTOALL( work(1,nzb_y,nxl_y,0), sendrecvcount_xy, MPI_REAL, &
	410	f_inv(nys_x,nzb_x,0), sendrecvcount_xy, MPI_REAL, &
[1106]	411	comm1dy, ierr )
[1111]	412	!$acc update device( f_inv )
[1106]	413	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
	414	#endif
	415
	416	ELSE
	417
	418	!
	419	!-- Reorder array f_in the same way as ALLTOALL did it
	420	!$OMP PARALLEL PRIVATE ( i, j, k )
	421	!$OMP DO
[1216]	422	!$acc kernels present( f_in, f_inv )
[1003]	423	DO i = nxl_y, nxr_y
	424	DO k = nzb_y, nzt_y
[1106]	425	DO j = 0, ny
	426	f_inv(j,k,i) = f_in(j,i,k)
[1]	427	ENDDO
	428	ENDDO
	429	ENDDO
[1111]	430	!$acc end kernels
[683]	431	!$OMP END PARALLEL
[1]	432
[1106]	433	ENDIF
[1]	434
	435	END SUBROUTINE transpose_yx
	436
	437
[1216]	438	SUBROUTINE transpose_yxd( f_in, f_out )
[1]	439
	440	!------------------------------------------------------------------------------!
	441	! Description:
	442	! ------------
	443	! Transposition of input array (f_in) from y to x. For the input array, all
	444	! elements along y reside on the same PE, while after transposition, all
	445	! elements along x reside on the same PE.
	446	! This is a direct transposition for arrays with indices in regular order
	447	! (k,j,i) (cf. transpose_yx).
	448	!------------------------------------------------------------------------------!
	449
	450	USE cpulog
	451	USE indices
	452	USE interfaces
	453	USE pegrid
	454	USE transpose_indices
	455
	456	IMPLICIT NONE
	457
[1092]	458	INTEGER :: i, j, k, l, m, xs
[1]	459
[1003]	460	REAL :: f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nxl:nxr,1:nz,nys:nyn), &
	461	f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x), &
[164]	462	work(nnxnnynnz)
[1]	463
	464	#if defined( __parallel )
	465
	466	!
	467	!-- Rearrange indices of input array in order to make data to be send
	468	!-- by MPI contiguous
[1003]	469	DO k = 1, nz
	470	DO j = nys, nyn
	471	DO i = nxl, nxr
[164]	472	f_inv(i,k,j) = f_in(k,j,i)
[1]	473	ENDDO
	474	ENDDO
	475	ENDDO
	476
	477	!
	478	!-- Transpose array
	479	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]	480	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1]	481	CALL MPI_ALLTOALL( f_inv(nxl,1,nys), sendrecvcount_xy, MPI_REAL, &
[164]	482	work(1), sendrecvcount_xy, MPI_REAL, &
[1]	483	comm1dx, ierr )
	484	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
	485
	486	!
	487	!-- Reorder transposed array
	488	m = 0
	489	DO l = 0, pdims(1) - 1
	490	xs = 0 + l * nnx
[1003]	491	DO j = nys_x, nyn_x
	492	DO k = 1, nz
[1]	493	DO i = xs, xs + nnx - 1
	494	m = m + 1
[164]	495	f_out(i,j,k) = work(m)
[1]	496	ENDDO
	497	ENDDO
	498	ENDDO
	499	ENDDO
	500
	501	#endif
	502
	503	END SUBROUTINE transpose_yxd
	504
	505
[1216]	506	SUBROUTINE resort_for_yz( f_in, f_inv )
[1]	507
	508	!------------------------------------------------------------------------------!
	509	! Description:
	510	! ------------
[1216]	511	! Resorting data for the transposition from y to z. The transposition itself
	512	! is carried out in transpose_yz
	513	!------------------------------------------------------------------------------!
	514
	515	USE indices
	516	USE transpose_indices
	517
	518	IMPLICIT NONE
	519
	520	REAL :: f_in(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
	521	REAL :: f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
	522
	523
	524	INTEGER :: i, j, k
	525
	526	!
	527	!-- Rearrange indices of input array in order to make data to be send
	528	!-- by MPI contiguous
	529	!$OMP PARALLEL PRIVATE ( i, j, k )
	530	!$OMP DO
	531	!$acc kernels present( f_in, f_inv )
	532	DO j = 0, ny
	533	DO k = nzb_y, nzt_y
	534	DO i = nxl_y, nxr_y
	535	f_inv(i,k,j) = f_in(j,i,k)
	536	ENDDO
	537	ENDDO
	538	ENDDO
	539	!$acc end kernels
	540	!$OMP END PARALLEL
	541
	542	END SUBROUTINE resort_for_yz
	543
	544
	545	SUBROUTINE transpose_yz( f_inv, f_out )
	546
	547	!------------------------------------------------------------------------------!
	548	! Description:
	549	! ------------
[1]	550	! Transposition of input array (f_in) from y to z. For the input array, all
	551	! elements along y reside on the same PE, while after transposition, all
	552	! elements along z reside on the same PE.
	553	!------------------------------------------------------------------------------!
	554
	555	USE cpulog
	556	USE indices
	557	USE interfaces
	558	USE pegrid
	559	USE transpose_indices
	560
	561	IMPLICIT NONE
	562
[1111]	563	INTEGER :: i, j, k, l, zs
[1]	564
[1216]	565	REAL :: f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny), f_out(nxl_z:nxr_z,nys_z:nyn_z,1:nz)
[1]	566
[1111]	567	REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) :: work
	568
	569
[1]	570	!
	571	!-- If the PE grid is one-dimensional along y, only local reordering
	572	!-- of the data is necessary and no transposition has to be done.
	573	IF ( pdims(1) == 1 ) THEN
[1106]	574
[683]	575	!$OMP PARALLEL PRIVATE ( i, j, k )
	576	!$OMP DO
[1216]	577	!$acc kernels present( f_inv, f_out )
[1003]	578	DO j = 0, ny
	579	DO k = nzb_y, nzt_y
	580	DO i = nxl_y, nxr_y
[164]	581	f_out(i,j,k) = f_inv(i,k,j)
[1]	582	ENDDO
	583	ENDDO
	584	ENDDO
[1111]	585	!$acc end kernels
[683]	586	!$OMP END PARALLEL
[1]	587
[1106]	588	ELSE
	589
	590	#if defined( __parallel )
[1]	591	!
[1106]	592	!-- Transpose array
	593	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
	594	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	595	!$acc update host( f_inv )
	596	CALL MPI_ALLTOALL( f_inv(nxl_y,nzb_y,0), sendrecvcount_yz, MPI_REAL, &
	597	work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
[1106]	598	comm1dx, ierr )
	599	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]	600
	601	!
[1106]	602	!-- Reorder transposed array
[1111]	603	!$OMP PARALLEL PRIVATE ( i, j, k, l, zs )
[683]	604	!$OMP DO
[1216]	605	!$acc data copyin( work )
[1106]	606	DO l = 0, pdims(1) - 1
	607	zs = 1 + l * ( nzt_y - nzb_y + 1 )
[1216]	608	!$acc kernels present( f_out )
[1106]	609	DO j = nys_z, nyn_z
	610	DO k = zs, zs + nzt_y - nzb_y
	611	DO i = nxl_z, nxr_z
[1111]	612	f_out(i,j,k) = work(i,k-zs+1,j,l)
[1106]	613	ENDDO
[1]	614	ENDDO
	615	ENDDO
[1111]	616	!$acc end kernels
[1]	617	ENDDO
[1216]	618	!$acc end data
[683]	619	!$OMP END PARALLEL
[1]	620	#endif
	621
[1106]	622	ENDIF
	623
[1]	624	END SUBROUTINE transpose_yz
	625
	626
[1216]	627	SUBROUTINE resort_for_zx( f_in, f_inv )
[1]	628
	629	!------------------------------------------------------------------------------!
	630	! Description:
	631	! ------------
[1216]	632	! Resorting data for the transposition from z to x. The transposition itself
	633	! is carried out in transpose_zx
	634	!------------------------------------------------------------------------------!
	635
	636	USE indices
	637	USE transpose_indices
	638
	639	IMPLICIT NONE
	640
	641	REAL :: f_in(1:nz,nys:nyn,nxl:nxr)
	642	REAL :: f_inv(nys:nyn,nxl:nxr,1:nz)
	643
	644
	645	INTEGER :: i, j, k
	646
	647	!
	648	!-- Rearrange indices of input array in order to make data to be send
	649	!-- by MPI contiguous
	650	!$OMP PARALLEL PRIVATE ( i, j, k )
	651	!$OMP DO
	652	!$acc kernels present( f_in, f_inv )
	653	DO k = 1,nz
	654	DO i = nxl, nxr
	655	DO j = nys, nyn
	656	f_inv(j,i,k) = f_in(k,j,i)
	657	ENDDO
	658	ENDDO
	659	ENDDO
	660	!$acc end kernels
	661	!$OMP END PARALLEL
	662
	663	END SUBROUTINE resort_for_zx
	664
	665
	666	SUBROUTINE transpose_zx( f_inv, f_out )
	667
	668	!------------------------------------------------------------------------------!
	669	! Description:
	670	! ------------
[1]	671	! Transposition of input array (f_in) from z to x. For the input array, all
	672	! elements along z reside on the same PE, while after transposition, all
	673	! elements along x reside on the same PE.
	674	!------------------------------------------------------------------------------!
	675
	676	USE cpulog
	677	USE indices
	678	USE interfaces
	679	USE pegrid
	680	USE transpose_indices
	681
	682	IMPLICIT NONE
	683
[1111]	684	INTEGER :: i, j, k, l, xs
[1]	685
[1216]	686	REAL :: f_inv(nys:nyn,nxl:nxr,1:nz), f_out(0:nx,nys_x:nyn_x,nzb_x:nzt_x)
[1]	687
[1111]	688	REAL, DIMENSION(nys_x:nyn_x,nnx,nzb_x:nzt_x,0:pdims(1)-1) :: work
	689
[1]	690
	691	!
	692	!-- If the PE grid is one-dimensional along y, only local reordering
	693	!-- of the data is necessary and no transposition has to be done.
	694	IF ( pdims(1) == 1 ) THEN
[1106]	695
[683]	696	!$OMP PARALLEL PRIVATE ( i, j, k )
	697	!$OMP DO
[1216]	698	!$acc kernels present( f_inv, f_out )
[1003]	699	DO k = 1, nz
	700	DO i = nxl, nxr
	701	DO j = nys, nyn
[164]	702	f_out(i,j,k) = f_inv(j,i,k)
[1]	703	ENDDO
	704	ENDDO
	705	ENDDO
[1111]	706	!$acc end kernels
[683]	707	!$OMP END PARALLEL
[1]	708
[1106]	709	ELSE
	710
	711	#if defined( __parallel )
[1]	712	!
[1106]	713	!-- Transpose array
	714	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
	715	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	716	!$acc update host( f_inv )
	717	CALL MPI_ALLTOALL( f_inv(nys,nxl,1), sendrecvcount_zx, MPI_REAL, &
	718	work(nys_x,1,nzb_x,0), sendrecvcount_zx, MPI_REAL, &
[1106]	719	comm1dx, ierr )
	720	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1]	721
	722	!
[1106]	723	!-- Reorder transposed array
[1111]	724	!$OMP PARALLEL PRIVATE ( i, j, k, l, xs )
[683]	725	!$OMP DO
[1216]	726	!$acc data copyin( work )
[1106]	727	DO l = 0, pdims(1) - 1
	728	xs = 0 + l * nnx
[1216]	729	!$acc kernels present( f_out )
[1106]	730	DO k = nzb_x, nzt_x
	731	DO i = xs, xs + nnx - 1
	732	DO j = nys_x, nyn_x
[1111]	733	f_out(i,j,k) = work(j,i-xs+1,k,l)
[1106]	734	ENDDO
[1]	735	ENDDO
	736	ENDDO
[1111]	737	!$acc end kernels
[1]	738	ENDDO
[1216]	739	!$acc end data
[683]	740	!$OMP END PARALLEL
[1]	741	#endif
	742
[1106]	743	ENDIF
	744
[1]	745	END SUBROUTINE transpose_zx
	746
	747
[1216]	748	SUBROUTINE resort_for_zy( f_inv, f_out )
[1]	749
	750	!------------------------------------------------------------------------------!
	751	! Description:
	752	! ------------
[1216]	753	! Resorting data after the transposition from z to y. The transposition itself
	754	! is carried out in transpose_zy
	755	!------------------------------------------------------------------------------!
	756
	757	USE indices
	758	USE transpose_indices
	759
	760	IMPLICIT NONE
	761
	762	REAL :: f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
	763	REAL :: f_out(0:ny,nxl_y:nxr_y,nzb_y:nzt_y)
	764
	765
	766	INTEGER :: i, j, k
	767
	768	!
	769	!-- Rearrange indices of input array in order to make data to be send
	770	!-- by MPI contiguous
	771	!$OMP PARALLEL PRIVATE ( i, j, k )
	772	!$OMP DO
	773	!$acc kernels present( f_inv, f_out )
	774	DO k = nzb_y, nzt_y
	775	DO j = 0, ny
	776	DO i = nxl_y, nxr_y
	777	f_out(j,i,k) = f_inv(i,k,j)
	778	ENDDO
	779	ENDDO
	780	ENDDO
	781	!$acc end kernels
	782	!$OMP END PARALLEL
	783
	784	END SUBROUTINE resort_for_zy
	785
	786
	787	SUBROUTINE transpose_zy( f_in, f_inv )
	788
	789	!------------------------------------------------------------------------------!
	790	! Description:
	791	! ------------
[1]	792	! Transposition of input array (f_in) from z to y. For the input array, all
	793	! elements along z reside on the same PE, while after transposition, all
	794	! elements along y reside on the same PE.
	795	!------------------------------------------------------------------------------!
	796
	797	USE cpulog
	798	USE indices
	799	USE interfaces
	800	USE pegrid
	801	USE transpose_indices
	802
	803	IMPLICIT NONE
	804
[1111]	805	INTEGER :: i, j, k, l, zs
[1]	806
[1216]	807	REAL :: f_in(nxl_z:nxr_z,nys_z:nyn_z,1:nz), f_inv(nxl_y:nxr_y,nzb_y:nzt_y,0:ny)
[1]	808
[1111]	809	REAL, DIMENSION(nxl_z:nxr_z,nzt_y-nzb_y+1,nys_z:nyn_z,0:pdims(1)-1) :: work
	810
	811
[1]	812	!
	813	!-- If the PE grid is one-dimensional along y, the array has only to be
	814	!-- reordered locally and therefore no transposition has to be done.
	815	IF ( pdims(1) /= 1 ) THEN
[1106]	816
	817	#if defined( __parallel )
[1]	818	!
	819	!-- Reorder input array for transposition
[1111]	820	!$OMP PARALLEL PRIVATE ( i, j, k, l, zs )
[683]	821	!$OMP DO
[1216]	822	!$acc data copyout( work )
[1]	823	DO l = 0, pdims(1) - 1
[1003]	824	zs = 1 + l * ( nzt_y - nzb_y + 1 )
[1111]	825	!$acc kernels present( f_in, work )
[1003]	826	DO j = nys_z, nyn_z
	827	DO k = zs, zs + nzt_y - nzb_y
	828	DO i = nxl_z, nxr_z
[1111]	829	work(i,k-zs+1,j,l) = f_in(i,j,k)
[1]	830	ENDDO
	831	ENDDO
	832	ENDDO
[1111]	833	!$acc end kernels
[1]	834	ENDDO
[1216]	835	!$acc end data
[683]	836	!$OMP END PARALLEL
[1]	837
	838	!
	839	!-- Transpose array
	840	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]	841	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1111]	842	CALL MPI_ALLTOALL( work(nxl_z,1,nys_z,0), sendrecvcount_yz, MPI_REAL, &
	843	f_inv(nxl_y,nzb_y,0), sendrecvcount_yz, MPI_REAL, &
[1]	844	comm1dx, ierr )
[1111]	845	!$acc update device( f_inv )
[1]	846	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
[1106]	847	#endif
[1]	848
	849	ELSE
	850	!
[1106]	851	!-- Reorder the array in the same way like ALLTOALL did it
[683]	852	!$OMP PARALLEL PRIVATE ( i, j, k )
	853	!$OMP DO
[1216]	854	!$acc kernels present( f_in, f_inv )
[1003]	855	DO k = nzb_y, nzt_y
	856	DO j = 0, ny
	857	DO i = nxl_y, nxr_y
[164]	858	f_inv(i,k,j) = f_in(i,j,k)
	859	ENDDO
	860	ENDDO
	861	ENDDO
[1111]	862	!$acc end kernels
[683]	863	!$OMP END PARALLEL
[1106]	864
	865	ENDIF
	866
[1]	867	END SUBROUTINE transpose_zy
	868
	869
[1216]	870	SUBROUTINE transpose_zyd( f_in, f_out )
[1]	871
	872	!------------------------------------------------------------------------------!
	873	! Description:
	874	! ------------
	875	! Transposition of input array (f_in) from z to y. For the input array, all
	876	! elements along z reside on the same PE, while after transposition, all
	877	! elements along y reside on the same PE.
	878	! This is a direct transposition for arrays with indices in regular order
	879	! (k,j,i) (cf. transpose_zy).
	880	!------------------------------------------------------------------------------!
	881
	882	USE cpulog
	883	USE indices
	884	USE interfaces
	885	USE pegrid
	886	USE transpose_indices
	887
	888	IMPLICIT NONE
	889
	890	INTEGER :: i, j, k, l, m, ys
	891
[1003]	892	REAL :: f_in(1:nz,nys:nyn,nxl:nxr), f_inv(nys:nyn,nxl:nxr,1:nz), &
	893	f_out(0:ny,nxl_yd:nxr_yd,nzb_yd:nzt_yd), &
[164]	894	work(nnxnnynnz)
[1]	895
	896	#if defined( __parallel )
	897
	898	!
	899	!-- Rearrange indices of input array in order to make data to be send
	900	!-- by MPI contiguous
[1003]	901	DO i = nxl, nxr
	902	DO j = nys, nyn
	903	DO k = 1, nz
[164]	904	f_inv(j,i,k) = f_in(k,j,i)
[1]	905	ENDDO
	906	ENDDO
	907	ENDDO
	908
	909	!
	910	!-- Move data to different array, because memory location of work1 is
	911	!-- needed further below (work1 = work2).
	912	!-- If the PE grid is one-dimensional along x, only local reordering
	913	!-- of the data is necessary and no transposition has to be done.
	914	IF ( pdims(2) == 1 ) THEN
[1003]	915	DO k = 1, nz
	916	DO i = nxl, nxr
	917	DO j = nys, nyn
[164]	918	f_out(j,i,k) = f_inv(j,i,k)
[1]	919	ENDDO
	920	ENDDO
	921	ENDDO
	922	RETURN
	923	ENDIF
	924
	925	!
	926	!-- Transpose array
	927	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
[622]	928	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
[1]	929	CALL MPI_ALLTOALL( f_inv(nys,nxl,1), sendrecvcount_zyd, MPI_REAL, &
[164]	930	work(1), sendrecvcount_zyd, MPI_REAL, &
[1]	931	comm1dy, ierr )
	932	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
	933
	934	!
	935	!-- Reorder transposed array
	936	m = 0
	937	DO l = 0, pdims(2) - 1
	938	ys = 0 + l * nny
[1003]	939	DO k = nzb_yd, nzt_yd
	940	DO i = nxl_yd, nxr_yd
[1]	941	DO j = ys, ys + nny - 1
	942	m = m + 1
[164]	943	f_out(j,i,k) = work(m)
[1]	944	ENDDO
	945	ENDDO
	946	ENDDO
	947	ENDDO
	948
	949	#endif
	950
	951	END SUBROUTINE transpose_zyd

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: palm/trunk/SOURCE/transpose.f90 @ 1257

Download in other formats: