Home

Context Navigation

poisfft_hybrid.f90 @ 1111

Last change on this file since 1111 was 1111, checked in by raasch, 11 years ago

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:

resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:

bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

Property svn:keywords set to Id

File size: 33.3 KB

Line
1	MODULE poisfft_hybrid_mod
2
3	!--------------------------------------------------------------------------------!
4	! This file is part of PALM.
5	!
6	! PALM is free software: you can redistribute it and/or modify it under the terms
7	! of the GNU General Public License as published by the Free Software Foundation,
8	! either version 3 of the License, or (at your option) any later version.
9	!
10	! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11	! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12	! A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13	!
14	! You should have received a copy of the GNU General Public License along with
15	! PALM. If not, see <http://www.gnu.org/licenses/>.
16	!
17	! Copyright 1997-2012 Leibniz University Hannover
18	!--------------------------------------------------------------------------------!
19	!
20	! Current revisions:
21	! -----------------
22	! poisfft_hybrid_ini is now called internally from poisfft_hybrid,
23	! ibc_p_b = 2 removed
24	!
25	! Former revisions:
26	! -----------------
27	! $Id: poisfft_hybrid.f90 1111 2013-03-08 23:54:10Z raasch $
28	!
29	! 1106 2013-03-04 05:31:38Z raasch
30	! calls of fft_x, fft_y replaced by fft_x_1d, fft_y_1d
31	!
32	! 1036 2012-10-22 13:43:42Z raasch
33	! code put under GPL (PALM 3.9)
34	!
35	! 1013 2012-09-21 07:03:55Z raasch
36	! FLOAT type conversion replaced by REAL
37	!
38	! 809 2012-01-30 13:32:58Z maronga
39	! Bugfix: replaced .AND. and .NOT. with && and ! in the preprocessor directives
40	!
41	! 807 2012-01-25 11:53:51Z maronga
42	! New cpp directive "__check" implemented which is used by check_namelist_files
43	! (most of the code is unneeded by check_namelist_files).
44	!
45	! 667 2010-12-23 12:06:00Z suehring/gryschka
46	! ddzu replaced by ddzu_pres due to changes in zu(0)
47	!
48	! 415 2009-12-15 10:26:23Z raasch
49	! Dimension of array stat in cascade change to prevent type problems with___
50	! mpi2 libraries
51	!
52	! 274 2009-03-26 15:11:21Z heinze
53	! Output of messages replaced by message handling routine.
54	!
55	! Feb. 2007
56	! RCS Log replace by Id keyword, revision history cleaned up
57	!
58	! Revision 1.11 2004/04/30 12:43:14 raasch
59	! Renaming of fft routines, additional argument in calls of fft_y_m
60	!
61	! Revision 1.2 2002/12/19 16:08:31 raasch
62	! Preprocessor directive KKMP introduced (OMP does NOT work),
63	! array tri will be a shared array in OpenMP loop, to get better cache
64	! utilization, the i index (x-direction) will be executed in stride
65	! "istride" as outer loop and in a shorter inner loop,
66	! overlapping of computation and communication realized by new routine
67	! poisfft_hybrid_nodes, name of old routine poisfft_hybrid changed to
68	! poisfft_hybrid_omp, STOP statement replaced by call of subroutine local_stop
69	!
70	!
71	! Description:
72	! ------------
73	! Solution of the Poisson equation with a 2D spectral method.
74	! Hybrid version for parallel computers using a 1D domain decomposition,
75	! realized with MPI, along x and parallelization with OPEN-MP along y
76	! (routine poisfft_hybrid_omp). In a second version (poisfft_hybrid_nodes),
77	! optimization is realized by overlapping of computation and communication
78	! and by simultaneously executing as many communication calls as switches
79	! per logical partition (LPAR) are available. This version comes into
80	! effect if more than one node is used and if the environment variable
81	! tasks_per_node is set in a way that it can be devided by switch_per_lpar
82	! without any rest.
83	!
84	! WARNING: In case of OpenMP, there are problems with allocating large
85	! arrays in parallel regions.
86	!
87	! Copyright Klaus Ketelsen / Siegfried Raasch May 2002
88	!------------------------------------------------------------------------------!
89
90	USE fft_xy
91	USE indices
92	USE pegrid
93
94	IMPLICIT NONE
95
96	INTEGER, PARAMETER :: switch_per_lpar = 2
97
98	INTEGER, SAVE :: nxl_a, nxr_a, & ! total x dimension
99	nxl_p, nxr_p, & ! partial x dimension
100	nys_a, nyn_a, & ! total y dimension
101	nys_p, nyn_p, & ! partial y dimension
102
103	npe_s, & ! total number of PEs for solver
104	nwords, & ! number of points to be exchanged
105	! with MPI_ALLTOALL
106	n_omp_threads ! number of OpenMP threads
107
108	!
109	!-- Variables for multi node version (cluster version) using routine
110	!-- poisfft_hybrid_nodes
111	INTEGER, SAVE :: comm_nodes, & ! communicater nodes
112	comm_node_all, & ! communicater all PEs node version
113	comm_tasks, & ! communicater tasks
114	me, me_node, me_task,& ! identity of this PE
115	nodes, & ! number of nodes
116	tasks_per_logical_node = -1 ! default no cluster
117
118	LOGICAL, SAVE :: poisfft_initialized = .FALSE.
119
120
121	PRIVATE
122
123
124	#if ! defined ( __check )
125	PUBLIC poisfft_hybrid, poisfft_hybrid_ini
126
127
128	!
129	!-- Public interfaces
130	INTERFACE poisfft_hybrid_ini
131	MODULE PROCEDURE poisfft_hybrid_ini
132	END INTERFACE poisfft_hybrid_ini
133
134	INTERFACE poisfft_hybrid
135	MODULE PROCEDURE poisfft_hybrid
136	END INTERFACE poisfft_hybrid
137
138	!
139	!-- Private interfaces
140	INTERFACE poisfft_hybrid_omp
141	MODULE PROCEDURE poisfft_hybrid_omp
142	END INTERFACE poisfft_hybrid_omp
143
144	INTERFACE poisfft_hybrid_omp_vec
145	MODULE PROCEDURE poisfft_hybrid_omp_vec
146	END INTERFACE poisfft_hybrid_omp_vec
147
148	INTERFACE poisfft_hybrid_nodes
149	MODULE PROCEDURE poisfft_hybrid_nodes
150	END INTERFACE poisfft_hybrid_nodes
151
152	INTERFACE tridia_hybrid
153	MODULE PROCEDURE tridia_hybrid
154	END INTERFACE tridia_hybrid
155
156	INTERFACE cascade
157	MODULE PROCEDURE cascade
158	END INTERFACE cascade
159	#else
160	PUBLIC poisfft_hybrid_ini
161
162	!
163	!-- Public interfaces
164	INTERFACE poisfft_hybrid_ini
165	MODULE PROCEDURE poisfft_hybrid_ini
166	END INTERFACE poisfft_hybrid_ini
167	#endif
168
169	CONTAINS
170
171
172	SUBROUTINE poisfft_hybrid_ini
173
174	USE control_parameters
175	USE pegrid
176
177	IMPLICIT NONE
178
179	CHARACTER(LEN=8) :: cdummy
180	INTEGER :: idummy, istat
181	INTEGER, DIMENSION(2) :: coords, dims
182
183	LOGICAL, DIMENSION(2) :: period = .false., re_dims
184
185
186	!
187	!-- Set the internal index values for the hybrid solver
188	#if defined( __parallel )
189	npe_s = pdims(1)
190	#else
191	npe_s = 1
192	#endif
193	nxl_a = 0
194	nxr_a = nx
195	nxl_p = 0
196	nxr_p = ( ( nx+1 ) / npe_s ) - 1
197	nys_a = nys
198	nyn_a = nyn
199	nys_p = 0
200	nyn_p = ( ( ny+1 ) / npe_s ) - 1
201
202	nwords = ( nxr_p-nxl_p+1 ) * nz * ( nyn_p-nys_p+1 )
203
204	#if defined( __KKMP ) && ! defined ( __check )
205	CALL LOCAL_GETENV( 'OMP_NUM_THREADS', 15, cdummy, idummy )
206	READ ( cdummy, '(I8)' ) n_omp_threads
207	IF ( n_omp_threads > 1 ) THEN
208	WRITE( message_string, * ) 'Number of OpenMP threads = ', &
209	n_omp_threads
210	CALL message( 'poisfft_hybrid_ini', 'PA0280', 0, 0, 0, 6, 0 )
211	ENDIF
212	#else
213	n_omp_threads = 1
214	#endif
215	!
216	!-- Initialize the one-dimensional FFT routines
217	CALL fft_init
218
219	!
220	!-- Setup for multi node version (poisfft_hybrid_nodes)
221	IF ( n_omp_threads == 1 .AND. &
222	( host(1:4) == 'ibmh' .OR. host(1:4) == 'ibmb' ) ) THEN
223
224	IF ( tasks_per_node /= -9999 ) THEN
225	!
226	!-- Multi node version requires that the available number of
227	!-- switches per logical partition must be an integral divisor
228	!-- of the chosen number of tasks per node
229	IF ( MOD( tasks_per_node, switch_per_lpar ) == 0 ) THEN
230	!
231	!-- Set the switch which decides about usage of the multi node
232	!-- version
233	IF ( tasks_per_node / switch_per_lpar > 1 .AND. &
234	numprocs > tasks_per_node ) THEN
235	tasks_per_logical_node = tasks_per_node / switch_per_lpar
236	ENDIF
237
238	IF ( tasks_per_logical_node > -1 ) THEN
239
240	WRITE( message_string, * ) 'running optimized ', &
241	'multinode version', &
242	'&switch_per_lpar = ', &
243	switch_per_lpar, &
244	'&tasks_per_lpar = ', &
245	tasks_per_node, &
246	'tasks_per_logical_node = ', &
247	tasks_per_logical_node
248	CALL message( 'poisfft_hybrid_ini', 'PA0281', 0, 0, 0, 6, 0 )
249
250	ENDIF
251
252	ENDIF
253	ENDIF
254	ENDIF
255
256	!
257	!-- Determine sub-topologies for multi node version
258	IF ( tasks_per_logical_node >= 2 ) THEN
259
260	#if defined( __parallel ) && ! defined ( __check )
261	nodes = ( numprocs + tasks_per_logical_node - 1 ) / &
262	tasks_per_logical_node
263	dims(1) = nodes
264	dims(2) = tasks_per_logical_node
265
266	CALL MPI_CART_CREATE( comm2d, 2, dims, period, .FALSE., &
267	comm_node_all, istat )
268	CALL MPI_COMM_RANK( comm_node_all, me, istat )
269
270	re_dims(1) = .TRUE.
271	re_dims(2) = .FALSE.
272	CALL MPI_CART_SUB( comm_node_all, re_dims, comm_nodes, istat )
273	CALL MPI_COMM_RANK( comm_nodes, me_node, istat )
274
275	re_dims(1) = .FALSE.
276	re_dims(2) = .TRUE.
277	CALL MPI_CART_SUB( comm_node_all, re_dims, comm_tasks, istat )
278	CALL MPI_COMM_RANK( comm_tasks, me_task, istat )
279
280	! write(0,*) 'who am i',myid,me,me_node,me_task,nodes,&
281	! tasks_per_logical_node
282	#elif ! defined( __parallel )
283	message_string = 'parallel environment (MPI) required'
284	CALL message( 'poisfft_hybrid_ini', 'PA0282', 1, 2, 0, 6, 0 )
285	#endif
286	ENDIF
287
288	poisfft_initialized = .TRUE.
289
290	END SUBROUTINE poisfft_hybrid_ini
291
292	#if ! defined ( __check )
293	SUBROUTINE poisfft_hybrid( ar )
294
295	USE control_parameters
296	USE interfaces
297
298	IMPLICIT NONE
299
300	REAL, DIMENSION(1:nz,nys:nyn,nxl:nxr) :: ar
301
302	IF ( .NOT. poisfft_initialized ) CALL poisfft_hybrid_ini
303
304	IF ( host(1:3) == 'nec' ) THEN
305	CALL poisfft_hybrid_omp_vec( ar )
306	ELSE
307	IF ( tasks_per_logical_node == -1 ) THEN
308	CALL poisfft_hybrid_omp( ar )
309	ELSE
310	CALL poisfft_hybrid_nodes( ar )
311	ENDIF
312	ENDIF
313
314	END SUBROUTINE poisfft_hybrid
315
316
317	SUBROUTINE poisfft_hybrid_omp ( ar )
318
319	USE cpulog
320	USE interfaces
321
322	IMPLICIT NONE
323
324	INTEGER, PARAMETER :: istride = 4 ! stride of i loop
325	INTEGER :: i, ii, ir, iei, iouter, istat, j, jj, k, m, n, jthread
326
327	REAL, DIMENSION(1:nz,nys:nyn,nxl:nxr) :: ar
328
329	REAL, DIMENSION(0:nx) :: fftx_ar
330	REAL, DIMENSION(0:ny,istride) :: ffty_ar
331
332	REAL, DIMENSION(0:nx,nz) :: tri_ar
333
334	REAL, DIMENSION(nxl_p:nxr_p,nz,nys_p:nyn_p,npe_s) :: work1, work2
335	#if defined( __KKMP )
336	INTEGER :: omp_get_thread_num
337	REAL, DIMENSION(:,:,:,:), ALLOCATABLE :: tri
338	ALLOCATE( tri(5,0:nx,0:nz-1,n_omp_threads ) )
339	#else
340	REAL, DIMENSION(5,0:nx,0:nz-1,1) :: tri
341	#endif
342
343
344	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_omp', 'start' )
345
346	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'start' )
347
348	!$OMP PARALLEL PRIVATE (i,iouter,ii,ir,iei,j,k,m,n,ffty_ar)
349	!$OMP DO
350	!
351	!-- Store grid points to be transformed on a 1d-array, do the fft
352	!-- and sample the results on a 4d-array
353	DO iouter = nxl_p, nxr_p, istride ! stride loop, better cache
354	iei = MIN( iouter+istride-1, nxr_p )
355	DO k = 1, nz
356
357	DO i = iouter, iei
358	ii = nxl + i
359	ir = i - iouter + 1
360
361	DO j = nys_a, nyn_a
362	ffty_ar(j,ir) = ar(k,j,ii)
363	ENDDO
364
365	CALL fft_y_1d( ffty_ar(:,ir), 'forward' )
366	ENDDO
367
368	m = nys_a
369	DO n = 1, npe_s
370	DO j = nys_p, nyn_p
371	DO i = iouter, iei
372	ir = i - iouter + 1
373	work1(i,k,j,n) = ffty_ar(m,ir)
374	ENDDO
375	m = m+1
376	ENDDO
377	ENDDO
378
379	ENDDO
380	ENDDO
381	!$OMP END PARALLEL
382
383	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'pause' )
384
385	#if defined( __parallel )
386	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
387
388	CALL MPI_ALLTOALL( work1(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
389	work2(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
390	comm2d, istat )
391
392	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
393	#else
394	work2 = work1
395	#endif
396
397	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'start' )
398
399	#if defined( __KKMP )
400	!$OMP PARALLEL PRIVATE (i,j,jj,k,m,n,fftx_ar,tri_ar,jthread)
401	!$OMP DO
402	DO j = nys_p, nyn_p
403	jthread = omp_get_thread_num() + 1
404	#else
405	DO j = nys_p, nyn_p
406	jthread = 1
407	#endif
408	DO k = 1, nz
409
410	m = nxl_a
411	DO n = 1, npe_s
412	DO i = nxl_p, nxr_p
413	fftx_ar(m) = work2(i,k,j,n)
414	m = m+1
415	ENDDO
416	ENDDO
417
418	CALL fft_x_1d( fftx_ar, 'forward' )
419
420	DO i = nxl_a, nxr_a
421	tri_ar(i,k) = fftx_ar(i)
422	ENDDO
423
424	ENDDO
425
426	jj = myid * (nyn_p-nys_p+1) + j
427	CALL tridia_hybrid( jj, tri_ar, tri(:,:,:,jthread))
428
429	DO k = 1, nz
430	DO i = nxl_a, nxr_a
431	fftx_ar(i) = tri_ar (i,k)
432	ENDDO
433
434	CALL fft_x_1d( fftx_ar, 'backward' )
435
436	m = nxl_a
437	DO n = 1, npe_s
438	DO i = nxl_p, nxr_p
439	work2(i,k,j,n) = fftx_ar(m)
440	m = m+1
441	ENDDO
442	ENDDO
443
444	ENDDO
445	ENDDO
446	#if defined( __KKMP )
447	!$OMP END PARALLEL
448	#endif
449
450	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'stop' )
451
452	#if defined( __parallel )
453	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
454	nwords = (nxr_p-nxl_p+1) * nz * (nyn_p-nys_p+1)
455
456	CALL MPI_ALLTOALL( work2(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
457	work1(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
458	comm2d, istat )
459
460	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
461	#else
462	work1 = work2
463	#endif
464
465	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'continue' )
466
467	!$OMP PARALLEL PRIVATE (i,iouter,ii,ir,iei,j,k,m,n,ffty_ar)
468	!$OMP DO
469	DO iouter = nxl_p, nxr_p, istride
470	iei = MIN( iouter+istride-1, nxr_p )
471	DO k = 1, nz
472
473	m = nys_a
474	DO n = 1, npe_s
475	DO j = nys_p, nyn_p
476	DO i = iouter, iei
477	ir = i - iouter + 1
478	ffty_ar(m,ir) = work1 (i,k,j,n)
479	ENDDO
480	m = m+1
481	ENDDO
482	ENDDO
483
484	DO i = iouter, iei
485	ii = nxl + i
486	ir = i - iouter + 1
487	CALL fft_y_1d( ffty_ar(:,ir), 'backward' )
488
489	DO j = nys_a, nyn_a
490	ar(k,j,ii) = ffty_ar(j,ir)
491	ENDDO
492	ENDDO
493
494	ENDDO
495	ENDDO
496	!$OMP END PARALLEL
497
498	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'stop' )
499
500	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_omp', 'stop' )
501
502	#if defined( __KKMP )
503	DEALLOCATE( tri )
504	#endif
505
506	END SUBROUTINE poisfft_hybrid_omp
507
508
509	SUBROUTINE poisfft_hybrid_omp_vec ( ar )
510
511	USE cpulog
512	USE interfaces
513
514	IMPLICIT NONE
515
516	INTEGER, PARAMETER :: istride = 4 ! stride of i loop
517	INTEGER :: i, ii, ir, iei, iouter, istat, j, jj, k, m, n, jthread
518
519	REAL, DIMENSION(0:nx,nz) :: tri_ar
520
521	REAL, DIMENSION(1:nz,nys:nyn,nxl:nxr) :: ar
522
523	REAL, DIMENSION(0:ny+3,nz,nxl_p:nxr_p) :: ffty_ar3
524	REAL, DIMENSION(0:nx+3,nz,nys_p:nyn_p) :: fftx_ar3
525
526	REAL, DIMENSION(nxl_p:nxr_p,nz,nys_p:nyn_p,npe_s) :: work1, work2
527	#if defined( __KKMP )
528	INTEGER :: omp_get_thread_num
529	REAL, DIMENSION(:,:,:,:), ALLOCATABLE :: tri
530	ALLOCATE( tri(5,0:nx,0:nz-1,n_omp_threads ) )
531	#else
532	REAL, DIMENSION(5,0:nx,0:nz-1,1) :: tri
533	#endif
534
535
536	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_vec', 'start' )
537
538	CALL cpu_log( log_point_s(7), 'fft_y_m', 'start' )
539
540	!$OMP PARALLEL PRIVATE (i,j,k,m,n)
541	!$OMP DO
542	!
543	!-- Store grid points to be transformed on a 1d-array, do the fft
544	!-- and sample the results on a 4d-array
545	DO i = nxl_p, nxr_p
546
547	DO j = nys_a, nyn_a
548	DO k = 1, nz
549	ffty_ar3(j,k,i) = ar(k,j,i+nxl)
550	ENDDO
551	ENDDO
552
553	CALL fft_y_m( ffty_ar3(:,:,i), ny+3, 'forward' )
554	ENDDO
555
556	!$OMP DO
557	DO k = 1, nz
558	m = nys_a
559	DO n = 1, npe_s
560	DO j = nys_p, nyn_p
561	DO i = nxl_p, nxr_p
562	work1(i,k,j,n) = ffty_ar3(m,k,i)
563	ENDDO
564	m = m+1
565	ENDDO
566	ENDDO
567	ENDDO
568	!$OMP END PARALLEL
569
570	CALL cpu_log( log_point_s(7), 'fft_y_m', 'pause' )
571
572	#if defined( __parallel )
573	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
574	CALL MPI_ALLTOALL( work1(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
575	work2(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
576	comm2d, istat )
577	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
578	#else
579	work2 = work1
580	#endif
581
582	CALL cpu_log( log_point_s(33), 'fft_x_m + tridia', 'start' )
583
584	#if defined( __KKMP )
585	!$OMP PARALLEL PRIVATE (i,j,jj,k,m,n,tri_ar,jthread)
586	!$OMP DO
587	DO j = nys_p, nyn_p
588	jthread = omp_get_thread_num() + 1
589	#else
590	DO j = nys_p, nyn_p
591	jthread = 1
592	#endif
593	DO k = 1, nz
594
595	m = nxl_a
596	DO n = 1, npe_s
597	DO i = nxl_p, nxr_p
598	fftx_ar3(m,k,j) = work2(i,k,j,n)
599	m = m+1
600	ENDDO
601	ENDDO
602	ENDDO
603
604	CALL fft_x_m( fftx_ar3(:,:,j), 'forward' )
605
606	DO k = 1, nz
607	DO i = nxl_a, nxr_a
608	tri_ar(i,k) = fftx_ar3(i,k,j)
609	ENDDO
610	ENDDO
611
612	jj = myid * (nyn_p-nys_p+1) + j
613	CALL tridia_hybrid( jj, tri_ar, tri(:,:,:,jthread))
614
615	DO k = 1, nz
616	DO i = nxl_a, nxr_a
617	fftx_ar3(i,k,j) = tri_ar (i,k)
618	ENDDO
619	ENDDO
620
621	CALL fft_x_m( fftx_ar3(:,:,j), 'backward' )
622
623	DO k = 1, nz
624	m = nxl_a
625	DO n = 1, npe_s
626	DO i = nxl_p, nxr_p
627	work2(i,k,j,n) = fftx_ar3(m,k,j)
628	m = m+1
629	ENDDO
630	ENDDO
631	ENDDO
632
633	ENDDO
634	#if defined( __KKMP )
635	!$OMP END PARALLEL
636	#endif
637
638	CALL cpu_log( log_point_s(33), 'fft_x_m + tridia', 'stop' )
639
640	#if defined( __parallel )
641	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
642	nwords = (nxr_p-nxl_p+1) * nz * (nyn_p-nys_p+1)
643	CALL MPI_ALLTOALL( work2(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
644	work1(nxl_p,1,nys_p,1), nwords, MPI_REAL, &
645	comm2d, istat )
646	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
647	#else
648	work1 = work2
649	#endif
650
651	CALL cpu_log( log_point_s(7), 'fft_y_m', 'continue' )
652
653	!$OMP PARALLEL PRIVATE (i,iouter,ii,ir,iei,j,k,m,n)
654	!$OMP DO
655	DO k = 1, nz
656	m = nys_a
657	DO n = 1, npe_s
658	DO j = nys_p, nyn_p
659	DO i = nxl_p, nxr_p
660	ffty_ar3(m,k,i) = work1(i,k,j,n)
661	ENDDO
662	m = m+1
663	ENDDO
664	ENDDO
665	ENDDO
666
667	!$OMP DO
668	DO i = nxl_p, nxr_p
669	CALL fft_y_m( ffty_ar3(:,:,i), ny+3, 'backward' )
670	DO j = nys_a, nyn_a
671	DO k = 1, nz
672	ar(k,j,i+nxl) = ffty_ar3(j,k,i)
673	ENDDO
674	ENDDO
675	ENDDO
676	!$OMP END PARALLEL
677
678	CALL cpu_log( log_point_s(7), 'fft_y_m', 'stop' )
679
680	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_vec', 'stop' )
681
682	#if defined( __KKMP )
683	DEALLOCATE( tri )
684	#endif
685
686	END SUBROUTINE poisfft_hybrid_omp_vec
687
688
689	SUBROUTINE poisfft_hybrid_nodes ( ar )
690
691	USE cpulog
692	USE interfaces
693
694	IMPLICIT NONE
695
696	INTEGER, PARAMETER :: istride = 4 ! stride of i loop
697	INTEGER :: i, iei, ii, iouter, ir, istat, j, jj, k, m, &
698	n, nn, nt, nw1, nw2
699
700	REAL, DIMENSION(1:nz,nys:nyn,nxl:nxr) :: ar
701
702	REAL, DIMENSION(0:nx) :: fftx_ar
703	REAL, DIMENSION(0:ny,istride) :: ffty_ar
704
705	REAL, DIMENSION(0:nx,nz) :: tri_ar
706
707	REAL, DIMENSION(nxl_p:nxr_p,nz,tasks_per_logical_node, &
708	nodes,nys_p:nyn_p) :: work1,work2
709	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
710
711
712	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_nodes', 'start' )
713
714	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'start' )
715
716	!
717	!-- Store grid points to be transformed on a 1d-array, do the fft
718	!-- and sample the results on a 4d-array
719	DO iouter = nxl_p, nxr_p, istride ! stride loop, better cache
720	iei = MIN( iouter+istride-1, nxr_p )
721	DO k = 1, nz
722
723	DO i = iouter, iei
724	ii = nxl + i
725	ir = i - iouter + 1
726
727	DO j = nys_a, nyn_a
728	ffty_ar(j,ir) = ar(k,j,ii)
729	ENDDO
730
731	CALL fft_y_1d( ffty_ar(:,ir), 'forward' )
732	ENDDO
733
734	m = nys_a
735	DO nn = 1, nodes
736	DO nt = 1, tasks_per_logical_node
737	DO j = nys_p, nyn_p
738	DO i = iouter, iei
739	ir = i - iouter + 1
740	work1(i,k,nt,nn,j) = ffty_ar(m,ir)
741	ENDDO
742	m = m+1
743	ENDDO
744	ENDDO
745	ENDDO
746
747	ENDDO
748	ENDDO
749
750	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'pause' )
751
752	CALL cpu_log( log_point_s(32), 'alltoall_task', 'start' )
753	nw1 = SIZE( work1, 1 ) * SIZE( work1, 2 )
754	DO nn = 1, nodes
755	DO j = nys_p, nyn_p
756	#if defined( __parallel )
757	CALL MPI_ALLTOALL( work1(nxl_p,1,1,nn,j), nw1, MPI_REAL, &
758	work2(nxl_p,1,1,nn,j), nw1, MPI_REAL, &
759	comm_tasks, istat )
760	#endif
761	ENDDO
762	ENDDO
763	CALL cpu_log( log_point_s(32), 'alltoall_task', 'stop' )
764
765
766	DO j = nys_p, nyn_p
767
768	CALL cascade( 1, j, nys_p, nyn_p )
769	nw2 = nw1 * SIZE( work1, 3 )
770	CALL cpu_log( log_point_s(37), 'alltoall_node', 'start' )
771	#if defined( __parallel )
772	CALL MPI_ALLTOALL( work2(nxl_p,1,1,1,j), nw2, MPI_REAL, &
773	work1(nxl_p,1,1,1,j), nw2, MPI_REAL, &
774	comm_nodes, istat )
775	#endif
776	CALL cpu_log( log_point_s(37), 'alltoall_node', 'pause' )
777	CALL cascade( 2, j, nys_p, nyn_p )
778
779	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'start' )
780	DO k = 1, nz
781
782	m = nxl_a
783	DO nn = 1, nodes
784	DO nt = 1, tasks_per_logical_node
785	DO i = nxl_p, nxr_p
786	fftx_ar(m) = work1(i,k,nt,nn,j)
787	m = m+1
788	ENDDO
789	ENDDO
790	ENDDO
791
792	CALL fft_x_1d( fftx_ar, 'forward' )
793
794	DO i = nxl_a, nxr_a
795	tri_ar(i,k) = fftx_ar(i)
796	ENDDO
797
798	ENDDO
799
800	jj = myid * (nyn_p-nys_p+1) + j
801	CALL tridia_hybrid( jj, tri_ar, tri(:,:,:) )
802
803	DO k = 1, nz
804	DO i = nxl_a, nxr_a
805	fftx_ar(i) = tri_ar(i,k)
806	ENDDO
807
808	CALL fft_x_1d( fftx_ar, 'backward' )
809
810	m = nxl_a
811	DO nn = 1, nodes
812	DO nt = 1, tasks_per_logical_node
813	DO i = nxl_p, nxr_p
814	work1(i,k,nt,nn,j) = fftx_ar(m)
815	m = m+1
816	ENDDO
817	ENDDO
818	ENDDO
819	ENDDO
820
821	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'stop' )
822	nw2 = nw1 * SIZE( work1, 3 )
823	CALL cpu_log( log_point_s(37), 'alltoall_node', 'continue' )
824	#if defined( __parallel )
825	CALL MPI_ALLTOALL( work1(nxl_p,1,1,1,j), nw2, MPI_REAL, &
826	work2(nxl_p,1,1,1,j), nw2, MPI_REAL, &
827	comm_nodes, istat )
828	#endif
829	CALL cpu_log( log_point_s(37), 'alltoall_node', 'stop' )
830
831	ENDDO
832
833	CALL cpu_log( log_point_s(32), 'alltoall_task', 'start' )
834	DO nn = 1, nodes
835	DO j = nys_p, nyn_p
836	#if defined( __parallel )
837	CALL MPI_ALLTOALL( work2(nxl_p,1,1,nn,j), nw1, MPI_REAL, &
838	work1(nxl_p,1,1,nn,j), nw1, MPI_REAL, &
839	comm_tasks, istat )
840	#endif
841	ENDDO
842	ENDDO
843	CALL cpu_log( log_point_s(32), 'alltoall_task', 'stop' )
844
845	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'continue' )
846
847	DO iouter = nxl_p, nxr_p, istride
848	iei = MIN( iouter+istride-1, nxr_p )
849	DO k = 1, nz
850
851	m = nys_a
852	DO nn = 1, nodes
853	DO nt = 1, tasks_per_logical_node
854	DO j = nys_p, nyn_p
855	DO i = iouter, iei
856	ir = i - iouter + 1
857	ffty_ar(m,ir) = work1(i,k,nt,nn,j)
858	ENDDO
859	m = m+1
860	ENDDO
861	ENDDO
862	ENDDO
863
864	DO i = iouter, iei
865	ii = nxl + i
866	ir = i - iouter + 1
867	CALL fft_y_1d( ffty_ar(:,ir), 'backward' )
868
869	DO j = nys_a, nyn_a
870	ar(k,j,ii) = ffty_ar(j,ir)
871	ENDDO
872	ENDDO
873
874	ENDDO
875	ENDDO
876
877	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'stop' )
878
879	CALL cpu_log( log_point_s(30), 'poisfft_hybrid_nodes', 'stop' )
880
881	END SUBROUTINE poisfft_hybrid_nodes
882
883
884
885	SUBROUTINE tridia_hybrid( j, ar, tri )
886
887	USE arrays_3d
888	USE control_parameters
889	USE grid_variables
890
891	IMPLICIT NONE
892
893	INTEGER :: i, j, k, nnyh
894	REAL, DIMENSION(0:nx,nz) :: ar
895	REAL, DIMENSION(0:nx,0:nz-1) :: ar1
896	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
897
898	nnyh = (ny+1) / 2
899
900	tri = 0.0
901	!
902	!-- Define constant elements of the tridiagonal matrix.
903	DO k = 0, nz-1
904	DO i = 0,nx
905	tri(2,i,k) = ddzu_pres(k+1) * ddzw(k+1)
906	tri(3,i,k) = ddzu_pres(k+2) * ddzw(k+1)
907	ENDDO
908	ENDDO
909
910	IF ( j <= nnyh ) THEN
911	CALL maketri_hybrid( j )
912	ELSE
913	CALL maketri_hybrid( ny+1-j)
914	ENDIF
915	CALL zerleg_hybrid
916	CALL substi_hybrid( ar, tri )
917
918	CONTAINS
919
920	SUBROUTINE maketri_hybrid( j )
921
922	!----------------------------------------------------------------------!
923	! maketri !
924	! !
925	! computes the i- and j-dependent component of the matrix !
926	!----------------------------------------------------------------------!
927
928	USE constants
929
930	IMPLICIT NONE
931
932	INTEGER :: i, j, k, nnxh
933	REAL :: a, c
934
935	REAL, DIMENSION(0:nx) :: l
936
937
938	nnxh = (nx+1) / 2
939	!
940	!-- Provide the tridiagonal matrix for solution of the Poisson equation
941	!-- in Fourier space. The coefficients are computed following the method
942	!-- of Schmidt et al. (DFVLR-Mitteilung 84-15) --> departs from Stephan
943	!-- Siano's original version.
944	DO i = 0,nx
945	IF ( i >= 0 .AND. i < nnxh ) THEN
946	l(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * i ) / &
947	REAL( nx+1 ) ) ) / ( dx * dx ) + &
948	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
949	REAL( ny+1 ) ) ) / ( dy * dy )
950	ELSEIF ( i == nnxh ) THEN
951	l(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * ( nx+1-i ) ) / &
952	REAL( nx+1 ) ) ) / ( dx * dx ) + &
953	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
954	REAL(ny+1) ) ) / ( dy * dy )
955	ELSE
956	l(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * ( nx+1-i ) ) / &
957	REAL( nx+1 ) ) ) / ( dx * dx ) + &
958	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
959	REAL( ny+1 ) ) ) / ( dy * dy )
960	ENDIF
961	ENDDO
962
963	DO k = 0,nz-1
964	DO i = 0, nx
965	a = -1.0 * ddzu_pres(k+2) * ddzw(k+1)
966	c = -1.0 * ddzu_pres(k+1) * ddzw(k+1)
967	tri(1,i,k) = a + c - l(i)
968	ENDDO
969	ENDDO
970	IF ( ibc_p_b == 1 ) THEN
971	DO i = 0,nx
972	tri(1,i,0) = tri(1,i,0) + tri(2,i,0)
973	ENDDO
974	ENDIF
975	IF ( ibc_p_t == 1 ) THEN
976	DO i = 0,nx
977	tri(1,i,nz-1) = tri(1,i,nz-1) + tri(3,i,nz-1)
978	ENDDO
979	ENDIF
980
981	END SUBROUTINE maketri_hybrid
982
983
984	SUBROUTINE zerleg_hybrid
985
986	!----------------------------------------------------------------------!
987	! zerleg !
988	! !
989	! Splitting of the tridiagonal matrix (Thomas algorithm) !
990	!----------------------------------------------------------------------!
991
992	USE indices
993
994	IMPLICIT NONE
995
996	INTEGER :: i, k
997
998	!
999	!-- Splitting
1000	DO i = 0, nx
1001	tri(4,i,0) = tri(1,i,0)
1002	ENDDO
1003	DO k = 1, nz-1
1004	DO i = 0,nx
1005	tri(5,i,k) = tri(2,i,k) / tri(4,i,k-1)
1006	tri(4,i,k) = tri(1,i,k) - tri(3,i,k-1) * tri(5,i,k)
1007	ENDDO
1008	ENDDO
1009
1010	END SUBROUTINE zerleg_hybrid
1011
1012	SUBROUTINE substi_hybrid( ar, tri )
1013
1014	!----------------------------------------------------------------------!
1015	! substi !
1016	! !
1017	! Substitution (Forward and Backward) (Thomas algorithm) !
1018	!----------------------------------------------------------------------!
1019
1020	IMPLICIT NONE
1021
1022	INTEGER :: i, j, k
1023	REAL, DIMENSION(0:nx,nz) :: ar
1024	REAL, DIMENSION(0:nx,0:nz-1) :: ar1
1025	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
1026
1027	!
1028	!-- Forward substitution
1029	DO i = 0, nx
1030	ar1(i,0) = ar(i,1)
1031	ENDDO
1032	DO k = 1, nz - 1
1033	DO i = 0,nx
1034	ar1(i,k) = ar(i,k+1) - tri(5,i,k) * ar1(i,k-1)
1035	ENDDO
1036	ENDDO
1037
1038	!
1039	!-- Backward substitution
1040	DO i = 0,nx
1041	ar(i,nz) = ar1(i,nz-1) / tri(4,i,nz-1)
1042	ENDDO
1043	DO k = nz-2, 0, -1
1044	DO i = 0,nx
1045	ar(i,k+1) = ( ar1(i,k) - tri(3,i,k) * ar(i,k+2) ) &
1046	/ tri(4,i,k)
1047	ENDDO
1048	ENDDO
1049
1050	END SUBROUTINE substi_hybrid
1051
1052	END SUBROUTINE tridia_hybrid
1053
1054
1055	SUBROUTINE cascade( loca, j, nys_p, nyn_p )
1056
1057	USE cpulog
1058
1059	IMPLICIT NONE
1060
1061	INTEGER :: ier, j, loca, nyn_p, nys_p, req, reqa(1)
1062	INTEGER, SAVE :: tag = 10
1063	#if defined( __parallel )
1064	INTEGER, DIMENSION(MPI_STATUS_SIZE) :: stat
1065	INTEGER, DIMENSION(MPI_STATUS_SIZE,1) :: stata
1066	#endif
1067
1068	REAL :: buf, buf1
1069
1070
1071	buf = 1.0
1072	buf1 = 1.1
1073	IF ( me_node == 0 ) THEN ! first node only
1074
1075	SELECT CASE ( loca )
1076
1077	CASE ( 1 ) ! before alltoall
1078
1079	IF( me_task > 0 ) THEN ! first task does not wait
1080	#if defined( __parallel )
1081	CALL MPI_SENDRECV( buf, 1, MPI_REAL, me_task-1, 0, &
1082	buf1, 1, MPI_REAL, me_task-1, 0, &
1083	comm_tasks, stat, ierr )
1084	#endif
1085	ELSEIF ( j > nys_p ) THEN
1086	req = 0
1087	tag = MOD( tag-10, 10 ) + 10
1088	#if defined( __parallel )
1089	CALL MPI_IRECV( buf, 1, MPI_REAL, tasks_per_logical_node-1,&
1090	tag, comm_tasks, req, ierr )
1091	reqa = req
1092	CALL MPI_WAITALL( 1, reqa, stata, ierr )
1093	#endif
1094	ENDIF
1095
1096	CASE ( 2 ) ! after alltoall
1097
1098	IF ( me_task < tasks_per_logical_node-1 ) THEN ! last task
1099	#if defined( __parallel )
1100	CALL MPI_SENDRECV( buf, 1, MPI_REAL, me_task+1, 0, &
1101	buf1, 1, MPI_REAL, me_task+1, 0, &
1102	comm_tasks, stat, ierr)
1103	#endif
1104	ELSEIF ( j < nyn_p ) THEN
1105	req = 0
1106	tag = MOD( tag-10, 10 ) + 10
1107	#if defined( __parallel )
1108	CALL MPI_ISEND( buf, 1, MPI_REAL, 0, tag, comm_tasks, req, &
1109	ierr )
1110	#endif
1111	ENDIF
1112
1113	END SELECT
1114
1115	ENDIF
1116
1117	END SUBROUTINE cascade
1118	#endif
1119	END MODULE poisfft_hybrid_mod

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: palm/trunk/SOURCE/poisfft_hybrid.f90 @ 1111

Download in other formats: