Home

Context Navigation

poisfft.f90 @ 1106

Last change on this file since 1106 was 1106, checked in by raasch, 11 years ago

New:
---

Porting of FFT-solver for serial runs to GPU using CUDA FFT,
preprocessor lines in transpose routines rearranged, so that routines can also
be used in serial (non-parallel) mode,
transpositions also carried out in serial mode, routines fftx, fftxp replaced
by calls of fft_x, fft_x replaced by fft_x_1d in the 1D-decomposition routines
(Makefile, Makefile_check, fft_xy, poisfft, poisfft_hybrid, transpose, new: cuda_fft_interfaces)

--stdin argument for mpiexec on lckyuh, -y and -Y settings output to header (mrun)

Changed:

Module array_kind renamed precision_kind
(check_open, data_output_3d, fft_xy, modules, user_data_output_3d)

some format changes for coupled atmosphere-ocean runs (header)
small changes in code formatting (microphysics, prognostic_equations)

Errors:

bugfix: default value (0) assigned to coupling_start_time (modules)
bugfix: initial time for preruns of coupled runs is output as -coupling_start_time (data_output_profiles)

Property svn:keywords set to Id

File size: 45.2 KB

Line
1	MODULE poisfft_mod
2
3	!--------------------------------------------------------------------------------!
4	! This file is part of PALM.
5	!
6	! PALM is free software: you can redistribute it and/or modify it under the terms
7	! of the GNU General Public License as published by the Free Software Foundation,
8	! either version 3 of the License, or (at your option) any later version.
9	!
10	! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11	! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12	! A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13	!
14	! You should have received a copy of the GNU General Public License along with
15	! PALM. If not, see <http://www.gnu.org/licenses/>.
16	!
17	! Copyright 1997-2012 Leibniz University Hannover
18	!--------------------------------------------------------------------------------!
19	!
20	! Current revisions:
21	! -----------------
22	! routines fftx, ffty, fftxp, fftyp removed, calls replaced by fft_x, fft_y,
23	! in the 1D-decomposition routines fft_x, ffty are replaced by fft_x_1d,
24	! fft_y_1d
25	!
26	! Former revisions:
27	! -----------------
28	! $Id: poisfft.f90 1106 2013-03-04 05:31:38Z raasch $
29	!
30	! 1103 2013-02-20 02:15:53Z raasch
31	! tri, ar, and ar1 arguments in tridia-routines (2d) are removed because they
32	! sometimes cause segmentation faults with intel 12.1 compiler
33	!
34	! 1092 2013-02-02 11:24:22Z raasch
35	! unused variables removed
36	!
37	! 1036 2012-10-22 13:43:42Z raasch
38	! code put under GPL (PALM 3.9)
39	!
40	! 2012-09-21 07:03:55Z raasch
41	! FLOAT type conversion replaced by REAL
42	!
43	! 1003 2012-09-14 14:35:53Z raasch
44	! indices nxa, nya, etc. replaced by nx, ny, etc.
45	!
46	! 940 2012-07-09 14:31:00Z raasch
47	! special handling of tri-array as an argument in tridia_1dd routines switched
48	! off because it caused segmentation faults with intel 12.1 compiler
49	!
50	! 877 2012-04-03 11:21:44Z suehring
51	! Bugfix: Avoid divisions by zero in case of using a 'neumann' bc for the
52	! pressure at the top of the model domain.
53	!
54	! 809 2012-01-30 13:32:58Z maronga
55	! Bugfix: replaced .AND. and .NOT. with && and ! in the preprocessor directives
56	!
57	! 807 2012-01-25 11:53:51Z maronga
58	! New cpp directive "__check" implemented which is used by check_namelist_files
59	! (most of the code is unneeded by check_namelist_files).
60	!
61	! 763 2011-10-06 09:32:09Z suehring
62	! Comment added concerning the last change.
63	!
64	! 761 2011-10-05 17:58:52Z suehring
65	! Bugfix: Avoid divisions by zero in case of using a 'neumann' bc for the
66	! pressure at the top of the model domain.
67	!
68	! 696 2011-03-18 07:03:49Z raasch
69	! work_fftx removed from PRIVATE clauses in fftx_tr_xy and tr_yx_fftx
70	!
71	! 683 2011-02-09 14:25:15Z raasch
72	! openMP parallelization for 2d-domain-decomposition
73	!
74	! 667 2010-12-23 12:06:00Z suehring/gryschka
75	! ddzu replaced by ddzu_pres due to changes in zu(0)
76	!
77	! 622 2010-12-10 08:08:13Z raasch
78	! optional barriers included in order to speed up collective operations
79	!
80	! 377 2009-09-04 11:09:00Z raasch
81	! __lcmuk changed to __lc to avoid problems with Intel compiler on sgi-ice
82	!
83	! 164 2008-05-15 08:46:15Z raasch
84	! Arguments removed from transpose routines
85	!
86	! 128 2007-10-26 13:11:14Z raasch
87	! Bugfix: wavenumber calculation for even nx in routines maketri
88	!
89	! 85 2007-05-11 09:35:14Z raasch
90	! Bugfix: work_fft*_vec removed from some PRIVATE-declarations
91	!
92	! 76 2007-03-29 00:58:32Z raasch
93	! Tridiagonal coefficients adjusted for Neumann boundary conditions both at
94	! the bottom and the top.
95	!
96	! RCS Log replace by Id keyword, revision history cleaned up
97	!
98	! Revision 1.24 2006/08/04 15:00:24 raasch
99	! Default setting of the thread number tn in case of not using OpenMP
100	!
101	! Revision 1.23 2006/02/23 12:48:38 raasch
102	! Additional compiler directive in routine tridia_1dd for preventing loop
103	! exchange on NEC-SX6
104	!
105	! Revision 1.20 2004/04/30 12:38:09 raasch
106	! Parts of former poisfft_hybrid moved to this subroutine,
107	! former subroutine changed to a module, renaming of FFT-subroutines and
108	! -module, FFTs completely substituted by calls of fft_x and fft_y,
109	! NAG fft used in the non-parallel case completely removed, l in maketri
110	! is now a 1d-array, variables passed by modules instead of using parameter
111	! lists, enlarged transposition arrays introduced
112	!
113	! Revision 1.1 1997/07/24 11:24:14 raasch
114	! Initial revision
115	!
116	!
117	! Description:
118	! ------------
119	! See below.
120	!------------------------------------------------------------------------------!
121
122	!--------------------------------------------------------------------------!
123	! poisfft !
124	! !
125	! Original version: Stephan Siano (pois3d) !
126	! !
127	! Institute of Meteorology and Climatology, University of Hannover !
128	! Germany !
129	! !
130	! Version as of July 23,1996 !
131	! !
132	! !
133	! Version for parallel computers: Siegfried Raasch !
134	! !
135	! Version as of July 03,1997 !
136	! !
137	! Solves the Poisson equation with a 2D spectral method !
138	! d^2 p / dx^2 + d^2 p / dy^2 + d^2 p / dz^2 = s !
139	! !
140	! Input: !
141	! real ar contains in the (nnx,nny,nnz) elements, !
142	! starting from the element (1,nys,nxl), the !
143	! values for s !
144	! real work Temporary array !
145	! !
146	! Output: !
147	! real ar contains the solution for p !
148	!--------------------------------------------------------------------------!
149
150	USE fft_xy
151	USE indices
152	USE transpose_indices
153
154	IMPLICIT NONE
155
156	PRIVATE
157
158	#if ! defined ( __check )
159	PUBLIC poisfft, poisfft_init
160
161	INTERFACE poisfft
162	MODULE PROCEDURE poisfft
163	END INTERFACE poisfft
164
165	INTERFACE poisfft_init
166	MODULE PROCEDURE poisfft_init
167	END INTERFACE poisfft_init
168	#else
169	PUBLIC poisfft_init
170
171	INTERFACE poisfft_init
172	MODULE PROCEDURE poisfft_init
173	END INTERFACE poisfft_init
174	#endif
175
176	CONTAINS
177
178	SUBROUTINE poisfft_init
179
180	CALL fft_init
181
182	END SUBROUTINE poisfft_init
183
184	#if ! defined ( __check )
185	SUBROUTINE poisfft( ar, work )
186
187	USE cpulog
188	USE interfaces
189	USE pegrid
190
191	IMPLICIT NONE
192
193	REAL, DIMENSION(1:nz,nys:nyn,nxl:nxr) :: ar, work
194
195
196	CALL cpu_log( log_point_s(3), 'poisfft', 'start' )
197
198	!
199	!-- Two-dimensional Fourier Transformation in x- and y-direction.
200	#if defined( __parallel )
201	IF ( pdims(2) == 1 ) THEN
202
203	!
204	!-- 1d-domain-decomposition along x:
205	!-- FFT along y and transposition y --> x
206	CALL ffty_tr_yx( ar, work, ar )
207
208	!
209	!-- FFT along x, solving the tridiagonal system and backward FFT
210	CALL fftx_tri_fftx( ar )
211
212	!
213	!-- Transposition x --> y and backward FFT along y
214	CALL tr_xy_ffty( ar, work, ar )
215
216	ELSEIF ( pdims(1) == 1 ) THEN
217
218	!
219	!-- 1d-domain-decomposition along y:
220	!-- FFT along x and transposition x --> y
221	CALL fftx_tr_xy( ar, work, ar )
222
223	!
224	!-- FFT along y, solving the tridiagonal system and backward FFT
225	CALL ffty_tri_ffty( ar )
226
227	!
228	!-- Transposition y --> x and backward FFT along x
229	CALL tr_yx_fftx( ar, work, ar )
230
231	ELSE
232
233	!
234	!-- 2d-domain-decomposition
235	!-- Transposition z --> x
236	CALL cpu_log( log_point_s(5), 'transpo forward', 'start' )
237	CALL transpose_zx( ar, work, ar )
238	CALL cpu_log( log_point_s(5), 'transpo forward', 'pause' )
239
240	CALL cpu_log( log_point_s(4), 'fft_x', 'start' )
241	CALL fft_x( ar, 'forward' )
242	CALL cpu_log( log_point_s(4), 'fft_x', 'pause' )
243
244	!
245	!-- Transposition x --> y
246	CALL cpu_log( log_point_s(5), 'transpo forward', 'continue' )
247	CALL transpose_xy( ar, work, ar )
248	CALL cpu_log( log_point_s(5), 'transpo forward', 'pause' )
249
250	CALL cpu_log( log_point_s(7), 'fft_y', 'start' )
251	CALL fft_y( ar, 'forward' )
252	CALL cpu_log( log_point_s(7), 'fft_y', 'pause' )
253
254	!
255	!-- Transposition y --> z
256	CALL cpu_log( log_point_s(5), 'transpo forward', 'continue' )
257	CALL transpose_yz( ar, work, ar )
258	CALL cpu_log( log_point_s(5), 'transpo forward', 'stop' )
259
260	!
261	!-- Solve the tridiagonal equation system along z
262	CALL cpu_log( log_point_s(6), 'tridia', 'start' )
263	CALL tridia( ar )
264	CALL cpu_log( log_point_s(6), 'tridia', 'stop' )
265
266	!
267	!-- Inverse Fourier Transformation
268	!-- Transposition z --> y
269	CALL cpu_log( log_point_s(8), 'transpo invers', 'start' )
270	CALL transpose_zy( ar, work, ar )
271	CALL cpu_log( log_point_s(8), 'transpo invers', 'pause' )
272
273	CALL cpu_log( log_point_s(7), 'fft_y', 'continue' )
274	CALL fft_y( ar, 'backward' )
275	CALL cpu_log( log_point_s(7), 'fft_y', 'stop' )
276
277	!
278	!-- Transposition y --> x
279	CALL cpu_log( log_point_s(8), 'transpo invers', 'continue' )
280	CALL transpose_yx( ar, work, ar )
281	CALL cpu_log( log_point_s(8), 'transpo invers', 'pause' )
282
283	CALL cpu_log( log_point_s(4), 'fft_x', 'continue' )
284	CALL fft_x( ar, 'backward' )
285	CALL cpu_log( log_point_s(4), 'fft_x', 'stop' )
286
287	!
288	!-- Transposition x --> z
289	CALL cpu_log( log_point_s(8), 'transpo invers', 'continue' )
290	CALL transpose_xz( ar, work, ar )
291	CALL cpu_log( log_point_s(8), 'transpo invers', 'stop' )
292
293	ENDIF
294
295	#else
296
297	!
298	!-- Two-dimensional Fourier Transformation along x- and y-direction.
299	CALL cpu_log( log_point_s(5), 'transpo forward', 'start' )
300	!$acc data copyin( ar, work )
301	CALL transpose_zx( ar, work, ar )
302	!$acc update host( ar )
303	CALL cpu_log( log_point_s(5), 'transpo forward', 'pause' )
304
305
306	CALL cpu_log( log_point_s(4), 'fft_x', 'start' )
307	CALL fft_x( ar, 'forward' )
308	CALL cpu_log( log_point_s(4), 'fft_x', 'pause' )
309
310	CALL cpu_log( log_point_s(5), 'transpo forward', 'continue' )
311	CALL transpose_xy( ar, work, ar )
312	CALL cpu_log( log_point_s(5), 'transpo forward', 'pause' )
313
314	CALL cpu_log( log_point_s(7), 'fft_y', 'start' )
315	CALL fft_y( ar, 'forward' )
316	CALL cpu_log( log_point_s(7), 'fft_y', 'pause' )
317
318	!
319	!-- Solve the tridiagonal equation system along z
320	CALL cpu_log( log_point_s(5), 'transpo forward', 'continue' )
321	CALL transpose_yz( ar, work, ar )
322	CALL cpu_log( log_point_s(5), 'transpo forward', 'stop' )
323
324	CALL cpu_log( log_point_s(6), 'tridia', 'start' )
325	CALL tridia( ar )
326	CALL cpu_log( log_point_s(6), 'tridia', 'stop' )
327
328	CALL cpu_log( log_point_s(8), 'transpo invers', 'start' )
329	CALL transpose_zy( ar, work, ar )
330	CALL cpu_log( log_point_s(8), 'transpo invers', 'pause' )
331
332	!
333	!-- Inverse Fourier Transformation.
334	CALL cpu_log( log_point_s(7), 'fft_y', 'continue' )
335	CALL fft_y( ar, 'backward' )
336	CALL cpu_log( log_point_s(7), 'fft_y', 'stop' )
337
338	CALL cpu_log( log_point_s(8), 'transpo invers', 'continue' )
339	CALL transpose_yx( ar, work, ar )
340	CALL cpu_log( log_point_s(8), 'transpo invers', 'pause' )
341
342	CALL cpu_log( log_point_s(4), 'fft_x', 'continue' )
343	CALL fft_x( ar, 'backward' )
344	CALL cpu_log( log_point_s(4), 'fft_x', 'stop' )
345
346	CALL cpu_log( log_point_s(8), 'transpo invers', 'continue' )
347	CALL transpose_xz( ar, work, ar )
348	CALL cpu_log( log_point_s(8), 'transpo invers', 'stop' )
349
350	!$acc end data
351
352	#endif
353
354	CALL cpu_log( log_point_s(3), 'poisfft', 'stop' )
355
356	END SUBROUTINE poisfft
357
358
359
360	SUBROUTINE tridia( ar )
361
362	!------------------------------------------------------------------------------!
363	! solves the linear system of equations:
364	!
365	! -(4 pi^2(i^2/(dx^2nnx^2)+j^2/(dy^2nny^2))+
366	! 1/(dzu(k)dzw(k))+1/(dzu(k-1)dzw(k)))*p(i,j,k)+
367	! 1/(dzu(k)dzw(k))p(i,j,k+1)+1/(dzu(k-1)dzw(k))p(i,j,k-1)=d(i,j,k)
368	!
369	! by using the Thomas algorithm
370	!------------------------------------------------------------------------------!
371
372	USE arrays_3d
373
374	IMPLICIT NONE
375
376	INTEGER :: i, j, k, nnyh
377
378	REAL, DIMENSION(nxl_z:nxr_z,0:nz-1) :: ar1
379	REAL, DIMENSION(5,nxl_z:nxr_z,0:nz-1) :: tri
380
381	REAL :: ar(nxl_z:nxr_z,nys_z:nyn_z,1:nz)
382
383
384	nnyh = (ny+1) / 2
385
386	!
387	!-- Define constant elements of the tridiagonal matrix.
388	!$OMP PARALLEL PRIVATE ( k, i )
389	!$OMP DO
390	DO k = 0, nz-1
391	DO i = nxl_z, nxr_z
392	tri(2,i,k) = ddzu_pres(k+1) * ddzw(k+1)
393	tri(3,i,k) = ddzu_pres(k+2) * ddzw(k+1)
394	ENDDO
395	ENDDO
396	!$OMP END PARALLEL
397
398	#if defined( __parallel )
399	!
400	!-- Repeat for all y-levels.
401	!$OMP PARALLEL FIRSTPRIVATE( tri ) PRIVATE ( ar1, j )
402	!$OMP DO
403	DO j = nys_z, nyn_z
404	IF ( j <= nnyh ) THEN
405	CALL maketri( j )
406	ELSE
407	CALL maketri( ny+1-j )
408	ENDIF
409	CALL split
410	CALL substi( j )
411	ENDDO
412	!$OMP END PARALLEL
413	#else
414	!
415	!-- First y-level.
416	CALL maketri( nys_z )
417	CALL split
418	CALL substi( 0 )
419
420	!
421	!-- Further y-levels.
422	DO j = 1, nnyh - 1
423	CALL maketri( j )
424	CALL split
425	CALL substi( j )
426	CALL substi( ny+1-j )
427	ENDDO
428	CALL maketri( nnyh )
429	CALL split
430	CALL substi( nnyh+nys )
431	#endif
432
433	CONTAINS
434
435	SUBROUTINE maketri( j )
436
437	!------------------------------------------------------------------------------!
438	! Computes the i- and j-dependent component of the matrix
439	!------------------------------------------------------------------------------!
440
441	USE arrays_3d
442	USE constants
443	USE control_parameters
444	USE grid_variables
445
446	IMPLICIT NONE
447
448	INTEGER :: i, j, k, nnxh
449	REAL :: a, c
450	REAL :: ll(nxl_z:nxr_z)
451
452
453	nnxh = ( nx + 1 ) / 2
454
455	!
456	!-- Provide the tridiagonal matrix for solution of the Poisson equation in
457	!-- Fourier space. The coefficients are computed following the method of
458	!-- Schmidt et al. (DFVLR-Mitteilung 84-15), which departs from Stephan
459	!-- Siano's original version by discretizing the Poisson equation,
460	!-- before it is Fourier-transformed
461	#if defined( __parallel )
462	DO i = nxl_z, nxr_z
463	IF ( i >= 0 .AND. i <= nnxh ) THEN
464	ll(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * i ) / &
465	REAL( nx+1 ) ) ) / ( dx * dx ) + &
466	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
467	REAL( ny+1 ) ) ) / ( dy * dy )
468	ELSE
469	ll(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * ( nx+1-i ) ) / &
470	REAL( nx+1 ) ) ) / ( dx * dx ) + &
471	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
472	REAL( ny+1 ) ) ) / ( dy * dy )
473	ENDIF
474	DO k = 0,nz-1
475	a = -1.0 * ddzu_pres(k+2) * ddzw(k+1)
476	c = -1.0 * ddzu_pres(k+1) * ddzw(k+1)
477	tri(1,i,k) = a + c - ll(i)
478	ENDDO
479	ENDDO
480	#else
481	DO i = 0, nnxh
482	ll(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * i ) / REAL( nx+1 ) ) ) / &
483	( dx * dx ) + &
484	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / REAL( ny+1 ) ) ) / &
485	( dy * dy )
486	DO k = 0, nz-1
487	a = -1.0 * ddzu_pres(k+2) * ddzw(k+1)
488	c = -1.0 * ddzu_pres(k+1) * ddzw(k+1)
489	tri(1,i,k) = a + c - ll(i)
490	IF ( i >= 1 .and. i < nnxh ) THEN
491	tri(1,nx+1-i,k) = tri(1,i,k)
492	ENDIF
493	ENDDO
494	ENDDO
495	#endif
496	IF ( ibc_p_b == 1 .OR. ibc_p_b == 2 ) THEN
497	DO i = nxl_z, nxr_z
498	tri(1,i,0) = tri(1,i,0) + tri(2,i,0)
499	ENDDO
500	ENDIF
501	IF ( ibc_p_t == 1 ) THEN
502	DO i = nxl_z, nxr_z
503	tri(1,i,nz-1) = tri(1,i,nz-1) + tri(3,i,nz-1)
504	ENDDO
505	ENDIF
506
507	END SUBROUTINE maketri
508
509
510	SUBROUTINE substi( j )
511
512	!------------------------------------------------------------------------------!
513	! Substitution (Forward and Backward) (Thomas algorithm)
514	!------------------------------------------------------------------------------!
515
516	USE control_parameters
517
518	IMPLICIT NONE
519
520	INTEGER :: i, j, k
521
522	!
523	!-- Forward substitution.
524	DO i = nxl_z, nxr_z
525	ar1(i,0) = ar(i,j,1)
526	ENDDO
527	DO k = 1, nz - 1
528	DO i = nxl_z, nxr_z
529	ar1(i,k) = ar(i,j,k+1) - tri(5,i,k) * ar1(i,k-1)
530	ENDDO
531	ENDDO
532
533	!
534	!-- Backward substitution
535	!-- Note, the 1.0E-20 in the denominator is due to avoid divisions
536	!-- by zero appearing if the pressure bc is set to neumann at the top of
537	!-- the model domain.
538	DO i = nxl_z, nxr_z
539	ar(i,j,nz) = ar1(i,nz-1) / ( tri(4,i,nz-1) + 1.0E-20 )
540	ENDDO
541	DO k = nz-2, 0, -1
542	DO i = nxl_z, nxr_z
543	ar(i,j,k+1) = ( ar1(i,k) - tri(3,i,k) * ar(i,j,k+2) ) &
544	/ tri(4,i,k)
545	ENDDO
546	ENDDO
547
548	!
549	!-- Indices i=0, j=0 correspond to horizontally averaged pressure.
550	!-- The respective values of ar should be zero at all k-levels if
551	!-- acceleration of horizontally averaged vertical velocity is zero.
552	IF ( ibc_p_b == 1 .AND. ibc_p_t == 1 ) THEN
553	IF ( j == 0 .AND. nxl_z == 0 ) THEN
554	DO k = 1, nz
555	ar(nxl_z,j,k) = 0.0
556	ENDDO
557	ENDIF
558	ENDIF
559
560	END SUBROUTINE substi
561
562
563	SUBROUTINE split
564
565	!------------------------------------------------------------------------------!
566	! Splitting of the tridiagonal matrix (Thomas algorithm)
567	!------------------------------------------------------------------------------!
568
569	IMPLICIT NONE
570
571	INTEGER :: i, k
572
573	!
574	!-- Splitting.
575	DO i = nxl_z, nxr_z
576	tri(4,i,0) = tri(1,i,0)
577	ENDDO
578	DO k = 1, nz-1
579	DO i = nxl_z, nxr_z
580	tri(5,i,k) = tri(2,i,k) / tri(4,i,k-1)
581	tri(4,i,k) = tri(1,i,k) - tri(3,i,k-1) * tri(5,i,k)
582	ENDDO
583	ENDDO
584
585	END SUBROUTINE split
586
587	END SUBROUTINE tridia
588
589
590	#if defined( __parallel )
591	SUBROUTINE ffty_tr_yx( f_in, work, f_out )
592
593	!------------------------------------------------------------------------------!
594	! Fourier-transformation along y with subsequent transposition y --> x for
595	! a 1d-decomposition along x
596	!
597	! ATTENTION: The performance of this routine is much faster on the NEC-SX6,
598	! if the first index of work_ffty_vec is odd. Otherwise
599	! memory bank conflicts may occur (especially if the index is a
600	! multiple of 128). That's why work_ffty_vec is dimensioned as
601	! 0:ny+1.
602	! Of course, this will not work if users are using an odd number
603	! of gridpoints along y.
604	!------------------------------------------------------------------------------!
605
606	USE control_parameters
607	USE cpulog
608	USE indices
609	USE interfaces
610	USE pegrid
611	USE transpose_indices
612
613	IMPLICIT NONE
614
615	INTEGER :: i, iend, iouter, ir, j, k
616	INTEGER, PARAMETER :: stridex = 4
617
618	REAL, DIMENSION(0:ny,stridex) :: work_ffty
619	#if defined( __nec )
620	REAL, DIMENSION(0:ny+1,1:nz,nxl:nxr) :: work_ffty_vec
621	#endif
622	REAL, DIMENSION(1:nz,0:ny,nxl:nxr) :: f_in
623	REAL, DIMENSION(nnx,1:nz,nys_x:nyn_x,pdims(1)) :: f_out
624	REAL, DIMENSION(nxl:nxr,1:nz,0:ny) :: work
625
626	!
627	!-- Carry out the FFT along y, where all data are present due to the
628	!-- 1d-decomposition along x. Resort the data in a way that x becomes
629	!-- the first index.
630	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'start' )
631
632	IF ( host(1:3) == 'nec' ) THEN
633	#if defined( __nec )
634	!
635	!-- Code optimized for vector processors
636	!$OMP PARALLEL PRIVATE ( i, j, k )
637	!$OMP DO
638	DO i = nxl, nxr
639
640	DO j = 0, ny
641	DO k = 1, nz
642	work_ffty_vec(j,k,i) = f_in(k,j,i)
643	ENDDO
644	ENDDO
645
646	CALL fft_y_m( work_ffty_vec(:,:,i), ny+1, 'forward' )
647
648	ENDDO
649
650	!$OMP DO
651	DO k = 1, nz
652	DO j = 0, ny
653	DO i = nxl, nxr
654	work(i,k,j) = work_ffty_vec(j,k,i)
655	ENDDO
656	ENDDO
657	ENDDO
658	!$OMP END PARALLEL
659	#endif
660
661	ELSE
662
663	!
664	!-- Cache optimized code.
665	!-- The i-(x-)direction is split into a strided outer loop and an inner
666	!-- loop for better cache performance
667	!$OMP PARALLEL PRIVATE (i,iend,iouter,ir,j,k,work_ffty)
668	!$OMP DO
669	DO iouter = nxl, nxr, stridex
670
671	iend = MIN( iouter+stridex-1, nxr ) ! Upper bound for inner i loop
672
673	DO k = 1, nz
674
675	DO i = iouter, iend
676
677	ir = i-iouter+1 ! counter within a stride
678	DO j = 0, ny
679	work_ffty(j,ir) = f_in(k,j,i)
680	ENDDO
681	!
682	!-- FFT along y
683	CALL fft_y_1d( work_ffty(:,ir), 'forward' )
684
685	ENDDO
686
687	!
688	!-- Resort
689	DO j = 0, ny
690	DO i = iouter, iend
691	work(i,k,j) = work_ffty(j,i-iouter+1)
692	ENDDO
693	ENDDO
694
695	ENDDO
696
697	ENDDO
698	!$OMP END PARALLEL
699
700	ENDIF
701	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'pause' )
702
703	!
704	!-- Transpose array
705	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
706	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
707	CALL MPI_ALLTOALL( work(nxl,1,0), sendrecvcount_xy, MPI_REAL, &
708	f_out(1,1,nys_x,1), sendrecvcount_xy, MPI_REAL, &
709	comm1dx, ierr )
710	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
711
712	END SUBROUTINE ffty_tr_yx
713
714
715	SUBROUTINE tr_xy_ffty( f_in, work, f_out )
716
717	!------------------------------------------------------------------------------!
718	! Transposition x --> y with a subsequent backward Fourier transformation for
719	! a 1d-decomposition along x
720	!------------------------------------------------------------------------------!
721
722	USE control_parameters
723	USE cpulog
724	USE indices
725	USE interfaces
726	USE pegrid
727	USE transpose_indices
728
729	IMPLICIT NONE
730
731	INTEGER :: i, iend, iouter, ir, j, k
732	INTEGER, PARAMETER :: stridex = 4
733
734	REAL, DIMENSION(0:ny,stridex) :: work_ffty
735	#if defined( __nec )
736	REAL, DIMENSION(0:ny+1,1:nz,nxl:nxr) :: work_ffty_vec
737	#endif
738	REAL, DIMENSION(nnx,1:nz,nys_x:nyn_x,pdims(1)) :: f_in
739	REAL, DIMENSION(1:nz,0:ny,nxl:nxr) :: f_out
740	REAL, DIMENSION(nxl:nxr,1:nz,0:ny) :: work
741
742	!
743	!-- Transpose array
744	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
745	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
746	CALL MPI_ALLTOALL( f_in(1,1,nys_x,1), sendrecvcount_xy, MPI_REAL, &
747	work(nxl,1,0), sendrecvcount_xy, MPI_REAL, &
748	comm1dx, ierr )
749	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
750
751	!
752	!-- Resort the data in a way that y becomes the first index and carry out the
753	!-- backward fft along y.
754	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'continue' )
755
756	IF ( host(1:3) == 'nec' ) THEN
757	#if defined( __nec )
758	!
759	!-- Code optimized for vector processors
760	!$OMP PARALLEL PRIVATE ( i, j, k )
761	!$OMP DO
762	DO k = 1, nz
763	DO j = 0, ny
764	DO i = nxl, nxr
765	work_ffty_vec(j,k,i) = work(i,k,j)
766	ENDDO
767	ENDDO
768	ENDDO
769
770	!$OMP DO
771	DO i = nxl, nxr
772
773	CALL fft_y_m( work_ffty_vec(:,:,i), ny+1, 'backward' )
774
775	DO j = 0, ny
776	DO k = 1, nz
777	f_out(k,j,i) = work_ffty_vec(j,k,i)
778	ENDDO
779	ENDDO
780
781	ENDDO
782	!$OMP END PARALLEL
783	#endif
784
785	ELSE
786
787	!
788	!-- Cache optimized code.
789	!-- The i-(x-)direction is split into a strided outer loop and an inner
790	!-- loop for better cache performance
791	!$OMP PARALLEL PRIVATE ( i, iend, iouter, ir, j, k, work_ffty )
792	!$OMP DO
793	DO iouter = nxl, nxr, stridex
794
795	iend = MIN( iouter+stridex-1, nxr ) ! Upper bound for inner i loop
796
797	DO k = 1, nz
798	!
799	!-- Resort
800	DO j = 0, ny
801	DO i = iouter, iend
802	work_ffty(j,i-iouter+1) = work(i,k,j)
803	ENDDO
804	ENDDO
805
806	DO i = iouter, iend
807
808	!
809	!-- FFT along y
810	ir = i-iouter+1 ! counter within a stride
811	CALL fft_y_1d( work_ffty(:,ir), 'backward' )
812
813	DO j = 0, ny
814	f_out(k,j,i) = work_ffty(j,ir)
815	ENDDO
816	ENDDO
817
818	ENDDO
819
820	ENDDO
821	!$OMP END PARALLEL
822
823	ENDIF
824
825	CALL cpu_log( log_point_s(7), 'fft_y_1d', 'stop' )
826
827	END SUBROUTINE tr_xy_ffty
828
829
830	SUBROUTINE fftx_tri_fftx( ar )
831
832	!------------------------------------------------------------------------------!
833	! FFT along x, solution of the tridiagonal system and backward FFT for
834	! a 1d-decomposition along x
835	!
836	! WARNING: this subroutine may still not work for hybrid parallelization
837	! with OpenMP (for possible necessary changes see the original
838	! routine poisfft_hybrid, developed by Klaus Ketelsen, May 2002)
839	!------------------------------------------------------------------------------!
840
841	USE control_parameters
842	USE cpulog
843	USE grid_variables
844	USE indices
845	USE interfaces
846	USE pegrid
847	USE transpose_indices
848
849	IMPLICIT NONE
850
851	INTEGER :: i, j, k, m, n, omp_get_thread_num, tn
852
853	REAL, DIMENSION(0:nx) :: work_fftx
854	REAL, DIMENSION(0:nx,1:nz) :: work_trix
855	REAL, DIMENSION(nnx,1:nz,nys_x:nyn_x,pdims(1)) :: ar
856	REAL, DIMENSION(:,:,:,:), ALLOCATABLE :: tri
857
858
859	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'start' )
860
861	ALLOCATE( tri(5,0:nx,0:nz-1,0:threads_per_task-1) )
862
863	tn = 0 ! Default thread number in case of one thread
864	!$OMP PARALLEL DO PRIVATE ( i, j, k, m, n, tn, work_fftx, work_trix )
865	DO j = nys_x, nyn_x
866
867	!$ tn = omp_get_thread_num()
868
869	IF ( host(1:3) == 'nec' ) THEN
870	!
871	!-- Code optimized for vector processors
872	DO k = 1, nz
873
874	m = 0
875	DO n = 1, pdims(1)
876	DO i = 1, nnx
877	work_trix(m,k) = ar(i,k,j,n)
878	m = m + 1
879	ENDDO
880	ENDDO
881
882	ENDDO
883
884	CALL fft_x_m( work_trix, 'forward' )
885
886	ELSE
887	!
888	!-- Cache optimized code
889	DO k = 1, nz
890
891	m = 0
892	DO n = 1, pdims(1)
893	DO i = 1, nnx
894	work_fftx(m) = ar(i,k,j,n)
895	m = m + 1
896	ENDDO
897	ENDDO
898
899	CALL fft_x_1d( work_fftx, 'forward' )
900
901	DO i = 0, nx
902	work_trix(i,k) = work_fftx(i)
903	ENDDO
904
905	ENDDO
906
907	ENDIF
908
909	!
910	!-- Solve the linear equation system
911	CALL tridia_1dd( ddx2, ddy2, nx, ny, j, work_trix, tri(:,:,:,tn) )
912
913	IF ( host(1:3) == 'nec' ) THEN
914	!
915	!-- Code optimized for vector processors
916	CALL fft_x_m( work_trix, 'backward' )
917
918	DO k = 1, nz
919
920	m = 0
921	DO n = 1, pdims(1)
922	DO i = 1, nnx
923	ar(i,k,j,n) = work_trix(m,k)
924	m = m + 1
925	ENDDO
926	ENDDO
927
928	ENDDO
929
930	ELSE
931	!
932	!-- Cache optimized code
933	DO k = 1, nz
934
935	DO i = 0, nx
936	work_fftx(i) = work_trix(i,k)
937	ENDDO
938
939	CALL fft_x_1d( work_fftx, 'backward' )
940
941	m = 0
942	DO n = 1, pdims(1)
943	DO i = 1, nnx
944	ar(i,k,j,n) = work_fftx(m)
945	m = m + 1
946	ENDDO
947	ENDDO
948
949	ENDDO
950
951	ENDIF
952
953	ENDDO
954
955	DEALLOCATE( tri )
956
957	CALL cpu_log( log_point_s(33), 'fft_x_1d + tridia', 'stop' )
958
959	END SUBROUTINE fftx_tri_fftx
960
961
962	SUBROUTINE fftx_tr_xy( f_in, work, f_out )
963
964	!------------------------------------------------------------------------------!
965	! Fourier-transformation along x with subsequent transposition x --> y for
966	! a 1d-decomposition along y
967	!
968	! ATTENTION: The NEC-branch of this routine may significantly profit from
969	! further optimizations. So far, performance is much worse than
970	! for routine ffty_tr_yx (more than three times slower).
971	!------------------------------------------------------------------------------!
972
973	USE control_parameters
974	USE cpulog
975	USE indices
976	USE interfaces
977	USE pegrid
978	USE transpose_indices
979
980	IMPLICIT NONE
981
982	INTEGER :: i, j, k
983
984	REAL, DIMENSION(0:nx,1:nz,nys:nyn) :: work_fftx
985	REAL, DIMENSION(1:nz,nys:nyn,0:nx) :: f_in
986	REAL, DIMENSION(nny,1:nz,nxl_y:nxr_y,pdims(2)) :: f_out
987	REAL, DIMENSION(nys:nyn,1:nz,0:nx) :: work
988
989	!
990	!-- Carry out the FFT along x, where all data are present due to the
991	!-- 1d-decomposition along y. Resort the data in a way that y becomes
992	!-- the first index.
993	CALL cpu_log( log_point_s(4), 'fft_x_1d', 'start' )
994
995	IF ( host(1:3) == 'nec' ) THEN
996	!
997	!-- Code for vector processors
998	!$OMP PARALLEL PRIVATE ( i, j, k )
999	!$OMP DO
1000	DO i = 0, nx
1001
1002	DO j = nys, nyn
1003	DO k = 1, nz
1004	work_fftx(i,k,j) = f_in(k,j,i)
1005	ENDDO
1006	ENDDO
1007
1008	ENDDO
1009
1010	!$OMP DO
1011	DO j = nys, nyn
1012
1013	CALL fft_x_m( work_fftx(:,:,j), 'forward' )
1014
1015	DO k = 1, nz
1016	DO i = 0, nx
1017	work(j,k,i) = work_fftx(i,k,j)
1018	ENDDO
1019	ENDDO
1020
1021	ENDDO
1022	!$OMP END PARALLEL
1023
1024	ELSE
1025
1026	!
1027	!-- Cache optimized code (there might be still a potential for better
1028	!-- optimization).
1029	!$OMP PARALLEL PRIVATE (i,j,k)
1030	!$OMP DO
1031	DO i = 0, nx
1032
1033	DO j = nys, nyn
1034	DO k = 1, nz
1035	work_fftx(i,k,j) = f_in(k,j,i)
1036	ENDDO
1037	ENDDO
1038
1039	ENDDO
1040
1041	!$OMP DO
1042	DO j = nys, nyn
1043	DO k = 1, nz
1044
1045	CALL fft_x_1d( work_fftx(0:nx,k,j), 'forward' )
1046
1047	DO i = 0, nx
1048	work(j,k,i) = work_fftx(i,k,j)
1049	ENDDO
1050	ENDDO
1051
1052	ENDDO
1053	!$OMP END PARALLEL
1054
1055	ENDIF
1056	CALL cpu_log( log_point_s(4), 'fft_x_1d', 'pause' )
1057
1058	!
1059	!-- Transpose array
1060	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
1061	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
1062	CALL MPI_ALLTOALL( work(nys,1,0), sendrecvcount_xy, MPI_REAL, &
1063	f_out(1,1,nxl_y,1), sendrecvcount_xy, MPI_REAL, &
1064	comm1dy, ierr )
1065	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
1066
1067	END SUBROUTINE fftx_tr_xy
1068
1069
1070	SUBROUTINE tr_yx_fftx( f_in, work, f_out )
1071
1072	!------------------------------------------------------------------------------!
1073	! Transposition y --> x with a subsequent backward Fourier transformation for
1074	! a 1d-decomposition along x
1075	!------------------------------------------------------------------------------!
1076
1077	USE control_parameters
1078	USE cpulog
1079	USE indices
1080	USE interfaces
1081	USE pegrid
1082	USE transpose_indices
1083
1084	IMPLICIT NONE
1085
1086	INTEGER :: i, j, k
1087
1088	REAL, DIMENSION(0:nx,1:nz,nys:nyn) :: work_fftx
1089	REAL, DIMENSION(nny,1:nz,nxl_y:nxr_y,pdims(2)) :: f_in
1090	REAL, DIMENSION(1:nz,nys:nyn,0:nx) :: f_out
1091	REAL, DIMENSION(nys:nyn,1:nz,0:nx) :: work
1092
1093	!
1094	!-- Transpose array
1095	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'start' )
1096	IF ( collective_wait ) CALL MPI_BARRIER( comm2d, ierr )
1097	CALL MPI_ALLTOALL( f_in(1,1,nxl_y,1), sendrecvcount_xy, MPI_REAL, &
1098	work(nys,1,0), sendrecvcount_xy, MPI_REAL, &
1099	comm1dy, ierr )
1100	CALL cpu_log( log_point_s(32), 'mpi_alltoall', 'stop' )
1101
1102	!
1103	!-- Carry out the FFT along x, where all data are present due to the
1104	!-- 1d-decomposition along y. Resort the data in a way that y becomes
1105	!-- the first index.
1106	CALL cpu_log( log_point_s(4), 'fft_x_1d', 'continue' )
1107
1108	IF ( host(1:3) == 'nec' ) THEN
1109	!
1110	!-- Code optimized for vector processors
1111	!$OMP PARALLEL PRIVATE ( i, j, k )
1112	!$OMP DO
1113	DO j = nys, nyn
1114
1115	DO k = 1, nz
1116	DO i = 0, nx
1117	work_fftx(i,k,j) = work(j,k,i)
1118	ENDDO
1119	ENDDO
1120
1121	CALL fft_x_m( work_fftx(:,:,j), 'backward' )
1122
1123	ENDDO
1124
1125	!$OMP DO
1126	DO i = 0, nx
1127	DO j = nys, nyn
1128	DO k = 1, nz
1129	f_out(k,j,i) = work_fftx(i,k,j)
1130	ENDDO
1131	ENDDO
1132	ENDDO
1133	!$OMP END PARALLEL
1134
1135	ELSE
1136
1137	!
1138	!-- Cache optimized code (there might be still a potential for better
1139	!-- optimization).
1140	!$OMP PARALLEL PRIVATE (i,j,k)
1141	!$OMP DO
1142	DO j = nys, nyn
1143	DO k = 1, nz
1144
1145	DO i = 0, nx
1146	work_fftx(i,k,j) = work(j,k,i)
1147	ENDDO
1148
1149	CALL fft_x_1d( work_fftx(0:nx,k,j), 'backward' )
1150
1151	ENDDO
1152	ENDDO
1153
1154	!$OMP DO
1155	DO i = 0, nx
1156	DO j = nys, nyn
1157	DO k = 1, nz
1158	f_out(k,j,i) = work_fftx(i,k,j)
1159	ENDDO
1160	ENDDO
1161	ENDDO
1162	!$OMP END PARALLEL
1163
1164	ENDIF
1165	CALL cpu_log( log_point_s(4), 'fft_x_1d', 'stop' )
1166
1167	END SUBROUTINE tr_yx_fftx
1168
1169
1170	SUBROUTINE ffty_tri_ffty( ar )
1171
1172	!------------------------------------------------------------------------------!
1173	! FFT along y, solution of the tridiagonal system and backward FFT for
1174	! a 1d-decomposition along y
1175	!
1176	! WARNING: this subroutine may still not work for hybrid parallelization
1177	! with OpenMP (for possible necessary changes see the original
1178	! routine poisfft_hybrid, developed by Klaus Ketelsen, May 2002)
1179	!------------------------------------------------------------------------------!
1180
1181	USE control_parameters
1182	USE cpulog
1183	USE grid_variables
1184	USE indices
1185	USE interfaces
1186	USE pegrid
1187	USE transpose_indices
1188
1189	IMPLICIT NONE
1190
1191	INTEGER :: i, j, k, m, n, omp_get_thread_num, tn
1192
1193	REAL, DIMENSION(0:ny) :: work_ffty
1194	REAL, DIMENSION(0:ny,1:nz) :: work_triy
1195	REAL, DIMENSION(nny,1:nz,nxl_y:nxr_y,pdims(2)) :: ar
1196	REAL, DIMENSION(:,:,:,:), ALLOCATABLE :: tri
1197
1198
1199	CALL cpu_log( log_point_s(39), 'fft_y_1d + tridia', 'start' )
1200
1201	ALLOCATE( tri(5,0:ny,0:nz-1,0:threads_per_task-1) )
1202
1203	tn = 0 ! Default thread number in case of one thread
1204	!$OMP PARALLEL DO PRIVATE ( i, j, k, m, n, tn, work_ffty, work_triy )
1205	DO i = nxl_y, nxr_y
1206
1207	!$ tn = omp_get_thread_num()
1208
1209	IF ( host(1:3) == 'nec' ) THEN
1210	!
1211	!-- Code optimized for vector processors
1212	DO k = 1, nz
1213
1214	m = 0
1215	DO n = 1, pdims(2)
1216	DO j = 1, nny
1217	work_triy(m,k) = ar(j,k,i,n)
1218	m = m + 1
1219	ENDDO
1220	ENDDO
1221
1222	ENDDO
1223
1224	CALL fft_y_m( work_triy, ny, 'forward' )
1225
1226	ELSE
1227	!
1228	!-- Cache optimized code
1229	DO k = 1, nz
1230
1231	m = 0
1232	DO n = 1, pdims(2)
1233	DO j = 1, nny
1234	work_ffty(m) = ar(j,k,i,n)
1235	m = m + 1
1236	ENDDO
1237	ENDDO
1238
1239	CALL fft_y_1d( work_ffty, 'forward' )
1240
1241	DO j = 0, ny
1242	work_triy(j,k) = work_ffty(j)
1243	ENDDO
1244
1245	ENDDO
1246
1247	ENDIF
1248
1249	!
1250	!-- Solve the linear equation system
1251	CALL tridia_1dd( ddy2, ddx2, ny, nx, i, work_triy, tri(:,:,:,tn) )
1252
1253	IF ( host(1:3) == 'nec' ) THEN
1254	!
1255	!-- Code optimized for vector processors
1256	CALL fft_y_m( work_triy, ny, 'backward' )
1257
1258	DO k = 1, nz
1259
1260	m = 0
1261	DO n = 1, pdims(2)
1262	DO j = 1, nny
1263	ar(j,k,i,n) = work_triy(m,k)
1264	m = m + 1
1265	ENDDO
1266	ENDDO
1267
1268	ENDDO
1269
1270	ELSE
1271	!
1272	!-- Cache optimized code
1273	DO k = 1, nz
1274
1275	DO j = 0, ny
1276	work_ffty(j) = work_triy(j,k)
1277	ENDDO
1278
1279	CALL fft_y_1d( work_ffty, 'backward' )
1280
1281	m = 0
1282	DO n = 1, pdims(2)
1283	DO j = 1, nny
1284	ar(j,k,i,n) = work_ffty(m)
1285	m = m + 1
1286	ENDDO
1287	ENDDO
1288
1289	ENDDO
1290
1291	ENDIF
1292
1293	ENDDO
1294
1295	DEALLOCATE( tri )
1296
1297	CALL cpu_log( log_point_s(39), 'fft_y_1d + tridia', 'stop' )
1298
1299	END SUBROUTINE ffty_tri_ffty
1300
1301
1302	SUBROUTINE tridia_1dd( ddx2, ddy2, nx, ny, j, ar, tri )
1303
1304	!------------------------------------------------------------------------------!
1305	! Solves the linear system of equations for a 1d-decomposition along x (see
1306	! tridia)
1307	!
1308	! Attention: when using the intel compilers older than 12.0, array tri must
1309	! be passed as an argument to the contained subroutines. Otherwise
1310	! addres faults will occur. This feature can be activated with
1311	! cpp-switch __intel11
1312	! On NEC, tri should not be passed (except for routine substi_1dd)
1313	! because this causes very bad performance.
1314	!------------------------------------------------------------------------------!
1315
1316	USE arrays_3d
1317	USE control_parameters
1318
1319	USE pegrid
1320
1321	IMPLICIT NONE
1322
1323	INTEGER :: i, j, k, nnyh, nx, ny, omp_get_thread_num, tn
1324
1325	REAL :: ddx2, ddy2
1326
1327	REAL, DIMENSION(0:nx,1:nz) :: ar
1328	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
1329
1330
1331	nnyh = ( ny + 1 ) / 2
1332
1333	!
1334	!-- Define constant elements of the tridiagonal matrix.
1335	!-- The compiler on SX6 does loop exchange. If 0:nx is a high power of 2,
1336	!-- the exchanged loops create bank conflicts. The following directive
1337	!-- prohibits loop exchange and the loops perform much better.
1338	! tn = omp_get_thread_num()
1339	! WRITE( 120+tn, * ) '+++ id=',myid,' nx=',nx,' thread=', omp_get_thread_num()
1340	! CALL local_flush( 120+tn )
1341	!CDIR NOLOOPCHG
1342	DO k = 0, nz-1
1343	DO i = 0,nx
1344	tri(2,i,k) = ddzu_pres(k+1) * ddzw(k+1)
1345	tri(3,i,k) = ddzu_pres(k+2) * ddzw(k+1)
1346	ENDDO
1347	ENDDO
1348	! WRITE( 120+tn, * ) '+++ id=',myid,' end of first tridia loop thread=', omp_get_thread_num()
1349	! CALL local_flush( 120+tn )
1350
1351	IF ( j <= nnyh ) THEN
1352	#if defined( __intel11 )
1353	CALL maketri_1dd( j, tri )
1354	#else
1355	CALL maketri_1dd( j )
1356	#endif
1357	ELSE
1358	#if defined( __intel11 )
1359	CALL maketri_1dd( ny+1-j, tri )
1360	#else
1361	CALL maketri_1dd( ny+1-j )
1362	#endif
1363	ENDIF
1364	#if defined( __intel11 )
1365	CALL split_1dd( tri )
1366	#else
1367	CALL split_1dd
1368	#endif
1369	CALL substi_1dd( ar, tri )
1370
1371	CONTAINS
1372
1373	#if defined( __intel11 )
1374	SUBROUTINE maketri_1dd( j, tri )
1375	#else
1376	SUBROUTINE maketri_1dd( j )
1377	#endif
1378
1379	!------------------------------------------------------------------------------!
1380	! computes the i- and j-dependent component of the matrix
1381	!------------------------------------------------------------------------------!
1382
1383	USE constants
1384
1385	IMPLICIT NONE
1386
1387	INTEGER :: i, j, k, nnxh
1388	REAL :: a, c
1389
1390	REAL, DIMENSION(0:nx) :: l
1391
1392	#if defined( __intel11 )
1393	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
1394	#endif
1395
1396
1397	nnxh = ( nx + 1 ) / 2
1398	!
1399	!-- Provide the tridiagonal matrix for solution of the Poisson equation in
1400	!-- Fourier space. The coefficients are computed following the method of
1401	!-- Schmidt et al. (DFVLR-Mitteilung 84-15), which departs from Stephan
1402	!-- Siano's original version by discretizing the Poisson equation,
1403	!-- before it is Fourier-transformed
1404	DO i = 0, nx
1405	IF ( i >= 0 .AND. i <= nnxh ) THEN
1406	l(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * i ) / &
1407	REAL( nx+1 ) ) ) * ddx2 + &
1408	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
1409	REAL( ny+1 ) ) ) * ddy2
1410	ELSE
1411	l(i) = 2.0 * ( 1.0 - COS( ( 2.0 * pi * ( nx+1-i ) ) / &
1412	REAL( nx+1 ) ) ) * ddx2 + &
1413	2.0 * ( 1.0 - COS( ( 2.0 * pi * j ) / &
1414	REAL( ny+1 ) ) ) * ddy2
1415	ENDIF
1416	ENDDO
1417
1418	DO k = 0, nz-1
1419	DO i = 0, nx
1420	a = -1.0 * ddzu_pres(k+2) * ddzw(k+1)
1421	c = -1.0 * ddzu_pres(k+1) * ddzw(k+1)
1422	tri(1,i,k) = a + c - l(i)
1423	ENDDO
1424	ENDDO
1425	IF ( ibc_p_b == 1 .OR. ibc_p_b == 2 ) THEN
1426	DO i = 0, nx
1427	tri(1,i,0) = tri(1,i,0) + tri(2,i,0)
1428	ENDDO
1429	ENDIF
1430	IF ( ibc_p_t == 1 ) THEN
1431	DO i = 0, nx
1432	tri(1,i,nz-1) = tri(1,i,nz-1) + tri(3,i,nz-1)
1433	ENDDO
1434	ENDIF
1435
1436	END SUBROUTINE maketri_1dd
1437
1438
1439	#if defined( __intel11 )
1440	SUBROUTINE split_1dd( tri )
1441	#else
1442	SUBROUTINE split_1dd
1443	#endif
1444
1445	!------------------------------------------------------------------------------!
1446	! Splitting of the tridiagonal matrix (Thomas algorithm)
1447	!------------------------------------------------------------------------------!
1448
1449	IMPLICIT NONE
1450
1451	INTEGER :: i, k
1452
1453	#if defined( __intel11 )
1454	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
1455	#endif
1456
1457
1458	!
1459	!-- Splitting
1460	DO i = 0, nx
1461	tri(4,i,0) = tri(1,i,0)
1462	ENDDO
1463	DO k = 1, nz-1
1464	DO i = 0, nx
1465	tri(5,i,k) = tri(2,i,k) / tri(4,i,k-1)
1466	tri(4,i,k) = tri(1,i,k) - tri(3,i,k-1) * tri(5,i,k)
1467	ENDDO
1468	ENDDO
1469
1470	END SUBROUTINE split_1dd
1471
1472
1473	SUBROUTINE substi_1dd( ar, tri )
1474
1475	!------------------------------------------------------------------------------!
1476	! Substitution (Forward and Backward) (Thomas algorithm)
1477	!------------------------------------------------------------------------------!
1478
1479	IMPLICIT NONE
1480
1481	INTEGER :: i, k
1482
1483	REAL, DIMENSION(0:nx,nz) :: ar
1484	REAL, DIMENSION(0:nx,0:nz-1) :: ar1
1485	REAL, DIMENSION(5,0:nx,0:nz-1) :: tri
1486
1487	!
1488	!-- Forward substitution
1489	DO i = 0, nx
1490	ar1(i,0) = ar(i,1)
1491	ENDDO
1492	DO k = 1, nz-1
1493	DO i = 0, nx
1494	ar1(i,k) = ar(i,k+1) - tri(5,i,k) * ar1(i,k-1)
1495	ENDDO
1496	ENDDO
1497
1498	!
1499	!-- Backward substitution
1500	!-- Note, the add of 1.0E-20 in the denominator is due to avoid divisions
1501	!-- by zero appearing if the pressure bc is set to neumann at the top of
1502	!-- the model domain.
1503	DO i = 0, nx
1504	ar(i,nz) = ar1(i,nz-1) / ( tri(4,i,nz-1) + 1.0E-20 )
1505	ENDDO
1506	DO k = nz-2, 0, -1
1507	DO i = 0, nx
1508	ar(i,k+1) = ( ar1(i,k) - tri(3,i,k) * ar(i,k+2) ) &
1509	/ tri(4,i,k)
1510	ENDDO
1511	ENDDO
1512
1513	!
1514	!-- Indices i=0, j=0 correspond to horizontally averaged pressure.
1515	!-- The respective values of ar should be zero at all k-levels if
1516	!-- acceleration of horizontally averaged vertical velocity is zero.
1517	IF ( ibc_p_b == 1 .AND. ibc_p_t == 1 ) THEN
1518	IF ( j == 0 ) THEN
1519	DO k = 1, nz
1520	ar(0,k) = 0.0
1521	ENDDO
1522	ENDIF
1523	ENDIF
1524
1525	END SUBROUTINE substi_1dd
1526
1527	END SUBROUTINE tridia_1dd
1528
1529	#endif
1530	#endif
1531	END MODULE poisfft_mod

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: palm/trunk/SOURCE/poisfft.f90 @ 1106

Download in other formats: