source: palm/trunk/SOURCE/pres.f90 @ 1221

Last change on this file since 1221 was 1221, checked in by raasch, 11 years ago

New:


openACC porting of reduction operations
additional 3D-flag arrays for replacing the 2D-index arrays nzb_s_inner and nzb_diff_s_inner
(flow_statistics, init_grid, init_3d_model, modules, palm, pres, time_integration)

Changed:


for PGI/openACC performance reasons set default compile options for openACC to "-ta=nocache",
and set environment variable PGI_ACC_SYNCHRONOUS=1
(MAKE.inc.pgi.openacc, palm_simple_run)

wall_flags_0 changed to 32bit INTEGER, additional array wall_flags_00 introduced to hold
bits 32-63
(advec_ws, init_grid, modules, palm)

Errors:


dummy argument tri in 1d-routines replaced by tri_for_1d because of name
conflict with arry tri in module arrays_3d
(tridia_solver)

  • Property svn:keywords set to Id
File size: 24.0 KB
Line 
1 SUBROUTINE pres
2
3!--------------------------------------------------------------------------------!
4! This file is part of PALM.
5!
6! PALM is free software: you can redistribute it and/or modify it under the terms
7! of the GNU General Public License as published by the Free Software Foundation,
8! either version 3 of the License, or (at your option) any later version.
9!
10! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12! A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
13!
14! You should have received a copy of the GNU General Public License along with
15! PALM. If not, see <http://www.gnu.org/licenses/>.
16!
17! Copyright 1997-2012  Leibniz University Hannover
18!--------------------------------------------------------------------------------!
19!
20! Current revisions:
21! ------------------
22! openACC porting of reduction operations, loops for calculating d are
23! using the rflags_s_inner multiply flag instead of the nzb_s_inner loop index
24!
25! Former revisions:
26! -----------------
27! $Id: pres.f90 1221 2013-09-10 08:59:13Z raasch $
28!
29! 1212 2013-08-15 08:46:27Z raasch
30! call of poisfft_hybrid removed
31!
32! 1117 2013-03-27 11:15:36Z suehring
33! Bugfix in OpenMP parallelization.
34!
35! 1113 2013-03-10 02:48:14Z raasch
36! GPU-porting of several loops, some loops rearranged
37!
38! 1111 2013-03-08 23:54:10Z
39! openACC statements added,
40! ibc_p_b = 2 removed
41!
42! 1092 2013-02-02 11:24:22Z raasch
43! unused variables removed
44!
45! 1036 2012-10-22 13:43:42Z raasch
46! code put under GPL (PALM 3.9)
47!
48! 1003 2012-09-14 14:35:53Z raasch
49! adjustment of array tend for cases with unequal subdomain sizes removed
50!
51! 778 2011-11-07 14:18:25Z fricke
52! New allocation of tend when multigrid is used and the collected field on PE0
53! has more grid points than the subdomain of an PE.
54!
55! 719 2011-04-06 13:05:23Z gryschka
56! Bugfix in volume flow control for double cyclic boundary conditions
57!
58! 709 2011-03-30 09:31:40Z raasch
59! formatting adjustments
60!
61! 707 2011-03-29 11:39:40Z raasch
62! Calculation of weighted average of p is now handled in the same way
63! regardless of the number of ghost layers (advection scheme),
64! multigrid and sor method are using p_loc for iterative advancements of
65! pressure,
66! localsum calculation modified for proper OpenMP reduction,
67! bc_lr/ns replaced by bc_lr/ns_cyc
68!
69! 693 2011-03-08 09:..:..Z raasch
70! bugfix: weighting coefficient added to ibm branch
71!
72! 680 2011-02-04 23:16:06Z gryschka
73! bugfix: collective_wait
74!
75! 675 2011-01-19 10:56:55Z suehring
76! Removed bugfix while copying tend.
77!
78! 673 2011-01-18 16:19:48Z suehring
79! Weighting coefficients added for right computation of the pressure during
80! Runge-Kutta substeps.
81!
82! 667 2010-12-23 12:06:00Z suehring/gryschka
83! New allocation of tend when ws-scheme and multigrid is used. This is due to
84! reasons of perforance of the data_exchange. The same is done with p after
85! poismg is called.
86! nxl-1, nxr+1, nys-1, nyn+1 replaced by nxlg, nxrg, nysg, nyng when no
87! multigrid is used. Calls of exchange_horiz are modified.
88! bugfix: After pressure correction no volume flow correction in case of
89! non-cyclic boundary conditions
90! (has to be done only before pressure correction)
91! Call of SOR routine is referenced with ddzu_pres.
92!
93! 622 2010-12-10 08:08:13Z raasch
94! optional barriers included in order to speed up collective operations
95!
96! 151 2008-03-07 13:42:18Z raasch
97! Bugfix in volume flow control for non-cyclic boundary conditions
98!
99! 106 2007-08-16 14:30:26Z raasch
100! Volume flow conservation added for the remaining three outflow boundaries
101!
102! 85 2007-05-11 09:35:14Z raasch
103! Division through dt_3d replaced by multiplication of the inverse.
104! For performance optimisation, this is done in the loop calculating the
105! divergence instead of using a seperate loop.
106!
107! 75 2007-03-22 09:54:05Z raasch
108! Volume flow control for non-cyclic boundary conditions added (currently only
109! for the north boundary!!), 2nd+3rd argument removed from exchange horiz,
110! mean vertical velocity is removed in case of Neumann boundary conditions
111! both at the bottom and the top
112!
113! RCS Log replace by Id keyword, revision history cleaned up
114!
115! Revision 1.25  2006/04/26 13:26:12  raasch
116! OpenMP optimization (+localsum, threadsum)
117!
118! Revision 1.1  1997/07/24 11:24:44  raasch
119! Initial revision
120!
121!
122! Description:
123! ------------
124! Compute the divergence of the provisional velocity field. Solve the Poisson
125! equation for the perturbation pressure. Compute the final velocities using
126! this perturbation pressure. Compute the remaining divergence.
127!------------------------------------------------------------------------------!
128
129    USE arrays_3d
130    USE constants
131    USE control_parameters
132    USE cpulog
133    USE grid_variables
134    USE indices
135    USE interfaces
136    USE pegrid
137    USE poisfft_mod
138    USE statistics
139
140    IMPLICIT NONE
141
142    INTEGER ::  i, j, k
143
144    REAL    ::  ddt_3d, localsum, threadsum, d_weight_pres
145
146    REAL, DIMENSION(1:2) ::  volume_flow_l, volume_flow_offset
147    REAL, DIMENSION(1:nzt) ::  w_l, w_l_l
148
149
150    CALL cpu_log( log_point(8), 'pres', 'start' )
151
152
153    ddt_3d = 1.0 / dt_3d
154    d_weight_pres = 1.0 / weight_pres(intermediate_timestep_count)
155
156!
157!-- Multigrid method expects array d to have one ghost layer.
158!--
159    IF ( psolver == 'multigrid' )  THEN
160     
161       DEALLOCATE( d )
162       ALLOCATE( d(nzb:nzt+1,nys-1:nyn+1,nxl-1:nxr+1) ) 
163
164!
165!--    Since p is later used to hold the weighted average of the substeps, it
166!--    cannot be used in the iterative solver. Therefore, its initial value is
167!--    stored on p_loc, which is then iteratively advanced in every substep.
168       IF ( intermediate_timestep_count == 1 )  THEN
169          DO  i = nxl-1, nxr+1
170             DO  j = nys-1, nyn+1
171                DO  k = nzb, nzt+1
172                   p_loc(k,j,i) = p(k,j,i)
173                ENDDO
174             ENDDO
175          ENDDO
176       ENDIF
177       
178    ELSEIF ( psolver == 'sor'  .AND.  intermediate_timestep_count == 1 )  THEN
179
180!
181!--    Since p is later used to hold the weighted average of the substeps, it
182!--    cannot be used in the iterative solver. Therefore, its initial value is
183!--    stored on p_loc, which is then iteratively advanced in every substep.
184       p_loc = p
185
186    ENDIF
187
188!
189!-- Conserve the volume flow at the outflow in case of non-cyclic lateral
190!-- boundary conditions
191!-- WARNING: so far, this conservation does not work at the left/south
192!--          boundary if the topography at the inflow differs from that at the
193!--          outflow! For this case, volume_flow_area needs adjustment!
194!
195!-- Left/right
196    IF ( conserve_volume_flow  .AND.  ( outflow_l .OR. outflow_r ) )  THEN
197
198       volume_flow(1)   = 0.0
199       volume_flow_l(1) = 0.0
200
201       IF ( outflow_l )  THEN
202          i = 0
203       ELSEIF ( outflow_r )  THEN
204          i = nx+1
205       ENDIF
206
207       DO  j = nys, nyn
208!
209!--       Sum up the volume flow through the south/north boundary
210          DO  k = nzb_2d(j,i)+1, nzt
211             volume_flow_l(1) = volume_flow_l(1) + u(k,j,i) * dzw(k)
212          ENDDO
213       ENDDO
214
215#if defined( __parallel )   
216       IF ( collective_wait )  CALL MPI_BARRIER( comm1dy, ierr )
217       CALL MPI_ALLREDUCE( volume_flow_l(1), volume_flow(1), 1, MPI_REAL, &
218                           MPI_SUM, comm1dy, ierr )   
219#else
220       volume_flow = volume_flow_l 
221#endif
222       volume_flow_offset(1) = ( volume_flow_initial(1) - volume_flow(1) ) &
223                               / volume_flow_area(1)
224
225       DO  j = nysg, nyng
226          DO  k = nzb_2d(j,i)+1, nzt
227             u(k,j,i) = u(k,j,i) + volume_flow_offset(1)
228          ENDDO
229       ENDDO
230
231    ENDIF
232
233!
234!-- South/north
235    IF ( conserve_volume_flow  .AND.  ( outflow_n .OR. outflow_s ) )  THEN
236
237       volume_flow(2)   = 0.0
238       volume_flow_l(2) = 0.0
239
240       IF ( outflow_s )  THEN
241          j = 0
242       ELSEIF ( outflow_n )  THEN
243          j = ny+1
244       ENDIF
245
246       DO  i = nxl, nxr
247!
248!--       Sum up the volume flow through the south/north boundary
249          DO  k = nzb_2d(j,i)+1, nzt
250             volume_flow_l(2) = volume_flow_l(2) + v(k,j,i) * dzw(k)
251          ENDDO
252       ENDDO
253
254#if defined( __parallel )   
255       IF ( collective_wait )  CALL MPI_BARRIER( comm1dx, ierr )
256       CALL MPI_ALLREDUCE( volume_flow_l(2), volume_flow(2), 1, MPI_REAL, &
257                           MPI_SUM, comm1dx, ierr )   
258#else
259       volume_flow = volume_flow_l 
260#endif
261       volume_flow_offset(2) = ( volume_flow_initial(2) - volume_flow(2) )    &
262                               / volume_flow_area(2)
263
264       DO  i = nxlg, nxrg
265          DO  k = nzb_v_inner(j,i)+1, nzt
266             v(k,j,i) = v(k,j,i) + volume_flow_offset(2)
267          ENDDO
268       ENDDO
269
270    ENDIF
271
272!
273!-- Remove mean vertical velocity
274    IF ( ibc_p_b == 1  .AND.  ibc_p_t == 1 )  THEN
275       IF ( simulated_time > 0.0 )  THEN ! otherwise nzb_w_inner not yet known
276          w_l = 0.0;  w_l_l = 0.0
277          DO  i = nxl, nxr
278             DO  j = nys, nyn
279                DO  k = nzb_w_inner(j,i)+1, nzt
280                   w_l_l(k) = w_l_l(k) + w(k,j,i)
281                ENDDO
282             ENDDO
283          ENDDO
284#if defined( __parallel )   
285          IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
286          CALL MPI_ALLREDUCE( w_l_l(1), w_l(1), nzt, MPI_REAL, MPI_SUM, &
287                              comm2d, ierr )
288#else
289          w_l = w_l_l 
290#endif
291          DO  k = 1, nzt
292             w_l(k) = w_l(k) / ngp_2dh_outer(k,0)
293          ENDDO
294          DO  i = nxlg, nxrg
295             DO  j = nysg, nyng
296                DO  k = nzb_w_inner(j,i)+1, nzt
297                   w(k,j,i) = w(k,j,i) - w_l(k)
298                ENDDO
299             ENDDO
300          ENDDO
301       ENDIF
302    ENDIF
303
304!
305!-- Compute the divergence of the provisional velocity field.
306    CALL cpu_log( log_point_s(1), 'divergence', 'start' )
307
308    IF ( psolver == 'multigrid' )  THEN
309       !$OMP PARALLEL DO SCHEDULE( STATIC )
310       DO  i = nxl-1, nxr+1
311          DO  j = nys-1, nyn+1
312             DO  k = nzb, nzt+1
313                d(k,j,i) = 0.0
314             ENDDO
315          ENDDO
316       ENDDO
317    ELSE
318       !$OMP PARALLEL DO SCHEDULE( STATIC )
319       !$acc kernels present( d )
320       !$acc loop
321       DO  i = nxl, nxr
322          DO  j = nys, nyn
323             !$acc loop vector(32)
324             DO  k = nzb+1, nzt
325                d(k,j,i) = 0.0
326             ENDDO
327          ENDDO
328       ENDDO
329       !$acc end kernels
330    ENDIF
331
332    localsum  = 0.0
333    threadsum = 0.0
334
335#if defined( __ibm )
336    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
337    !$OMP DO SCHEDULE( STATIC )
338    DO  i = nxl, nxr
339       DO  j = nys, nyn
340          DO  k = nzb_s_inner(j,i)+1, nzt
341             d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx + &
342                          ( v(k,j+1,i) - v(k,j,i) ) * ddy + &
343                          ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) ) * ddt_3d      &
344                                                                * d_weight_pres 
345          ENDDO
346!
347!--       Compute possible PE-sum of divergences for flow_statistics
348          DO  k = nzb_s_inner(j,i)+1, nzt
349             threadsum = threadsum + ABS( d(k,j,i) )
350          ENDDO
351
352       ENDDO
353    ENDDO
354
355    localsum = localsum + threadsum * dt_3d * &
356                          weight_pres(intermediate_timestep_count)
357
358    !$OMP END PARALLEL
359#else
360
361    !$OMP PARALLEL PRIVATE (i,j,k)
362    !$OMP DO SCHEDULE( STATIC )
363    !$acc kernels present( d, ddzw, rflags_s_inner, u, v, w )
364    !$acc loop collapse( 3 )
365    DO  i = nxl, nxr
366       DO  j = nys, nyn
367          DO  k = 1, nzt
368             d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx +             &
369                        ( v(k,j+1,i) - v(k,j,i) ) * ddy +               &
370                        ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) ) * ddt_3d  &
371                        * d_weight_pres * rflags_s_inner(k,j,i)
372          ENDDO
373       ENDDO
374    ENDDO
375    !$acc end kernels
376    !$OMP END PARALLEL
377
378!
379!-- Compute possible PE-sum of divergences for flow_statistics
380    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
381    !$OMP DO SCHEDULE( STATIC )
382    !$acc parallel loop collapse(3) present( d ) reduction(+:threadsum)
383    DO  i = nxl, nxr
384       DO  j = nys, nyn
385          DO  k = nzb+1, nzt
386             threadsum = threadsum + ABS( d(k,j,i) )
387          ENDDO
388       ENDDO
389    ENDDO
390    !$acc end parallel
391    localsum = localsum + threadsum * dt_3d * &
392                          weight_pres(intermediate_timestep_count)
393    !$OMP END PARALLEL
394#endif
395
396!
397!-- For completeness, set the divergence sum of all statistic regions to those
398!-- of the total domain
399    sums_divold_l(0:statistic_regions) = localsum
400
401    CALL cpu_log( log_point_s(1), 'divergence', 'stop' )
402
403!
404!-- Compute the pressure perturbation solving the Poisson equation
405    IF ( psolver(1:7) == 'poisfft' )  THEN
406
407!
408!--    Solve Poisson equation via FFT and solution of tridiagonal matrices
409       IF ( psolver == 'poisfft' )  THEN
410
411          CALL poisfft( d, tend )
412
413       ENDIF
414
415!
416!--    Store computed perturbation pressure and set boundary condition in
417!--    z-direction
418       !$OMP PARALLEL DO
419       !$acc kernels present( d, tend )
420       !$acc loop
421       DO  i = nxl, nxr
422          DO  j = nys, nyn
423             !$acc loop vector( 32 )
424             DO  k = nzb+1, nzt
425                tend(k,j,i) = d(k,j,i)
426             ENDDO
427          ENDDO
428       ENDDO
429       !$acc end kernels
430
431!
432!--    Bottom boundary:
433!--    This condition is only required for internal output. The pressure
434!--    gradient (dp(nzb+1)-dp(nzb))/dz is not used anywhere else.
435       IF ( ibc_p_b == 1 )  THEN
436!
437!--       Neumann (dp/dz = 0)
438          !$OMP PARALLEL DO
439          !$acc kernels present( nzb_s_inner, tend )
440          DO  i = nxlg, nxrg
441             DO  j = nysg, nyng
442                tend(nzb_s_inner(j,i),j,i) = tend(nzb_s_inner(j,i)+1,j,i)
443             ENDDO
444          ENDDO
445          !$acc end kernels
446
447       ELSE
448!
449!--       Dirichlet
450          !$OMP PARALLEL DO
451          !$acc kernels present( tend )
452          DO  i = nxlg, nxrg
453             DO  j = nysg, nyng
454                tend(nzb_s_inner(j,i),j,i) = 0.0
455             ENDDO
456          ENDDO
457          !$acc end kernels
458
459       ENDIF
460
461!
462!--    Top boundary
463       IF ( ibc_p_t == 1 )  THEN
464!
465!--       Neumann
466          !$OMP PARALLEL DO
467          !$acc kernels present( tend )
468          DO  i = nxlg, nxrg
469             DO  j = nysg, nyng
470                tend(nzt+1,j,i) = tend(nzt,j,i)
471             ENDDO
472          ENDDO
473          !$acc end kernels
474
475       ELSE
476!
477!--       Dirichlet
478          !$OMP PARALLEL DO
479          !$acc kernels present( tend )
480          DO  i = nxlg, nxrg
481             DO  j = nysg, nyng
482                tend(nzt+1,j,i) = 0.0
483             ENDDO
484          ENDDO
485          !$acc end kernels
486
487       ENDIF
488
489!
490!--    Exchange boundaries for p
491       IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
492          on_device = .TRUE.         ! to be removed after complete porting
493       ELSE                          ! of ghost point exchange
494          !$acc update host( tend )
495       ENDIF
496       CALL exchange_horiz( tend, nbgp )
497       IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
498          on_device = .FALSE.        ! to be removed after complete porting
499       ELSE                          ! of ghost point exchange
500          !$acc update device( tend )
501       ENDIF
502     
503    ELSEIF ( psolver == 'sor' )  THEN
504
505!
506!--    Solve Poisson equation for perturbation pressure using SOR-Red/Black
507!--    scheme
508       CALL sor( d, ddzu_pres, ddzw, p_loc )
509       tend = p_loc
510
511    ELSEIF ( psolver == 'multigrid' )  THEN
512
513!
514!--    Solve Poisson equation for perturbation pressure using Multigrid scheme,
515!--    array tend is used to store the residuals, logical exchange_mg is used
516!--    to discern data exchange in multigrid ( 1 ghostpoint ) and normal grid
517!--    ( nbgp ghost points ).
518
519!--    If the number of grid points of the gathered grid, which is collected
520!--    on PE0, is larger than the number of grid points of an PE, than array
521!--    tend will be enlarged.
522       IF ( gathered_size > subdomain_size )  THEN
523          DEALLOCATE( tend )
524          ALLOCATE( tend(nzb:nzt_mg(mg_switch_to_pe0_level)+1,nys_mg(          &
525                    mg_switch_to_pe0_level)-1:nyn_mg(mg_switch_to_pe0_level)+1,&
526                    nxl_mg(mg_switch_to_pe0_level)-1:nxr_mg(                   &
527                    mg_switch_to_pe0_level)+1) )
528       ENDIF
529
530       CALL poismg( tend )
531
532       IF ( gathered_size > subdomain_size )  THEN
533          DEALLOCATE( tend )
534          ALLOCATE( tend(nzb:nzt+1,nysg:nyng,nxlg:nxrg) )
535       ENDIF
536
537!
538!--    Restore perturbation pressure on tend because this array is used
539!--    further below to correct the velocity fields
540       DO  i = nxl-1, nxr+1
541          DO  j = nys-1, nyn+1
542             DO  k = nzb, nzt+1
543                tend(k,j,i) = p_loc(k,j,i)
544             ENDDO
545          ENDDO
546       ENDDO
547
548    ENDIF
549
550!
551!-- Store perturbation pressure on array p, used for pressure data output.
552!-- Ghost layers are added in the output routines (except sor-method: see below)
553    IF ( intermediate_timestep_count == 1 )  THEN
554       !$OMP PARALLEL PRIVATE (i,j,k)
555       !$OMP DO
556       !$acc kernels present( p, tend, weight_substep )
557       !$acc loop
558       DO  i = nxl-1, nxr+1
559          DO  j = nys-1, nyn+1
560             !$acc loop vector( 32 )
561             DO  k = nzb, nzt+1
562                p(k,j,i) = tend(k,j,i) * &
563                           weight_substep(intermediate_timestep_count)
564             ENDDO
565          ENDDO
566       ENDDO
567       !$acc end kernels
568       !$OMP END PARALLEL
569
570    ELSE 
571       !$OMP PARALLEL PRIVATE (i,j,k)
572       !$OMP DO
573       !$acc kernels present( p, tend, weight_substep )
574       !$acc loop
575       DO  i = nxl-1, nxr+1
576          DO  j = nys-1, nyn+1
577             !$acc loop vector( 32 )
578             DO  k = nzb, nzt+1
579                p(k,j,i) = p(k,j,i) + tend(k,j,i) * &
580                           weight_substep(intermediate_timestep_count)
581             ENDDO
582          ENDDO
583       ENDDO
584       !$acc end kernels
585       !$OMP END PARALLEL
586
587    ENDIF
588       
589!
590!-- SOR-method needs ghost layers for the next timestep
591    IF ( psolver == 'sor' )  CALL exchange_horiz( p, nbgp )
592
593!
594!-- Correction of the provisional velocities with the current perturbation
595!-- pressure just computed
596    IF ( conserve_volume_flow  .AND.  ( bc_lr_cyc .OR. bc_ns_cyc ) )  THEN
597       volume_flow_l(1) = 0.0
598       volume_flow_l(2) = 0.0
599    ENDIF
600
601    !$OMP PARALLEL PRIVATE (i,j,k)
602    !$OMP DO
603    !$acc kernels present( ddzu, nzb_u_inner, nzb_v_inner, nzb_w_inner, tend, u, v, w, weight_pres )
604    !$acc loop
605    DO  i = nxl, nxr   
606       DO  j = nys, nyn
607          !$acc loop vector( 32 )
608          DO  k = 1, nzt
609             IF ( k > nzb_w_inner(j,i) )  THEN
610                w(k,j,i) = w(k,j,i) - dt_3d *                                 &
611                           ( tend(k+1,j,i) - tend(k,j,i) ) * ddzu(k+1) *      &
612                           weight_pres(intermediate_timestep_count)
613             ENDIF
614          ENDDO
615          !$acc loop vector( 32 )
616          DO  k = 1, nzt
617             IF ( k > nzb_u_inner(j,i) )  THEN
618                u(k,j,i) = u(k,j,i) - dt_3d *                                 &
619                           ( tend(k,j,i) - tend(k,j,i-1) ) * ddx *            &
620                           weight_pres(intermediate_timestep_count)
621             ENDIF
622          ENDDO
623          !$acc loop vector( 32 )
624          DO  k = 1, nzt
625             IF ( k > nzb_v_inner(j,i) )  THEN
626                v(k,j,i) = v(k,j,i) - dt_3d *                                 &
627                           ( tend(k,j,i) - tend(k,j-1,i) ) * ddy *            &
628                           weight_pres(intermediate_timestep_count)
629             ENDIF
630          ENDDO                                                         
631
632       ENDDO
633    ENDDO
634    !$acc end kernels
635    !$OMP END PARALLEL
636
637!
638!-- Sum up the volume flow through the right and north boundary
639    IF ( conserve_volume_flow  .AND.  bc_lr_cyc  .AND.  bc_ns_cyc  .AND.  &
640         nxr == nx )  THEN
641
642       !$OMP PARALLEL PRIVATE (j,k)
643       !$OMP DO
644       DO  j = nys, nyn
645          !$OMP CRITICAL
646          DO  k = nzb_2d(j,nx) + 1, nzt
647             volume_flow_l(1) = volume_flow_l(1) + u(k,j,nx) * dzw(k)
648          ENDDO
649          !$OMP END CRITICAL
650       ENDDO
651       !$OMP END PARALLEL
652
653    ENDIF
654
655    IF ( conserve_volume_flow  .AND.  bc_ns_cyc  .AND.  bc_lr_cyc  .AND.  &
656         nyn == ny )  THEN
657
658       !$OMP PARALLEL PRIVATE (i,k)
659       !$OMP DO
660       DO  i = nxl, nxr
661          !$OMP CRITICAL
662          DO  k = nzb_2d(ny,i) + 1, nzt
663             volume_flow_l(2) = volume_flow_l(2) + v(k,ny,i) * dzw(k)
664           ENDDO
665          !$OMP END CRITICAL
666       ENDDO
667       !$OMP END PARALLEL
668
669    ENDIF
670   
671!
672!-- Conserve the volume flow
673    IF ( conserve_volume_flow  .AND.  ( bc_lr_cyc  .AND.  bc_ns_cyc ) )  THEN
674
675#if defined( __parallel )   
676       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
677       CALL MPI_ALLREDUCE( volume_flow_l(1), volume_flow(1), 2, MPI_REAL, &
678                           MPI_SUM, comm2d, ierr ) 
679#else
680       volume_flow = volume_flow_l 
681#endif   
682
683       volume_flow_offset = ( volume_flow_initial - volume_flow ) / &
684                            volume_flow_area
685
686       !$OMP PARALLEL PRIVATE (i,j,k)
687       !$OMP DO
688       DO  i = nxl, nxr
689          DO  j = nys, nyn
690             DO  k = nzb_u_inner(j,i) + 1, nzt
691                u(k,j,i) = u(k,j,i) + volume_flow_offset(1)
692             ENDDO
693             DO k = nzb_v_inner(j,i) + 1, nzt
694                v(k,j,i) = v(k,j,i) + volume_flow_offset(2)
695             ENDDO
696          ENDDO
697       ENDDO
698
699       !$OMP END PARALLEL
700
701    ENDIF
702
703!
704!-- Exchange of boundaries for the velocities
705    IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
706       on_device = .TRUE.         ! to be removed after complete porting
707    ELSE                          ! of ghost point exchange
708       !$acc update host( u, v, w )
709    ENDIF
710    CALL exchange_horiz( u, nbgp )
711    CALL exchange_horiz( v, nbgp )
712    CALL exchange_horiz( w, nbgp )
713    IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
714       on_device = .FALSE.        ! to be removed after complete porting
715    ELSE                          ! of ghost point exchange
716       !$acc update device( u, v, w )
717    ENDIF
718
719!
720!-- Compute the divergence of the corrected velocity field,
721!-- a possible PE-sum is computed in flow_statistics
722    CALL cpu_log( log_point_s(1), 'divergence', 'start' )
723    sums_divnew_l = 0.0
724
725!
726!-- d must be reset to zero because it can contain nonzero values below the
727!-- topography
728    IF ( topography /= 'flat' )  d = 0.0
729
730    localsum  = 0.0
731    threadsum = 0.0
732
733    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
734    !$OMP DO SCHEDULE( STATIC )
735#if defined( __ibm )
736    DO  i = nxl, nxr
737       DO  j = nys, nyn
738          DO  k = nzb_s_inner(j,i)+1, nzt
739             d(k,j,i) = ( u(k,j,i+1) - u(k,j,i) ) * ddx + &
740                        ( v(k,j+1,i) - v(k,j,i) ) * ddy + &
741                        ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k)
742          ENDDO
743          DO  k = nzb+1, nzt
744             threadsum = threadsum + ABS( d(k,j,i) )
745          ENDDO
746       ENDDO
747    ENDDO
748#else
749    !$acc kernels present( d, ddzw, rflags_s_inner, u, v, w )
750    !$acc loop collapse( 3 )
751    DO  i = nxl, nxr
752       DO  j = nys, nyn
753          DO  k = 1, nzt
754                d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx +   &
755                             ( v(k,j+1,i) - v(k,j,i) ) * ddy +   &
756                             ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) &
757                           ) * rflags_s_inner(k,j,i)
758          ENDDO
759       ENDDO
760    ENDDO
761    !$acc end kernels
762!
763!-- Compute possible PE-sum of divergences for flow_statistics
764    !$acc parallel loop collapse(3) present( d ) reduction(+:threadsum)
765    DO  i = nxl, nxr
766       DO  j = nys, nyn
767          DO  k = nzb+1, nzt
768             threadsum = threadsum + ABS( d(k,j,i) )
769          ENDDO
770       ENDDO
771    ENDDO
772    !$acc end parallel
773#endif
774
775    localsum = localsum + threadsum
776    !$OMP END PARALLEL
777
778!
779!-- For completeness, set the divergence sum of all statistic regions to those
780!-- of the total domain
781    sums_divnew_l(0:statistic_regions) = localsum
782
783    CALL cpu_log( log_point_s(1), 'divergence', 'stop' )
784
785    CALL cpu_log( log_point(8), 'pres', 'stop' )
786
787
788 END SUBROUTINE pres
Note: See TracBrowser for help on using the repository browser.