source: palm/trunk/SOURCE/pres.f90 @ 1257

Last change on this file since 1257 was 1257, checked in by raasch, 8 years ago

New:
---

openACC porting of timestep calculation
(modules, timestep, time_integration)

Changed:


openACC loop directives and vector clauses removed (because they do not give any performance improvement with PGI
compiler versions > 13.6)
(advec_ws, buoyancy, coriolis, diffusion_e, diffusion_s, diffusion_u, diffusion_v, diffusion_w, diffusivities, exchange_horiz, fft_xy, pres, production_e, transpose, tridia_solver, wall_fluxes)

openACC loop independent clauses added
(boundary_conds, prandtl_fluxes, pres)

openACC declare create statements moved after FORTRAN declaration statement
(diffusion_u, diffusion_v, diffusion_w, fft_xy, poisfft, production_e, tridia_solver)

openACC end parallel replaced by end parallel loop
(flow_statistics, pres)

openACC "kernels do" replaced by "kernels loop"
(prandtl_fluxes)

output format for theta* changed to avoid output of *
(run_control)

Errors:


bugfix for calculation of advective timestep (old version may cause wrong timesteps in case of
vertixcally stretched grids)
Attention: standard run-control output has changed!
(timestep)

  • Property svn:keywords set to Id
File size: 24.2 KB
Line 
1 SUBROUTINE pres
2
3!--------------------------------------------------------------------------------!
4! This file is part of PALM.
5!
6! PALM is free software: you can redistribute it and/or modify it under the terms
7! of the GNU General Public License as published by the Free Software Foundation,
8! either version 3 of the License, or (at your option) any later version.
9!
10! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12! A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
13!
14! You should have received a copy of the GNU General Public License along with
15! PALM. If not, see <http://www.gnu.org/licenses/>.
16!
17! Copyright 1997-2012  Leibniz University Hannover
18!--------------------------------------------------------------------------------!
19!
20! Current revisions:
21! ------------------
22! openacc loop and loop vector clauses removed, independent clauses added,
23! end parallel replaced by end parallel loop
24!
25! Former revisions:
26! -----------------
27! $Id: pres.f90 1257 2013-11-08 15:18:40Z raasch $
28!
29! 1221 2013-09-10 08:59:13Z raasch
30! openACC porting of reduction operations, loops for calculating d are
31! using the rflags_s_inner multiply flag instead of the nzb_s_inner loop index
32!
33! 1212 2013-08-15 08:46:27Z raasch
34! call of poisfft_hybrid removed
35!
36! 1117 2013-03-27 11:15:36Z suehring
37! Bugfix in OpenMP parallelization.
38!
39! 1113 2013-03-10 02:48:14Z raasch
40! GPU-porting of several loops, some loops rearranged
41!
42! 1111 2013-03-08 23:54:10Z
43! openACC statements added,
44! ibc_p_b = 2 removed
45!
46! 1092 2013-02-02 11:24:22Z raasch
47! unused variables removed
48!
49! 1036 2012-10-22 13:43:42Z raasch
50! code put under GPL (PALM 3.9)
51!
52! 1003 2012-09-14 14:35:53Z raasch
53! adjustment of array tend for cases with unequal subdomain sizes removed
54!
55! 778 2011-11-07 14:18:25Z fricke
56! New allocation of tend when multigrid is used and the collected field on PE0
57! has more grid points than the subdomain of an PE.
58!
59! 719 2011-04-06 13:05:23Z gryschka
60! Bugfix in volume flow control for double cyclic boundary conditions
61!
62! 709 2011-03-30 09:31:40Z raasch
63! formatting adjustments
64!
65! 707 2011-03-29 11:39:40Z raasch
66! Calculation of weighted average of p is now handled in the same way
67! regardless of the number of ghost layers (advection scheme),
68! multigrid and sor method are using p_loc for iterative advancements of
69! pressure,
70! localsum calculation modified for proper OpenMP reduction,
71! bc_lr/ns replaced by bc_lr/ns_cyc
72!
73! 693 2011-03-08 09:..:..Z raasch
74! bugfix: weighting coefficient added to ibm branch
75!
76! 680 2011-02-04 23:16:06Z gryschka
77! bugfix: collective_wait
78!
79! 675 2011-01-19 10:56:55Z suehring
80! Removed bugfix while copying tend.
81!
82! 673 2011-01-18 16:19:48Z suehring
83! Weighting coefficients added for right computation of the pressure during
84! Runge-Kutta substeps.
85!
86! 667 2010-12-23 12:06:00Z suehring/gryschka
87! New allocation of tend when ws-scheme and multigrid is used. This is due to
88! reasons of perforance of the data_exchange. The same is done with p after
89! poismg is called.
90! nxl-1, nxr+1, nys-1, nyn+1 replaced by nxlg, nxrg, nysg, nyng when no
91! multigrid is used. Calls of exchange_horiz are modified.
92! bugfix: After pressure correction no volume flow correction in case of
93! non-cyclic boundary conditions
94! (has to be done only before pressure correction)
95! Call of SOR routine is referenced with ddzu_pres.
96!
97! 622 2010-12-10 08:08:13Z raasch
98! optional barriers included in order to speed up collective operations
99!
100! 151 2008-03-07 13:42:18Z raasch
101! Bugfix in volume flow control for non-cyclic boundary conditions
102!
103! 106 2007-08-16 14:30:26Z raasch
104! Volume flow conservation added for the remaining three outflow boundaries
105!
106! 85 2007-05-11 09:35:14Z raasch
107! Division through dt_3d replaced by multiplication of the inverse.
108! For performance optimisation, this is done in the loop calculating the
109! divergence instead of using a seperate loop.
110!
111! 75 2007-03-22 09:54:05Z raasch
112! Volume flow control for non-cyclic boundary conditions added (currently only
113! for the north boundary!!), 2nd+3rd argument removed from exchange horiz,
114! mean vertical velocity is removed in case of Neumann boundary conditions
115! both at the bottom and the top
116!
117! RCS Log replace by Id keyword, revision history cleaned up
118!
119! Revision 1.25  2006/04/26 13:26:12  raasch
120! OpenMP optimization (+localsum, threadsum)
121!
122! Revision 1.1  1997/07/24 11:24:44  raasch
123! Initial revision
124!
125!
126! Description:
127! ------------
128! Compute the divergence of the provisional velocity field. Solve the Poisson
129! equation for the perturbation pressure. Compute the final velocities using
130! this perturbation pressure. Compute the remaining divergence.
131!------------------------------------------------------------------------------!
132
133    USE arrays_3d
134    USE constants
135    USE control_parameters
136    USE cpulog
137    USE grid_variables
138    USE indices
139    USE interfaces
140    USE pegrid
141    USE poisfft_mod
142    USE statistics
143
144    IMPLICIT NONE
145
146    INTEGER ::  i, j, k
147
148    REAL    ::  ddt_3d, localsum, threadsum, d_weight_pres
149
150    REAL, DIMENSION(1:2) ::  volume_flow_l, volume_flow_offset
151    REAL, DIMENSION(1:nzt) ::  w_l, w_l_l
152
153
154    CALL cpu_log( log_point(8), 'pres', 'start' )
155
156
157    ddt_3d = 1.0 / dt_3d
158    d_weight_pres = 1.0 / weight_pres(intermediate_timestep_count)
159
160!
161!-- Multigrid method expects array d to have one ghost layer.
162!--
163    IF ( psolver == 'multigrid' )  THEN
164     
165       DEALLOCATE( d )
166       ALLOCATE( d(nzb:nzt+1,nys-1:nyn+1,nxl-1:nxr+1) ) 
167
168!
169!--    Since p is later used to hold the weighted average of the substeps, it
170!--    cannot be used in the iterative solver. Therefore, its initial value is
171!--    stored on p_loc, which is then iteratively advanced in every substep.
172       IF ( intermediate_timestep_count == 1 )  THEN
173          DO  i = nxl-1, nxr+1
174             DO  j = nys-1, nyn+1
175                DO  k = nzb, nzt+1
176                   p_loc(k,j,i) = p(k,j,i)
177                ENDDO
178             ENDDO
179          ENDDO
180       ENDIF
181       
182    ELSEIF ( psolver == 'sor'  .AND.  intermediate_timestep_count == 1 )  THEN
183
184!
185!--    Since p is later used to hold the weighted average of the substeps, it
186!--    cannot be used in the iterative solver. Therefore, its initial value is
187!--    stored on p_loc, which is then iteratively advanced in every substep.
188       p_loc = p
189
190    ENDIF
191
192!
193!-- Conserve the volume flow at the outflow in case of non-cyclic lateral
194!-- boundary conditions
195!-- WARNING: so far, this conservation does not work at the left/south
196!--          boundary if the topography at the inflow differs from that at the
197!--          outflow! For this case, volume_flow_area needs adjustment!
198!
199!-- Left/right
200    IF ( conserve_volume_flow  .AND.  ( outflow_l .OR. outflow_r ) )  THEN
201
202       volume_flow(1)   = 0.0
203       volume_flow_l(1) = 0.0
204
205       IF ( outflow_l )  THEN
206          i = 0
207       ELSEIF ( outflow_r )  THEN
208          i = nx+1
209       ENDIF
210
211       DO  j = nys, nyn
212!
213!--       Sum up the volume flow through the south/north boundary
214          DO  k = nzb_2d(j,i)+1, nzt
215             volume_flow_l(1) = volume_flow_l(1) + u(k,j,i) * dzw(k)
216          ENDDO
217       ENDDO
218
219#if defined( __parallel )   
220       IF ( collective_wait )  CALL MPI_BARRIER( comm1dy, ierr )
221       CALL MPI_ALLREDUCE( volume_flow_l(1), volume_flow(1), 1, MPI_REAL, &
222                           MPI_SUM, comm1dy, ierr )   
223#else
224       volume_flow = volume_flow_l 
225#endif
226       volume_flow_offset(1) = ( volume_flow_initial(1) - volume_flow(1) ) &
227                               / volume_flow_area(1)
228
229       DO  j = nysg, nyng
230          DO  k = nzb_2d(j,i)+1, nzt
231             u(k,j,i) = u(k,j,i) + volume_flow_offset(1)
232          ENDDO
233       ENDDO
234
235    ENDIF
236
237!
238!-- South/north
239    IF ( conserve_volume_flow  .AND.  ( outflow_n .OR. outflow_s ) )  THEN
240
241       volume_flow(2)   = 0.0
242       volume_flow_l(2) = 0.0
243
244       IF ( outflow_s )  THEN
245          j = 0
246       ELSEIF ( outflow_n )  THEN
247          j = ny+1
248       ENDIF
249
250       DO  i = nxl, nxr
251!
252!--       Sum up the volume flow through the south/north boundary
253          DO  k = nzb_2d(j,i)+1, nzt
254             volume_flow_l(2) = volume_flow_l(2) + v(k,j,i) * dzw(k)
255          ENDDO
256       ENDDO
257
258#if defined( __parallel )   
259       IF ( collective_wait )  CALL MPI_BARRIER( comm1dx, ierr )
260       CALL MPI_ALLREDUCE( volume_flow_l(2), volume_flow(2), 1, MPI_REAL, &
261                           MPI_SUM, comm1dx, ierr )   
262#else
263       volume_flow = volume_flow_l 
264#endif
265       volume_flow_offset(2) = ( volume_flow_initial(2) - volume_flow(2) )    &
266                               / volume_flow_area(2)
267
268       DO  i = nxlg, nxrg
269          DO  k = nzb_v_inner(j,i)+1, nzt
270             v(k,j,i) = v(k,j,i) + volume_flow_offset(2)
271          ENDDO
272       ENDDO
273
274    ENDIF
275
276!
277!-- Remove mean vertical velocity
278    IF ( ibc_p_b == 1  .AND.  ibc_p_t == 1 )  THEN
279       IF ( simulated_time > 0.0 )  THEN ! otherwise nzb_w_inner not yet known
280          w_l = 0.0;  w_l_l = 0.0
281          DO  i = nxl, nxr
282             DO  j = nys, nyn
283                DO  k = nzb_w_inner(j,i)+1, nzt
284                   w_l_l(k) = w_l_l(k) + w(k,j,i)
285                ENDDO
286             ENDDO
287          ENDDO
288#if defined( __parallel )   
289          IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
290          CALL MPI_ALLREDUCE( w_l_l(1), w_l(1), nzt, MPI_REAL, MPI_SUM, &
291                              comm2d, ierr )
292#else
293          w_l = w_l_l 
294#endif
295          DO  k = 1, nzt
296             w_l(k) = w_l(k) / ngp_2dh_outer(k,0)
297          ENDDO
298          DO  i = nxlg, nxrg
299             DO  j = nysg, nyng
300                DO  k = nzb_w_inner(j,i)+1, nzt
301                   w(k,j,i) = w(k,j,i) - w_l(k)
302                ENDDO
303             ENDDO
304          ENDDO
305       ENDIF
306    ENDIF
307
308!
309!-- Compute the divergence of the provisional velocity field.
310    CALL cpu_log( log_point_s(1), 'divergence', 'start' )
311
312    IF ( psolver == 'multigrid' )  THEN
313       !$OMP PARALLEL DO SCHEDULE( STATIC )
314       DO  i = nxl-1, nxr+1
315          DO  j = nys-1, nyn+1
316             DO  k = nzb, nzt+1
317                d(k,j,i) = 0.0
318             ENDDO
319          ENDDO
320       ENDDO
321    ELSE
322       !$OMP PARALLEL DO SCHEDULE( STATIC )
323       !$acc kernels present( d )
324       DO  i = nxl, nxr
325          DO  j = nys, nyn
326             DO  k = nzb+1, nzt
327                d(k,j,i) = 0.0
328             ENDDO
329          ENDDO
330       ENDDO
331       !$acc end kernels
332    ENDIF
333
334    localsum  = 0.0
335    threadsum = 0.0
336
337#if defined( __ibm )
338    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
339    !$OMP DO SCHEDULE( STATIC )
340    DO  i = nxl, nxr
341       DO  j = nys, nyn
342          DO  k = nzb_s_inner(j,i)+1, nzt
343             d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx + &
344                          ( v(k,j+1,i) - v(k,j,i) ) * ddy + &
345                          ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) ) * ddt_3d      &
346                                                                * d_weight_pres
347          ENDDO
348!
349!--       Compute possible PE-sum of divergences for flow_statistics
350          DO  k = nzb_s_inner(j,i)+1, nzt
351             threadsum = threadsum + ABS( d(k,j,i) )
352          ENDDO
353
354       ENDDO
355    ENDDO
356
357    localsum = localsum + threadsum * dt_3d * &
358                          weight_pres(intermediate_timestep_count)
359
360    !$OMP END PARALLEL
361#else
362
363    !$OMP PARALLEL PRIVATE (i,j,k)
364    !$OMP DO SCHEDULE( STATIC )
365    !$acc kernels present( d, ddzw, rflags_s_inner, u, v, w )
366    !$acc loop collapse( 3 )
367    DO  i = nxl, nxr
368       DO  j = nys, nyn
369          DO  k = 1, nzt
370             d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx +             &
371                        ( v(k,j+1,i) - v(k,j,i) ) * ddy +               &
372                        ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) ) * ddt_3d  &
373                        * d_weight_pres * rflags_s_inner(k,j,i)
374          ENDDO
375       ENDDO
376    ENDDO
377    !$acc end kernels
378    !$OMP END PARALLEL
379
380!
381!-- Compute possible PE-sum of divergences for flow_statistics
382    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
383    !$OMP DO SCHEDULE( STATIC )
384    !$acc parallel loop collapse(3) present( d ) reduction(+:threadsum)
385    DO  i = nxl, nxr
386       DO  j = nys, nyn
387          DO  k = nzb+1, nzt
388             threadsum = threadsum + ABS( d(k,j,i) )
389          ENDDO
390       ENDDO
391    ENDDO
392    !$acc end parallel loop
393    localsum = localsum + threadsum * dt_3d * &
394                          weight_pres(intermediate_timestep_count)
395    !$OMP END PARALLEL
396#endif
397
398!
399!-- For completeness, set the divergence sum of all statistic regions to those
400!-- of the total domain
401    sums_divold_l(0:statistic_regions) = localsum
402
403    CALL cpu_log( log_point_s(1), 'divergence', 'stop' )
404
405!
406!-- Compute the pressure perturbation solving the Poisson equation
407    IF ( psolver(1:7) == 'poisfft' )  THEN
408
409!
410!--    Solve Poisson equation via FFT and solution of tridiagonal matrices
411       IF ( psolver == 'poisfft' )  THEN
412
413          CALL poisfft( d, tend )
414
415       ENDIF
416
417!
418!--    Store computed perturbation pressure and set boundary condition in
419!--    z-direction
420       !$OMP PARALLEL DO
421       !$acc kernels present( d, tend )
422       DO  i = nxl, nxr
423          DO  j = nys, nyn
424             DO  k = nzb+1, nzt
425                tend(k,j,i) = d(k,j,i)
426             ENDDO
427          ENDDO
428       ENDDO
429       !$acc end kernels
430
431!
432!--    Bottom boundary:
433!--    This condition is only required for internal output. The pressure
434!--    gradient (dp(nzb+1)-dp(nzb))/dz is not used anywhere else.
435       IF ( ibc_p_b == 1 )  THEN
436!
437!--       Neumann (dp/dz = 0)
438          !$OMP PARALLEL DO
439          !$acc kernels present( nzb_s_inner, tend )
440          DO  i = nxlg, nxrg
441             DO  j = nysg, nyng
442                tend(nzb_s_inner(j,i),j,i) = tend(nzb_s_inner(j,i)+1,j,i)
443             ENDDO
444          ENDDO
445          !$acc end kernels
446
447       ELSE
448!
449!--       Dirichlet
450          !$OMP PARALLEL DO
451          !$acc kernels present( tend )
452          DO  i = nxlg, nxrg
453             DO  j = nysg, nyng
454                tend(nzb_s_inner(j,i),j,i) = 0.0
455             ENDDO
456          ENDDO
457          !$acc end kernels
458
459       ENDIF
460
461!
462!--    Top boundary
463       IF ( ibc_p_t == 1 )  THEN
464!
465!--       Neumann
466          !$OMP PARALLEL DO
467          !$acc kernels present( tend )
468          DO  i = nxlg, nxrg
469             DO  j = nysg, nyng
470                tend(nzt+1,j,i) = tend(nzt,j,i)
471             ENDDO
472          ENDDO
473          !$acc end kernels
474
475       ELSE
476!
477!--       Dirichlet
478          !$OMP PARALLEL DO
479          !$acc kernels present( tend )
480          DO  i = nxlg, nxrg
481             DO  j = nysg, nyng
482                tend(nzt+1,j,i) = 0.0
483             ENDDO
484          ENDDO
485          !$acc end kernels
486
487       ENDIF
488
489!
490!--    Exchange boundaries for p
491       IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
492          on_device = .TRUE.         ! to be removed after complete porting
493       ELSE                          ! of ghost point exchange
494          !$acc update host( tend )
495       ENDIF
496       CALL exchange_horiz( tend, nbgp )
497       IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
498          on_device = .FALSE.        ! to be removed after complete porting
499       ELSE                          ! of ghost point exchange
500          !$acc update device( tend )
501       ENDIF
502     
503    ELSEIF ( psolver == 'sor' )  THEN
504
505!
506!--    Solve Poisson equation for perturbation pressure using SOR-Red/Black
507!--    scheme
508       CALL sor( d, ddzu_pres, ddzw, p_loc )
509       tend = p_loc
510
511    ELSEIF ( psolver == 'multigrid' )  THEN
512
513!
514!--    Solve Poisson equation for perturbation pressure using Multigrid scheme,
515!--    array tend is used to store the residuals, logical exchange_mg is used
516!--    to discern data exchange in multigrid ( 1 ghostpoint ) and normal grid
517!--    ( nbgp ghost points ).
518
519!--    If the number of grid points of the gathered grid, which is collected
520!--    on PE0, is larger than the number of grid points of an PE, than array
521!--    tend will be enlarged.
522       IF ( gathered_size > subdomain_size )  THEN
523          DEALLOCATE( tend )
524          ALLOCATE( tend(nzb:nzt_mg(mg_switch_to_pe0_level)+1,nys_mg(          &
525                    mg_switch_to_pe0_level)-1:nyn_mg(mg_switch_to_pe0_level)+1,&
526                    nxl_mg(mg_switch_to_pe0_level)-1:nxr_mg(                   &
527                    mg_switch_to_pe0_level)+1) )
528       ENDIF
529
530       CALL poismg( tend )
531
532       IF ( gathered_size > subdomain_size )  THEN
533          DEALLOCATE( tend )
534          ALLOCATE( tend(nzb:nzt+1,nysg:nyng,nxlg:nxrg) )
535       ENDIF
536
537!
538!--    Restore perturbation pressure on tend because this array is used
539!--    further below to correct the velocity fields
540       DO  i = nxl-1, nxr+1
541          DO  j = nys-1, nyn+1
542             DO  k = nzb, nzt+1
543                tend(k,j,i) = p_loc(k,j,i)
544             ENDDO
545          ENDDO
546       ENDDO
547
548    ENDIF
549
550!
551!-- Store perturbation pressure on array p, used for pressure data output.
552!-- Ghost layers are added in the output routines (except sor-method: see below)
553    IF ( intermediate_timestep_count == 1 )  THEN
554       !$OMP PARALLEL PRIVATE (i,j,k)
555       !$OMP DO
556       !$acc kernels present( p, tend, weight_substep )
557       !$acc loop independent
558       DO  i = nxl-1, nxr+1
559          !$acc loop independent
560          DO  j = nys-1, nyn+1
561             !$acc loop independent
562             DO  k = nzb, nzt+1
563                p(k,j,i) = tend(k,j,i) * &
564                           weight_substep(intermediate_timestep_count)
565             ENDDO
566          ENDDO
567       ENDDO
568       !$acc end kernels
569       !$OMP END PARALLEL
570
571    ELSE 
572       !$OMP PARALLEL PRIVATE (i,j,k)
573       !$OMP DO
574       !$acc kernels present( p, tend, weight_substep )
575       !$acc loop independent
576       DO  i = nxl-1, nxr+1
577          !$acc loop independent
578          DO  j = nys-1, nyn+1
579             !$acc loop independent
580             DO  k = nzb, nzt+1
581                p(k,j,i) = p(k,j,i) + tend(k,j,i) * &
582                           weight_substep(intermediate_timestep_count)
583             ENDDO
584          ENDDO
585       ENDDO
586       !$acc end kernels
587       !$OMP END PARALLEL
588
589    ENDIF
590       
591!
592!-- SOR-method needs ghost layers for the next timestep
593    IF ( psolver == 'sor' )  CALL exchange_horiz( p, nbgp )
594
595!
596!-- Correction of the provisional velocities with the current perturbation
597!-- pressure just computed
598    IF ( conserve_volume_flow  .AND.  ( bc_lr_cyc .OR. bc_ns_cyc ) )  THEN
599       volume_flow_l(1) = 0.0
600       volume_flow_l(2) = 0.0
601    ENDIF
602
603    !$OMP PARALLEL PRIVATE (i,j,k)
604    !$OMP DO
605    !$acc kernels present( ddzu, nzb_u_inner, nzb_v_inner, nzb_w_inner, tend, u, v, w, weight_pres )
606    !$acc loop independent
607    DO  i = nxl, nxr   
608       !$acc loop independent
609       DO  j = nys, nyn
610          !$acc loop independent
611          DO  k = 1, nzt
612             IF ( k > nzb_w_inner(j,i) )  THEN
613                w(k,j,i) = w(k,j,i) - dt_3d *                                 &
614                           ( tend(k+1,j,i) - tend(k,j,i) ) * ddzu(k+1) *      &
615                           weight_pres(intermediate_timestep_count)
616             ENDIF
617          ENDDO
618          !$acc loop independent
619          DO  k = 1, nzt
620             IF ( k > nzb_u_inner(j,i) )  THEN
621                u(k,j,i) = u(k,j,i) - dt_3d *                                 &
622                           ( tend(k,j,i) - tend(k,j,i-1) ) * ddx *            &
623                           weight_pres(intermediate_timestep_count)
624             ENDIF
625          ENDDO
626          !$acc loop independent
627          DO  k = 1, nzt
628             IF ( k > nzb_v_inner(j,i) )  THEN
629                v(k,j,i) = v(k,j,i) - dt_3d *                                 &
630                           ( tend(k,j,i) - tend(k,j-1,i) ) * ddy *            &
631                           weight_pres(intermediate_timestep_count)
632             ENDIF
633          ENDDO                                                         
634
635       ENDDO
636    ENDDO
637    !$acc end kernels
638    !$OMP END PARALLEL
639
640!
641!-- Sum up the volume flow through the right and north boundary
642    IF ( conserve_volume_flow  .AND.  bc_lr_cyc  .AND.  bc_ns_cyc  .AND.  &
643         nxr == nx )  THEN
644
645       !$OMP PARALLEL PRIVATE (j,k)
646       !$OMP DO
647       DO  j = nys, nyn
648          !$OMP CRITICAL
649          DO  k = nzb_2d(j,nx) + 1, nzt
650             volume_flow_l(1) = volume_flow_l(1) + u(k,j,nx) * dzw(k)
651          ENDDO
652          !$OMP END CRITICAL
653       ENDDO
654       !$OMP END PARALLEL
655
656    ENDIF
657
658    IF ( conserve_volume_flow  .AND.  bc_ns_cyc  .AND.  bc_lr_cyc  .AND.  &
659         nyn == ny )  THEN
660
661       !$OMP PARALLEL PRIVATE (i,k)
662       !$OMP DO
663       DO  i = nxl, nxr
664          !$OMP CRITICAL
665          DO  k = nzb_2d(ny,i) + 1, nzt
666             volume_flow_l(2) = volume_flow_l(2) + v(k,ny,i) * dzw(k)
667           ENDDO
668          !$OMP END CRITICAL
669       ENDDO
670       !$OMP END PARALLEL
671
672    ENDIF
673   
674!
675!-- Conserve the volume flow
676    IF ( conserve_volume_flow  .AND.  ( bc_lr_cyc  .AND.  bc_ns_cyc ) )  THEN
677
678#if defined( __parallel )   
679       IF ( collective_wait )  CALL MPI_BARRIER( comm2d, ierr )
680       CALL MPI_ALLREDUCE( volume_flow_l(1), volume_flow(1), 2, MPI_REAL, &
681                           MPI_SUM, comm2d, ierr ) 
682#else
683       volume_flow = volume_flow_l 
684#endif   
685
686       volume_flow_offset = ( volume_flow_initial - volume_flow ) / &
687                            volume_flow_area
688
689       !$OMP PARALLEL PRIVATE (i,j,k)
690       !$OMP DO
691       DO  i = nxl, nxr
692          DO  j = nys, nyn
693             DO  k = nzb_u_inner(j,i) + 1, nzt
694                u(k,j,i) = u(k,j,i) + volume_flow_offset(1)
695             ENDDO
696             DO k = nzb_v_inner(j,i) + 1, nzt
697                v(k,j,i) = v(k,j,i) + volume_flow_offset(2)
698             ENDDO
699          ENDDO
700       ENDDO
701
702       !$OMP END PARALLEL
703
704    ENDIF
705
706!
707!-- Exchange of boundaries for the velocities
708    IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
709       on_device = .TRUE.         ! to be removed after complete porting
710    ELSE                          ! of ghost point exchange
711       !$acc update host( u, v, w )
712    ENDIF
713    CALL exchange_horiz( u, nbgp )
714    CALL exchange_horiz( v, nbgp )
715    CALL exchange_horiz( w, nbgp )
716    IF ( numprocs == 1 )  THEN    ! workaround for single-core GPU runs
717       on_device = .FALSE.        ! to be removed after complete porting
718    ELSE                          ! of ghost point exchange
719       !$acc update device( u, v, w )
720    ENDIF
721
722!
723!-- Compute the divergence of the corrected velocity field,
724!-- a possible PE-sum is computed in flow_statistics
725    CALL cpu_log( log_point_s(1), 'divergence', 'start' )
726    sums_divnew_l = 0.0
727
728!
729!-- d must be reset to zero because it can contain nonzero values below the
730!-- topography
731    IF ( topography /= 'flat' )  d = 0.0
732
733    localsum  = 0.0
734    threadsum = 0.0
735
736    !$OMP PARALLEL PRIVATE (i,j,k) FIRSTPRIVATE(threadsum) REDUCTION(+:localsum)
737    !$OMP DO SCHEDULE( STATIC )
738#if defined( __ibm )
739    DO  i = nxl, nxr
740       DO  j = nys, nyn
741          DO  k = nzb_s_inner(j,i)+1, nzt
742             d(k,j,i) = ( u(k,j,i+1) - u(k,j,i) ) * ddx + &
743                        ( v(k,j+1,i) - v(k,j,i) ) * ddy + &
744                        ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k)
745          ENDDO
746          DO  k = nzb+1, nzt
747             threadsum = threadsum + ABS( d(k,j,i) )
748          ENDDO
749       ENDDO
750    ENDDO
751#else
752    !$acc kernels present( d, ddzw, rflags_s_inner, u, v, w )
753    !$acc loop collapse( 3 )
754    DO  i = nxl, nxr
755       DO  j = nys, nyn
756          DO  k = 1, nzt
757                d(k,j,i) = ( ( u(k,j,i+1) - u(k,j,i) ) * ddx +   &
758                             ( v(k,j+1,i) - v(k,j,i) ) * ddy +   &
759                             ( w(k,j,i) - w(k-1,j,i) ) * ddzw(k) &
760                           ) * rflags_s_inner(k,j,i)
761          ENDDO
762       ENDDO
763    ENDDO
764    !$acc end kernels
765!
766!-- Compute possible PE-sum of divergences for flow_statistics
767    !$acc parallel loop collapse(3) present( d ) reduction(+:threadsum)
768    DO  i = nxl, nxr
769       DO  j = nys, nyn
770          DO  k = nzb+1, nzt
771             threadsum = threadsum + ABS( d(k,j,i) )
772          ENDDO
773       ENDDO
774    ENDDO
775    !$acc end parallel loop
776#endif
777
778    localsum = localsum + threadsum
779    !$OMP END PARALLEL
780
781!
782!-- For completeness, set the divergence sum of all statistic regions to those
783!-- of the total domain
784    sums_divnew_l(0:statistic_regions) = localsum
785
786    CALL cpu_log( log_point_s(1), 'divergence', 'stop' )
787
788    CALL cpu_log( log_point(8), 'pres', 'stop' )
789
790
791 END SUBROUTINE pres
Note: See TracBrowser for help on using the repository browser.