source: palm/trunk/SOURCE/palm.f90 @ 1111

Last change on this file since 1111 was 1111, checked in by raasch, 8 years ago

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:


resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:


bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

  • Property svn:keywords set to Id
File size: 10.0 KB
Line 
1 PROGRAM palm
2
3!--------------------------------------------------------------------------------!
4! This file is part of PALM.
5!
6! PALM is free software: you can redistribute it and/or modify it under the terms
7! of the GNU General Public License as published by the Free Software Foundation,
8! either version 3 of the License, or (at your option) any later version.
9!
10! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12! A PARTICULAR PURPOSE.  See the GNU General Public License for more details.
13!
14! You should have received a copy of the GNU General Public License along with
15! PALM. If not, see <http://www.gnu.org/licenses/>.
16!
17! Copyright 1997-2012  Leibniz University Hannover
18!--------------------------------------------------------------------------------!
19!
20! Current revisions:
21! -----------------
22! openACC statements updated
23!
24! Former revisions:
25! -----------------
26! $Id: palm.f90 1111 2013-03-08 23:54:10Z raasch $
27!
28! 1092 2013-02-02 11:24:22Z raasch
29! unused variables removed
30!
31! 1036 2012-10-22 13:43:42Z raasch
32! code put under GPL (PALM 3.9)
33!
34! 1015 2012-09-27 09:23:24Z raasch
35! Version number changed from 3.8 to 3.8a.
36! OpenACC statements added + code changes required for GPU optimization
37!
38! 849 2012-03-15 10:35:09Z raasch
39! write_particles renamed lpm_write_restart_file
40!
41! 759 2011-09-15 13:58:31Z raasch
42! Splitting of parallel I/O, cpu measurement for write_3d_binary and opening
43! of unit 14 moved to here
44!
45! 495 2010-03-02 00:40:15Z raasch
46! Particle data for restart runs are only written if write_binary=.T..
47!
48! 215 2008-11-18 09:54:31Z raasch
49! Initialization of coupled runs modified for MPI-1 and moved to external
50! subroutine init_coupling
51!
52! 197 2008-09-16 15:29:03Z raasch
53! Workaround for getting information about the coupling mode
54!
55! 108 2007-08-24 15:10:38Z letzel
56! Get coupling mode from environment variable, change location of debug output
57!
58! 75 2007-03-22 09:54:05Z raasch
59! __vtk directives removed, write_particles is called only in case of particle
60! advection switched on, open unit 9 for debug output,
61! setting of palm version moved from modules to here
62!
63! RCS Log replace by Id keyword, revision history cleaned up
64!
65! Revision 1.10  2006/08/04 14:53:12  raasch
66! Distibution of run description header removed, call of header moved behind
67! init_3d_model
68!
69! Revision 1.2  2001/01/25 07:15:06  raasch
70! Program name changed to PALM, module test_variables removed.
71! Initialization of dvrp logging as well as exit of dvrp moved to new
72! subroutines init_dvrp_logging and close_dvrp (file init_dvrp.f90)
73!
74! Revision 1.1  1997/07/24 11:23:35  raasch
75! Initial revision
76!
77!
78! Description:
79! ------------
80! Large-Eddy Simulation (LES) model for the convective boundary layer,
81! optimized for use on parallel machines (implementation realized using the
82! Message Passing Interface (MPI)). The model can also be run on vector machines
83! (less well optimized) and workstations. Versions for the different types of
84! machines are controlled via cpp-directives.
85! Model runs are only feasible using the ksh-script mrun.
86!------------------------------------------------------------------------------!
87
88
89    USE arrays_3d
90    USE constants
91    USE control_parameters
92    USE cpulog
93    USE dvrp_variables
94    USE grid_variables
95    USE indices
96    USE interfaces
97    USE model_1d
98    USE particle_attributes
99    USE pegrid
100    USE spectrum
101    USE statistics
102
103#if defined( __openacc )
104    USE OPENACC
105#endif
106
107    IMPLICIT NONE
108
109!
110!-- Local variables
111    CHARACTER (LEN=9) ::  time_to_string
112    INTEGER           ::  i
113#if defined( __openacc )
114    REAL, DIMENSION(100) ::  acc_dum
115#endif
116
117    version = 'PALM 3.9'
118
119#if defined( __parallel )
120!
121!-- MPI initialisation. comm2d is preliminary set, because
122!-- it will be defined in init_pegrid but is used before in cpu_log.
123    CALL MPI_INIT( ierr )
124    CALL MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
125    CALL MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
126    comm_palm = MPI_COMM_WORLD
127    comm2d    = MPI_COMM_WORLD
128
129!
130!-- Initialize PE topology in case of coupled runs
131    CALL init_coupling
132#endif
133
134#if defined( __openacc )
135!
136!-- Get the number of accelerator boards per node and assign the MPI processes
137!-- to these boards
138    PRINT*, '*** ACC_DEVICE_NVIDIA = ', ACC_DEVICE_NVIDIA
139    num_acc_per_node  = ACC_GET_NUM_DEVICES( ACC_DEVICE_NVIDIA )
140    IF ( numprocs == 1  .AND.  num_acc_per_node > 0 )  num_acc_per_node = 1
141    PRINT*, '*** myid = ', myid, ' num_acc_per_node = ', num_acc_per_node
142    acc_rank = MOD( myid, num_acc_per_node )
143!    STOP '****'
144    CALL ACC_SET_DEVICE_NUM ( acc_rank, ACC_DEVICE_NVIDIA )
145!
146!-- Test output (to be removed later)
147    WRITE (*,'(A,I4,A,I3,A,I3,A,I3)') '*** Connect MPI-Task ', myid,' to CPU ',&
148                                      acc_rank, ' Devices: ', num_acc_per_node,&
149                                      ' connected to:',                        &
150                                      ACC_GET_DEVICE_NUM( ACC_DEVICE_NVIDIA )
151#endif
152
153!
154!-- Ensure that OpenACC first attaches the GPU devices by copying a dummy data
155!-- region
156    !$acc data copyin( acc_dum )
157
158!
159!-- Initialize measuring of the CPU-time remaining to the run
160    CALL local_tremain_ini
161
162!
163!-- Start of total CPU time measuring.
164    CALL cpu_log( log_point(1), 'total', 'start' )
165    CALL cpu_log( log_point(2), 'initialisation', 'start' )
166
167!
168!-- Open a file for debug output
169    WRITE (myid_char,'(''_'',I4.4)')  myid
170    OPEN( 9, FILE='DEBUG'//TRIM( coupling_char )//myid_char, FORM='FORMATTED' )
171
172!
173!-- Initialize dvrp logging. Also, one PE maybe split from the global
174!-- communicator for doing the dvrp output. In that case, the number of
175!-- PEs available for PALM is reduced by one and communicator comm_palm
176!-- is changed respectively.
177#if defined( __parallel )
178    CALL MPI_COMM_RANK( comm_palm, myid, ierr )
179!
180!-- TEST OUTPUT (TO BE REMOVED)
181    WRITE(9,*) '*** coupling_mode = "', TRIM( coupling_mode ), '"'
182    CALL LOCAL_FLUSH( 9 )
183    IF ( TRIM( coupling_mode ) /= 'uncoupled' )  THEN
184       PRINT*, '*** PE', myid, ' Global target PE:', target_id, &
185               TRIM( coupling_mode )
186    ENDIF
187#endif
188
189    CALL init_dvrp_logging
190
191!
192!-- Read control parameters from NAMELIST files and read environment-variables
193    CALL parin
194
195!
196!-- Determine processor topology and local array indices
197    CALL init_pegrid
198
199!
200!-- Generate grid parameters
201    CALL init_grid
202
203!
204!-- Check control parameters and deduce further quantities
205    CALL check_parameters
206
207
208!
209!-- Initialize all necessary variables
210    CALL init_3d_model
211
212!
213!-- Output of program header
214    IF ( myid == 0 )  CALL header
215
216    CALL cpu_log( log_point(2), 'initialisation', 'stop' )
217
218!
219!-- Set start time in format hh:mm:ss
220    simulated_time_chr = time_to_string( simulated_time )
221
222!
223!-- If required, output of initial arrays
224    IF ( do2d_at_begin )  THEN
225       CALL data_output_2d( 'xy', 0 )
226       CALL data_output_2d( 'xz', 0 )
227       CALL data_output_2d( 'yz', 0 )
228    ENDIF
229    IF ( do3d_at_begin )  THEN
230       CALL data_output_3d( 0 )
231    ENDIF
232
233!
234!-- Declare and initialize variables in the accelerator memory with their
235!-- host values
236    !$acc  data copyin( d, diss, e, e_p, kh, km, pt, pt_p, q, ql, tend, te_m, tpt_m, tu_m, tv_m, tw_m, u, u_p, v, vpt, v_p, w, w_p )          &
237    !$acc       copyin( tric, ddzu, ddzw, dd2zu, l_grid, l_wall, ptdf_x, ptdf_y, pt_init, rdf, rdf_sc, ug, vg, zu, zw )   &
238    !$acc       copyin( hom, qs, qsws, qswst, rif, rif_wall, shf, ts, tswst, us, usws, uswst, vsws, vswst, z0, z0h )      &
239    !$acc       copyin( fxm, fxp, fym, fyp, fwxm, fwxp, fwym, fwyp, nzb_diff_s_inner, nzb_diff_s_outer, nzb_diff_u )       &
240    !$acc       copyin( nzb_diff_v, nzb_s_inner, nzb_s_outer, nzb_u_inner )    &
241    !$acc       copyin( nzb_u_outer, nzb_v_inner, nzb_v_outer, nzb_w_inner )   &
242    !$acc       copyin( nzb_w_outer, wall_heatflux, wall_e_x, wall_e_y, wall_u, wall_v, wall_w_x, wall_w_y, wall_flags_0 )
243!
244!-- Integration of the model equations using timestep-scheme
245    CALL time_integration
246
247!
248!-- If required, write binary data for restart runs
249    IF ( write_binary(1:4) == 'true' )  THEN
250
251       CALL cpu_log( log_point(22), 'write_3d_binary', 'start' )
252
253       CALL check_open( 14 )
254
255       DO  i = 0, io_blocks-1
256          IF ( i == io_group )  THEN
257!
258!--          Write flow field data
259             CALL write_3d_binary
260          ENDIF
261#if defined( __parallel )
262          CALL MPI_BARRIER( comm2d, ierr )
263#endif
264       ENDDO
265
266       CALL cpu_log( log_point(22), 'write_3d_binary', 'stop' )
267
268!
269!--    If required, write particle data
270       IF ( particle_advection )  CALL lpm_write_restart_file
271    ENDIF
272
273!
274!-- If required, repeat output of header including the required CPU-time
275    IF ( myid == 0 )  CALL header
276!
277!-- If required, final user-defined actions, and
278!-- last actions on the open files and close files. Unit 14 was opened
279!-- in write_3d_binary but it is closed here, to allow writing on this
280!-- unit in routine user_last_actions.
281    CALL cpu_log( log_point(4), 'last actions', 'start' )
282    DO  i = 0, io_blocks-1
283       IF ( i == io_group )  THEN
284          CALL user_last_actions
285          IF ( write_binary(1:4) == 'true' )  CALL close_file( 14 )
286       ENDIF
287#if defined( __parallel )
288       CALL MPI_BARRIER( comm2d, ierr )
289#endif
290    ENDDO
291    CALL close_file( 0 )
292    CALL close_dvrp
293    CALL cpu_log( log_point(4), 'last actions', 'stop' )
294
295#if defined( __mpi2 )
296!
297!-- Test exchange via intercommunicator in case of a MPI-2 coupling
298    IF ( coupling_mode == 'atmosphere_to_ocean' )  THEN
299       i = 12345 + myid
300       CALL MPI_SEND( i, 1, MPI_INTEGER, myid, 11, comm_inter, ierr )
301    ELSEIF ( coupling_mode == 'ocean_to_atmosphere' )  THEN
302       CALL MPI_RECV( i, 1, MPI_INTEGER, myid, 11, comm_inter, status, ierr )
303       PRINT*, '### myid: ', myid, '   received from atmosphere:  i = ', i
304    ENDIF
305#endif
306
307!
308!-- Close the OpenACC dummy data region
309    !$acc end data
310    !$acc end data
311
312!
313!-- Take final CPU-time for CPU-time analysis
314    CALL cpu_log( log_point(1), 'total', 'stop' )
315    CALL cpu_statistics
316
317#if defined( __parallel )
318    CALL MPI_FINALIZE( ierr )
319#endif
320
321 END PROGRAM palm
Note: See TracBrowser for help on using the repository browser.