Home

Context Navigation

cpu_statistics.f90 @ 1111

Last change on this file since 1111 was 1111, checked in by raasch, 11 years ago

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:

resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:

bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

Property svn:keywords set to Id

File size: 11.6 KB

Line
1	SUBROUTINE cpu_statistics
2
3	!--------------------------------------------------------------------------------!
4	! This file is part of PALM.
5	!
6	! PALM is free software: you can redistribute it and/or modify it under the terms
7	! of the GNU General Public License as published by the Free Software Foundation,
8	! either version 3 of the License, or (at your option) any later version.
9	!
10	! PALM is distributed in the hope that it will be useful, but WITHOUT ANY
11	! WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR
12	! A PARTICULAR PURPOSE. See the GNU General Public License for more details.
13	!
14	! You should have received a copy of the GNU General Public License along with
15	! PALM. If not, see <http://www.gnu.org/licenses/>.
16	!
17	! Copyright 1997-2012 Leibniz University Hannover
18	!--------------------------------------------------------------------------------!
19	!
20	! Current revisions:
21	! -----------------
22	! output of grid point numbers and average CPU time per grid point and timestep
23	!
24	! Former revisions:
25	! -----------------
26	! $Id: cpu_statistics.f90 1111 2013-03-08 23:54:10Z raasch $
27	!
28	! 1092 2013-02-02 11:24:22Z raasch
29	! unused variables removed
30	!
31	! 1036 2012-10-22 13:43:42Z raasch
32	! code put under GPL (PALM 3.9)
33	!
34	! 1015 2012-09-27 09:23:24Z raasch
35	! output of accelerator board information
36	!
37	! 683 2011-02-09 14:25:15Z raasch
38	! output of handling of ghostpoint exchange
39	!
40	! 622 2010-12-10 08:08:13Z raasch
41	! output of handling of collective operations
42	!
43	! 222 2009-01-12 16:04:16Z letzel
44	! Bugfix for nonparallel execution
45	!
46	! 197 2008-09-16 15:29:03Z raasch
47	! Format adjustments in order to allow CPU# > 999,
48	! data are collected from PE0 in an ordered sequence which seems to avoid
49	! hanging of processes on SGI-ICE
50	!
51	! 82 2007-04-16 15:40:52Z raasch
52	! Preprocessor directives for old systems removed
53	!
54	! RCS Log replace by Id keyword, revision history cleaned up
55	!
56	! Revision 1.13 2006/04/26 12:10:51 raasch
57	! Output of number of threads per task, max = min in case of 1 PE
58	!
59	! Revision 1.1 1997/07/24 11:11:11 raasch
60	! Initial revision
61	!
62	!
63	! Description:
64	! ------------
65	! Analysis and output of the cpu-times measured. All PE results are collected
66	! on PE0 in order to calculate the mean cpu-time over all PEs and other
67	! statistics. The output is sorted according to the amount of cpu-time consumed
68	! and output on PE0.
69	!------------------------------------------------------------------------------!
70
71	USE control_parameters
72	USE cpulog
73	USE indices, ONLY: nx, ny, nz
74	USE pegrid
75
76	IMPLICIT NONE
77
78	INTEGER :: i, ii(1), iii, sender
79	REAL :: average_cputime
80	REAL, SAVE :: norm = 1.0
81	REAL, DIMENSION(:), ALLOCATABLE :: pe_max, pe_min, pe_rms, sum
82	REAL, DIMENSION(:,:), ALLOCATABLE :: pe_log_points
83
84
85	!
86	!-- Compute cpu-times in seconds
87	log_point%mtime = log_point%mtime / norm
88	log_point%sum = log_point%sum / norm
89	log_point%vector = log_point%vector / norm
90	WHERE ( log_point%counts /= 0 )
91	log_point%mean = log_point%sum / log_point%counts
92	END WHERE
93
94
95	!
96	!-- Collect cpu-times from all PEs and calculate statistics
97	IF ( myid == 0 ) THEN
98	!
99	!-- Allocate and initialize temporary arrays needed for statistics
100	ALLOCATE( pe_max( SIZE( log_point ) ), pe_min( SIZE( log_point ) ), &
101	pe_rms( SIZE( log_point ) ), &
102	pe_log_points( SIZE( log_point ), 0:numprocs-1 ) )
103	pe_min = log_point%sum
104	pe_max = log_point%sum ! need to be set in case of 1 PE
105	pe_rms = 0.0
106
107	#if defined( __parallel )
108	!
109	!-- Receive data from all PEs
110	DO i = 1, numprocs-1
111	CALL MPI_RECV( pe_max(1), SIZE( log_point ), MPI_REAL, &
112	i, i, comm2d, status, ierr )
113	sender = status(MPI_SOURCE)
114	pe_log_points(:,sender) = pe_max
115	ENDDO
116	pe_log_points(:,0) = log_point%sum ! Results from PE0
117	!
118	!-- Calculate mean of all PEs, store it on log_point%sum
119	!-- and find minimum and maximum
120	DO iii = 1, SIZE( log_point )
121	DO i = 1, numprocs-1
122	log_point(iii)%sum = log_point(iii)%sum + pe_log_points(iii,i)
123	pe_min(iii) = MIN( pe_min(iii), pe_log_points(iii,i) )
124	pe_max(iii) = MAX( pe_max(iii), pe_log_points(iii,i) )
125	ENDDO
126	log_point(iii)%sum = log_point(iii)%sum / numprocs
127	!
128	!-- Calculate rms
129	DO i = 0, numprocs-1
130	pe_rms(iii) = pe_rms(iii) + ( &
131	pe_log_points(iii,i) - log_point(iii)%sum &
132	)**2
133	ENDDO
134	pe_rms(iii) = SQRT( pe_rms(iii) / numprocs )
135	ENDDO
136	ELSE
137	!
138	!-- Send data to PE0 (pe_max is used as temporary storage to send
139	!-- the data in order to avoid sending the data type log)
140	ALLOCATE( pe_max( SIZE( log_point ) ) )
141	pe_max = log_point%sum
142	CALL MPI_SEND( pe_max(1), SIZE( log_point ), MPI_REAL, 0, myid, comm2d, &
143	ierr )
144	#endif
145
146	ENDIF
147
148	!
149	!-- Write cpu-times
150	IF ( myid == 0 ) THEN
151	!
152	!-- Re-store sums
153	ALLOCATE( sum( SIZE( log_point ) ) )
154	WHERE ( log_point%counts /= 0 )
155	sum = log_point%sum
156	ELSEWHERE
157	sum = -1.0
158	ENDWHERE
159
160	!
161	!-- Get total time in order to calculate CPU-time per gridpoint and timestep
162	IF ( nr_timesteps_this_run /= 0 ) THEN
163	average_cputime = log_point(1)%sum / REAL( (nx+1) * (ny+1) * nz ) / &
164	REAL( nr_timesteps_this_run ) * 1E6 ! in micro-sec
165	ELSE
166	average_cputime = -1.0
167	ENDIF
168
169	!
170	!-- Write cpu-times sorted by size
171	CALL check_open( 18 )
172	#if defined( __parallel )
173	WRITE ( 18, 100 ) TRIM( run_description_header ), &
174	numprocs * threads_per_task, pdims(1), pdims(2), &
175	threads_per_task, nx+1, ny+1, nz, nr_timesteps_this_run, &
176	average_cputime
177
178	IF ( num_acc_per_node /= 0 ) WRITE ( 18, 108 ) num_acc_per_node
179	WRITE ( 18, 110 )
180	#else
181	WRITE ( 18, 100 ) TRIM( run_description_header ), &
182	numprocs * threads_per_task, 1, 1, &
183	threads_per_task, nx+1, ny+1, nz, nr_timesteps_this_run, &
184	average_cputime
185
186	IF ( num_acc_per_node /= 0 ) WRITE ( 18, 109 ) num_acc_per_node
187	WRITE ( 18, 110 )
188	#endif
189	DO
190	ii = MAXLOC( sum )
191	i = ii(1)
192	IF ( sum(i) /= -1.0 ) THEN
193	WRITE ( 18, 102 ) &
194	log_point(i)%place, log_point(i)%sum, &
195	log_point(i)%sum / log_point(1)%sum * 100.0, &
196	log_point(i)%counts, pe_min(i), pe_max(i), pe_rms(i)
197	sum(i) = -1.0
198	ELSE
199	EXIT
200	ENDIF
201	ENDDO
202	ENDIF
203
204
205	!
206	!-- The same procedure again for the individual measurements.
207	!
208	!-- Compute cpu-times in seconds
209	log_point_s%mtime = log_point_s%mtime / norm
210	log_point_s%sum = log_point_s%sum / norm
211	log_point_s%vector = log_point_s%vector / norm
212	WHERE ( log_point_s%counts /= 0 )
213	log_point_s%mean = log_point_s%sum / log_point_s%counts
214	END WHERE
215
216	!
217	!-- Collect cpu-times from all PEs and calculate statistics
218	#if defined( __parallel )
219	!
220	!-- Set barrier in order to avoid that PE0 receives log_point_s-data
221	!-- while still busy with receiving log_point-data (see above)
222	CALL MPI_BARRIER( comm2d, ierr )
223	#endif
224	IF ( myid == 0 ) THEN
225	!
226	!-- Initialize temporary arrays needed for statistics
227	pe_min = log_point_s%sum
228	pe_max = log_point_s%sum ! need to be set in case of 1 PE
229	pe_rms = 0.0
230
231	#if defined( __parallel )
232	!
233	!-- Receive data from all PEs
234	DO i = 1, numprocs-1
235	CALL MPI_RECV( pe_max(1), SIZE( log_point ), MPI_REAL, &
236	MPI_ANY_SOURCE, MPI_ANY_TAG, comm2d, status, ierr )
237	sender = status(MPI_SOURCE)
238	pe_log_points(:,sender) = pe_max
239	ENDDO
240	pe_log_points(:,0) = log_point_s%sum ! Results from PE0
241	!
242	!-- Calculate mean of all PEs, store it on log_point_s%sum
243	!-- and find minimum and maximum
244	DO iii = 1, SIZE( log_point )
245	DO i = 1, numprocs-1
246	log_point_s(iii)%sum = log_point_s(iii)%sum + pe_log_points(iii,i)
247	pe_min(iii) = MIN( pe_min(iii), pe_log_points(iii,i) )
248	pe_max(iii) = MAX( pe_max(iii), pe_log_points(iii,i) )
249	ENDDO
250	log_point_s(iii)%sum = log_point_s(iii)%sum / numprocs
251	!
252	!-- Calculate rms
253	DO i = 0, numprocs-1
254	pe_rms(iii) = pe_rms(iii) + ( &
255	pe_log_points(iii,i) - log_point_s(iii)%sum &
256	)**2
257	ENDDO
258	pe_rms(iii) = SQRT( pe_rms(iii) / numprocs )
259	ENDDO
260	ELSE
261	!
262	!-- Send data to PE0 (pe_max is used as temporary storage to send
263	!-- the data in order to avoid sending the data type log)
264	pe_max = log_point_s%sum
265	CALL MPI_SEND( pe_max(1), SIZE( log_point ), MPI_REAL, 0, 0, comm2d, &
266	ierr )
267	#endif
268
269	ENDIF
270
271	!
272	!-- Write cpu-times
273	IF ( myid == 0 ) THEN
274	!
275	!-- Re-store sums
276	WHERE ( log_point_s%counts /= 0 )
277	sum = log_point_s%sum
278	ELSEWHERE
279	sum = -1.0
280	ENDWHERE
281
282	!
283	!-- Write cpu-times sorted by size
284	WRITE ( 18, 101 )
285	DO
286	ii = MAXLOC( sum )
287	i = ii(1)
288	IF ( sum(i) /= -1.0 ) THEN
289	WRITE ( 18, 102 ) &
290	log_point_s(i)%place, log_point_s(i)%sum, &
291	log_point_s(i)%sum / log_point(1)%sum * 100.0, &
292	log_point_s(i)%counts, pe_min(i), pe_max(i), pe_rms(i)
293	sum(i) = -1.0
294	ELSE
295	EXIT
296	ENDIF
297	ENDDO
298
299	!
300	!-- Output of handling of MPI operations
301	IF ( collective_wait ) THEN
302	WRITE ( 18, 103 )
303	ELSE
304	WRITE ( 18, 104 )
305	ENDIF
306	IF ( synchronous_exchange ) THEN
307	WRITE ( 18, 105 )
308	ELSE
309	WRITE ( 18, 106 )
310	ENDIF
311
312	!
313	!-- Empty lines in order to create a gap to the results of the model
314	!-- continuation runs
315	WRITE ( 18, 107 )
316
317	!
318	!-- Unit 18 is not needed anymore
319	CALL close_file( 18 )
320
321	ENDIF
322
323
324	100 FORMAT (A/11('-')//'CPU measures for ',I5,' PEs (',I5,'(x) * ',I5,'(y', &
325	&') tasks *',I5,' threads):'// &
326	'gridpoints (x/y/z): ',20X,I5,' * ',I5,' * ',I5/ &
327	'nr of timesteps: ',22X,I6/ &
328	'cpu time per grid point and timestep: ',5X,F8.5,' * 10**-6 s')
329
330	101 FORMAT (/'special measures:'/ &
331	&'-----------------------------------------------------------', &
332	&'--------------------')
333
334	102 FORMAT (A20,2X,F9.3,2X,F7.2,1X,I7,3(1X,F9.3))
335	103 FORMAT (/'Barriers are set in front of collective operations')
336	104 FORMAT (/'No barriers are set in front of collective operations')
337	105 FORMAT (/'Exchange of ghostpoints via MPI_SENDRCV')
338	106 FORMAT (/'Exchange of ghostpoints via MPI_ISEND/MPI_IRECV')
339	107 FORMAT (//)
340	108 FORMAT ('Accelerator boards per node: ',14X,I2)
341	109 FORMAT ('Accelerator boards: ',23X,I2)
342	110 FORMAT ('----------------------------------------------------------', &
343	&'------------'//&
344	&'place: mean counts min ', &
345	&' max rms'/ &
346	&' sec. % sec. ', &
347	&' sec. sec.'/ &
348	&'-----------------------------------------------------------', &
349	&'-------------------')
350
351	END SUBROUTINE cpu_statistics
352

Note: See TracBrowser for help on using the repository browser.

Context Navigation

source: palm/trunk/SOURCE/cpu_statistics.f90 @ 1111

Download in other formats: