Ignore:
Timestamp:
Mar 8, 2013 11:54:10 PM (11 years ago)
Author:
raasch
Message:

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:


resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:


bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • palm/trunk/SOURCE/time_integration.f90

    r1093 r1111  
    2020! Current revisions:
    2121! ------------------
    22 !
     22! +internal timestep counter for cpu statistics added,
     23! openACC directives updated
    2324!
    2425! Former revisions:
     
    238239!--       Exchange of ghost points (lateral boundary conditions)
    239240          CALL cpu_log( log_point(26), 'exchange-horiz-progn', 'start' )
     241          !$acc update host( e_p, pt_p, u_p, v_p, w_p )
    240242          CALL exchange_horiz( u_p, nbgp )
    241243          CALL exchange_horiz( v_p, nbgp )
     
    272274!
    273275!--       Swap the time levels in preparation for the next time step.
     276          !$acc update device( e_p, pt_p, u_p, v_p, w_p )
    274277          CALL swap_timelevel
    275278
     
    298301             time_disturb = time_disturb + dt_3d
    299302             IF ( time_disturb >= dt_disturb )  THEN
     303                !$acc update host( u, v )
    300304                IF ( hom(nzb+5,1,pr_palm,0) < disturbance_energy_limit )  THEN
    301305                   CALL disturb_field( nzb_u_inner, tend, u )
     
    310314                   dist_range = 0
    311315                ENDIF
     316                !$acc update device( u, v )
    312317                time_disturb = time_disturb - dt_disturb
    313318             ENDIF
     
    321326             CALL pres
    322327          ENDIF
    323 !
    324 !--       Update device memory for calculating diffusion quantities and for next
    325 !--       timestep
    326           !$acc update device( e, pt, u, v, w )
    327           !$acc update device( q )  if ( allocated( q ) )
    328328
    329329!
     
    351351                CALL prandtl_fluxes
    352352                CALL cpu_log( log_point(19), 'prandtl_fluxes', 'stop' )
    353 !
    354 !++             Statistics still require updates on host
    355                 !$acc update host( qs, qsws, rif, shf, ts )
    356353             ENDIF
    357354
     
    369366             ENDIF
    370367             CALL cpu_log( log_point(17), 'diffusivities', 'stop' )
    371 !
    372 !++          Statistics still require update of diffusivities on host
    373              !$acc update host( kh, km )
    374368
    375369          ENDIF
     
    379373!
    380374!--    Increase simulation time and output times
     375       nr_timesteps_this_run      = nr_timesteps_this_run + 1
    381376       current_timestep_number    = current_timestep_number + 1
    382377       simulated_time             = simulated_time   + dt_3d
Note: See TracChangeset for help on using the changeset viewer.