Ignore:
Timestamp:
Mar 8, 2013 11:54:10 PM (9 years ago)
Author:
raasch
Message:

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:


resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:


bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • palm/trunk/SOURCE/header.f90

    r1109 r1111  
    2020! Current revisions:
    2121! -----------------
    22 !
     22! output of accelerator board information
     23! ibc_p_b = 2 removed
    2324!
    2425! Former revisions:
     
    291292                          threads_per_task, pdims(1), pdims(2), TRIM( char1 )
    292293    ENDIF
     294    IF ( num_acc_per_node /= 0 )  WRITE ( io, 117 )  num_acc_per_node   
    293295    IF ( ( host(1:3) == 'ibm'  .OR.  host(1:3) == 'nec'  .OR.    &
    294296           host(1:2) == 'lc'   .OR.  host(1:3) == 'dec' )  .AND. &
     
    305307       WRITE ( io, 108 )  maximum_parallel_io_streams
    306308    ENDIF
     309#else
     310    IF ( num_acc_per_node /= 0 )  WRITE ( io, 120 )  num_acc_per_node
    307311#endif
    308312    WRITE ( io, 99 )
     
    593597    ELSEIF ( ibc_p_b == 1 )  THEN
    594598       runten = 'p(0)     = p(1)   |'
    595     ELSE
    596        runten = 'p(0)     = p(1) +R|'
    597599    ENDIF
    598600    IF ( ibc_p_t == 0 )  THEN
     
    16131615            37X,'independent precursor runs'/             &
    16141616            37X,42('-'))
     1617117 FORMAT (' Accelerator boards / node:  ',I2)
    16151618#endif
    16161619110 FORMAT (/' Numerical Schemes:'/ &
     
    16271630            '     translation velocity = ',A/ &
    16281631            '     distance advected ',A,':  ',F8.3,' km(x)  ',F8.3,' km(y)')
     1632120 FORMAT (' Accelerator boards: ',8X,I2)
    16291633122 FORMAT (' --> Time differencing scheme: ',A)
    16301634123 FORMAT (' --> Rayleigh-Damping active, starts ',A,' z = ',F8.2,' m'/ &
     
    16801684             ' CPU-time used:       ',F9.3,' s     per timestep:               ', &
    16811685               '  ',F9.3,' s'/                                                    &
    1682              '                                   per second of simulated tim',    &
     1686             '                                      per second of simulated tim', &
    16831687               'e: ',F9.3,' s')
    16841688207 FORMAT ( ' Coupling start time: ',F9.3,' s')
Note: See TracChangeset for help on using the changeset viewer.