Ignore:
Timestamp:
Mar 8, 2013 11:54:10 PM (9 years ago)
Author:
raasch
Message:

New:
---

GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)

Changed:


resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)

Errors:


bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

File:
1 edited

Legend:

Unmodified
Added
Removed
  • palm/trunk/SOURCE/modules.f90

    r1107 r1111  
    2020! Current revisions:
    2121! ------------------
    22 !
     22! +tric, nr_timesteps_this_run
    2323!
    2424! Former revisions:
     
    407407
    408408    REAL, DIMENSION(:,:), ALLOCATABLE ::                                       &
    409           c_u, c_v, c_w, diss_s_e, diss_s_nr, diss_s_pt, diss_s_q, diss_s_qr,  &
    410           diss_s_sa, diss_s_u, diss_s_v, diss_s_w, dzu_mg, dzw_mg, flux_s_e,   &
    411           flux_s_nr, flux_s_pt, flux_s_q, flux_s_qr, flux_s_sa, flux_s_u,      &
    412           flux_s_v, flux_s_w, f1_mg, f2_mg, f3_mg, mean_inflow_profiles, nrs,  &
    413           nrsws, nrswst, pt_slope_ref, qs, qsws, qswst, qswst_remote, qrs,     &
    414           qrsws, qrswst, rif, saswsb, saswst, shf, total_2d_a, total_2d_o, ts, &
    415           tswst, us, usws, uswst, vsws, vswst, z0, z0h
    416          
     409          c_u, c_v, c_w, diss_s_e, diss_s_nr, diss_s_pt, diss_s_q,             &
     410          diss_s_qr, diss_s_sa, diss_s_u, diss_s_v, diss_s_w, dzu_mg, dzw_mg,  &
     411          flux_s_e, flux_s_nr, flux_s_pt, flux_s_q, flux_s_qr, flux_s_sa,      &
     412          flux_s_u, flux_s_v, flux_s_w, f1_mg, f2_mg, f3_mg,                   &
     413          mean_inflow_profiles, nrs, nrsws, nrswst, pt_slope_ref, qs, qsws,    &
     414          qswst, qswst_remote, qrs, qrsws, qrswst, rif, saswsb, saswst, shf,  &
     415          total_2d_a, total_2d_o, ts, tswst, us, usws, uswst, vsws, vswst, z0, &
     416          z0h
    417417
    418418    REAL, DIMENSION(:,:,:), ALLOCATABLE ::                                     &
     
    422422          flux_l_qr, flux_l_sa, flux_l_u, flux_l_v, flux_l_w, kh, km, lad_s,   &
    423423          lad_u, lad_v, lad_w, lai, l_wall, p_loc, sec, sls, tend, tend_pt,    &
    424           tend_nr, tend_q, tend_qr, u_m_l, u_m_n, u_m_r, u_m_s, v_m_l, v_m_n,  &
    425           v_m_r, v_m_s, w_m_l, w_m_n, w_m_r, w_m_s
     424          tend_nr, tend_q, tend_qr, tric, u_m_l, u_m_n, u_m_r, u_m_s, v_m_l,   &
     425          v_m_n, v_m_r, v_m_s, w_m_l, w_m_n, w_m_r, w_m_s
    426426           
    427427
     
    684684                maximum_parallel_io_streams = -1, max_pr_user = 0, &
    685685                mgcycles = 0, mg_cycles = -1, mg_switch_to_pe0_level = 0, mid, &
    686                 netcdf_data_format = 2, ngsrb = 2, nsor = 20, &
    687                 nsor_ini = 100, n_sor, normalizing_region = 0, &
     686                netcdf_data_format = 2, ngsrb = 2, nr_timesteps_this_run = 0, &
     687                nsor = 20, nsor_ini = 100, n_sor, normalizing_region = 0, &
    688688                nz_do3d = -9999, pch_index = 0, prt_time_count = 0, &
    689689                recycling_plane, runnr = 0, &
Note: See TracChangeset for help on using the changeset viewer.