Changeset 1111 for palm/trunk/SCRIPTS

Mar 8, 2013 11:54:10 PM (10 years ago)


GPU porting of pres, swap_timelevel. Adjustments of openACC directives.
Further porting of poisfft, which now runs completely on GPU without any
host/device data transfer for serial an parallel runs (but parallel runs
require data transfer before and after the MPI transpositions).
GPU-porting of tridiagonal solver:
tridiagonal routines split into extermal subroutines (instead using CONTAINS),
no distinction between parallel/non-parallel in poisfft and tridia any more,
tridia routines moved to end of file because of probable bug in PGI compiler
(otherwise "invalid device function" is indicated during runtime).
(cuda_fft_interfaces, fft_xy, flow_statistics, init_3d_model, palm, poisfft, pres, prognostic_equations, swap_timelevel, time_integration, transpose)
output of accelerator board information. (header)

optimization of tridia routines: constant elements and coefficients of tri are
stored in seperate arrays ddzuw and tric, last dimension of tri reduced from 5 to 2,
(init_grid, init_3d_model, modules, palm, poisfft)

poisfft_init is now called internally from poisfft,
(Makefile, Makefile_check, init_pegrid, poisfft, poisfft_hybrid)

CPU-time per grid point and timestep is output to CPU_MEASURES file
(cpu_statistics, modules, time_integration)


resorting from/to array work changed, work now has 4 dimensions instead of 1 (transpose)
array diss allocated only if required (init_3d_model)

pressure boundary condition "Neumann+inhomo" removed from the code
(check_parameters, header, poisfft, poisfft_hybrid, pres)


bugfix: dependency added for cuda_fft_interfaces (Makefile)
bugfix: CUDA fft plans adjusted for domain decomposition (before they always
used total domain) (fft_xy)

1 edited


  • palm/trunk/SCRIPTS/.mrun.config.imuk_gpu

    • Property svn:keywords set to Id
    r1016 r1111  
    12#column 1          column 2                                   column 3
    23#name of variable  value of variable (~ must not be used)     scope
    89%add_source_path   $base_directory/USER_CODE/$fname
    910%depository_path   $base_directory/MAKE_DEPOSITORY
    10 #%use_makefile      true
    11 #
    12 # Enter your own host below by adding another line containing in the second
    13 # column your hostname (as provided by the unix command "hostname") and in the
    14 # third column the host identifier. Depending on your operating system, the
    15 # first characters of the host identifier should be "lc" (Linux cluster), "ibm"
    16 # (IBM-AIX), or "nec" (NEC-SX), respectively.
    1812%host_identifier   inferno      lcmuk
    20 # version 27/09/2012
    21 #
     14# pure MPI version
    2215%remote_username   <replace by your IMUK username>               lcmuk parallel pgi
    2316%tmp_user_catalog  /localdata                                    lcmuk parallel pgi
    2922%lopts             -Mcray=pointer:-fastsse:-r8                   lcmuk parallel pgi
     24# pure MPI version with debug options
     25%remote_username   <replace by your IMUK username>               lcmuk parallel pgidbg
     26%tmp_user_catalog  /localdata                                    lcmuk parallel pgidbg
     27%compiler_name     mpif90                                        lcmuk parallel pgidbg
     28%compiler_name_ser pgf90                                         lcmuk parallel pgidbg
     29%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer   lcmuk parallel pgidbg
     30%mopts             -j:4                                          lcmuk parallel pgidbg
     31%fopts             -Mcray=pointer:-O0:-C:-g:-Mbounds:-Mchkstk:-traceback:-r8   lcmuk parallel pgidbg
     32%lopts             -Mcray=pointer:-O0:-C:-g:-Mbounds:-Mchkstk:-traceback:-r8   lcmuk parallel pgidbg
     34# pure GPU version
     35%remote_username   <replace by your IMUK username>                       lcmuk pgigpu
     36%tmp_user_catalog  /localdata                                            lcmuk pgigpu
     37%compiler_name     pgf90                                                 lcmuk pgigpu
     38%compiler_name_ser pgf90                                                 lcmuk pgigpu
     39%cpp_options       -Mpreprocess:-D__nopointer:-D__openacc:-D__cuda_fft   lcmuk pgigpu
     40%mopts             -j:4                                                  lcmuk pgigpu
     41%fopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8:-Mcuda    lcmuk pgigpu
     42%lopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8:-Mcuda:-L/localdata/opt/pgi/linux86-64/2012/cuda/4.1/lib64:-lcufft    lcmuk pgigpu
     44# MPI+GPU
    3145%remote_username   <replace by your IMUK username>               lcmuk parallel pgigpu
    3246%tmp_user_catalog  /localdata                                    lcmuk parallel pgigpu
    3347%compiler_name     mpif90                                        lcmuk parallel pgigpu
    3448%compiler_name_ser pgf90                                         lcmuk parallel pgigpu
    35 %cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc   lcmuk parallel pgigpu
     49%cpp_options       -Mpreprocess:-DMPI_REAL=MPI_DOUBLE_PRECISION:-DMPI_2REAL=MPI_2DOUBLE_PRECISION:-D__nopointer:-D__openacc:-D__cuda_fft   lcmuk parallel pgigpu
    3650%mopts             -j:4                                          lcmuk parallel pgigpu
    37 %fopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8        lcmuk parallel pgigpu
    38 %lopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8        lcmuk parallel pgigpu
     51%fopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8:-Mcuda    lcmuk parallel pgigpu
     52%lopts             -acc:-ta=nvidia,4.1:-Minfo=acc:-Mcray=pointer:-fastsse:-r8:-Mcuda:-L/localdata/opt/pgi/linux86-64/2012/cuda/4.1/lib64:-lcufft   lcmuk parallel pgigpu
    4054%write_binary                true                             restart
Note: See TracChangeset for help on using the changeset viewer.