= [=#gpu]Executing PALM on GPUs =
[[TracNav(doc/install/toc)]]

Palm was first ported to GPUs in the course of its participation in the [[https://www.spec.org/accel/|SPEC-ACCEL]] benchmark. This initial GPU-capable PALM version is r1257. A full GPU porting was conducted in the year 2016 and the source code is available [[source:palm/branches/GPU_porting_eurohack|here]]. A detailed discussion of the porting process and its outcome can be found in [[http://link.springer.com/chapter/10.1007%2F978-3-319-46079-6_35|Knoop et al. (2016)]]

In order to compile PALM with e.g. the PGI 16.5 compiler for an NVidia Tesla GPU with CUDA 7.5 and OpenMPI 1.10.3, please use the following

- Preprocessor options: {{{-Mpreprocess -DMPI_REAL=MPI_DOUBLE_PRECISION -DMPI_2REAL=MPI_2DOUBLE_PRECISION -D__parallel -D__nopointer -D__lc -D__cuda_fft}}}
- Compiler flags: {{{-fastsse -acc -ta=tesla -Minfo=acc -Mcuda}}}
- Linker flags: {{{-fastsse -acc -ta=tesla -Minfo=acc -Mcuda -lcufft}}}

Don't forget to use the mpif90 wrapper of your MPI installation in order to correctly link your MPI libraries.