4.5.1
NetCDF data output
The standard data output of
PALM is NetCDF (Network
Common Data Form) in 64-bit offset format. NetCDF is an
interface to a library of data access functions for
storing and retrieving data in the form of arrays. NetCDF is an
abstraction that supports a view of data as a collection of
self-describing, portable
objects that can be accessed through a simple interface (protable means
that NetCDF data files can be read on any machine regardless of where
they
have been created). Array values may be accessed directly, without
knowing details of how the data are stored. Auxiliary information about
the data, such as what units are used, may be stored with the data.
Generic utilities and application programs can access NetCDF datasets
(files) and transform, combine, analyze, or display specified fields of
the data, e.g. the contents of a NetCDF dataset can be viewed using the
command ncdump
(see further below).
Many (public domain) graphic software has built in interfaces to read
NetCDF datasets (e.g. ferret
or NCL).
The complete NetCDF documentation
is available from the NetCDF
homepage. The NetCDF tutorial for FORTRAN90 can also be found
on our
web server.
The
general output format of PALM data is determined by the runtime-parameter data_output_format
(data_output_format
= 'netcdf',
by default). For historical reasons, some alternative formats can be
selected (see data_output_format).
The accuracy of the NetCDF output data can be set with parameter netcdf_precision.
By default, data have single (4 byte) precision. Runtime-parameter netcdf_data_format
can be used to choose between the different NetCDF file formats
(classic, 64-bit offset, NetCDF4/HDF5). The 64-bit offset format allows
creating large files (file size only limited by the underlying file
system), but each output variable (array) is still limited to 2GB. In
NetCDF4 format, there is no limit for the size of variables, and it
also allows parallel I/O into one output file. However, some (graphic)
software still does not support NetCDF4 format.
PALM allows the output of
various
data (e.g. cross sections, vertical profiles, timeseries, etc.) into
different files. The following table gives an overview about the
different kind of NetCDF output data offered by PALM. Beside the local
names of the files, the table also lists the minimum parameter settings
which
are necessary to switch on the output, as well as the parameters to be
used to control the output.
kind of data | local filename | parameter settings necessary to
switch on output | further
parameters for output control |
vertical profiles | DATA_1D_PR_NETCDF | data_output_pr,
dt_data_output
(or dt_dopr) | averaging_interval,
(or averaging_interval_pr),
data_output_format,
dt_averaging_input,
dt_averaging_input_pr,
skip_time_data_output
(or skip_time_dopr),
statistic_regions |
timeseries | DATA_1D_TS_NETCDF | dt_dots | data_output_format,
statistic_regions |
spectra | DATA_1D_SP_NETCDF | comp_spectra_level,
data_output_sp,
dt_data_output
(or dt_dosp),
spectra_direction | averaging_interval (or
averaging_interval_sp),
data_output_format,
dt_averaging_input_pr,
skip_time_data_output
(or skip_time_dosp) |
2d cross section (xy) | DATA_2D_XY_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_do2d_xy),
section_xy | data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_do2d_xy) |
2d cross section (xy),
time-averaged | DATA_2D_XY_AV_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_data_output_av or
dt_do2d_xy),
section_xy | averaging_interval,
dt_averaging_input,
data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_data_output_av,
or skip_time_do2d_xy) |
2d cross section (xz) | DATA_2D_XZ_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_do2d_xz),
section_xz | data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_do2d_xz) |
2d cross section (xz),
time-averaged | DATA_2D_XZ_AV_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_data_output_av or
dt_do2d_xz),
section_xz | averaging_interval,
dt_averaging_input,
data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_data_output_av,
or skip_time_do2d_xz) |
2d cross section (yz) | DATA_2D_YZ_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_do2d_yz),
section_yz | data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_do2d_yz) |
2d cross section (yz),
time-averaged | DATA_2D_YZ_AV_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_data_output_av or
dt_do2d_yz),
section_yz | averaging_interval,
dt_averaging_input,
data_output_format,
data_output_2d_on_each_pe,
do2d_at_begin,
skip_time_data_output
(or skip_time_data_output_av,
or skip_time_do2d_yz) |
3d volume | DATA_3D_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_do3d) | data_output_format,
do3d_at_begin,
nz_do3d, skip_time_data_output
(or skip_time_do3d) |
3d volume, time-averaged | DATA_3D_AV_NETCDF | data_output
(or data_output_user),
dt_data_output
(or dt_data_output_av or
dt_do3d) | averaging_interval,
dt_averaging_input,
data_output_format, do3d_at_begin,
nz_do3d, skip_time_data_output
(or skip_time_data_output_av,
or skip_time_do3d) |
particle
timeseries | DATA_1D_PTS_NETCDF | dt_data_output
(or dt_dopts) |
|
particle attributes | DATA_PRT_NETCDF | dt_write_particle_data |
|
Creating, contents and
post-processing of a PALM NetCDF file
This
section describes, step-by-step, the creation, storage, and
post-processing of PALM NetCDF datasets considering the output of 2d
horizontal (xy) cross sections as example. The parameter settings
described below are those of the example
parameter file (see chapter
4.4.1) so this parameter file can be used to retrace the
following explanations.
- Output
of xy cross
sections requires to set at least three parameters: first, the temporal
interval of the output time (run parameter dt_data_output
or dt_do2d_xy), second,
the names of the quantities for which cross section output is wanted (data_output),
and third, the position (height level given as gridpoint index) of the
cross sections (section_xy).
The string '_xy'
must be appended to the name strings assigned to data_output in
either case. Output times
cannot be defined
directly but only via the output time interval, starting from the
beginning of the initial 3d run (t=0, but no cross sections are
written at the time t=0; exceptions see do2d_at_begin).
As an exception, the first output time can be set independently with
parameter skip_time_data_output
(or skip_time_do2d_xy).
Very important:
If
no values have been assigned to data_output , dt_data_output (or dt_do2d_xy), and section_xy,
or
if the values given for dt_data_output
(or dt_do2d_xy) or skip_time_data_output
(or skip_time_do2d_xy)
are
larger than the simulated time (see end_time),
then there will be no output!
For
output of time-averaged data, the string '_av' has to be
additionally appended to the respective name string (see data_output).
-
Instantaneous data are
output in NetCDF
format
into the
local file DATA_2D_XY_NETCDF.
This file must be linked with a permanent file by
using a file connection statement in the mrun
configuration
file (see e.g. chapter
3.2). At the end of the run the local file is copied to this
file. Such a statement can look like this:
DATA_2D_XY_NETCDF out:loc:tr xy#
~/$fname/OUTPUT/$fname _xy nc .
If
the respective mrun call is
like
mrun -d test -r “xy#” ...
then the local
file DATA_2D_XY_NETCDF
is copied to the permanent file ~/test/OUTPUT/test/test_xy.nc
. However, the character string 'xy#' activating the
file connection statement (see third column of the
statement) must be given in the mrun call as
argument of the
option -r (and/or -o). If
this is forgotten by mistake, the model outputs the data
to
the local file, but this is not copied to the permanent file and thus
the data are not available for the user after the run has finished.
The
last (6th) column of the file connection statement, which defines the
additional file suffix, should be the string 'nc', because many
application programs expect NetCDF files to have the file extension '.nc'. (This
additional suffix given in the 6th column is always put at the very end
of the filename, even
in case of cycle numbers.)
Time averaged data are
output into local file DATA_2D_XY_AV_NETCDF
which requires an additional file connection statement
DATA_2D_XY_AV_NETCDF out:loc:tr xy#
~/$fname/OUTPUT/$fname _xy_av nc .
- Using netcdf_data_format > 2 or data_output_2d_on_each_pe = .F. generates a single NetCDF file containing data from all propcessors. However, with
parallel runs and choice of data_output_2d_on_each_pe
= .T.
each PE outputs the data of its subdomain
not directly to the NetCDF file but to a separate file with the name
PLOT2D_XY_<processor-Id>,
where <processor-Id> is a four digit number (e.g.
PLOT2D_XY_0000). These files have FORTRAN binary format. After PALM has
finished, their content is merged into the final local destination file
DATA_2D_XY_NETCDF by the program combine_plot_fields. This is done by
adding the following output command to the configuration file:
OC:[[$
( echo $localhost | cut -c1-3 ) = imbh ]] &&
combine_plot_fields.x .
Using
this call, possibly existing
files of the other cross sections (xz, yz) and of 3d volume data are
also merged to their respective NetCDF files. The tool writes
informative messages about the actions accomplished into the job
protocol, even if no files were found (i.e. the output command
may remain in the configuration file, even if no appropriate files
are created during the simulation).
- The contents of a NetCDF dataset can
be simply analyzed with the tool ncdump
(which is part of the NetCDF software). It can be used to display the
dimension (coordinate) names and lengths; variable names, types, and
shapes; attribute names and values; and optionally,
the values
of
data for all variables or selected variables in a netCDF dataset. The
file content (without displaying the gridpoint data of the quantities)
can be displayed with the command
ncdump
-c <filename> .
Usage
of the ncdump
command requires that the path to the NetCDF software is appropriately
set. On the IMUK-Linux-cluster this path is set by default, on the
HLRN-IBM-Regatta, the user has to execute the command
module
load netcdf .
Please
refer to the system documentation or system administrator on
how
to setup the correct NetCDF path on the respective host.
An
example how to interpret the ncdump-output
will be given further below.
- There
are several application programs which can be used for graphical
display of NetCDF datasets. One of the easiest ways to display the PALM
data is the ferret
graphical user interface (GUI). On the IMUK-Linux-cluster, this can be
called by
executing the command
ferret
-gui .
ferret is also
available at HLRN.
Another possible tool is ncview,
which is also available at HLRN (see the HLRN
documentation). Beside these general tools, the PALM group
will develop a graphical interface based on NCL
(NCAR Command Language). This interface will be specially
designed
to display the PALM Data. A detailed documentation will be linked here
as soon as available.
- One
of the most flexible general ways for postprocessing NetCDF data
is reading these data into a FORTRAN program. The example program shows
how to read 2d or 3d NetCDF datasets created by PALM. Compiling this
program requires that the
NetCDF library is installed (if neccessary, please ask your system
administrator). Some compilation instructions are given in the header
of the
example program.
- By
default, each PALM job creates its own NetCDF files. If permanent files
with respective filenames are already existing, then new files with
higher cycle numbers will be created. However, in case of a job chain,
it is possible to extend the NetCDF datasets created by the initial
run with data from the restart run(s). As a result, data of
all
output times of the complete job chain are contained in one file and
the number of data files to be handled by the user may be reduced
significantly.
To extend a NetCDF dataset (created by a
previous run
of a job chain) with data from the current run requires that this
dataset must be provided as an INPUT file. This may be difficult in
case that PALM is running on a remote host because typically the output
data files from the previous run have been already transferred by mrun to the local
workstation with a file connection statement like
DATA_2D_XY_NETCDF
out:loc:tr xy#:xyf
~/palm/current_version/JOBS/$fname/OUTPUT
_xy nc
and thus
they are not available on the remote host any more.
A
workaround for solving this problem is to create an additional copy of
the output file on the remote machine by adding the file connection
statement
DATA_2D_XY_NETCDF out:loc xy#:xyf
~/palm/current_version/JOBS/$fname/OUTPUT _xy nc
This
additional copy can then be accessed from a restart job as an input
file using the file connection statement
DATA_2D_XY_NETCDF in:locopt xyf
~/palm/current_version/JOBS/$fname/OUTPUT _xy nc
Here
the file attribut locopt
(2nd column) guarantees that the job continues if a permanent file does
not exist (e.g. in case of an initial run). Otherwise, the job would be
aborted.
Although the dataset created by the last run of a job
chain
will contain data from all selected time levels of the complete job
chain, the main disadvantage of this workaround is that the datasets
created by the remaining jobs (with lower cycle numbers) still exist
and may consume large disc space. They have to be deleted "by hand" by
the user on the local machine as well as on the remote
machine because they only contain redundant data from the
earlier
time levels which are already contained in the dataset created
by
the last job of the job chain.
Note:
Extension
of PALM NetCDF datasets of 2d horizontal cross sections requires that
parameters data_output
and section_xy
for the restart runs are set identical to the initial run. In case of a
value mismatch between initial and restart runs, a warning is issued in
the job protocol file and the dataset will contain only data from those
timelevels calculated within the restart run.
Similar
restrictions apply for all other PALM NetCDF datasets (i.e. profiles,
vertical cross sections, volume data, etc.).
Example
of a PALM NetCDF dataset
The
NetCDF dataset described here contains data of instantaneous horizontal
cross sections and has been created using the settings of the example
parameter file (see chapter
4.4.1),
i.e. it contains section data of the w-velocity-component and of the
potential temperature for vertical grid levels with index k = 2 and k = 10,
selected by the respective parameter settings data_output = 'w_xy', 'pt_xy', and section_xy = 2, 10. Output has been
created after every 900 s (dt_data_output
= 900.0).
Because of end_time
= 3600.0,
the file contains data of 4 time levels (t = 900, 1800, 2700, 3600 s).
Supposed
that the name of the NetCDF dataset is example_xy.nc,
an analysis of the file contents using the command
ncdump
-c example_xy.nc
will create the
following output. The original ncdump output is displayed using fixed spacing,
additional explanations are given in italian.
netcdf example_xy {
! filename
dimensions:
!
41 gridpoints along x and y, 4 timelevels
time = UNLIMITED ; // (4 currently) ! unlimited means
that additional time levels can be added (e.g. by
! restart jobs)
zu_xy = 2 ;
!
vertical dimension (2, because two cross sections are selected);
zw_xy = 2 ;
!
there are two different vertical dimensions zu and zw because due
zu1_xy = 1 ;
!
to the staggered grid the z-levels of variables are those of the
x = 41 ;
! u- or the w-component of the
velocity
y = 41 ;
variables:
!
precision, dimensions, and units of the variables
double time(time) ;
! the variables containing the
time levels and grid point coordinates
time:units = "seconds" ;
! have the same names as the
respective dimensions
double zu_xy(zu_xy) ;
zu_xy:units = "meters" ;
double zw_xy(zw_xy) ;
zw_xy:units = "meters" ;
double zu1_xy(zu1_xy) ;
zu1_xy:units = "meters" ;
double ind_z_xy(zu_xy) ;
ind_z_xy:units = "gridpoints" ;
double x(x) ;
x:units = "meters" ;
double y(y) ;
y:units = "meters" ;
float w_xy(time, zw_xy, y, x) ;
! array of the
vertical velocity; it has 4 dimensions: x and y,
w_xy:long_name = "w_xy" ;
! because it is a horizontal
cross section, zw_xy, which defines
w_xy:units = "m/s" ;
! the vertical levels of the
sections, and time, for the time levels
float pt_xy(time, zu_xy, y, x) ; ! array of the potential
temperature, which is defined on the u-grid
pt_xy:long_name = "pt_xy" ;
pt_xy:units = "K" ;
// global attributes:
:Conventions = "COARDS" ;
:title = "PALM
3.0 run:
example.00 host: ibmh 13-04-06 15:12:43" ;
! PALM
run-identifier
:VAR_LIST = ";w_xy;pt_xy;" ;
!
the list of output quantities contained in this dataset;
! this global
attribute can be used by FORTRAN programs to identify
! and read the
quantities contained in the file
data:
time = 905.3,
1808.98, 2711.98, 3603.59 ; !
values of the four time levels
zu_xy = 75, 475 ;
! heights of the two selected
cross sections (u-grid)
zw_xy = 100, 500 ;
zu1_xy = 25 ;
x = 0, 50, 100,
150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, ! x-coordinates of the gridpoints
750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300,
1350,
1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950,
2000 ;
y = 0, 50, 100,
150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700,
750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300,
1350,
1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950,
2000 ;
}
If
the option -c
is omitted in the ncdump
call, then also the complete grid point data of all quantities are
output to the terminal.
The example program shows how to read
this 2d
horizontal cross section dataset from a FORTRAN program (see above).
Last
change: $Id: chapter_4.5.1.html 493 2010-03-01 08:30:24Z fricke $