4.5.1 NetCDF data output 

The standard data output of PALM is NetCDF (Network Common Data Form) in 64-bit offset format. NetCDF is an interface to a library of data access functions for storing and retrieving data in the form of arrays. NetCDF is an abstraction that supports a view of data as a collection of self-describing, portable objects that can be accessed through a simple interface (protable means that NetCDF data files can be read on any machine regardless of where they have been created). Array values may be accessed directly, without knowing details of how the data are stored. Auxiliary information about the data, such as what units are used, may be stored with the data. Generic utilities and application programs can access NetCDF datasets (files) and transform, combine, analyze, or display specified fields of the data, e.g. the contents of a NetCDF dataset can be viewed using the command ncdump (see further below). Many (public domain) graphic software has built in interfaces to read NetCDF datasets (e.g. ferret or NCL). The complete NetCDF documentation is available from the NetCDF homepage. The NetCDF tutorial for FORTRAN90 can also be found on our web server.

The general output format of PALM data is determined by the runtime-parameter data_output_format (data_output_format = 'netcdf', by default). For historical reasons, some alternative formats can be selected (see data_output_format). The accuracy of the NetCDF output data can be set with parameter netcdf_precision. By default, data have single (4 byte) precision. Runtime-parameter netcdf_data_format can be used to choose between the different NetCDF file formats (classic, 64-bit offset, NetCDF4/HDF5). The 64-bit offset format allows creating large files (file size only limited by the underlying file system), but each output variable (array) is still limited to 2GB. In NetCDF4 format, there is no limit for the size of variables, and it also allows parallel I/O into one output file. However, some (graphic) software still does not support NetCDF4 format.

PALM allows the output of various data (e.g. cross sections, vertical profiles, timeseries, etc.) into different files. The following table gives an overview about the different kind of NetCDF output data offered by PALM. Beside the local names of the files, the table also lists the minimum parameter settings which are necessary to switch on the output, as well as the parameters to be used to control the output.

kind of datalocal filenameparameter settings necessary to switch on outputfurther parameters for output control
vertical profilesDATA_1D_PR_NETCDFdata_output_pr, dt_data_output (or dt_dopr)averaging_interval, (or averaging_interval_pr), data_output_format, dt_averaging_input, dt_averaging_input_pr, skip_time_data_output (or skip_time_dopr), statistic_regions
timeseriesDATA_1D_TS_NETCDFdt_dotsdata_output_format, statistic_regions
spectraDATA_1D_SP_NETCDFcomp_spectra_level, data_output_sp, dt_data_output (or dt_dosp), spectra_directionaveraging_interval (or averaging_interval_sp), data_output_format, dt_averaging_input_pr, skip_time_data_output (or skip_time_dosp)
2d cross section (xy)DATA_2D_XY_NETCDFdata_output (or data_output_user), dt_data_output (or dt_do2d_xy), section_xydata_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_do2d_xy)
2d cross section (xy), time-averagedDATA_2D_XY_AV_NETCDFdata_output (or data_output_user), dt_data_output (or dt_data_output_av or dt_do2d_xy), section_xyaveraging_interval, dt_averaging_input, data_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_data_output_av, or skip_time_do2d_xy)
2d cross section (xz)DATA_2D_XZ_NETCDFdata_output (or data_output_user), dt_data_output (or dt_do2d_xz), section_xzdata_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_do2d_xz)
2d cross section (xz), time-averagedDATA_2D_XZ_AV_NETCDFdata_output (or data_output_user), dt_data_output (or dt_data_output_av or dt_do2d_xz), section_xzaveraging_interval, dt_averaging_input, data_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_data_output_av, or skip_time_do2d_xz)
2d cross section (yz)DATA_2D_YZ_NETCDFdata_output (or data_output_user), dt_data_output (or dt_do2d_yz), section_yzdata_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_do2d_yz)
2d cross section (yz), time-averagedDATA_2D_YZ_AV_NETCDFdata_output (or data_output_user), dt_data_output (or dt_data_output_av or dt_do2d_yz), section_yzaveraging_interval, dt_averaging_input, data_output_format, data_output_2d_on_each_pe, do2d_at_begin, skip_time_data_output (or skip_time_data_output_av, or skip_time_do2d_yz)
3d volumeDATA_3D_NETCDFdata_output (or data_output_user), dt_data_output (or dt_do3d)data_output_format, do3d_at_begin, nz_do3d, skip_time_data_output (or skip_time_do3d)
3d volume, time-averagedDATA_3D_AV_NETCDFdata_output (or data_output_user), dt_data_output (or dt_data_output_av or dt_do3d)averaging_interval, dt_averaging_input, data_output_formatdo3d_at_begin, nz_do3d, skip_time_data_output (or skip_time_data_output_av, or skip_time_do3d)
particle timeseriesDATA_1D_PTS_NETCDFdt_data_output (or dt_dopts)
particle attributesDATA_PRT_NETCDFdt_write_particle_data


Creating, contents and post-processing of a PALM NetCDF file

This section describes, step-by-step, the creation, storage, and post-processing of PALM NetCDF datasets considering the output of 2d horizontal (xy) cross sections as example. The parameter settings described below are those of the example parameter file (see chapter 4.4.1) so this parameter file can be used to retrace the following explanations.

  1. Output of xy cross sections requires to set at least three parameters: first, the temporal interval of the output time (run parameter dt_data_output or dt_do2d_xy), second, the names of the quantities for which cross section output is wanted (data_output), and third, the position (height level given as gridpoint index) of the cross sections (section_xy). The string '_xy' must be appended to the name strings assigned to data_output in either case. Output times cannot be defined directly but only via the output time interval, starting from the beginning of the initial 3d run (t=0, but no cross sections are written at the time t=0; exceptions see do2d_at_begin). As an exception, the first output time can be set independently with parameter skip_time_data_output (or skip_time_do2d_xy).

    Very important:
    If no values have been assigned to data_output , dt_data_output (or dt_do2d_xy), and section_xy, or if the values given for dt_data_output (or dt_do2d_xy) or skip_time_data_output (or skip_time_do2d_xy) are larger than the simulated time (see end_time), then there will be no output!

    For output of time-averaged data, the string '_av' has to be additionally appended to the respective name string (see data_output).


  2. Instantaneous data are output in NetCDF format into the local file DATA_2D_XY_NETCDF. This file must be linked with a permanent file by using a file connection statement in the mrun configuration file (see e.g. chapter 3.2). At the end of the run the local file is copied to this file. Such a statement can look like this:

       DATA_2D_XY_NETCDF out:loc:tr  xy#  ~/$fname/OUTPUT/$fname  _xy nc   .

    If the respective mrun call is like

       mrun -d  test -r “xy#” ...

    then the local file DATA_2D_XY_NETCDF is copied to the permanent file ~/test/OUTPUT/test/test_xy.nc . However, the character string 'xy#' activating the file connection statement (see third column of the statement) must be given in the mrun call as argument of the option -r (and/or -o). If this is forgotten by mistake, the model outputs the data to the local file, but this is not copied to the permanent file and thus the data are not available for the user after the run has finished.

    The last (6th) column of the file connection statement, which defines the additional file suffix, should be the string 'nc', because many application programs expect NetCDF files to have the file extension '.nc'. (This additional suffix given in the 6th column is always put at the very end of the filename, even in case of cycle numbers.)

    Time averaged data are output into local file DATA_2D_XY_AV_NETCDF which requires an additional file connection statement

       DATA_2D_XY_AV_NETCDF out:loc:tr  xy#  ~/$fname/OUTPUT/$fname  _xy_av nc   .

  3. Using netcdf_data_format > 2 or data_output_2d_on_each_pe = .F. generates a single NetCDF file containing data from all propcessors. However, with parallel runs and choice of data_output_2d_on_each_pe = .T. each PE outputs the data of its subdomain not directly to the NetCDF file but to a separate file with the name PLOT2D_XY_<processor-Id>, where <processor-Id> is a four digit number (e.g. PLOT2D_XY_0000). These files have FORTRAN binary format. After PALM has finished, their content is merged into the final local destination file DATA_2D_XY_NETCDF by the program combine_plot_fieldsThis is done by adding the following output command to the configuration file:

       OC:[[$ ( echo $localhost | cut -c1-3 ) = imbh ]] && combine_plot_fields.x     .

    Using this call, possibly existing files of the other cross sections (xz, yz) and of 3d volume data are also merged to their respective NetCDF files. The tool writes informative messages about the actions accomplished into the job protocol, even if no files were found (i.e. the output command may remain in the configuration file, even if no appropriate files are created during the simulation). 


  4. The contents of a NetCDF dataset can be simply analyzed with the tool ncdump (which is part of the NetCDF software). It can be used to display the dimension (coordinate) names and lengths; variable names, types, and shapes; attribute names and values; and optionally,
    the values of data for all variables or selected variables in a netCDF dataset. The file content (without displaying the gridpoint data of the quantities) can be displayed with the command

       ncdump -c <filename>     .

    Usage of the ncdump command requires that the path to the NetCDF software is appropriately set. On the IMUK-Linux-cluster this path is set by default, on the HLRN-IBM-Regatta, the user has to execute the command

       module load netcdf     .

    Please refer to the system documentation or system administrator on how to setup the correct NetCDF path on the respective host.

    An example how to interpret the ncdump-output will be given further below.

  5. There are several application programs which can be used for graphical display of NetCDF datasets. One of the easiest ways to display the PALM data is the ferret graphical user interface (GUI). On the IMUK-Linux-cluster, this can be called by executing the command

       ferret -gui     .

    ferret is also available at HLRN. Another possible tool is ncview, which is also available at HLRN (see the HLRN documentation). Beside these general tools, the PALM group will develop a graphical interface based on NCL (NCAR Command Language). This interface will be specially designed to display the PALM Data. A detailed documentation will be linked here as soon as available.


  6. One of the most flexible general ways for postprocessing NetCDF data is reading these data into a FORTRAN program. The example program shows how to read 2d or 3d NetCDF datasets created by PALM. Compiling this program requires that the NetCDF library is installed (if neccessary, please ask your system administrator). Some compilation instructions are given in the header of the example program.


  7. By default, each PALM job creates its own NetCDF files. If permanent files with respective filenames are already existing, then new files with higher cycle numbers will be created. However, in case of a job chain, it is possible to extend the NetCDF datasets created by the initial run with data from the restart run(s). As a result, data of all output times of the complete job chain are contained in one file and the number of data files to be handled by the user may be reduced significantly.
    To extend a NetCDF dataset (created by a previous run of a job chain) with data from the current run requires that this dataset must be provided as an INPUT file. This may be difficult in case that PALM is running on a remote host because typically the output data files from the previous run have been already transferred by mrun to the local workstation with a file connection statement like

       DATA_2D_XY_NETCDF  out:loc:tr  xy#:xyf  ~/palm/current_version/JOBS/$fname/OUTPUT  _xy  nc

    and thus they are not available on the remote host any more.
    A workaround for solving this problem is to create an additional copy of the output file on the remote machine by adding the file connection statement

       DATA_2D_XY_NETCDF  out:loc  xy#:xyf  ~/palm/current_version/JOBS/$fname/OUTPUT  _xy  nc

    This additional copy can then be accessed from a restart job as an input file using the file connection statement

       DATA_2D_XY_NETCDF  in:locopt  xyf  ~/palm/current_version/JOBS/$fname/OUTPUT  _xy  nc

    Here the file attribut locopt (2nd column) guarantees that the job continues if a permanent file does not exist (e.g. in case of an initial run). Otherwise, the job would be aborted.
    Although the dataset created by the last run of a job chain will contain data from all selected time levels of the complete job chain, the main disadvantage of this workaround is that the datasets created by the remaining jobs (with lower cycle numbers) still exist and may consume large disc space. They have to be deleted "by hand" by the user on the local machine as well as on the remote machine because they only contain redundant data from the earlier time levels which are already contained in the dataset created by the last job of the job chain.

    Note:
    Extension of PALM NetCDF datasets of 2d horizontal cross sections requires that parameters data_output and section_xy  for the restart runs are set identical to the initial run. In case of a value mismatch between initial and restart runs, a warning is issued in the job protocol file and the dataset will contain only data from those timelevels calculated within the restart run.
    Similar restrictions apply for all other PALM NetCDF datasets (i.e. profiles, vertical cross sections, volume data, etc.).

 
Example of a PALM NetCDF dataset

The NetCDF dataset described here contains data of instantaneous horizontal cross sections and has been created using the settings of the example parameter file (see chapter 4.4.1), i.e. it contains section data of the w-velocity-component and of the potential temperature for vertical grid levels with index k = 2 and k = 10, selected by the respective parameter settings data_output = 'w_xy', 'pt_xy', and section_xy = 2, 10. Output has been created after every 900 s (dt_data_output = 900.0). Because of end_time = 3600.0, the file contains data of 4 time levels (t = 900, 1800, 2700, 3600 s).
Supposed that the name of the NetCDF dataset is example_xy.nc, an analysis of the file contents using the command

   ncdump -c example_xy.nc

will create the following output. The original ncdump output is displayed using fixed spacing, additional explanations are given in italian.

netcdf example_xy {                        ! filename
dimensions:                                ! 41 gridpoints along x and y, 4 timelevels
    time = UNLIMITED ; // (4 currently)    ! unlimited means that additional time levels can be added (e.g. by
                                           ! restart jobs)
    zu_xy = 2 ;                            ! vertical dimension (2, because two cross sections are selected);

    zw_xy = 2 ;                            ! there are two different vertical dimensions zu and zw because due
    zu1_xy = 1 ;                           ! to the staggered grid the z-levels of variables are those of the
    x = 41 ;                               ! u- or the w-component of the velocity
    y = 41 ;
variables:                                 ! precision, dimensions, and units of the variables
    double time(time) ;                    ! the variables containing the time levels and grid point coordinates
        time:units = "seconds" ;           ! have the same names as the respective dimensions
    double zu_xy(zu_xy) ;
        zu_xy:units = "meters" ;
    double zw_xy(zw_xy) ;
        zw_xy:units = "meters" ;
    double zu1_xy(zu1_xy) ;
        zu1_xy:units = "meters" ;
    double ind_z_xy(zu_xy) ;
        ind_z_xy:units = "gridpoints" ;
    double x(x) ;
        x:units = "meters" ;
    double y(y) ;
        y:units = "meters" ;
    float w_xy(time, zw_xy, y, x) ;        ! array of the vertical velocity; it has 4 dimensions: x and y,
        w_xy:long_name = "w_xy" ;          ! because it is a horizontal cross section, zw_xy, which defines
        w_xy:units = "m/s" ;               ! the vertical levels of the sections, and time, for the time levels
    float pt_xy(time, zu_xy, y, x) ;       ! array of the potential temperature, which is defined on the u-grid
        pt_xy:long_name = "pt_xy" ;
        pt_xy:units = "K" ;

// global attributes:
        :Conventions = "COARDS" ;
        :title = "PALM   3.0  run: example.00  host: ibmh  13-04-06 15:12:43" ;  ! PALM run-identifier
        :VAR_LIST = ";w_xy;pt_xy;" ;       ! the list of output quantities contained in this dataset;
                                           ! this global attribute can be used by FORTRAN programs to identify
                                           ! and read the quantities contained in the file
data:


 time = 905.3, 1808.98, 2711.98, 3603.59 ; ! values of the four time levels

 zu_xy = 75, 475 ;                         ! heights of the two selected cross sections (u-grid)

 zw_xy = 100, 500 ;

 zu1_xy = 25 ;

 x = 0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700,   ! x-coordinates of the gridpoints
    750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350,
    1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950,
    2000 ;

 y = 0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700,
    750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350,
    1400, 1450, 1500, 1550, 1600, 1650, 1700, 1750, 1800, 1850, 1900, 1950,
    2000 ;
}

If the option -c is omitted in the ncdump call, then also the complete grid point data of all quantities are output to the terminal.

The  example program shows how to read this 2d horizontal cross section dataset from a FORTRAN program (see above).


Last change: $Id: chapter_4.5.1.html 493 2010-03-01 08:30:24Z basit $