This document provides a basic account of how PP and Fieldsfiles data is represented within Iris. It describes how Iris represents data from the Met Office Unified Model (UM), in terms of the metadata elements found in PP and Fieldsfile formats.
For simplicity, we shall describe this mostly in terms of loading of PP data into Iris (i.e. into cubes). However most of the details are identical for Fieldsfiles, and are relevant to saving in these formats as well as loading.
Notes:
For details of Iris terms (cubes, coordinates, attributes), refer to Iris data structures.
For details of CF conventions, see http://cfconventions.org/.
The basics of Iris loading are explained at Loading Iris cubes. Loading as it specifically applies to PP and Fieldsfile data can be summarised as follows:
Note
This document covers the essential features of the UM data loading process. The complete details are implemented as follows:
The rest of this document describes various independent sections of related metadata items.
(unrotated) : coordinates longitude, latitude
(rotated pole) : coordinates grid_latitude, grid_longitude
Details
At present, only latitude-longitude projections are supported (both normal and rotated). In these cases, LBCODE is typically 1 or 101 (though, in fact, cross-sections with latitude and longitude axes are also supported).
For an ordinary latitude-longitude grid, the cubes have coordinates called ‘longitude’ and ‘latitude’:
- These are mapped to the appropriate data dimensions.
- They have units of ‘degrees’.
- They have a coordinate system of type iris.coord_systems.GeogCS.
- The coordinate points are normally set to the regular sequence ZDX/Y + BDX/Y * (1 .. LBNPT/LBROW) (except, if BDX/BDY is zero, the values are taken from the extra data vector X/Y, if present).
- If X/Y_LOWER_BOUNDS extra data is available, this appears as bounds values of the horizontal cooordinates.
For rotated latitude-longitude coordinates (as for LBCODE=101), the horizontal coordinates differ only slightly –
- The names are ‘grid_latitude’ and ‘grid_longitude’.
- The coord_system is a iris.coord_systems.RotatedGeogCS, created with a pole defined by BPLAT, BPLON.
>>> # Load a PP field.
... fname = iris.sample_data_path('air_temp.pp')
>>> fields_iter = iris.fileformats.pp.load(fname)
>>> field = next(fields_iter)
>>>
>>> # Show grid details and first 5 longitude values.
>>> print(' '.join(str(_) for _ in (field.lbcode, field.lbnpt, field.bzx,
... field.bdx)))
1 96 -3.75 3.75
>>> print(field.bzx + field.bdx * np.arange(1, 6))
[ 0. 3.75 7.5 11.25 15. ]
>>>
>>> # Show Iris equivalent information.
... cube = iris.load_cube(fname)
>>> print(cube.coord('longitude').points[:5])
[ 0. 3.75 7.5 11.25 15. ]
Note
Note that in Iris (as in CF) there is no special distinction between “regular” and “irregular” coordinates. Thus on saving, X and Y extra data sections are written only if the actual values are unevenly spaced.
Details
This information is normally encoded in the cube standard_name property. Iris identifies the stash section and item codes from LBUSER4 and the model code in LBUSER7, and compares these against a list of phenomenon types with known CF translations. If the stashcode is recognised, it then defines the appropriate standard_name and units properties of the cube (i.e. iris.cube.Cube.standard_name and iris.cube.Cube.units).
Where any parts of the stash information are outside the valid range, Iris will instead attempt to interpret LBFC, for which a set of known translations is also stored. This is often the case for fieldsfiles, where LBUSER4 is frequently left as 0.
In all cases, Iris also constructs a STASH item to identify the phenomenon, which is stored as a cube attribute named STASH. This preserves the original STASH coding (as standard name translation is not always one-to-one), and can be used when no standard_name translation is identified (for example, to load only certain stashcodes with a constraint – see example at Load constraint examples).
>>> # Show PPfield phenomenon details.
>>> print(field.lbuser[3])
16203
>>> print(field.lbuser[6])
1
>>>
>>>
>>> # Show Iris equivalents.
>>> print(cube.standard_name)
air_temperature
>>> print(cube.units)
K
>>> print(cube.attributes['STASH'])
m01s16i203
Note
On saving data, no attempt is made to translate a cube standard_name into a STASH code, but any attached ‘STASH’ attribute will be stored into the LBUSER4 and LBUSER7 elements.
for height levels : coordinate height
for pressure levels : coordinate pressure
for hybrid height levels :
for hybrid pressure levels :
Details
Several vertical coordinate forms are supported, according to different values of LBVC. The commonest ones are:
In all these cases, vertical coordinates are created, with points and bounds values taken from the appropriate header elements. In the raw cubes, each vertical coordinate is just a single value, but multiple values will usually occur. The subsequent merge operation will then convert these into multiple-valued coordinates, and create a new vertical data dimension (i.e. a “Z” axis) which they map onto.
Three basic vertical coordinates are created:
Also in this case, a HybridHeightFactory is created, which references the ‘level_height’ and ‘sigma’ coordinates. Following raw cube merging, an extra load stage occurs where the attached HybridHeightFactory is called to manufacture a new altitude coordinate:
To make the altitude coordinate, there must be an orography field present in the load sources. This is a surface altitude reference field, identified (by stashcode) during the main loading operation, and recorded for later use in the hybrid height calculation. If it is absent, a warning message is printed, and no altitude coordinate is produced.
Note that on merging hybrid height data into a cube, only the ‘model_level’ coordinate becomes a dimension coordinate: The other vertical coordinates remain as auxiliary coordinates, because they may be (variously) multidimensional or non-monotonic.
See an example printout of a hybrid height cube, here:
Notice that this contains all of the above coordinates – ‘model_level_number’, ‘sigma’, ‘level_height’ and the derived ‘altitude’.
Note
Hybrid pressure levels can also be handled (for LBVC=9). Without going into details, the mechanism is very similar to that for hybrid height: it produces basic coordinates ‘model_level_number’, ‘sigma’ and ‘level_pressure’, and a manufactured 3D ‘air_pressure’ coordinate.
UM Field elements
Details
In Iris (as in CF) times and time intervals are both expressed as simple numbers, following the approach of the UDUNITS project. These values are stored as cube coordinates, where the scaling and calendar information is contained in the units property.
The units.calendar property of time coordinates is set from the lowest decimal digit of LBTIM, known as LBTIM.IC. Note that the non-gregorian calendars (e.g. 360-day ‘model’ calendar) are defined in CF, not udunits.
There are a number of different time encoding methods used in UM data, but the important distinctions are controlled by the next-to-lowest decimal digit of LBTIM, known as “LBTIM.IB”. The most common cases are as follows:
Note that, in those more complex cases where the input defines all three of the ‘time’, ‘forecast_reference_time’ and ‘forecast_period’ values, any or all of these may become dimensions of the resulting data cube. This will depend on the values actually present in the source fields for each of the elements.
See an example printout of a forecast data cube, here :
Notice that this example contains all of the above coordinates – ‘time’, ‘forecast_period’ and ‘forecast_reference_time’. In this case the data are forecasts, so ‘time’ is a dimension, ‘forecast_period’ varies with time and ‘forecast_reference_time’ is a constant.
Details
Where a field contains statistically processed data, Iris will add an appropriate iris.coords.CellMethod to the cube, representing the aggregation operation which was performed.
This is implemented for certain binary flag bits within the LBPROC element value. For example:
Cube has a cell_method of the form “CellMethod(‘mean’, ‘time’).
Cube has a cell_method of the form “CellMethod(‘minimum’, ‘time’).
Cube has a cell_method of the form “CellMethod(‘maximum’, ‘time’).
In all these cases, if the field LBTIM is also set to denote a time aggregate field (i.e. “LBTIM.IB=2”, see above Time information), then the second-to-last digit of LBTIM, aka “LBTIM.IA” may also be non-zero, in which case this indicates the aggregation time-interval. In that case, the cell-method intervals attribute is also set to this many hours.
>>> # Show stats metadata in a test PP field.
... fname = iris.sample_data_path('pre-industrial.pp')
>>> eg_field = next(iris.fileformats.pp.load(fname))
>>> print(eg_field.lbtim)
622
>>> print(eg_field.lbproc)
128
>>>
>>> # Print out the Iris equivalent information.
>>> print(iris.load_cube(fname).cell_methods)
(CellMethod(method='mean', coord_names=('time',), intervals=('6 hour',), comments=()),)
If non-zero, this is interpreted as an ensemble number. This produces a cube scalar coordinate named ‘realization’ (as defined in the CF conventions).
If non-zero, this is interpreted as a ‘pseudo_level’ number. This produces a cube scalar coordinate named ‘pseudo_level’. In the UM documentation LBUSER5 is also sometimes referred to as LBPLEV.