In the section Reducing a Cube we saw how to extract a subset of a cube in order to reduce either its dimensionality or its resolution. Instead of simply extracting a sub-region of the data, we can produce statistical functions of the data values across a particular dimension, such as a ‘mean over time’ or ‘minimum over latitude’.
For instance, suppose we have a cube:
>>> import iris
>>> filename = iris.sample_data_path('uk_hires.pp')
>>> cube = iris.load_cube(filename, 'air_potential_temperature')
>>> print cube
air_potential_temperature / (K) (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187)
Dimension coordinates:
time x - - -
model_level_number - x - -
grid_latitude - - x -
grid_longitude - - - x
Auxiliary coordinates:
forecast_period x - - -
level_height - x - -
sigma - x - -
surface_altitude - - x x
Derived coordinates:
altitude - x x x
Scalar coordinates:
forecast_reference_time: 2009-11-19 04:00:00
Attributes:
STASH: m01s00i004
source: Data from Met Office Unified Model 7.03
In this case we have a 4 dimensional cube; to mean the vertical (z) dimension down to a single valued extent we can pass the coordinate name and the aggregation definition to the Cube.collapsed() method:
>>> import iris.analysis
>>> vertical_mean = cube.collapsed('model_level_number', iris.analysis.MEAN)
>>> print vertical_mean
air_potential_temperature / (K) (time: 3; grid_latitude: 204; grid_longitude: 187)
Dimension coordinates:
time x - -
grid_latitude - x -
grid_longitude - - x
Auxiliary coordinates:
forecast_period x - -
surface_altitude - x x
Derived coordinates:
altitude - x x
Scalar coordinates:
forecast_reference_time: 2009-11-19 04:00:00
level_height: 696.667 m, bound=(0.0, 1393.33) m
model_level_number: 10, bound=(1, 19)
sigma: 0.92293, bound=(0.84586, 1.0)
Attributes:
STASH: m01s00i004
history: Mean of air_potential_temperature over model_level_number
source: Data from Met Office Unified Model 7.03
Cell methods:
mean: model_level_number
Similarly other analysis operators such as MAX, MIN and STD_DEV can be used instead of MEAN, see iris.analysis for a full list of currently supported operators.
Some operators support additional keywords to the cube.collapsed method. For example, iris.analysis.MEAN supports a weights keyword which can be combined with iris.analysis.cartography.area_weights() to calculate an area average.
Let’s use the same data as was loaded in the previous example. Since grid_latitude and grid_longitude were both point coordinates we must guess bound positions for them in order to calculate the area of the grid boxes:
import iris.analysis.cartography
cube.coord('grid_latitude').guess_bounds()
cube.coord('grid_longitude').guess_bounds()
grid_areas = iris.analysis.cartography.area_weights(cube)
These areas can now be passed to the collapsed method as weights:
>>> new_cube = cube.collapsed(['grid_longitude', 'grid_latitude'], iris.analysis.MEAN, weights=grid_areas)
>>> print new_cube
air_potential_temperature / (K) (time: 3; model_level_number: 7)
Dimension coordinates:
time x -
model_level_number - x
Auxiliary coordinates:
forecast_period x -
level_height - x
sigma - x
Derived coordinates:
altitude - x
Scalar coordinates:
forecast_reference_time: 2009-11-19 04:00:00
grid_latitude: 1.51455 degrees, bound=(0.1443, 2.8848) degrees
grid_longitude: 358.749 degrees, bound=(357.494, 360.005) degrees
surface_altitude: 399.625 m, bound=(-14.0, 813.25) m
Attributes:
STASH: m01s00i004
history: Mean of air_potential_temperature over grid_longitude, grid_latitude
source: Data from Met Office Unified Model 7.03
Cell methods:
mean: grid_longitude, grid_latitude
Instead of completely collapsing a dimension, other methods can be applied to reduce or filter the number of data points of a particular dimension.
The Cube.aggregated_by operation combines data for all points with the same value of a given coordinate. To do this, you need a coordinate whose points take on only a limited set of different values – the number of these then determines the size of the reduced dimension. The iris.coord_categorisation module can be used to make such ‘categorical’ coordinates out of ordinary ones: The most common use is to aggregate data over regular time intervals, such as by calendar month or day of the week.
For example, let’s create two new coordinates on the cube to represent the climatological seasons and the season year respectively:
import iris
import iris.coord_categorisation
filename = iris.sample_data_path('ostia_monthly.nc')
cube = iris.load_cube(filename, 'surface_temperature')
iris.coord_categorisation.add_season(cube, 'time', name='clim_season')
iris.coord_categorisation.add_season_year(cube, 'time', name='season_year')
Note
The ‘season year’ is not the same as year number, because (e.g.) the months Dec11, Jan12 + Feb12 all belong to ‘DJF-12’. See iris.coord_categorisation.add_season_year().
Printing this cube now shows that two extra coordinates exist on the cube:
>>> print cube
surface_temperature / (K) (time: 54; latitude: 18; longitude: 432)
Dimension coordinates:
time x - -
latitude - x -
longitude - - x
Auxiliary coordinates:
clim_season x - -
forecast_reference_time x - -
season_year x - -
Scalar coordinates:
forecast_period: 0 hours
Attributes:
Conventions: CF-1.5
STASH: m01s00i024
history: Mean of surface_temperature aggregated over month, year
Cell methods:
mean: month, year
These two coordinates can now be used to aggregate by season and climate-year:
>>> annual_seasonal_mean = cube.aggregated_by(
... ['clim_season', 'season_year'],
... iris.analysis.MEAN)
>>> print repr(annual_seasonal_mean)
<iris 'Cube' of surface_temperature / (K) (*ANONYMOUS*: 19; latitude: 18; longitude: 432)>
The primary change in the cube is that the cube’s data has been reduced in the ‘time’ dimension by aggregation (taking means, in this case). This has collected together all datapoints with the same values of season and season-year. The results are now indexed by the 19 different possible values of season and season-year in a new, reduced ‘time’ dimension.
We can see this by printing the first 10 values of season+year from the original cube: These points are individual months, so adjacent ones are often in the same season:
>>> print zip(
... cube.coord('clim_season')[:10].points,
... cube.coord('season_year')[:10].points)
[('mam', 2006), ('mam', 2006), ('jja', 2006), ('jja', 2006), ('jja', 2006), ('son', 2006), ('son', 2006), ('son', 2006), ('djf', 2007), ('djf', 2007)]
Compare this with the first 10 values of the new cube’s coordinates: All the points now have distinct season+year values:
>>> print zip(
... annual_seasonal_mean.coord('clim_season')[:10].points,
... annual_seasonal_mean.coord('season_year')[:10].points)
[('mam', 2006), ('jja', 2006), ('son', 2006), ('djf', 2007), ('mam', 2007), ('jja', 2007), ('son', 2007), ('djf', 2008), ('mam', 2008), ('jja', 2008)]
Because the original data started in April 2006 we have some incomplete seasons (e.g. there were only two months worth of data for ‘mam-2006’). In this case we can fix this by removing all of the resultant ‘times’ which do not cover a three month period (note: judged here as > 3*28 days):
>>> spans_three_months = lambda t: (t.bound[1] - t.bound[0]) > 3*28*24.0
>>> three_months_bound = iris.Constraint(time=spans_three_months)
>>> full_season_means = annual_seasonal_mean.extract(three_months_bound)
>>> full_season_means
<iris 'Cube' of surface_temperature / (K) (*ANONYMOUS*: 17; latitude: 18; longitude: 432)>
The final result now represents the seasonal mean temperature for 17 seasons from jja-2006 to jja-2010:
>>> print zip(
... full_season_means.coord('clim_season').points,
... full_season_means.coord('season_year').points)
[('jja', 2006), ('son', 2006), ('djf', 2007), ('mam', 2007), ('jja', 2007), ('son', 2007), ('djf', 2008), ('mam', 2008), ('jja', 2008), ('son', 2008), ('djf', 2009), ('mam', 2009), ('jja', 2009), ('son', 2009), ('djf', 2010), ('mam', 2010), ('jja', 2010)]