=============== Cube statistics =============== Collapsing entire data dimensions --------------------------------- .. testsetup:: import iris filename = iris.sample_data_path('uk_hires.pp') cube = iris.load_cube(filename, 'air_potential_temperature') import iris.analysis.cartography cube.coord('grid_latitude').guess_bounds() cube.coord('grid_longitude').guess_bounds() grid_areas = iris.analysis.cartography.area_weights(cube) In the section :doc:`reducing_a_cube` we saw how to extract a subset of a cube in order to reduce either its dimensionality or its resolution. Instead of downsampling the data, a similar goal can be achieved using statistical operations over *all* of the data. Suppose we have a cube: >>> import iris >>> filename = iris.sample_data_path('uk_hires.pp') >>> cube = iris.load_cube(filename, 'air_potential_temperature') >>> print cube air_potential_temperature (time: 3; model_level_number: 7; grid_latitude: 204; grid_longitude: 187) Dimension coordinates: time x - - - model_level_number - x - - grid_latitude - - x - grid_longitude - - - x Auxiliary coordinates: forecast_period x - - - level_height - x - - sigma - x - - surface_altitude - - x x Derived coordinates: altitude - x x x Scalar coordinates: forecast_reference_time: 349612.0 hours since 1970-01-01 00:00:00 Attributes: STASH: m01s00i004 source: Data from Met Office Unified Model 7.03 In this case we have a 4 dimensional cube; to mean the vertical (z) dimension down to a single valued extent we can pass the coordinate name and the aggregation definition to the :meth:`Cube.collapsed() ` method: >>> import iris.analysis >>> vertical_mean = cube.collapsed('model_level_number', iris.analysis.MEAN) >>> print vertical_mean air_potential_temperature (time: 3; grid_latitude: 204; grid_longitude: 187) Dimension coordinates: time x - - grid_latitude - x - grid_longitude - - x Auxiliary coordinates: forecast_period x - - surface_altitude - x x Derived coordinates: altitude - x x Scalar coordinates: forecast_reference_time: 349612.0 hours since 1970-01-01 00:00:00 level_height: 696.667 m, bound=(0.0, 1393.33) m model_level_number: 10, bound=(1, 19) sigma: 0.92293, bound=(0.84586, 1.0) Attributes: STASH: m01s00i004 history: Mean of air_potential_temperature over model_level_number source: Data from Met Office Unified Model 7.03 Cell methods: mean: model_level_number Similarly other analysis operators such as ``MAX``, ``MIN`` and ``STD_DEV`` can be used instead of ``MEAN``, see :mod:`iris.analysis` for a full list of currently supported operators. Area averaging ^^^^^^^^^^^^^^ Some operators support additional keywords to the ``cube.collapsed`` method. For example, :func:`iris.analysis.MEAN ` supports a weights keyword which can be combined with :func:`iris.analysis.cartography.area_weights` to calculate an area average. Let's use the same data as was loaded in the previous example. Since ``grid_latitude`` and ``grid_longitude`` were both point coordinates we must guess bound positions for them in order to calculate the area of the grid boxes:: import iris.analysis.cartography cube.coord('grid_latitude').guess_bounds() cube.coord('grid_longitude').guess_bounds() grid_areas = iris.analysis.cartography.area_weights(cube) These areas can now be passed to the ``collapsed`` method as weights: .. doctest:: >>> new_cube = cube.collapsed(['grid_longitude', 'grid_latitude'], iris.analysis.MEAN, weights=grid_areas) >>> print new_cube air_potential_temperature (time: 3; model_level_number: 7) Dimension coordinates: time x - model_level_number - x Auxiliary coordinates: forecast_period x - level_height - x sigma - x Derived coordinates: altitude - x Scalar coordinates: forecast_reference_time: 349612.0 hours since 1970-01-01 00:00:00 grid_latitude: 1.51455 degrees, bound=(0.1443, 2.8848) degrees grid_longitude: 358.749 degrees, bound=(357.494, 360.005) degrees surface_altitude: 399.625 m, bound=(-14.0, 813.25) m Attributes: STASH: m01s00i004 history: Mean of air_potential_temperature over grid_longitude, grid_latitude source: Data from Met Office Unified Model 7.03 Cell methods: mean: grid_longitude, grid_latitude Partially collapsing data dimensions ------------------------------------ Instead of completely collapsing a dimension, other methods can be applied to reduce or filter the number of data points of a particular dimension. Aggregation of grouped data ^^^^^^^^^^^^^^^^^^^^^^^^^^^ An aggregation on a *group* of coordinate values can be achieved with :meth:`Cube.aggregated_by `, which can be combined with the :mod:`iris.coord_categorisation` module to group the coordinate in the first place. First, let's create two coordinates on a cube which represent the climatological seasons and the season year respectively:: import iris import iris.coord_categorisation filename = iris.sample_data_path('ostia_monthly.nc') cube = iris.load_cube(filename, 'surface_temperature') iris.coord_categorisation.add_season(cube, 'time', name='clim_season') iris.coord_categorisation.add_season_year(cube, 'time', name='season_year') .. testsetup:: aggregation import iris filename = iris.sample_data_path('ostia_monthly.nc') cube = iris.load_cube(filename, 'surface_temperature') import iris.coord_categorisation iris.coord_categorisation.add_season(cube, 'time', name='clim_season') iris.coord_categorisation.add_season_year(cube, 'time', name='season_year') annual_seasonal_mean = cube.aggregated_by(['clim_season', 'season_year'], iris.analysis.MEAN) Printing this cube now shows that two extra coordinates exist on the cube: .. doctest:: aggregation >>> print cube surface_temperature (time: 54; latitude: 18; longitude: 432) Dimension coordinates: time x - - latitude - x - longitude - - x Auxiliary coordinates: clim_season x - - forecast_reference_time x - - season_year x - - Scalar coordinates: forecast_period: 0 hours Attributes: Conventions: CF-1.5 STASH: m01s00i024 history: Mean of surface_temperature aggregated over month, year Cell methods: mean: month, year These two coordinates can now be used as *groups* over which to do an aggregation: .. doctest:: aggregation >>> annual_seasonal_mean = cube.aggregated_by(['clim_season', 'season_year'], iris.analysis.MEAN) >>> print repr(annual_seasonal_mean) The primary change in the cube is that the cube's data has shrunk on the t axis as a result of the meaning aggregation. We have now collapsed all repeating copies of season (DJF etc.) and year to represent a single position in the t axis. We can see this by printing the first 10 values of the original coordinates: .. doctest:: aggregation >>> print cube.coord('clim_season')[:10].points ['mam' 'mam' 'jja' 'jja' 'jja' 'son' 'son' 'son' 'djf' 'djf'] >>> print cube.coord('season_year')[:10].points [2006 2006 2006 2006 2006 2006 2006 2006 2007 2007] And then comparing with the first 10 values of the new cube's coordinates: .. doctest:: aggregation >>> print annual_seasonal_mean.coord('clim_season')[:10].points ['mam' 'jja' 'son' 'djf' 'mam' 'jja' 'son' 'djf' 'mam' 'jja'] >>> print annual_seasonal_mean.coord('season_year')[:10].points [2006 2006 2006 2007 2007 2007 2007 2008 2008 2008] Because the original data started in April 2006 we have some incomplete seasons (e.g. there were only two months worth of data for ``mam 2006``). In this case we can fix this by removing all of the resultant ``times`` which do not cover a three month period (n.b. 3 months = 3 * 30 * 24 = 2160 hours): .. doctest:: aggregation >>> spans_three_months = lambda time: (time.bound[1] - time.bound[0]) == 2160 >>> three_months_bound = iris.Constraint(time=spans_three_months) >>> print annual_seasonal_mean.extract(three_months_bound) surface_temperature (*ANONYMOUS*: 3; latitude: 18; longitude: 432) Dimension coordinates: latitude - x - longitude - - x Auxiliary coordinates: clim_season x - - forecast_reference_time x - - season_year x - - time x - - Scalar coordinates: forecast_period: 0 hours Attributes: Conventions: CF-1.5 STASH: m01s00i024 history: Mean of surface_temperature aggregated over month, year Mean of surface_temperature... Cell methods: mean: month, year mean: clim_season, season_year The final result now represents the seasonal mean temperature for 63 seasons starting from ``March April May 1990``.