Format Handler

earthkit-utils has a number of decorators which are used in downstream earthkit packages to make the user interface for the simpler for non-technical users.

This notebook demonstrates the format handler decorator and the effect when used.

[1]:
# Import earhtkit data
import os

import numpy as np

# Import some common libraries we will use for typesetting example
import xarray as xr

from earthkit import data as ekd

# Import the handlers
from earthkit.utils.decorators import format_handler

# Set cache policy
ekd.settings.set("cache-policy", "user")

# Define a function to construct remote test data file paths
REMOTE_TEST_DATA_URL = "https://sites.ecmwf.int/repository/earthkit-data/test-data/"

Create some example earthkit data objects to use as examples, the following a various Reader objects using test data

[2]:
ekds_grib = ekd.from_source("url", os.path.join(REMOTE_TEST_DATA_URL, "era5_temperature_europe_20150101.grib"))
ekds_geojson = ekd.from_source("url", os.path.join(REMOTE_TEST_DATA_URL, "NUTS_RG_60M_2021_4326_LEVL_0.geojson"))
ekds_netcdf = ekd.from_source("url", os.path.join(REMOTE_TEST_DATA_URL, "test_single.nc"))
ekds_netcdf_satellite = ekd.from_source("url", os.path.join(REMOTE_TEST_DATA_URL, "CO2_iasi_metop_c_nlis_2021_01.nc"))

reader_objects = [ekds_grib, ekds_geojson, ekds_netcdf, ekds_netcdf_satellite]

Create some common data objects to use examples, these are created from the Reader objects.

[3]:
fieldlist_example = ekds_grib.to_fieldlist()
xarray_ds_example = ekds_netcdf.to_xarray()
xarray_da_example = xarray_ds_example[list(xarray_ds_example.data_vars)[0]]
pandas_df_example = xarray_da_example.to_pandas()
pandas_series_example = pandas_df_example.iloc[:, 0]
numpy_example = ekds_grib.to_numpy()
geopandas_example = ekds_geojson.to_geopandas()

data_objects = [
    fieldlist_example,
    xarray_ds_example,
    xarray_da_example,
    pandas_df_example,
    pandas_series_example,
    numpy_example,
    geopandas_example,
]

The format_handler decorator

The format_handler decorator is used to automatically convert the input data to the desired format. The standard behaviou it to use the typesetting of the function, the example below demonstrates how data is always converted to the type by the function signiture.

[4]:
@format_handler()
def xrDataArray_function(data: xr.DataArray):
    assert isinstance(data, xr.DataArray), f"data is not an xarray.DataArray: got {type(data)}"
    return data


test_types = [
    ekds_grib,
    ekds_netcdf,
    ekds_netcdf_satellite,
    fieldlist_example,
    xarray_da_example,
    xarray_ds_example,
    pandas_series_example,
    numpy_example,
]

print(f"{'Input type':<30} -> {'Output type':<30}")
for input in test_types:
    output = xrDataArray_function(input)
    print(f"{type(input)} -> {type(output)}")
Input type                     -> Output type
<class 'earthkit.data.data.grib.GribData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.data.netcdf.NetCDFData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.data.netcdf.NetCDFData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.grib.file.GribFieldListInFile'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataarray.DataArray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataset.Dataset'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'pandas.Series'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'numpy.ndarray'> -> <class 'xarray.core.dataarray.DataArray'>

The example above only uses input objects that can be successfully converted to the requested type, an xarray.DataArray. When the conversion is not possible, the decorator will pass the unconverted object to the function and let the function fail naturally. Below we pass a pandas.DataFrame object which can not be converted to a xarray.DataArray because it has too many columns, i.e. variables:

[5]:
try:
    xrDataArray_function(pandas_df_example)
except Exception as e:
    print(f"Expected exception for wrong input type:\n{e}")

If the downstream function can handle multiple datatypes. The decorator will first check if it matches any type listed, if not it will then attempt to convert the input in the order of the listed types. In the example below, the function can handle xarray.DataArray or xarray.Dataset. You can see that the objects from the previous example are converted to xarray.DataArray, and the new ones (which cannot be converted to an xarray.DataArray) are converted to an xarray.Dataset.

[6]:
@format_handler()
def xrDataArray_function(data: xr.DataArray | xr.Dataset):
    assert isinstance(data, (xr.DataArray, xr.Dataset)), (
        f"data is not an xarray.DataArray or xarray.Dataset: got {type(data)}"
    )
    return data


test_types = [
    ekds_grib,
    ekds_netcdf,
    ekds_netcdf_satellite,
    fieldlist_example,
    xarray_da_example,
    xarray_ds_example,
    pandas_series_example,
    numpy_example,
    pandas_df_example,
    geopandas_example,
    ekds_geojson,
]

print(f"{'Input type':<30} -> {'Output type':<30}")
for input in test_types:
    output = xrDataArray_function(input)
    print(f"{type(input)} -> {type(output)}")
Input type                     -> Output type
<class 'earthkit.data.data.grib.GribData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.data.netcdf.NetCDFData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.data.netcdf.NetCDFData'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.readers.grib.file.GribFieldListInFile'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataarray.DataArray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'xarray.core.dataset.Dataset'> -> <class 'xarray.core.dataset.Dataset'>
<class 'pandas.Series'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'numpy.ndarray'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'pandas.DataFrame'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'geopandas.geodataframe.GeoDataFrame'> -> <class 'xarray.core.dataarray.DataArray'>
<class 'earthkit.data.data.geojson.GeoJsonData'> -> <class 'xarray.core.dataarray.DataArray'>
[7]:
@format_handler()
def np_str_function(data: str | np.ndarray):
    print(type(data))
    return data


this_np_str = np_str_function(ekds_netcdf_satellite)
this_converted_str = np_str_function("a_string")
this_converted_str = np_str_function(1)
<class 'numpy.ndarray'>
<class 'str'>
<class 'numpy.ndarray'>