ECMWF Cyclone Forecasts

This module provides an interface for downloading ECMWF historical forecast data and performing basic processing operations to make it suitable for immediate analysis. ECMWF’s forecasts are retrieved from the THORPEX Interactive Grand Global Ensemble (TIGGE) dataset, which provides raw data in cxml format. These forecasts are provided every 12h, with 6h leadtime increments.

This module does not yet provide the ability to download ECMWF’s lowest-latency forecasts from the Dissemination Data Store (DISS). This functionality is forthcoming.

Quick Start

import ocha_lens as lens
from datetime import datetime

# Load ECMWF forecasts as a pandas dataframe
df = lens.ecmwf_storm.load_hindcasts(
    start_date=datetime(2019, 12, 20),
    end_date=datetime(2020, 1, 15)
)

# Extract storm metadata
df_storms = lens.ecmwf_storm.get_storms(df)

# Get track data
gdf_tracks = lens.ecmwf_storm.get_tracks(df)

Output Data Structure

The primary goal of this module is to provide easy access to ECMWF data in a tabular, analysis-ready format. See below for the output schemas provided by this module. These schemas are designed to be interoperable with other cyclone track data sources (eg. IBTrACS).

lens.ecmwf_storm.get_storms()

This function outputs a table that contains one row per unique storm (as identified by the storm_id). This data can be used to obtain storm-level metadata. Forecasts for unnamed storms (often those in the early development of a storm, or that never materialized into a storm) are not given a storm_id and are not given records in this table.

Field

Type

Required

Validation

Description

storm_id

str

Required

Must be unique

Concatenation of <name>_<basin>_<season>

number

str

Required

-

Storm number identifier

season

int

Required

2005-2050 range

Storm season year[1]

name

str

Optional

-

Storm name, all uppercase

provider

str

Optional

-

Data provider

genesis_basin

str

Optional

Must match basin mapping[2]

Basin where forecast originated

See more details of the enforced schema from this validation in the source code.

lens.ecmwf_storm.get_tracks()

This function outputs cleaned tracks for all forecasts in the raw input data. Note that there will be many unnamed forecasts (and so without a storm_id) present in this table that are not in the storm-level table output above.

Field

Type

Required

Validation

Description

storm_id

str

Optional

-

Links to storm metadata

point_id

str

Required

-

Unique identifier for this track point

forecast_id

str

Required

-

Forecast ID from ECMWF

number

str

Optional

-

Storm number identifier

issued_time

pd.Timestamp

Required

-

When the forecast was issued

valid_time

pd.Timestamp

Required

-

Time this track point is valid for

provider

str

Required

-

Forecast provider

basin

str

Required

Must match basin mapping[2]

Basin where forecast originated[3]

leadtime

Int64

Required

≥ 0

Hours ahead of forecast issue time

pressure

float

Optional

800-1100 hPa range

Central pressure

wind_speed

float

Optional

0-300 knots range

Maximum sustained winds[4]

geometry

gpd.array.GeometryDtype

Required

EPSG:4326, valid lat/lon

Geographic location

See more details of the enforced schema from this validation in the source code.

Usage Considerations

Cyclone identification

It can be challenging to identify unique storms from this dataset of historical forecasts. Not all forecasts correspond to a known storm, and forecasts issued from before a storm was given a name may be challenging to group with forecasts that can be identified by the storm’s name. Moreover, ECMWF’s assigned forecast_id may not necessarily be unique across all storms or systems (see below).

While the assigned storm_id can be used to group forecasts from known storms, users should query for forecasts based on spatio/temporal bounding boxes to be sure of retrieving all forecasts for a given weather system.

Non-unique forecast_ids

The forecast_id field may not necessarily be unique to a given forecasted system. ECMWF followings the {initialization_time}_{latitude}_{longitude} convention for creating these IDs. We have observed some IDs with the 00N_00E (eg. '2020011512_00N_00E'), which appear to capture marginal systems around the world. IDs with non-zero coordinate suffixes appear to more reliably be unique to a given system.

Possibility of multiple storm_ids for the same system in ECMWF data

It is expected that systems with forecasts starting in different basins will have multiple storm_ids. For example, see esther_sp_2020/esther_si_2020, and lisa_ep_2022/lisa_na_2022. This happens because the genesis basin of a forecast is included in its ID. Use caution in your analysis when investigating a storm that is close to a basin boundary, or when your study area is close to a basin boundary. In these cases, queries by a spatial bounding box may be more appropriate.

Handling storms that cross the antimeridian

All points in tracks tables are normalized to the [-180, 180] longitude range. As such, analyses such as distance calculations close to the antimeridian may not return results as expected. The joining of multiple points into tracks for these storms may also need to be handled separately for points on either side of the antimeridian.

Duplicate forecasts in ECMWF data

Historical ECMWF tracks may have duplicate records for the same issued date. These duplicates come directly from the source cxml files, and may have slightly different positional or intensity information in the track. These duplicates are preserved in the database. See forecast_id='2022123112_310S_682E' for an example.

Additional Resources