API Reference

IBTrACS Data Processing

The ibtracs module provides utilities for downloading, loading, and processing IBTrACS (International Best Track Archive for Climate Stewardship) tropical cyclone data.

Data Loading

ocha_lens.ibtracs.download_ibtracs(dataset='ALL', save_dir='storm')[source]

Download IBTrACS data to a specified or temporary directory.

Parameters:
  • dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") – Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin

  • save_dir (str, optional) – Directory to download to.

Returns:

Path to the downloaded file

Return type:

Path

ocha_lens.ibtracs.load_ibtracs(file_path=None, dataset='ALL')[source]

Load IBTrACS data from NetCDF file or download to a temporary directory.

Parameters:
  • file_path (str, optional) – Path to the IBTrACS NetCDF file. If None, downloads the file to a temp directory.

  • dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") –

    Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin

    Only used if file_path is None.

Returns:

Dataset containing IBTrACS data with dimensions (storm, date_time, quadrant)

Return type:

xarray.Dataset

Track Data Extraction

ocha_lens.ibtracs.get_tracks(ds, track_type='all')[source]

Extract track data from IBTrACS source data. Users should be cautious of comparing wind speed measurements from storms with different providers (eg. as may be the case in provisional vs best tracks), as different providers use different averaging periods.

Parameters:
  • ds (xarray.Dataset) – IBTrACS dataset containing storm track data

  • track_type ({"provisional", "best", "all"}) – Which subset of tracks to return

Returns:

DataFrame containing track data with standardized column names

Return type:

pandas.DataFrame

Storm Metadata

ocha_lens.ibtracs.get_storms(ds)[source]

Extract storm metadata from IBTrACS dataset.

Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.

Parameters:

ds (xarray.Dataset) – IBTrACS dataset containing storm track data

Returns:

DataFrame containing storm metadata with one row per storm

Return type:

pandas.DataFrame

Notes

The function takes the first available metadata for each storm when multiple records exist. This works because storm metadata is generally consistent across a storm’s lifetime.

Utility Functions

ocha_lens.ibtracs.normalize_radii(df, radii_cols=None)[source]

Convert radii data from separate quadrant rows to list format.

This function converts radius data that’s stored with separate rows for each quadrant into a single row per storm point with radius values stored as lists.

Parameters:
  • df (pandas.DataFrame) – DataFrame containing storm track data with radii columns and quadrant information

  • radii_cols (list of str, optional) – List of column names containing radii data. If None, defaults to [“r34”, “r50”, “r64”]

Returns:

DataFrame with radii data converted to lists for each point where each list contains values for the 4 quadrants (TODO - Confirm the ordering)

Return type:

pandas.DataFrame

ECMWF Storm Data Processing

The ecmwf_storm module provides utilities for downloading, loading, and processing ECMWF cyclone forecasts.

Data Loading

ocha_lens.ecmwf_storm.download_forecasts(date, cache_dir='storm', use_cache=False, skip_if_missing=False, stage='local')[source]

Download historical ECMWF data from TIGGE in XML format from https://rda.ucar.edu/datasets/d330003/dataaccess/#

Data can be saved locally or uploaded to Azure blob storage depending on the stage parameter.

Parameters:
  • date (datetime) – The datetime for which to download forecast data

  • cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?). If writing to Azure, the container must already exist.

  • use_cache (bool, default False) – Whether to check for existing files before downloading

  • skip_if_missing (bool, default False) – If True, skip download if file doesn’t exist on server rather than downloading

  • stage ({"dev", "prod", "local"}, default "local") – Where to save the downloaded data: - “local”: Save to local filesystem - “dev”: Upload to development Azure blob storage - “prod”: Upload to production Azure blob storage

Returns:

Path to the downloaded file if successful, None if download failed

Return type:

Path or None

ocha_lens.ecmwf_storm.load_forecasts(start_date=None, end_date=None, cache_dir='storm', use_cache=True, skip_if_missing=False, stage='local')[source]

Load ECMWF tropical cyclone hindcast data for a date range.

Downloads and processes ECMWF forecast data from TIGGE for the specified date range. Data is downloaded at 12-hour intervals and processed into a standardized format.

Default behaviour is to locally save downloaded files to “storm/” directory, and load from there if they already exist. Optionally, data can be saved to or loaded from Azure blob storage containers by setting the stage parameter.

Parameters:
  • start_date (datetime, optional) – Start date for data retrieval. If None, defaults to yesterday

  • end_date (datetime, optional) – End date for data retrieval. If None, defaults to yesterday

  • cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?) If writing to Azure, the container must already exist.

  • use_cache (bool, default True) – Whether to use cached files if they exist

  • skip_if_missing (bool, default False) – Whether to skip dates where files are missing on the server. Set to True if you’re pulling from what you know is a full cache.

  • stage ({"dev", "prod", "local"}, default "local") – Storage location for downloaded files. “dev” or “prod” refer to internal Azure blob storage containers.

Returns:

DataFrame containing processed forecast data with columns including issued_time, valid_time, latitude, longitude, pressure, wind_speed, etc. Returns None if no data is available for the specified date range

Return type:

pandas.DataFrame or None

Track Data Extraction

ocha_lens.ecmwf_storm.get_tracks(df)[source]

Extract tropical cyclone track data from ECMWF forecast data.

Processes ECMWF forecast data to create a tracks dataset with individual forecast points as rows. Each point contains storm information, forecast metadata, and geometric location data.

Parameters:

df (pandas.DataFrame) – DataFrame containing processed ECMWF forecast data

Returns:

GeoDataFrame containing track data with standardized column names and geometry points for each location

Return type:

geopandas.GeoDataFrame

Storm Metadata

ocha_lens.ecmwf_storm.get_storms(df)[source]

Processes ECMWF tropical cyclone forecast data to create a storms dataset with one row per storm containing identifying information. Only storms with names are included in the output.

Parameters:

df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data

Returns:

DataFrame containing storm metadata with with one row per storm

Return type:

pandas.DataFrame

ocha_lens.ecmwf_storm.get_forecasts(df)[source]

Processes ECMWF tropical cyclone forecast data to create a forecasts dataset with one row per forecast containing identifying information. Only storms with names are included in the output.

Parameters:

df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data

Returns:

DataFrame containing storm metadata with with one row per storm

Return type:

pandas.DataFrame

Notes

Storm IDs are created using the format “{name/number}_{basin}_{season}”. For storms with multiple forecasts, metadata is taken from the most recent forecast. Season calculation accounts for Southern Hemisphere cyclone seasons.

NHC Tropical Cyclone Data Processing

The nhc module provides utilities for downloading, loading, and processing National Hurricane Center (NHC) and Central Pacific Hurricane Center (CPHC) tropical cyclone forecast and observation data.

Data Loading

ocha_lens.nhc.download_nhc(cache_dir='storm', use_cache=False)[source]

Download current NHC storm data in JSON format.

Fetches active storm data from the National Hurricane Center’s CurrentStorms.json API and saves to local cache directory. Files are named using the latest forecast issuance time.

Parameters:
  • cache_dir (str, default "storm") – Directory to store raw JSON files

  • use_cache (bool, default False) – Whether to use existing cached file if available

Returns:

Path to downloaded JSON file, None if download failed or no active storms

Return type:

Path or None

ocha_lens.nhc.load_nhc(file_path=None, cache_dir='storm', use_cache=False, year=None, basin=None)[source]

Load and process NHC storm data from CurrentStorms.json or historical archive.

Supports two modes: 1. Current mode (default): Downloads current storms from NHC CurrentStorms.json 2. Archive mode: Downloads historical ATCF data when year is specified

Parameters:
  • file_path (str, optional) – Path to NHC JSON file. If None, downloads data

  • cache_dir (str, default "storm") – Directory for caching downloaded files

  • use_cache (bool, default False) – Whether to use existing cached file if available

  • year (int, optional) – Year for archive mode (e.g., 2023). If specified, loads historical ATCF data instead of current json data

  • basin (str, optional) – Basin code for archive mode: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific). If None, loads all basins

Returns:

DataFrame with combined observations and forecasts, or None if loading fails

Return type:

pd.DataFrame or None

ocha_lens.nhc.download_nhc_archive(year, basin='AL', cache_dir='storm', use_cache=True)[source]

Download ATCF archive files for all storms in a given year and basin. Queries the FTP server to find all available storms for the specified year and basin, then downloads only those files. Files are saved with archive naming: a{basin}{number}{year}.dat (e.g., aal012023.dat) in the {cache_dir}/raw/atcf/ subdirectory.

For recent years (current year and previous year), files are downloaded from the aid_public directory. For older years, files are downloaded from the archive directory.

Parameters:
  • year (int) – Year to download (e.g., 2023, 2024, 2025)

  • basin (str, default "AL") – Basin code: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific)

  • cache_dir (str, default "storm") – Directory to store downloaded files

  • use_cache (bool, default True) – Whether to use existing cached files if available

Returns:

Paths to downloaded ATCF files

Return type:

list of Path

Track Data Extraction

ocha_lens.nhc.get_tracks(df)[source]

Extract track-level data from NHC DataFrame.

Creates a GeoDataFrame with one row per track point (observation or forecast), including geometry for spatial analysis.

Parameters:

df (pd.DataFrame) – DataFrame from load_nhc()

Return type:

gpd.GeoDataFrame

Storm Metadata

ocha_lens.nhc.get_storms(df)[source]

Extract storm metadata from NHC DataFrame.

Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.

Parameters:

df (pd.DataFrame) – DataFrame from load_nhc()

Returns:

Storm metadata with schema validation applied

Return type:

pd.DataFrame

Wind Speed Probability

ocha_lens.nhc.get_wsp(issued_time=None, start=None, end=None, cache_dir='storm', use_cache=True)[source]

Load NHC 5km wind speed probability polygons.

Three modes:

  1. Current (default, no arguments): fetches the latest issuance from CurrentStorms.json. Only available when storms are active.

  2. Single issuance: pass issued_time as YYYYMMDDHH or ISO timestamp.

  3. Archive range: pass start (and optionally end) to fetch all available issuances in the date range from the NHC GIS archive.

Parameters:
  • issued_time (str, optional) – Single issuance timestamp (YYYYMMDDHH or ISO format, e.g. ‘2023082200’).

  • start (str, optional) – Start of date range for archive mode (YYYYMMDDHH or ISO format).

  • end (str, optional) – End of date range. Defaults to now. Only used with start.

  • cache_dir (str, default "storm") – Directory for cached zip files (used in archive mode).

  • use_cache (bool, default True) – Whether to use cached zip files if available.

Returns:

Columns: issued_time, wind_threshold_kt, percentage, geometry (EPSG:4326). One row per (issued_time, wind threshold, probability band). Empty if no data is available.

Return type:

gpd.GeoDataFrame