API Reference¶
IBTrACS Data Processing¶
The ibtracs module provides utilities for downloading, loading, and processing IBTrACS (International Best Track Archive for Climate Stewardship) tropical cyclone data.
Data Loading¶
- ocha_lens.ibtracs.download_ibtracs(dataset='ALL', save_dir='storm')[source]¶
Download IBTrACS data to a specified or temporary directory.
- Parameters:
dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") – Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin
save_dir (str, optional) – Directory to download to.
- Returns:
Path to the downloaded file
- Return type:
Path
- ocha_lens.ibtracs.load_ibtracs(file_path=None, dataset='ALL')[source]¶
Load IBTrACS data from NetCDF file or download to a temporary directory.
- Parameters:
file_path (str, optional) – Path to the IBTrACS NetCDF file. If None, downloads the file to a temp directory.
dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") –
Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin
Only used if file_path is None.
- Returns:
Dataset containing IBTrACS data with dimensions (storm, date_time, quadrant)
- Return type:
Track Data Extraction¶
- ocha_lens.ibtracs.get_tracks(ds, track_type='all')[source]¶
Extract track data from IBTrACS source data. Users should be cautious of comparing wind speed measurements from storms with different providers (eg. as may be the case in provisional vs best tracks), as different providers use different averaging periods.
- Parameters:
ds (xarray.Dataset) – IBTrACS dataset containing storm track data
track_type ({"provisional", "best", "all"}) – Which subset of tracks to return
- Returns:
DataFrame containing track data with standardized column names
- Return type:
Storm Metadata¶
- ocha_lens.ibtracs.get_storms(ds)[source]¶
Extract storm metadata from IBTrACS dataset.
Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.
- Parameters:
ds (xarray.Dataset) – IBTrACS dataset containing storm track data
- Returns:
DataFrame containing storm metadata with one row per storm
- Return type:
Notes
The function takes the first available metadata for each storm when multiple records exist. This works because storm metadata is generally consistent across a storm’s lifetime.
Utility Functions¶
- ocha_lens.ibtracs.normalize_radii(df, radii_cols=None)[source]¶
Convert radii data from separate quadrant rows to list format.
This function converts radius data that’s stored with separate rows for each quadrant into a single row per storm point with radius values stored as lists.
- Parameters:
df (pandas.DataFrame) – DataFrame containing storm track data with radii columns and quadrant information
radii_cols (list of str, optional) – List of column names containing radii data. If None, defaults to [“r34”, “r50”, “r64”]
- Returns:
DataFrame with radii data converted to lists for each point where each list contains values for the 4 quadrants (TODO - Confirm the ordering)
- Return type:
ECMWF Storm Data Processing¶
The ecmwf_storm module provides utilities for downloading, loading, and processing ECMWF cyclone
forecasts.
Data Loading¶
- ocha_lens.ecmwf_storm.download_forecasts(date, cache_dir='storm', use_cache=False, skip_if_missing=False, stage='local')[source]¶
Download historical ECMWF data from TIGGE in XML format from https://rda.ucar.edu/datasets/d330003/dataaccess/#
Data can be saved locally or uploaded to Azure blob storage depending on the stage parameter.
- Parameters:
date (datetime) – The datetime for which to download forecast data
cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?). If writing to Azure, the container must already exist.
use_cache (bool, default False) – Whether to check for existing files before downloading
skip_if_missing (bool, default False) – If True, skip download if file doesn’t exist on server rather than downloading
stage ({"dev", "prod", "local"}, default "local") – Where to save the downloaded data: - “local”: Save to local filesystem - “dev”: Upload to development Azure blob storage - “prod”: Upload to production Azure blob storage
- Returns:
Path to the downloaded file if successful, None if download failed
- Return type:
Path or None
- ocha_lens.ecmwf_storm.load_forecasts(start_date=None, end_date=None, cache_dir='storm', use_cache=True, skip_if_missing=False, stage='local')[source]¶
Load ECMWF tropical cyclone hindcast data for a date range.
Downloads and processes ECMWF forecast data from TIGGE for the specified date range. Data is downloaded at 12-hour intervals and processed into a standardized format.
Default behaviour is to locally save downloaded files to “storm/” directory, and load from there if they already exist. Optionally, data can be saved to or loaded from Azure blob storage containers by setting the stage parameter.
- Parameters:
start_date (datetime, optional) – Start date for data retrieval. If None, defaults to yesterday
end_date (datetime, optional) – End date for data retrieval. If None, defaults to yesterday
cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?) If writing to Azure, the container must already exist.
use_cache (bool, default True) – Whether to use cached files if they exist
skip_if_missing (bool, default False) – Whether to skip dates where files are missing on the server. Set to True if you’re pulling from what you know is a full cache.
stage ({"dev", "prod", "local"}, default "local") – Storage location for downloaded files. “dev” or “prod” refer to internal Azure blob storage containers.
- Returns:
DataFrame containing processed forecast data with columns including issued_time, valid_time, latitude, longitude, pressure, wind_speed, etc. Returns None if no data is available for the specified date range
- Return type:
pandas.DataFrame or None
Track Data Extraction¶
- ocha_lens.ecmwf_storm.get_tracks(df)[source]¶
Extract tropical cyclone track data from ECMWF forecast data.
Processes ECMWF forecast data to create a tracks dataset with individual forecast points as rows. Each point contains storm information, forecast metadata, and geometric location data.
- Parameters:
df (pandas.DataFrame) – DataFrame containing processed ECMWF forecast data
- Returns:
GeoDataFrame containing track data with standardized column names and geometry points for each location
- Return type:
Storm Metadata¶
- ocha_lens.ecmwf_storm.get_storms(df)[source]¶
Processes ECMWF tropical cyclone forecast data to create a storms dataset with one row per storm containing identifying information. Only storms with names are included in the output.
- Parameters:
df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data
- Returns:
DataFrame containing storm metadata with with one row per storm
- Return type:
- ocha_lens.ecmwf_storm.get_forecasts(df)[source]¶
Processes ECMWF tropical cyclone forecast data to create a forecasts dataset with one row per forecast containing identifying information. Only storms with names are included in the output.
- Parameters:
df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data
- Returns:
DataFrame containing storm metadata with with one row per storm
- Return type:
Notes
Storm IDs are created using the format “{name/number}_{basin}_{season}”. For storms with multiple forecasts, metadata is taken from the most recent forecast. Season calculation accounts for Southern Hemisphere cyclone seasons.
NHC Tropical Cyclone Data Processing¶
The nhc module provides utilities for downloading, loading, and processing National Hurricane Center (NHC) and Central Pacific Hurricane Center (CPHC) tropical cyclone forecast and observation data.
Data Loading¶
- ocha_lens.nhc.download_nhc(cache_dir='storm', use_cache=False)[source]¶
Download current NHC storm data in JSON format.
Fetches active storm data from the National Hurricane Center’s CurrentStorms.json API and saves to local cache directory. Files are named using the latest forecast issuance time.
- ocha_lens.nhc.load_nhc(file_path=None, cache_dir='storm', use_cache=False, year=None, basin=None)[source]¶
Load and process NHC storm data from CurrentStorms.json or historical archive.
Supports two modes: 1. Current mode (default): Downloads current storms from NHC CurrentStorms.json 2. Archive mode: Downloads historical ATCF data when year is specified
- Parameters:
file_path (str, optional) – Path to NHC JSON file. If None, downloads data
cache_dir (str, default "storm") – Directory for caching downloaded files
use_cache (bool, default False) – Whether to use existing cached file if available
year (int, optional) – Year for archive mode (e.g., 2023). If specified, loads historical ATCF data instead of current json data
basin (str, optional) – Basin code for archive mode: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific). If None, loads all basins
- Returns:
DataFrame with combined observations and forecasts, or None if loading fails
- Return type:
pd.DataFrame or None
- ocha_lens.nhc.download_nhc_archive(year, basin='AL', cache_dir='storm', use_cache=True)[source]¶
Download ATCF archive files for all storms in a given year and basin. Queries the FTP server to find all available storms for the specified year and basin, then downloads only those files. Files are saved with archive naming: a{basin}{number}{year}.dat (e.g., aal012023.dat) in the {cache_dir}/raw/atcf/ subdirectory.
For recent years (current year and previous year), files are downloaded from the aid_public directory. For older years, files are downloaded from the archive directory.
- Parameters:
year (int) – Year to download (e.g., 2023, 2024, 2025)
basin (str, default "AL") – Basin code: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific)
cache_dir (str, default "storm") – Directory to store downloaded files
use_cache (bool, default True) – Whether to use existing cached files if available
- Returns:
Paths to downloaded ATCF files
- Return type:
list of Path
Track Data Extraction¶
Storm Metadata¶
- ocha_lens.nhc.get_storms(df)[source]¶
Extract storm metadata from NHC DataFrame.
Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.
- Parameters:
df (pd.DataFrame) – DataFrame from load_nhc()
- Returns:
Storm metadata with schema validation applied
- Return type:
pd.DataFrame
Wind Speed Probability¶
- ocha_lens.nhc.get_wsp(issued_time=None, start=None, end=None, cache_dir='storm', use_cache=True)[source]¶
Load NHC 5km wind speed probability polygons.
Three modes:
Current (default, no arguments): fetches the latest issuance from
CurrentStorms.json. Only available when storms are active.Single issuance: pass
issued_timeas YYYYMMDDHH or ISO timestamp.Archive range: pass
start(and optionallyend) to fetch all available issuances in the date range from the NHC GIS archive.
- Parameters:
issued_time (str, optional) – Single issuance timestamp (YYYYMMDDHH or ISO format, e.g. ‘2023082200’).
start (str, optional) – Start of date range for archive mode (YYYYMMDDHH or ISO format).
end (str, optional) – End of date range. Defaults to now. Only used with
start.cache_dir (str, default "storm") – Directory for cached zip files (used in archive mode).
use_cache (bool, default True) – Whether to use cached zip files if available.
- Returns:
Columns: issued_time, wind_threshold_kt, percentage, geometry (EPSG:4326). One row per (issued_time, wind threshold, probability band). Empty if no data is available.
- Return type:
gpd.GeoDataFrame