API Reference¶

IBTrACS Data Processing¶

The ibtracs module provides utilities for downloading, loading, and processing IBTrACS (International Best Track Archive for Climate Stewardship) tropical cyclone data.

Data Loading¶

ocha_lens.ibtracs.download_ibtracs(dataset='ALL', save_dir='storm')[source]¶

Download IBTrACS data to a specified or temporary directory.

Parameters:

dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") – Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin
save_dir (str, optional) – Directory to download to.

Returns:

Path to the downloaded file

Return type:

Path

ocha_lens.ibtracs.load_ibtracs(file_path=None, dataset='ALL')[source]¶

Load IBTrACS data from NetCDF file or download to a temporary directory.

Parameters:

file_path (str, optional) – Path to the IBTrACS NetCDF file. If None, downloads the file to a temp directory.
dataset ({"ALL", "ACTIVE", "last3years", "EP", "NA", "NI", "SA", "SI", "SP", "WP"}, default "ALL") –
Which IBTrACS dataset to download: - “ALL”: Complete historical record - “ACTIVE”: Records for active storms only - “last3years”: Records from the past three years - “EP”: Eastern North Pacific basin - “NA”: North Atlantic basin - “NI”: North Indian basin - “SA”: South Atlantic basin - “SI”: South Indian basin - “SP”: South Pacific basin - “WP”: Western North Pacific basin

Only used if file_path is None.

Returns:

Dataset containing IBTrACS data with dimensions (storm, date_time, quadrant)

Return type:

xarray.Dataset

Track Data Extraction¶

ocha_lens.ibtracs.get_tracks(ds, track_type='all')[source]¶

Extract track data from IBTrACS source data. Users should be cautious of comparing wind speed measurements from storms with different providers (eg. as may be the case in provisional vs best tracks), as different providers use different averaging periods.

Parameters:

ds (xarray.Dataset) – IBTrACS dataset containing storm track data
track_type ({"provisional", "best", "all"}) – Which subset of tracks to return

Returns:

DataFrame containing track data with standardized column names

Return type:

pandas.DataFrame

Storm Metadata¶

ocha_lens.ibtracs.get_storms(ds)[source]¶

Extract storm metadata from IBTrACS dataset.

Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.

Parameters:: ds (xarray.Dataset) – IBTrACS dataset containing storm track data
Returns:: DataFrame containing storm metadata with one row per storm
Return type:: pandas.DataFrame

Notes

The function takes the first available metadata for each storm when multiple records exist. This works because storm metadata is generally consistent across a storm’s lifetime.

Utility Functions¶

ocha_lens.ibtracs.normalize_radii(df, radii_cols=None)[source]¶

Convert radii data from separate quadrant rows to list format.

This function converts radius data that’s stored with separate rows for each quadrant into a single row per storm point with radius values stored as lists.

Parameters:

df (pandas.DataFrame) – DataFrame containing storm track data with radii columns and quadrant information
radii_cols (list of str, optional) – List of column names containing radii data. If None, defaults to [“r34”, “r50”, “r64”]

Returns:

DataFrame with radii data converted to lists for each point where each list contains values for the 4 quadrants (TODO - Confirm the ordering)

Return type:

pandas.DataFrame

ECMWF Storm Data Processing¶

The ecmwf_storm module provides utilities for downloading, loading, and processing ECMWF cyclone forecasts.

Data Loading¶

ocha_lens.ecmwf_storm.download_forecasts(date, cache_dir='storm', use_cache=False, skip_if_missing=False, stage='local')[source]¶

Download historical ECMWF data from TIGGE in XML format from https://rda.ucar.edu/datasets/d330003/dataaccess/#

Data can be saved locally or uploaded to Azure blob storage depending on the stage parameter.

Parameters:

date (datetime) – The datetime for which to download forecast data
cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?). If writing to Azure, the container must already exist.
use_cache (bool, default False) – Whether to check for existing files before downloading
skip_if_missing (bool, default False) – If True, skip download if file doesn’t exist on server rather than downloading
stage ({"dev", "prod", "local"}, default "local") – Where to save the downloaded data: - “local”: Save to local filesystem - “dev”: Upload to development Azure blob storage - “prod”: Upload to production Azure blob storage

Returns:

Path to the downloaded file if successful, None if download failed

Return type:

Path or None

ocha_lens.ecmwf_storm.load_forecasts(start_date=None, end_date=None, cache_dir='storm', use_cache=True, skip_if_missing=False, stage='local')[source]¶

Load ECMWF tropical cyclone hindcast data for a date range.

Downloads and processes ECMWF forecast data from TIGGE for the specified date range. Data is downloaded at 12-hour intervals and processed into a standardized format.

Default behaviour is to locally save downloaded files to “storm/” directory, and load from there if they already exist. Optionally, data can be saved to or loaded from Azure blob storage containers by setting the stage parameter.

Parameters:

start_date (datetime, optional) – Start date for data retrieval. If None, defaults to yesterday
end_date (datetime, optional) – End date for data retrieval. If None, defaults to yesterday
cache_dir (str, default "storm") – Directory or container name to store raw cxml files. Refers to a container name if stage is “dev” or “prod”. Assumed to be a single string rather than a full path. (#TODO: consider allowing full paths?) If writing to Azure, the container must already exist.
use_cache (bool, default True) – Whether to use cached files if they exist
skip_if_missing (bool, default False) – Whether to skip dates where files are missing on the server. Set to True if you’re pulling from what you know is a full cache.
stage ({"dev", "prod", "local"}, default "local") – Storage location for downloaded files. “dev” or “prod” refer to internal Azure blob storage containers.

Returns:

DataFrame containing processed forecast data with columns including issued_time, valid_time, latitude, longitude, pressure, wind_speed, etc. Returns None if no data is available for the specified date range

Return type:

pandas.DataFrame or None

Track Data Extraction¶

ocha_lens.ecmwf_storm.get_tracks(df)[source]¶

Extract tropical cyclone track data from ECMWF forecast data.

Processes ECMWF forecast data to create a tracks dataset with individual forecast points as rows. Each point contains storm information, forecast metadata, and geometric location data.

Parameters:: df (pandas.DataFrame) – DataFrame containing processed ECMWF forecast data
Returns:: GeoDataFrame containing track data with standardized column names and geometry points for each location
Return type:: geopandas.GeoDataFrame

Storm Metadata¶

ocha_lens.ecmwf_storm.get_storms(df)[source]¶

Processes ECMWF tropical cyclone forecast data to create a storms dataset with one row per storm containing identifying information. Only storms with names are included in the output.

Parameters:: df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data
Returns:: DataFrame containing storm metadata with with one row per storm
Return type:: pandas.DataFrame

ocha_lens.ecmwf_storm.get_forecasts(df)[source]¶

Processes ECMWF tropical cyclone forecast data to create a forecasts dataset with one row per forecast containing identifying information. Only storms with names are included in the output.

Parameters:: df (pandas.DataFrame) – DataFrame containing raw ECMWF forecast data
Returns:: DataFrame containing storm metadata with with one row per storm
Return type:: pandas.DataFrame

Notes

Storm IDs are created using the format “{name/number}_{basin}_{season}”. For storms with multiple forecasts, metadata is taken from the most recent forecast. Season calculation accounts for Southern Hemisphere cyclone seasons.

NHC Tropical Cyclone Data Processing¶

The nhc module provides utilities for downloading, loading, and processing National Hurricane Center (NHC) and Central Pacific Hurricane Center (CPHC) tropical cyclone forecast and observation data.

Data Loading¶

ocha_lens.nhc.download_nhc(cache_dir='storm', use_cache=False)[source]¶

Download current NHC storm data in JSON format.

Fetches active storm data from the National Hurricane Center’s CurrentStorms.json API and saves to local cache directory. Files are named using the latest forecast issuance time.

Parameters:

cache_dir (str, default "storm") – Directory to store raw JSON files
use_cache (bool, default False) – Whether to use existing cached file if available

Returns:

Path to downloaded JSON file, None if download failed or no active storms

Return type:

Path or None

ocha_lens.nhc.load_nhc(file_path=None, cache_dir='storm', use_cache=False, year=None, basin=None)[source]¶

Load and process NHC storm data from CurrentStorms.json or historical archive.

Supports two modes: 1. Current mode (default): Downloads current storms from NHC CurrentStorms.json 2. Archive mode: Downloads historical ATCF data when year is specified

Parameters:

file_path (str, optional) – Path to NHC JSON file. If None, downloads data
cache_dir (str, default "storm") – Directory for caching downloaded files
use_cache (bool, default False) – Whether to use existing cached file if available
year (int, optional) – Year for archive mode (e.g., 2023). If specified, loads historical ATCF data instead of current json data
basin (str, optional) – Basin code for archive mode: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific). If None, loads all basins

Returns:

DataFrame with combined observations and forecasts, or None if loading fails

Return type:

pd.DataFrame or None

ocha_lens.nhc.download_nhc_archive(year, basin='AL', cache_dir='storm', use_cache=True)[source]¶

Download ATCF archive files for all storms in a given year and basin. Queries the FTP server to find all available storms for the specified year and basin, then downloads only those files. Files are saved with archive naming: a{basin}{number}{year}.dat (e.g., aal012023.dat) in the {cache_dir}/raw/atcf/ subdirectory.

For recent years (current year and previous year), files are downloaded from the aid_public directory. For older years, files are downloaded from the archive directory.

Parameters:

year (int) – Year to download (e.g., 2023, 2024, 2025)
basin (str, default "AL") – Basin code: “AL” (Atlantic), “EP” (Eastern Pacific), or “CP” (Central Pacific)
cache_dir (str, default "storm") – Directory to store downloaded files
use_cache (bool, default True) – Whether to use existing cached files if available

Returns:

Paths to downloaded ATCF files

Return type:

list of Path

Track Data Extraction¶

ocha_lens.nhc.get_tracks(df)[source]¶

Extract track-level data from NHC DataFrame.

Creates a GeoDataFrame with one row per track point (observation or forecast), including geometry for spatial analysis.

Parameters:: df (pd.DataFrame) – DataFrame from load_nhc()
Return type:: gpd.GeoDataFrame

Storm Metadata¶

ocha_lens.nhc.get_storms(df)[source]¶

Extract storm metadata from NHC DataFrame.

Creates a dataset with one row per storm containing identifying information. This provides a summary of all storms in the dataset with their basic metadata.

Parameters:: df (pd.DataFrame) – DataFrame from load_nhc()
Returns:: Storm metadata with schema validation applied
Return type:: pd.DataFrame

Wind Speed Probability¶

ocha_lens.nhc.get_wsp(issued_time=None, start=None, end=None, cache_dir='storm', use_cache=True)[source]¶

Load NHC 5km wind speed probability polygons.

Three modes:

Current (default, no arguments): fetches the latest issuance from CurrentStorms.json. Only available when storms are active.
Single issuance: pass issued_time as YYYYMMDDHH or ISO timestamp.
Archive range: pass start (and optionally end) to fetch all available issuances in the date range from the NHC GIS archive.

Parameters:

issued_time (str, optional) – Single issuance timestamp (YYYYMMDDHH or ISO format, e.g. ‘2023082200’).
start (str, optional) – Start of date range for archive mode (YYYYMMDDHH or ISO format).
end (str, optional) – End of date range. Defaults to now. Only used with start.
cache_dir (str, default "storm") – Directory for cached zip files (used in archive mode).
use_cache (bool, default True) – Whether to use cached zip files if available.

Returns:

Columns: issued_time, wind_threshold_kt, percentage, geometry (EPSG:4326). One row per (issued_time, wind threshold, probability band). Empty if no data is available.

Return type:

gpd.GeoDataFrame

GDACS Tropical Cyclone Data¶

The gdacs module provides a client for the GDACS (Global Disaster Alert and Coordination System) tropical cyclone API: event/episode traversal, advisory timelines, country-level population exposure, and matching GDACS events to NHC atcf_ids. No authentication is required.

Event & Episode Traversal¶

ocha_lens.gdacs.get_events(from_date=None, to_date=None, alert_levels=None, source=None, page_size=100)[source]¶

Fetch GDACS tropical cyclone events.

Auto-paginates the GDACS event search endpoint, iterating over alert levels separately and deduplicating on eventid.

Parameters:

from_date (str, optional) – ISO date strings "YYYY-MM-DD". Default: GDACS API defaults.
to_date (str, optional) – ISO date strings "YYYY-MM-DD". Default: GDACS API defaults.
alert_levels (list of {"Green", "Orange", "Red"}, optional) – Alert levels to query. Default: all three.
source ({"NOAA", "JTWC"}, optional) – Filter to events tagged with this source. Applied client-side because the API source filter is unreliable. Default: no filter.
page_size (int, default 100) – GDACS caps page size at 100; do not exceed.

Returns:

One row per event. Empty GeoDataFrame with the expected columns if no events match.

Return type:

geopandas.GeoDataFrame

ocha_lens.gdacs.get_event_detail(eventid)[source]¶

Fetch full event detail JSON.

Includes the episode list and resource URLs for the timeline and per-buffer impact endpoints.

Parameters:: eventid (int)
Return type:: Dict[str, Any]

ocha_lens.gdacs.get_episode_detail(eventid, episodeid)[source]¶

Fetch full episode detail JSON.

Each episode is one model run (issuance), typically every 6 hours. The structure mirrors get_event_detail(), but resource URLs point to episode-specific data.

Parameters:

eventid (int)
episodeid (int)

Return type:

Dict[str, Any]

ocha_lens.gdacs.latest_episode_id(event_detail)[source]¶

Pull the latest episode id out of an event-detail JSON.

GDACS’s properties.episodes is a list of {details, ...} dicts; the episode id is embedded in the details URL as a query param. This helper hides that GDACS-internal quirk.

Raises:

NoEpisodesError – Event has no episodes (legitimately new event with no advisories yet — callers may want to handle this).
EpisodeUrlFormatError – Episode URL is malformed — likely a GDACS API contract change; should surface loudly rather than be skipped.

Parameters:

event_detail (Dict[str, Any])

Return type:

int

ocha_lens.gdacs.get_timeline(eventid, detail=None)[source]¶

Fetch the advisory timeline for one TC event.

Each row is one advisory with position, wind speed, population exposure (pop39/pop74), and quadrant wind radii. pop39/pop74 are instantaneous snapshots — not cumulative.

Parameters:

eventid (int)
detail (dict, optional) – Pre-fetched output of get_event_detail() for this eventid. Pass it to skip the internal detail fetch when the caller already has it (saves one HTTP round-trip).

Raises:

NoTimelineError – Event has no timeline resource URL in its impacts list (legitimately rare — most TC events have one).

Returns:

Sorted by advisory_number ascending. Empty DataFrame (with expected columns) if the timeline endpoint returns no items — pandera validates the row structure either way.

Return type:

pandas.DataFrame

Population Exposure¶

ocha_lens.gdacs.get_exposure_adm0(eventid, episodeid=None, detail=None)[source]¶

Fetch country-level population exposure per wind buffer.

Reads datums[alias='country'] from each buffer’s impact JSON — the canonical ADM0 rollup. One row per affected country, per buffer.

Note: only returns data for events ~2022 onward. Earlier events return empty results. Use get_timeline() for 2015+ coverage.

Parameters:

eventid (int)
episodeid (int, optional) – If provided, fetches that specific episode’s snapshot via get_episode_detail(). Otherwise uses the event-level resource URLs from get_event_detail() (latest snapshot).
detail (dict, optional) – Pre-fetched output of get_event_detail(). Pass it to skip the internal detail fetch on the event-level path. Mutually exclusive with episodeid — passing both raises ValueError (the two args describe different snapshots).

Returns:

Maps buffer key ("buffer39" or "buffer74") to a DataFrame with columns iso3, country, pop_affected, distance_km.

Return type:

dict

ocha_lens.gdacs.get_exposure_adm1(eventid, episodeid=None, detail=None)[source]¶

Fetch ADM1-grain population exposure per wind buffer.

Reads datums[alias='alert'] from each buffer’s impact JSON. One row per affected sub-national admin unit, per buffer. Country identifiers are attached so callers can roll up to ADM0 if needed.

Parameters:

eventid (int)
episodeid (int, optional) – Same semantics as get_exposure_adm0().
detail (dict, optional) – Same semantics as get_exposure_adm0().

Returns:

Maps buffer key to a DataFrame with columns iso3, country, fips_admin, gmi_admin, admin_name, admin_type, pop_admin, pop_affected, distance_km.

Return type:

dict

Track Matching¶

ocha_lens.gdacs.match_to_atcf(timeline, nhc_tracks, max_dist_deg=0.05)[source]¶

Match a GDACS event to an NHC atcf_id.

GDACS and our NHC table are two views of the same NHC forecaster output (GDACS scrapes the TCM Forecast/Advisory; we store the A-deck OFCL). They’re timestamped 3h apart — A-deck on synoptic valid times (00/06/12/18Z), TCM on advisory issue times (03/09/15/21Z) — but the forecast cone (t>0) is identical between them at shared valid_times. Only the observed t=0 position differs (the storm moved in the intervening 3h).

That asymmetry drives a two-strategy match:

_match_by_forecast_cone() (primary) — vote the GDACS forecast points onto NHC atcf_id values by exact valid_time. Robust (many points), product-agnostic, and self-healing for storms whose first advisories we never captured.
_match_by_genesis() (fallback) — single-point match on the genesis observed advisory, for completed storms whose timeline has no forecast cone left.

Parameters:

timeline (DataFrame) – Output of get_timeline(). Required columns: advisory_number, actual, advisory_datetime, latitude, longitude.
nhc_tracks (DataFrame) – NHC tracks deduped to one row per (atcf_id, valid_time) at the freshest issuance (all leadtimes kept — the forecast cone lives at leadtime>0). Required columns: atcf_id, valid_time, lat, lon. See load_freshest_nhc_tracks in the pipeline.
max_dist_deg (float, default 0.05) – Spatial tolerance in degrees. Cone matches for the correct storm are ~0°; 0.05° absorbs only rounding. Robustness comes from voting across the cone, not from a loose tolerance.

Returns:

atcf_id (e.g. "AL142024") or None when neither strategy
finds a match — the correct answer for a non-NHC (JTWC/RSMC)
storm or a genuine gap in our NHC table.

Return type:

str | None

Utility Functions¶

ocha_lens.gdacs.to_iso3(gdacs_country_code)[source]¶

Map a GDACS GMI_CNTRY code to standard ISO 3166-1 alpha-3.

Most GDACS codes are already valid ISO3 — pass-through. Only the X-prefixed proprietary codes (see GDACS_PROPRIETARY_TO_ISO3) get remapped.

Parameters:: gdacs_country_code (str)
Return type:: str

ADAM Tropical Cyclone Data¶

The adam module provides access to WFP ADAM (Automatic Disaster Analysis and Mapping) tropical cyclone population-exposure data: a paginated event listing and per-event admin-level (ADM0/1/2) exposure.

Data Loading¶

ocha_lens.adam.get_events(from_date=None, to_date=None, source='NOAA', all_episodes=False)[source]¶

Fetch the ADAM TC event list.

ADAM’s /items endpoint returns one feature per (event, episode). By default we dedupe to the latest episode per event_id (cumulative snapshot at storm end). Pass all_episodes=True to skip the dedupe and return every episode-feature — the per-episode to_date is the snapshot’s effective time and each row carries its own population_csv_url.

Date filtering happens client-side after pagination (the server-side OGC filter syntax is awkward for date ranges); source is a server-side param.

Parameters:

from_date (str, optional) – ISO-format strings bounding an inclusive overlap window — an event is kept if it was active at any point in [from_date, to_date] (i.e., its to_date >= window start AND its from_date <= window end). Pass None on either side to leave that bound open. Strict containment would silently drop storms that started before the window opened.
to_date (str, optional) – ISO-format strings bounding an inclusive overlap window — an event is kept if it was active at any point in [from_date, to_date] (i.e., its to_date >= window start AND its from_date <= window end). Pass None on either side to leave that bound open. Strict containment would silently drop storms that started before the window opened.
source ("NOAA" | "JTWC" | None, default "NOAA") – Server-side source filter. ADAM aggregates advisory feeds from multiple agencies; NOAA covers Atlantic + EPac, JTWC covers WPac + IO + SHem. Pass None to fetch all sources.
all_episodes (bool, default False) – When True, return every (event_id, episode_id) row rather than deduping to the latest episode. Increases row count by ~N× (N = avg episodes per event, ~10–50).

Returns:

One row per event_id (default) or per (event_id, episode_id) (all_episodes=True), columns per EVENT_SCHEMA.

Return type:

pandas.DataFrame

ocha_lens.adam.get_exposure(event_id, population_csv_url)[source]¶

Fetch and shape one ADAM event’s exposure data.

Downloads the per-episode population CSV at the given URL, applies the cumulative per-band → ≥-threshold conversion, aggregates ADM2 → ADM0/ADM1 (ADM2 also retained), maps ADM0_NAME → ISO3.

Parameters:

event_id (int) – Used only for error messages (the CSV doesn’t carry it internally).
population_csv_url (str) – Per-episode CSV URL from get_events() output. Must be non-empty; callers receiving null from ADAM should not invoke this function (raise NoExposureCSVError instead).

Returns:

Long-form, one row per (admin_level × admin_unit × wind_speed_kt). Columns per EXPOSURE_SCHEMA. Caller adds event_id / episode_id / valid_time before persisting.

Return type:

pandas.DataFrame

Raises:

NoExposureCSVError – URL is empty/None, or the downloaded CSV is missing the expected columns. Caller treats this as “skip this event for this run; retry next cycle” the same way the GDACS pipeline treats NoTimelineError.

Utility Functions¶

ocha_lens.adam.make_cumulative(df)[source]¶

Convert ADAM per-band pop counts to cumulative ≥-threshold.

ADAM stores POP_60_KMH as “population in the 60-90 km/h band only” — not “population exposed to ≥ 60 km/h winds”. GDACS exposure is cumulative. This converter brings ADAM into the same semantic so the two sources are directly comparable in downstream queries.

Operates on per-band columns POP_60_KMH, POP_90_KMH, POP_120_KMH (whichever subset is present). Returns a copy with the columns rewritten in place. Null/NaN values are preserved as null (not coerced to 0) for the column being updated; for the columns being summed into an update, NaN is treated as 0 (we can’t add NaN to a count).

Parameters:: df (DataFrame)
Return type:: DataFrame

ocha_lens.adam.name_to_iso3(name)[source]¶

Resolve an ADM0_NAME to ISO 3166-1 alpha-3. Checks the override dict first; falls back to pycountry’s fuzzy search. Returns None when neither resolves (caller stores iso3 as null and downstream can flag).

Parameters:: name (str)
Return type:: str | None

Copernicus EMS Rapid Mapping¶

The cems module provides access to Copernicus EMS Rapid Mapping emergency activation products: activation discovery, the nested activation detail tree, flattened product/layer/statistics tables, and helpers for downloading products individually or in bulk. No authentication is required.

Discovery¶

ocha_lens.cems.get_activations(category=None, closed=None, country=None)¶

List all CEMS Rapid Mapping activations (one row per activation).

Walks the paginated public-activations-info endpoint (following each response’s next URL) and returns a tidy table. Filtering is applied client-side after the full list is fetched.

Parameters:

category (str, optional) – Case-insensitive exact match on the event category (e.g. "Earthquake", "Flood", "Wildfire").
closed (bool, optional) – Keep only ongoing (False) or closed (True) activations. None returns both.
country (str, optional) – Case-insensitive substring match against the joined countries string (so "venezu" matches “Venezuela”).

Returns:

Columns per ACTIVATION_SCHEMA, sorted by activation code descending (newest first).

Return type:

pandas.DataFrame

ocha_lens.cems.get_activation(code)¶

Fetch the full nested detail tree for one activation code.

Returns the raw (lightly-validated) activation dict — activation → aois → products → layers/images/stats — rather than a DataFrame, because the hierarchy doesn’t flatten to a single table without losing the AOI/product/layer relationships. Use get_products(), get_catalog(), and get_stats() for tabular views.

Parameters:: code (str) – Activation code, e.g. "EMSR884".
Raises:: ActivationNotFoundError – The endpoint returned no result for code.
Return type:: Dict[str, Any]

Flattened Views¶

ocha_lens.cems.get_products(ref)¶

Flatten an activation to one row per product (the download targets).

This is the primary entry point for the bulk-download workflow: each row carries the product’s download_url (its zip) plus enough metadata to filter (type, AOI, feasibility, version/status).

Parameters:: ref (str | dict) – An activation code (fetched via get_activation()) or an already-fetched activation dict.
Returns:: Columns per PRODUCT_SCHEMA, one row per (AOI, product).
Return type:: pandas.DataFrame

ocha_lens.cems.get_catalog(ref)¶

Flatten an activation to one row per layer (individual geo files).

Each layer row exposes the directly-downloadable geojson_url and its sld_url style file. Products that publish no individual layers still appear as a single row (layer fields null) so their product_zip_url is never dropped.

Parameters:: ref (str | dict) – Activation code or an already-fetched activation dict.
Returns:: Columns per CATALOG_SCHEMA.
Return type:: pandas.DataFrame

ocha_lens.cems.get_stats(ref)¶

Flatten the per-product damage-statistics tables.

The service nests stats as category → subcategory → {unit, total, affected}. This emits one row per (product, category, subcategory). Totals/affected are coerced to float; placeholder strings (e.g. "NA") become null.

Parameters:: ref (str | dict) – Activation code or an already-fetched activation dict.
Returns:: Columns per STATS_SCHEMA. Products without a stats table contribute no rows.
Return type:: pandas.DataFrame

Downloading¶

ocha_lens.cems.download_products(ref, dest_dir=None, product_types=None, aoi_numbers=None, feasible_only=True)¶

Download every (filtered) product of an activation.

The headline bulk-download helper. Iterates the activation’s products, applies the optional filters, and downloads each one — to memory by default, or to dest_dir on disk.

Parameters:

ref (str | dict) – Activation code or an already-fetched activation dict.
dest_dir (str | Path, optional) – Directory to write zips into (created if missing). None → return each product’s bytes in memory.
product_types (list[str], optional) – Keep only these product types (e.g. ["GRA"] for grading/damage). None keeps all types.
aoi_numbers (list[int], optional) – Keep only these AOI numbers. None keeps all AOIs.
feasible_only (bool, default True) – Skip products marked feasible=False (requested but not produced, so they have no usable zip).

Returns:

Maps each product’s zip filename to its bytes (in memory) or written path (dest_dir given). Products with no published download URL — the normal state for products still W (waiting) / N (not produced) — are skipped; a one-line INFO summary reports the counts, with per-product detail logged at DEBUG.

Return type:

dict[str, bytes | pathlib.Path]

ocha_lens.cems.download_product(product, dest=None)¶

Download a single product’s zip.

Parameters:

product (str | dict | pandas.Series) – A product zip URL, a product dict (from get_activation()), or a row from get_products() / get_catalog().
dest (str | Path, optional) – Local destination (file or directory). None → return bytes.

Return type:

bytes | pathlib.Path

ocha_lens.cems.download_activation_bundle(ref, dest=None)¶

Download the activation-wide _products.zip bundle.

A single archive containing the latest version of every product for the activation (the productsPath field). Convenient when you want everything in one shot rather than per-product via download_products().

Parameters:

ref (str | dict) – Activation code or an already-fetched activation dict.
dest (str | Path, optional) – Local destination. None → return bytes.

Raises:

ValueError – The activation has no productsPath.

Return type:

bytes | Path

ocha_lens.cems.download_geojson(layer)¶

Download one layer’s GeoJSON into a GeoDataFrame (in memory).

Parameters:: layer (str | dict | pandas.Series) – A GeoJSON URL, a layer dict from the detail tree (json key), or a catalog row from get_catalog() (geojson_url).
Return type:: geopandas.GeoDataFrame
Raises:: ValueError – No GeoJSON URL could be resolved from layer.

ocha_lens.cems.download_file(url, dest=None)¶

Download a URL to bytes (in memory) or to a local file.

The generic primitive the other downloaders build on. When dest is given the body is streamed to disk in chunks (so large product zips/COGs don’t have to fit in memory) and the written Path is returned; otherwise the full content is returned as bytes.

Parameters:

url (str) – File URL.
dest (str | Path, optional) – Local destination. If it is an existing directory (or ends with a path separator), the server-side filename is appended.

Return type:

bytes | pathlib.Path

Persistence¶

ocha_lens.cems.to_blob(data, blob_name, stage='dev', container_name='raster')¶

Upload bytes to the OCHA Azure blob store via ocha-stratus.

A thin convenience wrapper so a download can be pushed straight to blob, e.g. to_blob(download_product(row), "cems/EMSR884/grading.zip"). ocha-stratus is imported lazily here so it stays an optional dependency — the rest of this module works without it.

Parameters:

data (bytes) – Payload (e.g. the return of download_product() / download_file() with dest=None).
blob_name (str) – Destination blob path/key.
stage ({"dev", "prod"}, default "dev") – Azure stage.
container_name (str, default "raster") – Target container.

Raises:

ImportError – ocha-stratus is not installed.

Return type:

None

Storm Utilities¶

The utils.storm module provides shared geometry and matching helpers used across the cyclone datasources, including wind-buffer construction and matching NHC Wind Speed Probability (WSP) polygons to storm tracks.

Track Interpolation¶

ocha_lens.utils.storm.interpolate_track(gdf, time_col='valid_time', freq='30min', include_ends=True)[source]¶

Parameters:

gdf (GeoDataFrame)
time_col (str)
freq (str)
include_ends (bool)

Return type:

GeoDataFrame

Wind Buffers¶

ocha_lens.utils.storm.calculate_wind_buffers_gdf(gdf, quad_cols_format='usa_quadrant_radius_{speed}_{quad}', valid_time_col='valid_time')[source]¶

Calculate wind buffer polygons for each wind-speed threshold.

The storm track is interpolated to a regular 30-minute interval before the per-quadrant wind buffers are built. Reprojection goes through a basin-appropriate lon_wrap CRS so antimeridian-crossing tracks have continuous longitudes before projecting to Mercator.

Parameters:

gdf (gpd.GeoDataFrame) – Storm track points (EPSG:4326) with per-quadrant radius columns and, optionally, a basin column used to pick the projection.
quad_cols_format (str, default "usa_quadrant_radius_{speed}_{quad}") – Format string for the quadrant radius column names, with {speed} and {quad} placeholders (quads: ne/se/sw/nw).
valid_time_col (str, default "valid_time") – Name of the valid-time column used to order and interpolate the track.

Returns:

One row per wind-speed threshold (wind_speed_kt in {34, 50, 64}) with the merged buffer geometry (EPSG:4326).

Return type:

gpd.GeoDataFrame

WSP–Track Matching¶

ocha_lens.utils.storm.match_wsp_to_tracks(gdf_wsp, gdf_tracks, *, extra_containers=None)[source]¶

Match WSP polygons to NHC track forecasts.

Two passes per polygon part, in order:

Line-intersection. A part is matched to an atcf_id if that storm’s track LineString (the polyline through its track points at the WSP’s issued_time, or +3h) intersects the hole-filled polygon. WSPs are published ~3h after the nominal NHC advisory cycle, so the matching track advisory may be that next cycle.
Containment fallback. If a part is still unmatched, check whether it sits fully inside any already-matched polygon at the same (issued_time, wind_threshold_kt). The filled (donut-hole- ignoring) exterior of each candidate container is tested against the filled part. The smallest qualifying container wins.

Parts are processed in ascending percentage order within each (issued_time, wind_threshold_kt) group so that the big outer bands (which are likeliest to match via line-intersection) get assigned first and can serve as containers for the small inner bands that follow.

Line-intersection matters because track points are sparse (12–24h spacing) while WSP probability bands — especially the inner 50–80% bands — can be narrow ribbons along the track. Containment fallback matters because NHC WSP bands are nested annuli: a 90% lobe typically sits in the donut hole of the 70% band of the same storm, so even when no track point is near the lobe its parent band already is.

Multi-storm parts: if a single (issued_time, wind_threshold_kt, percentage) polygon is a MultiPolygon, it is exploded before matching. A storm can appear in multiple rows for the same band when its cone splits into disjoint regions. Callers that need a one-row-per- (issued_time, wind_threshold_kt, percentage, atcf_id) view should .dissolve(by=[...]) the result themselves.

Parameters:

gdf_wsp (GeoDataFrame) – Rows from storms.nhc_wsp_polygon_raw. Required columns: issued_time, wind_threshold_kt, percentage, geometry.
gdf_tracks (GeoDataFrame) – Rows from storms.nhc_tracks_geo. Required columns: atcf_id, issued_time, geometry (Point, EPSG:4326). Optional: valid_time (used to order points along the track line).
extra_containers (GeoDataFrame | None) – Optional GeoDataFrame of already-matched polygons supplied by the caller (e.g. from a prior matching pass). Used only as containment-fallback donors — never re-matched. Required columns: issued_time, wind_threshold_kt, atcf_id, geometry. Any rows with atcf_id IS NULL are ignored.

Returns:

Exploded polygon parts with an added atcf_id column.

One row per (storm, polygon part) where the track-line intersects or containment fallback fires.
If a part is intersected by multiple storms’ tracks (overlapping cones), a row is emitted for each matching storm.
Parts with no match still have atcf_id=None.

Return type:

GeoDataFrame