AIS Query

The AISQueryHelper class queries Automatic Identification System (AIS) vessel-traffic data stored in a local DuckDB database backed by Parquet files. It supports spatial queries by bounding box or radius, vessel track retrieval, category-based filtering, and gridded vessel-count aggregation. Results are returned as GeoDataFrames by default.

class ecosound.environment.ais.AISQueryHelper(db_path: str)[source]

Bases: object

Helper class for efficiently querying AIS data stored in DuckDB + Parquet. Optimized for date, geographic, and vessel-based queries.

All query methods return GeoDataFrames with Point geometries (WGS84/EPSG:4326).

Initialize the query helper.

Parameters:

db_path – Path to DuckDB database file

__init__(db_path: str)[source]

Initialize the query helper.

Parameters:

db_path – Path to DuckDB database file

close()[source]

Close the database connection.

query_rectangle(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, mmsi: int | List[int] | None = None, vessel_type: int | List[int] | None = None, vessel_name: str | None = None, limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]

Query AIS data within a geographic rectangle and date range. OPTIMIZED: Uses Hive partitioning (date) and Parquet column statistics (lat/lon).

Parameters:
  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • min_lat – Minimum latitude

  • max_lat – Maximum latitude

  • min_lon – Minimum longitude

  • max_lon – Maximum longitude

  • mmsi – Single MMSI or list of MMSIs (optional)

  • vessel_type – Single vessel type code or list of codes (optional)

  • vessel_name – Vessel name pattern for LIKE search (optional)

  • limit – Maximum number of records to return (optional)

  • return_gdf – Return GeoDataFrame (default: True) or DataFrame

Returns:

GeoDataFrame with query results (or DataFrame if return_gdf=False)

query_radius(start_date: str, end_date: str, center_lat: float, center_lon: float, radius_km: float, mmsi: int | List[int] | None = None, vessel_type: int | List[int] | None = None, vessel_name: str | None = None, limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]

Query AIS data within a radius from a center point. OPTIMIZED: Uses bounding box first, then calculates exact distance.

Parameters:
  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • center_lat – Center latitude (decimal degrees)

  • center_lon – Center longitude (decimal degrees)

  • radius_km – Radius in kilometers

  • mmsi – Single MMSI or list of MMSIs (optional)

  • vessel_type – Single vessel type code or list of codes (optional)

  • vessel_name – Vessel name pattern for LIKE search (optional)

  • limit – Maximum number of records to return (optional)

  • return_gdf – Return GeoDataFrame (default: True) or DataFrame

Returns:

GeoDataFrame with query results including ‘distance_km’ column

query_vessel_track(mmsi: int, start_date: str, end_date: str, min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]

Query all positions for a specific vessel (track). OPTIMIZED: MMSI filter is highly selective.

Parameters:
  • mmsi – Vessel MMSI

  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • min_lat – Optional latitude bounds

  • max_lat – Optional latitude bounds

  • min_lon – Optional longitude bounds

  • max_lon – Optional longitude bounds

  • return_gdf – Return GeoDataFrame (default: True) or DataFrame

Returns:

GeoDataFrame with vessel track, sorted by time

query_by_vessel_category(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, categories: str | List[str], limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]

Query AIS data by vessel category (Cargo, Tanker, Fishing, etc.). Uses the vessel_type_lookup table for categorization.

Parameters:
  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • min_lat – Minimum latitude

  • max_lat – Maximum latitude

  • min_lon – Minimum longitude

  • max_lon – Maximum longitude

  • categories – Single category or list of categories (e.g., ‘Cargo’, [‘Cargo’, ‘Tanker’])

  • limit – Maximum number of records to return (optional)

  • return_gdf – Return GeoDataFrame (default: True) or DataFrame

Returns:

GeoDataFrame with query results including vessel category

get_unique_vessels(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float) DataFrame[source]

Get unique vessels in a region during a time period. Note: Returns DataFrame (not GeoDataFrame) as each vessel has multiple positions.

Parameters:
  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • min_lat – Minimum latitude

  • max_lat – Maximum latitude

  • min_lon – Minimum longitude

  • max_lon – Maximum longitude

Returns:

DataFrame with unique vessels and their details

get_statistics(start_date: str, end_date: str, min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None) dict[source]

Get statistics about AIS data for a given region and time period.

Parameters:
  • start_date – Start date (YYYY-MM-DD)

  • end_date – End date (YYYY-MM-DD)

  • min_lat – Optional latitude bounds

  • max_lat – Optional latitude bounds

  • min_lon – Optional longitude bounds

  • max_lon – Optional longitude bounds

Returns:

Dictionary with statistics

create_gridded_vessel_counts(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, width_km: float, height_km: float, time_resolution_hours: int = 1) DataArray[source]

Create a gridded xarray DataArray with counts of unique vessels per hour.

Parameters:
  • start_date – Start date (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)

  • end_date – End date (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)

  • min_lat – Minimum latitude

  • max_lat – Maximum latitude

  • min_lon – Minimum longitude

  • max_lon – Maximum longitude

  • width_km – Grid cell width in kilometers

  • height_km – Grid cell height in kilometers

  • time_resolution_hours – Time resolution in hours (default: 1)

Returns:

xarray.DataArray with dimensions (time, lat, lon) containing unique vessel counts (NaN for zero counts)

static calculate_bounding_box(center_lat: float, center_lon: float, radius_km: float) dict[source]

Calculate bounding box for a given center point and radius.

Parameters:
  • center_lat – Center latitude

  • center_lon – Center longitude

  • radius_km – Radius in kilometers

Returns:

Dictionary with min_lat, max_lat, min_lon, max_lon