AIS Query
The AISQueryHelper class queries Automatic Identification System (AIS)
vessel-traffic data stored in a local DuckDB database backed by Parquet files.
It supports spatial queries by bounding box or radius, vessel track retrieval,
category-based filtering, and gridded vessel-count aggregation. Results are
returned as GeoDataFrames by default.
- class ecosound.environment.ais.AISQueryHelper(db_path: str)[source]
Bases:
objectHelper class for efficiently querying AIS data stored in DuckDB + Parquet. Optimized for date, geographic, and vessel-based queries.
All query methods return GeoDataFrames with Point geometries (WGS84/EPSG:4326).
Initialize the query helper.
- Parameters:
db_path – Path to DuckDB database file
- __init__(db_path: str)[source]
Initialize the query helper.
- Parameters:
db_path – Path to DuckDB database file
- query_rectangle(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, mmsi: int | List[int] | None = None, vessel_type: int | List[int] | None = None, vessel_name: str | None = None, limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]
Query AIS data within a geographic rectangle and date range. OPTIMIZED: Uses Hive partitioning (date) and Parquet column statistics (lat/lon).
- Parameters:
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
min_lat – Minimum latitude
max_lat – Maximum latitude
min_lon – Minimum longitude
max_lon – Maximum longitude
mmsi – Single MMSI or list of MMSIs (optional)
vessel_type – Single vessel type code or list of codes (optional)
vessel_name – Vessel name pattern for LIKE search (optional)
limit – Maximum number of records to return (optional)
return_gdf – Return GeoDataFrame (default: True) or DataFrame
- Returns:
GeoDataFrame with query results (or DataFrame if return_gdf=False)
- query_radius(start_date: str, end_date: str, center_lat: float, center_lon: float, radius_km: float, mmsi: int | List[int] | None = None, vessel_type: int | List[int] | None = None, vessel_name: str | None = None, limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]
Query AIS data within a radius from a center point. OPTIMIZED: Uses bounding box first, then calculates exact distance.
- Parameters:
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
center_lat – Center latitude (decimal degrees)
center_lon – Center longitude (decimal degrees)
radius_km – Radius in kilometers
mmsi – Single MMSI or list of MMSIs (optional)
vessel_type – Single vessel type code or list of codes (optional)
vessel_name – Vessel name pattern for LIKE search (optional)
limit – Maximum number of records to return (optional)
return_gdf – Return GeoDataFrame (default: True) or DataFrame
- Returns:
GeoDataFrame with query results including ‘distance_km’ column
- query_vessel_track(mmsi: int, start_date: str, end_date: str, min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]
Query all positions for a specific vessel (track). OPTIMIZED: MMSI filter is highly selective.
- Parameters:
mmsi – Vessel MMSI
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
min_lat – Optional latitude bounds
max_lat – Optional latitude bounds
min_lon – Optional longitude bounds
max_lon – Optional longitude bounds
return_gdf – Return GeoDataFrame (default: True) or DataFrame
- Returns:
GeoDataFrame with vessel track, sorted by time
- query_by_vessel_category(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, categories: str | List[str], limit: int | None = None, return_gdf: bool = True) DataFrame | GeoDataFrame[source]
Query AIS data by vessel category (Cargo, Tanker, Fishing, etc.). Uses the vessel_type_lookup table for categorization.
- Parameters:
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
min_lat – Minimum latitude
max_lat – Maximum latitude
min_lon – Minimum longitude
max_lon – Maximum longitude
categories – Single category or list of categories (e.g., ‘Cargo’, [‘Cargo’, ‘Tanker’])
limit – Maximum number of records to return (optional)
return_gdf – Return GeoDataFrame (default: True) or DataFrame
- Returns:
GeoDataFrame with query results including vessel category
- get_unique_vessels(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float) DataFrame[source]
Get unique vessels in a region during a time period. Note: Returns DataFrame (not GeoDataFrame) as each vessel has multiple positions.
- Parameters:
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
min_lat – Minimum latitude
max_lat – Maximum latitude
min_lon – Minimum longitude
max_lon – Maximum longitude
- Returns:
DataFrame with unique vessels and their details
- get_statistics(start_date: str, end_date: str, min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None) dict[source]
Get statistics about AIS data for a given region and time period.
- Parameters:
start_date – Start date (YYYY-MM-DD)
end_date – End date (YYYY-MM-DD)
min_lat – Optional latitude bounds
max_lat – Optional latitude bounds
min_lon – Optional longitude bounds
max_lon – Optional longitude bounds
- Returns:
Dictionary with statistics
- create_gridded_vessel_counts(start_date: str, end_date: str, min_lat: float, max_lat: float, min_lon: float, max_lon: float, width_km: float, height_km: float, time_resolution_hours: int = 1) DataArray[source]
Create a gridded xarray DataArray with counts of unique vessels per hour.
- Parameters:
start_date – Start date (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)
end_date – End date (YYYY-MM-DD or YYYY-MM-DD HH:MM:SS)
min_lat – Minimum latitude
max_lat – Maximum latitude
min_lon – Minimum longitude
max_lon – Maximum longitude
width_km – Grid cell width in kilometers
height_km – Grid cell height in kilometers
time_resolution_hours – Time resolution in hours (default: 1)
- Returns:
xarray.DataArray with dimensions (time, lat, lon) containing unique vessel counts (NaN for zero counts)
- static calculate_bounding_box(center_lat: float, center_lon: float, radius_km: float) dict[source]
Calculate bounding box for a given center point and radius.
- Parameters:
center_lat – Center latitude
center_lon – Center longitude
radius_km – Radius in kilometers
- Returns:
Dictionary with min_lat, max_lat, min_lon, max_lon