AIS Downloader

The AISDataDownloaderDuckDB class downloads historical AIS vessel-traffic data from Marine Cadastre and stores it efficiently in a local DuckDB database with Parquet files for fast spatial and temporal queries.

class ecosound.environment.ais_downloader.AISDataDownloaderDuckDB(db_path: str, parquet_dir: str, temp_dir: str | None = None)[source]

Bases: object

Downloads and processes AIS data from Marine Cadastre with DuckDB + Parquet storage. Optimized for fast temporal and geographic queries.

Initialize the AIS data downloader with DuckDB.

Parameters:
  • db_path – Path to DuckDB database file

  • parquet_dir – Directory to store Parquet files (partitioned by date)

  • temp_dir – Temporary directory for downloads (default: system temp)

BASE_URL = 'https://coast.noaa.gov/htdata/CMSP/AISDataHandler'
__init__(db_path: str, parquet_dir: str, temp_dir: str | None = None)[source]

Initialize the AIS data downloader with DuckDB.

Parameters:
  • db_path – Path to DuckDB database file

  • parquet_dir – Directory to store Parquet files (partitioned by date)

  • temp_dir – Temporary directory for downloads (default: system temp)

setup_database()[source]

Initialize DuckDB database with optimized settings and schema.

create_ais_view()[source]

Create a view that unions all Parquet files for easy querying. This view enables querying all AIS data as a single table.

generate_date_urls(start_date: str, end_date: str) List[Tuple[str, str]][source]

Generate download URLs for date range.

Parameters:
  • start_date – Start date in YYYY-MM-DD format

  • end_date – End date in YYYY-MM-DD format

Returns:

List of (url, date_string) tuples

is_date_in_database(date_str: str) bool[source]

Check if a date has already been processed and is in the database.

Parameters:

date_str – Date string in YYYY-MM-DD format

Returns:

True if date exists in database, False otherwise

async download_file(session: ClientSession, url: str, date_str: str, force_download: bool = False) Path | None[source]

Download a single AIS data file.

Parameters:
  • session – aiohttp client session

  • url – Download URL

  • date_str – Date string for filename

  • force_download – If True, download even if file already exists

Returns:

Path to downloaded file or None if failed

async download_files(start_date: str, end_date: str, max_concurrent: int = 5, force_download: bool = False) List[Path][source]

Download multiple AIS data files concurrently.

Parameters:
  • start_date – Start date in YYYY-MM-DD format

  • end_date – End date in YYYY-MM-DD format

  • max_concurrent – Maximum concurrent downloads

  • force_download – If True, download files even if already in database

Returns:

List of downloaded file paths

extract_and_process_file(zip_path: Path, min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None, force_process: bool = False) int[source]

Extract ZIP file and process CSV data into Parquet with geographic filtering.

Parameters:
  • zip_path – Path to ZIP file

  • min_lat – Latitude boundaries (optional)

  • max_lat – Latitude boundaries (optional)

  • min_lon – Longitude boundaries (optional)

  • max_lon – Longitude boundaries (optional)

  • force_process – If True, reprocess even if already in database

Returns:

Number of records inserted

process_all_files(downloaded_files: List[Path], min_lat: float | None = None, max_lat: float | None = None, min_lon: float | None = None, max_lon: float | None = None, max_workers: int = 4, force_process: bool = False)[source]

Process all downloaded files in parallel using ThreadPoolExecutor.

Parameters:
  • downloaded_files – List of downloaded ZIP file paths

  • min_lat – Latitude boundaries

  • max_lat – Latitude boundaries

  • min_lon – Longitude boundaries

  • max_lon – Longitude boundaries

  • max_workers – Maximum number of parallel workers (default: 4)

  • force_process – If True, reprocess even if already in database

optimize_database()[source]

Optimize the database and create the unified view.

cleanup_temp_files()[source]

Remove temporary download directory.

get_stats()[source]

Print database statistics.

create_vessel_type_lookup_table()[source]

Create and populate a vessel_type_lookup table in the DuckDB database. Based on the AIS specification standard codes.

example_queries()[source]

Print example queries to demonstrate the fast querying capabilities.