HMD

The HMD class processes Hybrid Millidecade (HMD) spectral data stored in NetCDF files. It supports loading data from one or more deployment directories, extracting broadband or fractional-octave band levels, and optional Dask-based parallel processing for large datasets.

class ecosound.soundscape.hmd.HMD(n_workers=4, memory_per_worker='4GB', temp_directory=None, use_dask=True, use_processes=None)[source]

Bases: object

Hybrid Millidecade acoustic data processor for NetCDF spectral files.

Parameters:
  • n_workers (int, optional) – Number of Dask workers (default: 4)

  • memory_per_worker (str, optional) – Memory limit per worker (default: ‘4GB’)

  • temp_directory (str, optional) – Directory for Dask temporary files (default: system temp)

  • use_dask (bool, optional) – Enable Dask parallel processing (default: True)

  • use_processes (bool, optional) – Use processes instead of threads (default: auto-detect, False on Windows)

Examples

>>> hmd = HMD(n_workers=4)
>>> hmd.load_nc_files('path/to/deployment_01')
>>> result = hmd.extract_band_levels([[50, 300]], ['ship'])
>>> hmd.summary()

Initialize HMD processor with optional Dask cluster

__init__(n_workers=4, memory_per_worker='4GB', temp_directory=None, use_dask=True, use_processes=None)[source]

Initialize HMD processor with optional Dask cluster

load_nc_files(path, freq_range=None, time_range=None, recursive=False, chunks=None, prefilter_by_date=False, filename_pattern=None)[source]

Load NetCDF files from a directory.

Parameters:
  • path (str) – Path to directory containing .nc files

  • freq_range (tuple, optional) – Frequency range in Hz (min_freq, max_freq)

  • time_range (tuple, optional) – Time range as strings (‘2021-06-01’, ‘2021-06-30’). Start is inclusive, end is exclusive: [start, end)

  • recursive (bool, optional) – If True, search subdirectories recursively (default: False)

  • chunks (dict, optional) – Custom chunk sizes (default: {‘time’: 1440, ‘frequency’: 500})

  • prefilter_by_date (bool, optional) – Filter files by date before loading (default: False)

  • filename_pattern (str, optional) – Regex pattern to filter files by name. Only files matching this pattern will be loaded. Examples: - r’_d{8}.nc$’ : Files ending with _YYYYMMDD.nc (e.g., data_20201121.nc) - r'^deployment_.*\.nc$' : Files starting with 'deployment_' - r’.*_HMD_.*.nc$’ : Files containing ‘_HMD_’ Default: None (load all .nc files)

Return type:

self

Examples

>>> # Load all .nc files
>>> hmd.load_nc_files('data/deployment_01')
>>>
>>> # Load only files ending with date pattern _YYYYMMDD.nc
>>> hmd.load_nc_files('data/',
...                   recursive=True,
...                   filename_pattern=r'_\d{8}\.nc$')
>>>
>>> # Load only files with specific prefix
>>> hmd.load_nc_files('data/',
...                   filename_pattern=r'^HMD_.*\.nc$')
extract_band_levels(freq_bands, band_names=None, persist=True)[source]

Extract time series at specific frequencies or frequency bands.

Parameters:
  • freq_bands (list of lists/tuples) – List of frequency specifications. Each element can be: - [freq]: Single frequency point (e.g., [100] for 100 Hz) - [freq_min, freq_max]: Frequency band (e.g., [50, 300] for 50-300 Hz)

  • band_names (list of str, optional) – Names for each band. If None, auto-generated as ‘band_0’, ‘band_1’, etc.

  • persist (bool) – If True, keep result in distributed memory

Returns:

Dataset with time series for each band/frequency

Return type:

xarray.Dataset

Examples

>>> # Single frequencies
>>> bands = [[100], [500], [1000]]
>>> result = hmd.extract_band_levels(bands)
>>> # Frequency bands
>>> bands = [[50, 300], [500, 2000], [2000, 10000]]
>>> names = ['ship', 'fish', 'mammal']
>>> result = hmd.extract_band_levels(bands, band_names=names)
>>> # Mixed: single frequencies and bands
>>> bands = [[100], [50, 300], [1000], [2000, 10000]]
>>> names = ['100Hz', 'ship', '1000Hz', 'mammal']
>>> result = hmd.extract_band_levels(bands, band_names=names)
>>> # Access results
>>> ship_noise = result['ship']
>>> ts_100hz = result['100Hz']
compute_timeseries_stats(data, percentiles=[10, 50, 90], resolution='1h')[source]

Compute statistics (mean and percentiles) on acoustic data in dB. Optimized for parallel computation with Dask. Robust to NaN values.

compute_frequency_stats(data=None, percentiles=[1, 10, 25, 50, 75, 90, 99], include_mean=False, persist=True)[source]

Compute statistics (percentiles and optionally mean) across the time dimension for each frequency. Optimized for parallel computation with Dask. Robust to NaN values.

This method computes statistics across time for spectral data (PSD), producing a frequency-domain summary showing how sound levels are distributed at each frequency.

Parameters:
  • data (xarray.DataArray or xarray.Dataset, optional) – Input spectral data in dB units with ‘frequency’ and ‘time’ dimensions. If None, uses self.ds.psd (default: None)

  • percentiles (list of int/float, optional) – Percentiles to compute (default: [1, 10, 25, 50, 75, 90, 99]). Each percentile will be saved as ‘p{value}’ (e.g., p50 for median)

  • include_mean (bool, optional) – If True, also compute the proper dB mean (convert to linear, average, convert back). Default is False since mean is less commonly used for PSD analysis.

  • persist (bool, optional) – If True and using Dask, immediately persist results to distributed memory. Default is True for better performance.

Returns:

Statistics with ‘frequency’ dimension. Variables include: - ‘p{N}’: Nth percentile for each frequency (e.g., p50 is median) - ‘mean’: Proper dB mean (only if include_mean=True) - ‘count’: Number of valid observations at each frequency

Return type:

xarray.Dataset

Examples

>>> # Compute default percentiles for PSD
>>> hmd = HMD(n_workers=4)
>>> hmd.load_nc_files('deployment_01/')
>>> freq_stats = hmd.compute_frequency_stats()
>>>
>>> # Custom percentiles without persisting
>>> freq_stats = hmd.compute_frequency_stats(
...     percentiles=[5, 50, 95],
...     include_mean=True,
...     persist=False
... )
>>>
>>> # Use with custom data
>>> freq_stats = hmd.compute_frequency_stats(data=hmd.ds.psd)

Notes

  • For dB data, mean is computed properly: convert to linear → average → convert back

  • All percentiles are computed together in one operation for efficiency

  • Results are persisted to Dask distributed memory for faster subsequent access

  • Use this method in plot_psd for consistent statistics computation

plot_psd(style='quantile', percentiles=[1, 10, 25, 50, 75, 90, 99], freq_range=None, db_range=None, scale='log', cmap='bimary', colors=None, linewidth=1.5, alpha=0.7, legend_loc='best', title=None, figsize=(10, 6), dpi=100, save_path=None, show=True, return_data=False)[source]

Plot Power Spectral Density (PSD) with statistical summaries across frequency.

Creates visualizations showing the distribution of sound levels across frequency using either quantile lines (with confidence bands) or density heatmaps.

Parameters:
  • style (str, optional) – Visualization style (default: ‘quantile’): - ‘quantile’: Line plot with percentile bands - ‘density’: 2D histogram heatmap showing distribution - ‘both’: Overlay density heatmap with quantile lines on top

  • percentiles (list of float, optional) – Percentiles to display (default: [1, 10, 25, 50, 75, 90, 99]). Each percentile is plotted as a separate line labeled as L1, L10, etc.

  • freq_range (tuple, optional) – Frequency range to display (min_freq, max_freq) in Hz. If None, uses full frequency range from data.

  • db_range (tuple, optional) – dB scale limits (min_db, max_db) for y-axis. If None, auto-scales to data range.

  • scale (str, optional) – Frequency axis scale: ‘log’ (default) or ‘linear’

  • cmap (str, optional) – Matplotlib colormap for density plots (default: ‘viridis’). Good options: ‘viridis’, ‘plasma’, ‘inferno’, ‘Blues’, ‘YlOrRd’

  • colors (str, list, or colormap, optional) – Colors for percentile lines. Can be: - None: Auto-generated colors - String: Colormap name (e.g., ‘viridis’, ‘plasma’) - List: Explicit list of colors

  • linewidth (float, optional) – Line width for percentile curves (default: 1.5)

  • alpha (float, optional) – Line transparency (default: 0.7)

  • legend_loc (str, optional) – Legend location (default: ‘best’). Same options as plot_multiyear_overlay: - ‘best’, ‘upper right’, ‘lower left’, etc. - ‘outside right’, ‘outside right top’, ‘outside right bottom’ - ‘outside left’, ‘outside left top’, ‘outside left bottom’

  • title (str, optional) – Plot title (auto-generated if None)

  • figsize (tuple, optional) – Figure size in inches (default: (10, 6))

  • dpi (int, optional) – Resolution in dots per inch (default: 100)

  • save_path (str, optional) – Path to save figure (PNG, PDF, etc.)

  • show (bool, optional) – Whether to display the figure (default: True)

  • return_data (bool, optional) – If True, return computed statistics instead of plotting.

Returns:

Figure object if return_data=False, otherwise dict of computed statistics

Return type:

matplotlib.figure.Figure or dict

Examples

>>> # Load data and plot PSD quantiles with default percentiles
>>> hmd = HMD(n_workers=4)
>>> hmd.load_nc_files('deployment_01/', time_range=('2020-01-01', '2020-02-01'))
>>> hmd.plot_psd(style='quantile')  # Shows L1, L10, L25, L50, L75, L90, L99
>>>
>>> # Custom percentiles with legend outside and custom title
>>> hmd.plot_psd(style='quantile',
...              percentiles=[5, 50, 95],
...              title='Acoustic Spectral Characteristics',
...              legend_loc='outside right top')
>>>
>>> # Use colormap for percentile lines
>>> hmd.plot_psd(style='quantile',
...              colors='viridis',
...              linewidth=2,
...              alpha=0.8,
...              title='PSD Analysis')
>>>
>>> # Density plot with custom frequency range
>>> hmd.plot_psd(style='density', freq_range=(20, 2000), scale='log')
>>>
>>> # Overlay density and quantile plots
>>> hmd.plot_psd(style='both', figsize=(12, 6))
>>>
>>> # Get statistics for further analysis
>>> stats = hmd.plot_psd(return_data=True)

Notes

  • PSD plots show spectral characteristics of the acoustic environment

  • Quantile plots are useful for understanding typical and extreme levels

  • Density plots reveal the full distribution of sound levels at each frequency

  • Log scale is recommended for frequency axis to better visualize wide ranges

plot_ltsa(bin='1H', freq_range=None, db_range=(32, 108), scale='log', cmap='rainbow', statistic='median', plot_date_range=None, title=None, figsize=(14, 6), dpi=100, save_path=None, show=True, return_data=False)[source]

Plot Long-Term Spectral Average (LTSA) spectrogram.

Creates a spectrogram visualization with time on x-axis, frequency on y-axis, and color representing sound intensity. Data is binned in time and the specified statistic (median, mean, etc.) is computed for each time-frequency bin.

Parameters:
  • bin (str, optional) – Time interval for binning (default: ‘1H’). Examples: - ‘1H’: 1 hour bins - ‘6H’: 6 hour bins - ‘1D’: 1 day bins - ‘1W’: 1 week bins

  • freq_range (tuple, optional) – Frequency range to display (min_freq, max_freq) in Hz. If None, uses full frequency range from data.

  • db_range (tuple, optional) – Fixed dB scale limits (min_db, max_db) for color mapping. If None, auto-scales to data range.

  • scale (str, optional) – Frequency axis scale: ‘log’ (default) or ‘linear’

  • cmap (str, optional) – Matplotlib colormap name (default: ‘rainbow’). Good options: ‘rainbow’, ‘jet’, ‘viridis’, ‘plasma’, ‘inferno’

  • statistic (str, optional) – Statistic to compute for each bin (default: ‘median’). Options: ‘median’, ‘mean’, ‘min’, ‘max’, ‘std’

  • plot_date_range (tuple, list, or str, optional) – Date range to display on x-axis (default: None, uses full data range). Can be: - Tuple/list of two dates: (start_date, end_date) as strings or datetime - ‘fullyear’: Automatically set to Jan 1 - Dec 31 of the year being plotted - None: Use full range of available data

  • title (str, optional) – Plot title (auto-generated if None)

  • figsize (tuple, optional) – Figure size in inches (default: (14, 6))

  • dpi (int, optional) – Resolution in dots per inch (default: 100)

  • save_path (str, optional) – Path to save figure (PNG, PDF, etc.)

  • show (bool, optional) – Whether to display the figure (default: True)

  • return_data (bool, optional) – If True, return the binned data array instead of plotting. Useful for further analysis or custom plotting.

Returns:

Figure object if return_data=False, otherwise the binned data

Return type:

matplotlib.figure.Figure or xarray.DataArray

Examples

>>> # Load data and plot LTSA with 6-hour bins
>>> hmd = HMD(n_workers=4)
>>> hmd.load_nc_files('deployment_01/', time_range=('2020-01-01', '2020-02-01'))
>>> hmd.plot_ltsa(bin='6H', freq_range=(50, 1000), scale='log')
>>>
>>> # Daily bins with custom color range
>>> hmd.plot_ltsa(bin='1D', db_range=(60, 120), cmap='plasma')
>>>
>>> # Display full year (Jan 1 - Dec 31)
>>> hmd.plot_ltsa(bin='1D', plot_date_range='fullyear')
>>>
>>> # Display specific date range
>>> hmd.plot_ltsa(bin='1H', plot_date_range=('2020-06-01', '2020-06-30'))
>>>
>>> # Get binned data for further analysis
>>> ltsa_data = hmd.plot_ltsa(bin='1H', return_data=True)

Notes

  • LTSA is useful for visualizing long-term acoustic patterns

  • Binning reduces data volume while preserving key features

  • Log scale is recommended for frequency axis in most cases

  • Median is robust to outliers compared to mean

plot_multiyear_overlay(data, band_names=None, time_axis='dayofyear', smoothing_window=None, title=None, xlabel=None, ylabel='Sound Level (dB)', figsize=None, colors=None, alpha=0.7, linewidth=1.5, grid=True, legend_loc='best', save_path=None, show=True, show_median=False, median_color='black', median_linewidth=2.5, median_linestyle='-', median_alpha=1.0, show_percentile_range=False, percentiles=[10, 90], range_color='lightgray', range_alpha=0.3, **kwargs)[source]

Plot multi-year overlay of time series data with aligned time axes.

Creates plots where each year is shown as a separate line, aligned by day of year, week of year, etc. Useful for comparing seasonal patterns across multiple years.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – Time series data spanning multiple years. - If DataArray: plots single variable across years - If Dataset: plots selected or all data variables as subplots

  • band_names (str or list of str, optional) – For Dataset input, specify which variables to plot. If None, plots all data variables.

  • time_axis (str, optional) – How to align time across years (default: ‘dayofyear’): - ‘dayofyear’: Day of year (1-366), best for daily+ resolution - ‘weekofyear’: Week of year (1-53), good for weekly+ data - ‘month’: Month (1-12), for monthly data - ‘dayofweek’: Day of week (0-6), for analyzing weekly patterns

  • smoothing_window (int, optional) – Apply rolling mean smoothing with this window size. Units match time_axis (e.g., days for dayofyear). Default: None (no smoothing)

  • title (str, optional) – Plot title (auto-generated if None)

  • xlabel (str, optional) – X-axis label. If None, auto-generated based on time_axis (e.g., ‘Day of Year’, ‘Week of Year’, etc.)

  • ylabel (str, optional) – Y-axis label (default: ‘Sound Level (dB)’)

  • figsize (tuple, optional) – Figure size. Default: (14, 6) for single plot, scaled for multiple

  • colors (str, list, or colormap, optional) – Colors for each year. Can be: - None: Auto-generated using ‘tab10’ or ‘tab20’ colormap - String: Colormap name (e.g., ‘viridis’, ‘plasma’, ‘coolwarm’) - Colormap object: matplotlib.cm colormap - List: Explicit list of colors for each year

  • alpha (float, optional) – Line transparency (default: 0.7)

  • linewidth (float, optional) – Line width (default: 1.5)

  • grid (bool, optional) – Show grid (default: True)

  • legend_loc (str, optional) – Legend location (default: ‘best’). Can be: - ‘best’, ‘upper right’, ‘upper left’, ‘lower left’, ‘lower right’, etc. - ‘outside right’: Places legend outside plot on the right (centered) - ‘outside right top’: Places legend outside plot on the right (top-aligned) - ‘outside right bottom’: Places legend outside plot on the right (bottom-aligned) - ‘outside left’: Places legend outside plot on the left (centered) - ‘outside left top’: Places legend outside plot on the left (top-aligned) - ‘outside left bottom’: Places legend outside plot on the left (bottom-aligned)

  • save_path (str, optional) – Path to save figure

  • show (bool, optional) – Whether to display the figure (default: True)

  • show_median (bool, optional) – If True, display median line across all years (default: False)

  • median_color (str, optional) – Color for median line (default: ‘black’)

  • median_linewidth (float, optional) – Line width for median line (default: 2.5)

  • median_linestyle (str, optional) – Line style for median (‘-’, ‘–’, ‘-.’, ‘:’, default: ‘-‘)

  • median_alpha (float, optional) – Transparency for median line (default: 1.0)

  • show_percentile_range (bool, optional) – If True, display shaded range between percentiles (default: False)

  • percentiles (list of two numbers, optional) – Lower and upper percentiles for range (default: [10, 90])

  • range_color (str, optional) – Color for percentile range shading (default: ‘lightgray’)

  • range_alpha (float, optional) – Transparency for percentile range (default: 0.3)

  • **kwargs (dict) – Additional matplotlib plot arguments

Return type:

matplotlib.figure.Figure

Examples

>>> # Load multi-year data
>>> hmd.load_nc_files('deployment/', time_range=('2018-01-01', '2021-01-01'))
>>> band_levels = hmd.extract_band_levels([[50, 300]], ['ship'])
>>> stats = hmd.compute_timeseries_stats(band_levels, resolution='1D')
>>>
>>> # Plot mean levels across years, aligned by day of year
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            time_axis='dayofyear',
...                            smoothing_window=7)  # 7-day smoothing
>>>
>>> # Compare multiple bands across years
>>> hmd.plot_multiyear_overlay(stats,
...                            band_names=['ship_mean', 'fish_mean'],
...                            time_axis='weekofyear')
>>>
>>> # Use a colormap for years
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            colors='viridis',  # Colormap name
...                            time_axis='dayofyear')
>>>
>>> # Use a diverging colormap
>>> import matplotlib.pyplot as plt
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            colors='coolwarm',
...                            time_axis='dayofyear')
>>>
>>> # Use explicit colors
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            colors=['red', 'blue', 'green'],
...                            time_axis='dayofyear')
>>>
>>> # Place legend outside plot on the right
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            legend_loc='outside right',
...                            time_axis='dayofyear')
>>>
>>> # Place legend outside on the right, top-aligned
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            legend_loc='outside right top',
...                            time_axis='dayofyear')
>>>
>>> # Custom axis labels
>>> hmd.plot_multiyear_overlay(stats['ship_mean'],
...                            xlabel='Day of Year (Jan 1 = 1)',
...                            ylabel='SPL (dB re 1 µPa)',
...                            time_axis='dayofyear')
plot_timeseries(data, band_names=None, overlay=True, title=None, ylabel='Sound Level (dB)', xlabel='Time', figsize=None, colors=None, alpha=0.7, linewidth=1.5, grid=True, legend_loc='best', sharex=True, sharey=False, save_path=None, **kwargs)[source]

Plot time series of pre-extracted acoustic data.

Parameters:
  • data (xarray.DataArray or xarray.Dataset) – Pre-computed time series data. - If DataArray: plots single time series - If Dataset: plots selected or all data variables

  • band_names (str or list of str, optional) – For Dataset input, specify which variables to plot. If None, plots all data variables. For DataArray input, this is ignored.

  • overlay (bool, optional) – If True (default), plot all series on same axes. If False, create separate subplot for each series.

  • title (str, optional) – Plot title (auto-generated if None)

  • figsize (tuple, optional) – Figure size. Defaults based on overlay and number of series: - Single: (12, 4) - Multiple overlaid: (14, 6) - Multiple subplots: (12, 3*n_series)

  • colors (str or list, optional) – Colors for plotting. Single color or list of colors.

  • alpha (float, optional) – Line transparency (default: 0.7)

  • linewidth (float, optional) – Line width (default: 1.5)

  • grid (bool, optional) – Show grid (default: True)

  • legend_loc (str, optional) – Legend location (default: ‘best’), only used when overlay=True

  • sharex (bool, optional) – Share x-axis across subplots when overlay=False (default: True)

  • sharey (bool, optional) – Share y-axis across subplots when overlay=False (default: False)

  • save_path (str, optional) – Path to save figure

  • **kwargs (dict) – Additional matplotlib plot arguments

Return type:

matplotlib.figure.Figure

Examples

>>> # Single time series
>>> hmd.plot_timeseries(result['ship'])
>>> # Multiple bands overlaid (default)
>>> hmd.plot_timeseries(result, band_names=['ship', 'fish'])
>>> # Multiple bands as separate subplots
>>> hmd.plot_timeseries(result, band_names=['ship', 'fish'], overlay=False)
>>> # All bands as subplots with shared y-axis
>>> hmd.plot_timeseries(result, overlay=False, sharey=True)
timeseries_dashboard(data, title='Time Series Dashboard', port=5006, show=True, save_html=None)[source]

Create advanced interactive dashboard with variable selection and multiple views.

Parameters:
  • data (xarray.Dataset or list of xarray.Dataset) – Time series data (must be computed). Can be a single Dataset or a list of Datasets to overlay on the same plot.

  • title (str, optional) – Dashboard title

  • port (int, optional) – Port for web server

  • show (bool, optional) – Auto-open browser

  • save_html (str, optional) – Save to HTML file

Returns:

Panel template dashboard

Return type:

panel.template.Template

Examples

>>> # Single dataset
>>> stats = hmd.compute_timeseries_stats(band_levels, resolution='1h')
>>> hmd.plot_interactive_advanced(stats.compute())
>>> # Multiple datasets overlaid
>>> hmd.plot_interactive_advanced([band_levels.compute(), stats.compute()])
summary()[source]

Print summary of loaded dataset

rechunk(chunks=None)[source]

Rechunk the dataset for optimal performance.

Rechunking can improve performance when chunks don’t align well with the stored data or when you’re doing operations along specific dimensions.

Parameters:

chunks (dict, optional) – Chunk sizes for each dimension. If None, uses intelligent defaults based on the operation type: - For time-series extraction: {‘time’: -1, ‘frequency’: ‘auto’} - For spectral analysis: {‘time’: ‘auto’, ‘frequency’: -1} Default: {‘time’: 1440, ‘frequency’: -1} (good for band extraction)

Return type:

self

Examples

>>> # Rechunk for time-series extraction (extract_band_levels)
>>> hmd.rechunk({'time': -1, 'frequency': 'auto'})
>>>
>>> # Rechunk for spectral statistics
>>> hmd.rechunk({'time': 'auto', 'frequency': -1})
>>>
>>> # Custom chunking
>>> hmd.rechunk({'time': 10000, 'frequency': 500})

Notes

  • Use -1 to load entire dimension into single chunk

  • Use ‘auto’ to let Dask determine optimal size

  • Rechunking can be slow but improves downstream performance

  • Call this after load_nc_files() and before analysis operations

subset(freq_range=None, time_range=None, persist=True)[source]

Create a subset of the data

analyze_spatial_correlation(timeseries, spatial_grid, method='pearson', min_periods=None, fill_grid_nan=True, absolute=False, compute=True, use_lag=False, max_lag=None)[source]

Compute spatial correlation between a time series and gridded spatial data.

This method correlates a single time series (e.g., acoustic levels at a point) with time series at each spatial grid cell (e.g., vessel counts across a region). Uses Dask for parallel processing across grid cells.

Parameters:
  • timeseries (xarray.DataArray) – 1D time series with ‘time’ dimension (e.g., SPL at recorder location)

  • spatial_grid (xarray.DataArray) – 3D gridded data with dimensions (time, latitude, longitude) (e.g., vessel counts on a spatial grid)

  • method (str, optional) – Correlation method: ‘pearson’ (default), ‘spearman’, or ‘kendall’

  • min_periods (int, optional) – Minimum number of overlapping time points required. Default is None (uses all available overlap)

  • fill_grid_nan (bool, optional) – If True (default), fill NaN values in spatial_grid with zeros. This is useful for vessel count grids where NaN typically means zero vessels. A warning will be displayed if NaNs are found and filled.

  • absolute (bool, optional) – If True, return absolute values of correlation coefficients [0, 1]. If False (default), return full correlation values [-1, 1]. Use absolute=True when you care about correlation strength regardless of direction.

  • compute (bool, optional) – If True (default), compute result immediately. If False, return lazy Dask array.

  • use_lag (bool, optional) – If True, use cross-correlation with lag to find maximum correlation. If False (default), use standard correlation at lag=0. This is useful when the spatial signal (e.g., vessel activity) may precede or lag the acoustic signal.

  • max_lag (int, optional) – Maximum time lag (in time steps) to consider when use_lag=True. The correlation will be computed for all lags from -max_lag to +max_lag. If None and use_lag=True, defaults to min(50, n_times // 4). Positive lags mean the grid leads the timeseries.

Returns:

If use_lag=False: 2D correlation map with dimensions (latitude, longitude) containing correlation coefficients in [-1, 1] (or [0, 1] if absolute=True).

If use_lag=True: Dataset with two DataArrays: 'correlation' (maximum coefficient at each grid cell) and 'lag' (time lag in steps where maximum occurs; positive means the grid leads the timeseries).

Return type:

xarray.DataArray or xarray.Dataset

Examples

>>> # Correlate acoustic levels with vessel grid
>>> with AISQueryHelper(db_file) as ais:
...     vessel_grid = ais.create_gridded_vessel_counts(...)
>>>
>>> with HMD(n_workers=8) as hmd:
...     hmd.load_nc_files(deployment_dir, time_range=time_range)
...     band_levels = hmd.extract_band_levels([[50, 300]], ['ship'])
...     stats = hmd.compute_timeseries_stats(band_levels, resolution='1H')
...
...     # Correlate mean ship noise with vessel counts (fills NaNs with 0)
...     corr_map = hmd.analyze_spatial_correlation(
...         stats['ship_mean'],
...         vessel_grid,
...         method='pearson',
...         fill_grid_nan=True  # Default, fills NaNs in grid with 0
...     )
...
...     # Get absolute correlation (strength regardless of direction)
...     corr_map_abs = hmd.analyze_spatial_correlation(
...         stats['ship_mean'],
...         vessel_grid,
...         absolute=True  # Returns values in [0, 1]
...     )
...
...     # Or keep NaNs in the grid (may result in NaN correlations)
...     corr_map = hmd.analyze_spatial_correlation(
...         stats['ship_mean'],
...         vessel_grid,
...         fill_grid_nan=False
...     )
...
...     # Plot correlation map
...     corr_map.plot()
...
...     # Use cross-correlation with lag to account for temporal offset
...     result = hmd.analyze_spatial_correlation(
...         stats['ship_mean'],
...         vessel_grid,
...         use_lag=True,
...         max_lag=24  # Search up to 24 hours of lag
...     )
...
...     # Access the correlation and lag maps
...     corr_map_lag = result['correlation']  # Maximum correlation at each point
...     lag_map = result['lag']  # Lag in time steps where max occurs
...
...     # Plot results
...     corr_map_lag.plot()
...     lag_map.plot()  # Positive = grid leads, negative = timeseries leads

Notes

  • Time coordinates must overlap between timeseries and spatial_grid

  • NaN values are handled automatically (excluded from correlation)

  • Computation is parallelized across grid cells using Dask

  • Grid cells with insufficient valid data return NaN

  • When use_lag=True, only Pearson correlation is supported (method must be ‘pearson’)

save_to_csv(data, output_path, wide_format=True)[source]

Save xarray Dataset or DataArray to CSV file(s).

Parameters:
  • data (xarray.Dataset or xarray.DataArray) – Data to save

  • output_path (str) – Output file path. For Datasets with multiple variables: - If wide_format=True: saves to single file at this path - If wide_format=False: saves multiple files with variable names appended

  • wide_format (bool, optional) – If True (default), save all variables in one wide CSV. If False, save each variable to a separate CSV file.

Examples

>>> # Save statistics to CSV
>>> stats = hmd.compute_timeseries_stats(band_levels, resolution='1h')
>>> hmd.save_to_csv(stats, 'statistics.csv')
>>> # Save each variable separately
>>> hmd.save_to_csv(stats, 'statistics.csv', wide_format=False)
close()[source]

Close the Dask client and clean up resources