Working with Parquet Outputs¶
SpectralBridge writes Parquet sidecars for ENVI products and a merged Parquet table for each completed flight line. These files are the main analysis entry point for DuckDB, pandas, or other columnar tools.
Why Parquet?¶
- Supports efficient columnar reads.
- Compresses well for large flight lines.
- Works cleanly with DuckDB SQL.
- Avoids loading full scenes into memory for simple summaries.
File structure¶
Typical Parquet files include:
*_envi.parquet
*_brdfandtopo_corrected_envi.parquet
*_landsat_oli_envi.parquet
*_merged_pixel_extraction.parquet
Per-product files use one row per pixel-band observation and include columns such as:
flightline_idrow,col,x,ybandwavelength_nmfwhm_nmreflectance
The merged Parquet combines raw, corrected, and sensor-resampled sidecars into
the per-flightline table named <flight_id>_merged_pixel_extraction.parquet.
Quick preview using DuckDB¶
import duckdb
duckdb.query("""
SELECT *
FROM '..._brdfandtopo_corrected_envi.parquet'
LIMIT 5
""").df()
Check dimensions without loading the full table:
duckdb.query("""
SELECT COUNT(*) AS nrows
FROM '..._landsat_oli_envi.parquet'
""").df()
Summarize a sensor product lazily:
duckdb.query("""
SELECT wavelength_nm, AVG(reflectance) AS mean_reflectance
FROM '..._landsat_oli_envi.parquet'
GROUP BY wavelength_nm
ORDER BY wavelength_nm
""").df()
Loading with pandas¶
import pandas as pd
df = pd.read_parquet("..._merged_pixel_extraction.parquet")
df.head()
Use pandas cautiously for large flight lines. DuckDB is usually better for filtering or aggregation before collecting results into memory.
Loading with xarray¶
Parquet-to-xarray workflows work best after pivoting or aggregating data. For
full spatial cubes, the ENVI .img/.hdr products remain the easier raster
interface.