Pipeline Stages¶
TL;DR: This pipeline takes NEON hyperspectral HDF5, produces corrected ENVI cubes, optionally convolves them to other sensors, then exports tidy Parquet for analysis—plus a visual QA panel.
flowchart TD
A[H5 download\n*_directional_reflectance.h5] --> B[ENVI export\n*_envi.img/.hdr]
B --> C[Topo + BRDF correction\n*_brdfandtopo_corrected_envi.* + JSON]
C --> D[Cross-sensor convolution\nTM/ETM+/OLI/OLI-2/MicaSense]
D --> E[Parquet export\nall ENVI products]
E --> F[DuckDB merge\n<prefix>_merged_pixel_extraction.parquet]
F --> G[QA panel\n<prefix>_qa.png]
1) Download NEON HDF5¶
Purpose Pull *_directional_reflectance.h5 tiles.
Inputs → Outputs H5 → same (no transform).
Run it
cross-sensor-cal download --site NIWO --year 2023 --tiles L019-1 L020-1
2) Export to ENVI¶
Purpose Convert H5 → ENVI .img/.hdr.
Inputs → Outputs H5 → *_directional_reflectance_envi.img/.hdr.
Run it
cross-sensor-cal envi-export --in data/*.h5 --out envi/
_envi suffix used downstream.
3) Topographic + BRDF correction¶
Purpose Apply topo/BRDF; write sidecar JSON with parameters.
Optional: run apply_brightness_correction on cubes in Python before this stage
to harmonise illumination; the QA JSON will expose the per-band gain/offsets.
Inputs → Outputs *_envi.img/.hdr → *_brdfandtopo_corrected_envi.img/.hdr + *_brdfandtopo_corrected_envi.json.
Run it
cross-sensor-cal correct --in envi/*_envi.img --dtm dtm/NIWO.tif --out corrected/
4) Cross-sensor convolution¶
Purpose Resample to target sensor bandpasses.
Inputs → Outputs corrected ENVI → *_resampled_<sensor>_envi.img/.hdr inside
Convolution_Reflectance_Resample_<sensor> directories.
Each convolution now also writes a brightness-free companion cube with the suffix
_undarkened_envi.img/.hdr so you can inspect the raw spectral convolution before
any Landsat↔MicaSense brightness offset is applied.
Run it
cross-sensor-cal convolve --in corrected/*_brdfandtopo_corrected_envi.img --sensor OLI --out convolved/
5) Parquet export¶
Purpose Flatten ENVI to tidy Parquet and self-heal corrupted sidecars.
Behavior For each ENVI product the stage checks whether <product>.parquet exists,
validates existing files with pyarrow.parquet.read_schema, reuses the valid ones, and
deletes + rebuilds any that fail validation. Manual cleanup is rarely needed.
Inputs → Outputs ENVI products → *.parquet.
Run it
cross-sensor-cal export-parquet --in corrected/*.img Convolution_Reflectance_Resample_*/*.img --out parquet/
--chunksize to keep memory low; reruns are restart-safe.
6) DuckDB merge¶
Purpose Merge parquet tables into one analysis-ready file while skipping corrupt inputs.
Behavior Each candidate Parquet is validated once. Invalid files trigger warnings such
as [merge] ⚠️ Skipping invalid parquet <file>: <error> and are ignored. The merge
continues as long as at least one valid file remains; it errors only when none are usable
for the prefix. The DuckDB join is now streamed directly to Parquet with configurable
memory/row-group limits so large flightlines no longer require hundreds of gigabytes of
RAM.
Undarkened convolution outputs are merged alongside the darkened stacks with distinct
column prefixes (for example, micasense_to_match_oli_oli2_undarkened_b###_wl####nm),
making it easy to compare convolution-only spectra to the brightness-adjusted values.
Inputs → Outputs parquet/*.parquet → <prefix>_merged_pixel_extraction.parquet.
Run it
cross-sensor-cal merge-duckdb --in parquet/*.parquet --out merged/<prefix>_merged_pixel_extraction.parquet
--merge-memory-limit, --merge-threads, or --merge-row-group-size when invoking
the CLI if you need to tune streaming behaviour for especially large scenes.
7) QA panel¶
Purpose Visual checks and metrics; saves <prefix>_qa.png and <prefix>_qa.json.
Run it
cscal-qa --base-folder processed/flightlines --quick
--full if you need denser stats.
File-naming at a glance¶
- Raw H5 →
..._directional_reflectance.h5 - ENVI export →
..._directional_reflectance_envi.img/.hdr - Corrected →
..._brdfandtopo_corrected_envi.img/.hdr+ JSON - Convolved →
..._brdfandtopo_corrected_envi_<sensor>.img/.hdr - Merged parquet →
<prefix>_merged_pixel_extraction.parquet - QA →
<prefix>_qa.png