Package Architecture¶
This page describes how SpectralBridge is organized internally. Understanding this structure helps contributors extend the pipeline or integrate new sensors without breaking guarantees.
Design philosophy and invariants¶
- Reproducibility first. Pipeline behavior is predictable and restart-safe; stages skip when valid outputs already exist instead of recomputing.
- Fixed ordering.
process_one_flightlineandgo_forth_and_multiplyorchestrate the same sequence: ENVI export → BRDF/topographic parameter build → BRDF+topo correction → sensor convolution/resampling → Parquet exports → DuckDB merge → QA panels. - File contracts.
FlightlinePathscentralizes filenames and directories; stages communicate only through these artifacts. Downstream docs and CI treat the merged Parquet and QA products as required outputs. - Outputs are the API. Functions return little; correctness is expressed through on-disk ENVI/Parquet and QA files that can be inspected or reused.
Pipeline architecture (high level)¶
- Stages are pure file transforms. Each consumes a known input set and writes ENVI, Parquet, and JSON/PNG sidecars. Stages do not mutate shared state.
- Orchestration happens in
pipelines/pipeline.pyviaprocess_one_flightline(single flightline) andgo_forth_and_multiply(batch). Both rely onFlightlinePathsto resolve paths and naming before delegating work. - Communication between stages is file-based. The BRDF+topo-corrected ENVI is always the source for sensor resampling; Parquet exports derive from both raw and corrected ENVI; the merged Parquet and QA summaries consume all earlier artifacts.
- Restart safety is achieved because each stage validates its outputs and returns early when they already exist. Partial runs can be resumed without recomputation or corrupting prior files.
Directory structure (Python package)¶
spectralbridge/pipelines/: orchestration entry points and Ray helpersspectralbridge/exports/: ENVI export helpersspectralbridge/io/: schema and I/O helpers (e.g., NEON schema resolution)spectralbridge/utils/: shared utilities, naming/path helpers, memory managementspectralbridge/data/: spectral metadata and calibration tableslandsat_band_parameters.json: band centers/FWHM used for resamplingbrightness/*.json: brightness adjustments between Landsat and MicaSensehyperspectral_bands.json: reference metadata for hyperspectral inputsspectralbridge/qa_plots.pyandspectralbridge/sensor_panel_plots.py: QA visualization utilitiesspectralbridge/standard_resample.py: spectral resampling and coefficients
Extending the system safely¶
Adding or modifying a target sensor¶
- Update spectral definitions in
spectralbridge/data/landsat_band_parameters.json(or analogous table for the new sensor) and ensure resampling logic instandard_resample.pyknows how to consume them. - Confirm
get_flightline_productsandFlightlinePathsgenerate filenames for the new sensor; outputs must still include merged Parquet and QA artifacts. - Add tests that validate band definitions and resampled outputs; do not bypass the existing stage ordering.
Updating brightness or calibration coefficients¶
- Brightness and regression tables live under
spectralbridge/data/brightness/and are loaded viabrightness_config. Changes here affect downstream cross-sensor harmonization. - Keep JSON schema and key names stable; update any dependent tests and documentation describing the coefficients.
- Validate against Landsat-referenced QA outputs to confirm calibrations remain within expected bounds.
Modifying QA outputs¶
- QA panels and JSON summaries are produced after merging outputs. Filenames such as
<flight_id>_qa.pngand<flight_id>_qa.jsonare assumed by docs and CI. - If adding metrics or changing formats, ensure
_qa.pngand_qa.jsonremain available and update the QA tests undertests/test_qaaccordingly. - Maintain quick-mode rendering used in CI fixtures so drift checks continue to pass.
Relationship to scientific reproducibility¶
- The repository encodes the workflow described in the RSE manuscript; artifacts (ENVI, Parquet, QA) are the evidence trail for analyses.
- Centralized naming, stage ordering, and idempotent execution make runs auditable and repeatable across environments.
- Contributors are expected to preserve these invariants so published and future analyses can be reproduced from the same on-disk products.