Package Architecture¶

This page describes how SpectralBridge is organized internally. Understanding this structure helps contributors extend the pipeline or integrate new sensors without breaking guarantees.

Design philosophy and invariants¶

Reproducibility first. Pipeline behavior is predictable and restart-safe; stages skip when valid outputs already exist instead of recomputing.
Fixed ordering. process_one_flightline and go_forth_and_multiply orchestrate the same sequence: ENVI export → BRDF/topographic parameter build → BRDF+topo correction → sensor convolution/resampling → Parquet exports → DuckDB merge → QA panels.
File contracts. FlightlinePaths centralizes filenames and directories; stages communicate only through these artifacts. Downstream docs and CI treat the merged Parquet and QA products as required outputs.
Outputs are the API. Functions return little; correctness is expressed through on-disk ENVI/Parquet and QA files that can be inspected or reused.

Pipeline architecture (high level)¶

Stages are pure file transforms. Each consumes a known input set and writes ENVI, Parquet, and JSON/PNG sidecars. Stages do not mutate shared state.
Orchestration happens in pipelines/pipeline.py via process_one_flightline (single flightline) and go_forth_and_multiply (batch). Both rely on FlightlinePaths to resolve paths and naming before delegating work.
Communication between stages is file-based. The BRDF+topo-corrected ENVI is always the source for sensor resampling; Parquet exports derive from both raw and corrected ENVI; the merged Parquet and QA summaries consume all earlier artifacts.
Restart safety is achieved because each stage validates its outputs and returns early when they already exist. Partial runs can be resumed without recomputation or corrupting prior files.

Directory structure (Python package)¶

spectralbridge/pipelines/: orchestration entry points and Ray helpers
spectralbridge/exports/: ENVI export helpers
spectralbridge/io/: schema and I/O helpers (e.g., NEON schema resolution)
spectralbridge/utils/: shared utilities, naming/path helpers, memory management
spectralbridge/data/: spectral metadata and calibration tables
landsat_band_parameters.json: band centers/FWHM used for resampling
brightness/*.json: brightness adjustments between Landsat and MicaSense
hyperspectral_bands.json: reference metadata for hyperspectral inputs
spectralbridge/qa_plots.py and spectralbridge/sensor_panel_plots.py: QA visualization utilities
spectralbridge/standard_resample.py: spectral resampling and coefficients

Extending the system safely¶

Adding or modifying a target sensor¶

Update spectral definitions in spectralbridge/data/landsat_band_parameters.json (or analogous table for the new sensor) and ensure resampling logic in standard_resample.py knows how to consume them.
Confirm get_flightline_products and FlightlinePaths generate filenames for the new sensor; outputs must still include merged Parquet and QA artifacts.
Add tests that validate band definitions and resampled outputs; do not bypass the existing stage ordering.

Updating brightness or calibration coefficients¶

Brightness and regression tables live under spectralbridge/data/brightness/ and are loaded via brightness_config. Changes here affect downstream cross-sensor harmonization.
Keep JSON schema and key names stable; update any dependent tests and documentation describing the coefficients.
Validate against Landsat-referenced QA outputs to confirm calibrations remain within expected bounds.

Modifying QA outputs¶

QA panels and JSON summaries are produced after merging outputs. Filenames such as <flight_id>_qa.png and <flight_id>_qa.json are assumed by docs and CI.
If adding metrics or changing formats, ensure _qa.png and _qa.json remain available and update the QA tests under tests/test_qa accordingly.
Maintain quick-mode rendering used in CI fixtures so drift checks continue to pass.

Relationship to scientific reproducibility¶

The repository encodes the workflow described in the RSE manuscript; artifacts (ENVI, Parquet, QA) are the evidence trail for analyses.
Centralized naming, stage ordering, and idempotent execution make runs auditable and repeatable across environments.
Contributors are expected to preserve these invariants so published and future analyses can be reproduced from the same on-disk products.

Package Architecture¶

Design philosophy and invariants¶

Pipeline architecture (high level)¶

Directory structure (Python package)¶

Extending the system safely¶

Adding or modifying a target sensor¶

Updating brightness or calibration coefficients¶

Modifying QA outputs¶

Relationship to scientific reproducibility¶

Next steps¶