Skip to content

cross-sensor-cal

Schemas

Schemas¶

When do I need this? When validating outputs or writing downstream tooling that expects consistent columns and metadata.

Purpose¶

Document the shape of artifacts emitted by Stage 5 (Parquet export) and Stage 6 (merge) so you can trust what Outputs deliver.

Inputs¶

Schema JSON files bundled with the project (see schemas/ in the repo)
Sample Parquet files from Parquet export or Merge

Outputs¶

Validation reports confirming column presence, dtypes, and metadata blocks for ENVI-derived tables.

Run it¶

python scripts/validate_schema.py parquet/demo_brdfandtopo_corrected_envi.parquet schemas/parquet_brdfandtopo.json

import json
import pyarrow.parquet as pq

with open("schemas/parquet_brdfandtopo.json", "r", encoding="utf-8") as fp:
    spec = json.load(fp)
meta = pq.read_table("parquet/demo_brdfandtopo_corrected_envi.parquet")
missing = set(spec["columns"]) - set(meta.schema.names)
print(f"Missing columns: {sorted(missing)}")

Pitfalls¶

Always match schema files to the correct stage; merged tables include joined metadata absent in Stage 5 outputs.
Case-sensitive column names can fail equality checks—normalize before comparing.
When adding sensors, update both the schema and the Troubleshooting page to reflect new failure modes.