Skip to content

Schemas

When do I need this? When validating outputs or writing downstream tooling that expects consistent columns and metadata.

Purpose

Document the shape of artifacts emitted by Stage 5 (Parquet export) and Stage 6 (merge) so you can trust what Outputs deliver.

Inputs

  • Schema JSON files bundled with the project (see schemas/ in the repo)
  • Sample Parquet files from Parquet export or Merge

Outputs

Validation reports confirming column presence, dtypes, and metadata blocks for ENVI-derived tables.

Run it

python scripts/validate_schema.py parquet/demo_brdfandtopo_corrected_envi.parquet schemas/parquet_brdfandtopo.json
import json
import pyarrow.parquet as pq

with open("schemas/parquet_brdfandtopo.json", "r", encoding="utf-8") as fp:
    spec = json.load(fp)
meta = pq.read_table("parquet/demo_brdfandtopo_corrected_envi.parquet")
missing = set(spec["columns"]) - set(meta.schema.names)
print(f"Missing columns: {sorted(missing)}")

Pitfalls

  • Always match schema files to the correct stage; merged tables include joined metadata absent in Stage 5 outputs.
  • Case-sensitive column names can fail equality checks—normalize before comparing.
  • When adding sensors, update both the schema and the Troubleshooting page to reflect new failure modes.