Configuration¶

Most users can rely on default configuration, but the pipeline allows fine-grained control over processing stages, performance, and file handling.

This page documents the available configuration parameters and how they affect pipeline behavior.

Configuration sources¶

Command-line options (highest priority)
Environment variables
Default internal settings

Common settings¶

`base-folder`¶

Directory where:

downloaded HDF5 tiles
ENVI exports
Parquet tables
QA artifacts

are written.

`engine`¶

Execution backend:

thread (default)
ray (distributed/hyperparallel workflows)

`max-workers`¶

Controls concurrency in:

ENVI export
BRDF and topo correction
Parquet extraction

Use cautiously when memory is limited.

`start-at` and `end-at`¶

Define subsets of the pipeline to run.

Example:

```bash --start-at brdf --end-at convolution Environment variables VariableMeaning CSCAL_TMPDIROverride temporary directory CSCAL_LOGLEVELSet logging verbosity CSCAL_RAY_ADDRESSUse an existing Ray cluster Advanced configuration These settings primarily matter for large-scale workflows: chunk sizes for Parquet extraction memory thresholds for Ray worker processes default CRS assignments sensor SRF paths Details of internal architecture appear in the Developer section. Next steps JSON schemas Validation metrics Pipeline stages

Configuration¶

Configuration sources¶

Common settings¶

base-folder¶

engine¶

max-workers¶

start-at and end-at¶

`base-folder`¶

`engine`¶

`max-workers`¶

`start-at` and `end-at`¶