Skip to content

Schema reference

The canonical schema lives at schema/helios-provenance-v0.1.json and is shipped inside the wheel at helios_provenance/_schema/helios-provenance-v0.1.json.

It is a JSON Schema 2020-12 document with four top-level record types under a oneOf discriminator. All four extend a common HeliosProvenanceRecord base and reject extra properties.

Common base

Every record carries:

Field Type Required Notes
id string (1-256 chars) Convention: helios:<type>:<source>:<localpart>.
record_type enum One of the four type names below.
schema_version const "0.1.0" Pinned to this spec version.
created_at RFC-3339 timestamp When this provenance record was created.
agent Agent object Who/what created the record.

The Agent sub-object:

Field Type Required
id string
name string
type software | service | person | organization
version string optional

HeliosDatasetRecord

Dataset-level metadata for one upstream space-weather data source. Crosswalkable to SPASE 2.7.1 NumericalData / Catalog resources.

Field Type Required Notes
source string Short label (e.g. "NASA-DONKI", "CCMC-SEP-Scoreboard-A").
mission string optional Mission/programme.
instrument string optional Instrument/product.
format string MIME or short token (e.g. "application/json", "IONEX").
temporal_coverage TemporalCoverage {start, stop?, cadence?}
spatial_coverage SpatialCoverage optional {frame?, region?, bbox?, point?}
doi string (DOI pattern) optional 10.x/y form.
source_url URI Where this dataset was fetched from.
license string optional SPDX identifier or short label.
ingestion_timestamp timestamp When the HELIOS ingestion tier fetched it.
spase_resource_id spase://... URI optional Anchor the SPASE crosswalk.

HeliosModelOutputRecord

A single output value from an upstream model (or measurement) at one timestamp and optional location.

Field Type Required Notes
model_id string Short model/source label.
model_version string Or "unspecified".
dataset_refs array of IDs (≥1) The HeliosDatasetRecords this output came from.
timestamp timestamp Valid time of this value.
location SpatialCoverage optional For spatially-resolved values.
value number/string/bool The output.
value_units string UDUNITS-compatible (e.g. "pfu", "TECU", "1").
confidence_interval ConfidenceInterval optional {lower, upper, alpha, method?}
ingestion_timestamp timestamp
extra object optional Source-specific metadata.

HeliosTransformationRecord

A transformation applied during fusion — calibration, BMA averaging, conformal interval, scaling, filter. Maps to W3C PROV Activity.

Field Type Required Notes
type enum calibration | bma | conformal | scaling | filter | other.
parameters object Free-form. Suggested keys: method, fitted_on, hyperparameters.
code_ref string URI / git permalink to the implementing code.
input_refs array of IDs (≥1) Records consumed.
output_refs array of IDs (≥1) Records produced.

HeliosFusedOutputRecord

The headline contribution: a single fused output value with full feature-level lineage.

Field Type Required Notes
prediction_target string E.g. "sep_all_clear_revocation".
timestamp timestamp Valid time.
location SpatialCoverage optional For spatially-resolved fused outputs.
value number The fused value.
value_units string
conformal_interval ConformalInterval {lower, upper, alpha, method, calibration_set_size?}
lineage array of LineageStep (≥1) Ordered. Order is causally significant.
provenance_chain_hash 64-char hex SHA-256 of canonicalised lineage.

A LineageStep:

Field Type Required
transformation_ref ID
input_refs array of IDs (≥1)
output_refs array of IDs (≥1)
weight float in [0, 1] optional
notes string optional

Validation

bash helios-provenance-validate path/to/record.json

or in code:

python from helios_provenance import HeliosProvenanceValidator v = HeliosProvenanceValidator() errors = v.errors(record_dict) # list of jsonschema.ValidationError (empty == valid)

The validator uses jsonschema.FormatChecker so format: date-time and format: uri keywords are enforced, not just annotated.