HELIOS Fusion Engine — Pre-Registration¶
File this on the Open Science Framework BEFORE running hold-out evaluation on Table 3-1.
The OSF URL must be present in orchestration/osf_preregistration.url for the kill-gate runner to proceed.
1. Title¶
Pre-registered validation of the HELIOS fusion engine: Bayesian Model Averaging with isotonic-regression reliability calibration and conformal prediction across SEP Scoreboards A/B/C for NASA mission-operations all-clear revocation forecasting.
2. Investigators¶
- Thomas Waweru, Principal Investigator, 577 Industries Inc.
- [Senior ML Engineer name]
- [Space-Weather / Ionospheric Physics SME name]
3. Pre-registration date and version¶
- Date filed (UTC ISO 8601): TO_BE_FILLED_AT_FILING_TIME
- OSF DOI: TO_BE_FILLED_AT_FILING_TIME
- This template version: 1.0
4. Hypotheses¶
We pre-register the following directional hypotheses:
H1: Fused all-clear-revocation HSS on the held-out events exceeds the HSS of the best single component model on the same hold-out events by at least 15% (relative).
H2: Reliability-diagram slope (linear regression of observed vs. predicted probability) is within 0.15 of 1.0 across all three Kp severity strata (quiet: Kp 0-3; moderate: Kp 4-6; extreme: Kp 7-9).
H3 (secondary, not gate-determining): Fused Brier score and CRPS on the held-out events improve over the best single component model on the same hold-out events.
5. Data¶
Training events (locked, no augmentation, no resampling outside SMOTE within training):
- Bastille Day — 2000-07-14
- Halloween storms — 2003-10-28 through 2003-11-04
- Mid-cycle 23 — 2005-01-20
- Late cycle 23 — 2006-12-13
- Cycle 24 onset — 2012-03-07
- Cycle 24 mid — 2012-05-17
- September 2017 storm — 2017-09-06 and 2017-09-10
Hold-out events (locked, sealed until after model freeze and OSF filing confirmation):
- Cycle 25 onset — 2022-01-20 (M5.5)
- Mid-cycle 25 — 2023-02-17 (X2.2)
- Gannon — 2024-05-11 (G5)
Component models (the BMA averages across these; "best component" baseline is whichever individual model scores highest on the hold-out by HSS, computed identically):
- UMASEP
- HESPERIA REleASE
- SEPMOD
- MagPy
- SEP Scoreboard A (consensus onset probability)
- SEP Scoreboard B (consensus peak flux prediction)
- SEP Scoreboard C (event time profile)
All component-model outputs are pulled from public sources via helios-spaceweather-connectors. No reprocessing; if a component model's output is unavailable for a given event, that component is excluded from BMA for that event only, with the exclusion recorded.
6. Metrics¶
Computed per CCMC's standard validation conventions [11, 13]:
- HSS (Heidke Skill Score) — primary metric for H1
- Reliability-diagram slope — primary metric for H2
- Brier score — secondary
- CRPS (continuous ranked probability score) — secondary
- TSS (True Skill Statistic) — reported
- POD (Probability of Detection) — reported
- FAR (False Alarm Ratio) — reported
All metrics report point estimates and bootstrapped 95% confidence intervals (1000 resamples with replacement over hold-out event-windows).
7. Severity strata¶
Kp-binned at the time of the prediction window:
- Quiet: Kp 0-3
- Moderate: Kp 4-6
- Extreme: Kp 7-9
All metrics are reported per-stratum AND aggregated.
8. Pre-registration discipline¶
- Model freeze date (date after which no hyperparameter changes, feature additions, or architecture revisions are permitted before hold-out evaluation): TO_BE_FILLED
- Hold-out evaluation date: TO_BE_FILLED (must be after OSF filing date)
- All code at evaluation time is tagged with git commit SHA
TO_BE_FILLEDin thehelios-fusion-enginerepo. - The
orchestration/kill_gate.pyscript's output is committed tohelios-program/results/<date>-killgate.jsonand is the single source of truth for whether the kill-gate passed.
9. Decision rules¶
Computed AFTER hold-out evaluation; no post-hoc re-baselining.
- PASS both H1 and H2 → file full arXiv preprint to astro-ph.SR with cs.LG cross-list. Companion footnote in
companion.mdpopulated with arXiv URL within 7 days of submission. - PASS one, FAIL one → file an "honest ablation" preprint reporting the failing dimension, with explicit pre-registration reference. Useful for the community even though it isn't the headline result.
- FAIL both → do not publish. Ship
helios-fusion-enginev0.1.0 with the negative result documented in a notebook and a clearly-labeled section of the README.
Deviations from the pre-registration (e.g., a component model becomes unavailable, a hold-out event is rescoped due to data-availability issues) are documented as deviations in the final report; they are not retroactive corrections to the pre-registration.
10. References¶
[11] Whitman, K. et al., Tools Used in Space Radiation Operations, NASA Technical Reports (2023). [13] Whitman, K. et al., NASA's Ongoing SEP Model Validation Effort Driving an Effective R2O2R Process, COSPAR (2024). [31] Vovk, V., Gammerman, A., Shafer, G., Algorithmic Learning in a Random World (Conformal Prediction), 2nd ed., Springer (2022).
This template is the starting point. Before filing on OSF:
1. Fill in TO_BE_FILLED fields
2. Confirm component-model list against actual data availability per helios-spaceweather-connectors adapters at the time
3. Have the SME consultant review the metric definitions
4. File. Save the OSF DOI/URL to orchestration/osf_preregistration.url.
5. Tag the helios-fusion-engine repo with prereg-v1.0 at the locked commit.
Only then does the kill-gate runner permit invocation.