Platform

Versioning

How the backend treats control + configuration as versioned artifacts, enabling reproducible deployments across fleets and environments.

Bootctrl architecture overview

Design intent

Use this lens when implementing Versioning across a fleet: define clear boundaries, make change snapshot-based, and keep operational signals observable.

  • Central versioning prevents drift and supports audits
  • Diffs attached to promotion decisions reduce risk
  • Rollback must be a first-class workflow, not an emergency hack

What it is

Cloud versioning is the authoritative record of “what should run where”: snapshots, releases, and their deployment history across a fleet.

Release workflow

  • Create snapshot → review/approve → promote → deploy to a target fleet
  • Track rollout status and associate failures with the release
  • Roll back by selecting the previous known-good snapshot

Design constraints

  • Central versioning prevents drift and supports audits
  • Diffs attached to promotion decisions reduce risk
  • Rollback must be a first-class workflow, not an emergency hack

Architecture at a glance

  • Artifacts: draft → snapshot → deployment (immutable references)
  • Control plane stores versions and deployment intent; edge reports actual state
  • Telemetry/events are correlated to snapshot + deployment identifiers
  • This is a UI + backend + edge concern: engineering, operations, and audits all depend on it

Typical workflow

  • Iterate in draft → cut a snapshot when ready for field validation
  • Promote through environments (dev → staging → production)
  • Canary deploy → monitor → expand; keep previous snapshot ready
  • After rollout: attach incident notes/diffs for future forensics

System boundary

Treat Versioning as a repeatable interface between engineering intent (design) and runtime reality (deployments + signals). Keep site-specific details configurable so the same design scales across sites.

How it relates to snapshots

It means the backend can always answer which version should run where and can provide the exact artifact/config to reproduce it.

Example artifact

Release flow (conceptual)

draft: ui-edit-session
snapshot: snap_2025-12-28_15-04
promotion:
  - env: staging
    checks: [health_green, telemetry_ok, acceptance_passed]
  - env: production
    rollout: canary(1) -> batch(10) -> fleet
rollback:
  target_snapshot: snap_2025-12-10_09-22

Why it matters

  • Ensures deployments are reproducible and auditable
  • Supports safe rollback and staged releases
  • Prevents configuration drift across sites

Engineering outcomes

  • Central versioning prevents drift and supports audits
  • Diffs attached to promotion decisions reduce risk
  • Rollback must be a first-class workflow, not an emergency hack

Quick acceptance checks

  • Use one authoritative source of truth for versions and promotion
  • Require diffs/change history for every promoted snapshot

Common failure modes

  • Deploying from mutable drafts (cannot reproduce behavior later)
  • Rollback ambiguity (no known-good snapshot selected)
  • Promotion bypass (prod changes without staging validation)
  • Drift: desired vs actual versions across the fleet

Acceptance tests

  • Deploy the same snapshot twice and verify identical behavior/signals
  • Roll back to previous snapshot and verify forensics are preserved
  • Verify promotion gates are enforced (no direct prod changes)
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

In the platform

  • Snapshot promotion between environments
  • Deployment plans tied to version identifiers
  • Diffing and change history for accountability

What versioning protects you from

  • “Works on my machine/site” drift
  • Untracked edits and mystery state changes
  • Irreproducible deployments that can’t be audited or rolled back

Implementation checklist

  • Use one authoritative source of truth for versions and promotion
  • Require diffs/change history for every promoted snapshot
  • Keep environment separation explicit (dev/staging/prod)
  • Make rollback a first-class action with recorded outcomes

Rollout guidance

  • Start with a canary site that matches real conditions
  • Use health + telemetry gates; stop expansion on regressions
  • Keep rollback to a known-good snapshot fast and rehearsed

Acceptance tests

  • Deploy the same snapshot twice and verify identical behavior/signals
  • Roll back to previous snapshot and verify forensics are preserved
  • Verify promotion gates are enforced (no direct prod changes)
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

Deep dive

Practical next steps

How teams typically apply this in real deployments.

Key takeaways

  • Central versioning prevents drift and supports audits
  • Diffs attached to promotion decisions reduce risk
  • Rollback must be a first-class workflow, not an emergency hack

Checklist

  • Use one authoritative source of truth for versions and promotion
  • Require diffs/change history for every promoted snapshot
  • Keep environment separation explicit (dev/staging/prod)
  • Make rollback a first-class action with recorded outcomes

Deep dive

Common questions

Quick answers that help during commissioning and operations.

What does “authoritative” versioning mean?

It means the backend can always answer which version should run where and can provide the exact artifact/config to reproduce it.

How do we prevent drift across the fleet?

Deploy from snapshots, reconcile desired vs actual, and alert on drift. Avoid manual one-off edits that bypass versioning.

What should change history include?

Who changed what, when, why (optional), and the diff. It should be possible to reconstruct the release story from the log.