Platform

Orchestration

How the backend persists configuration, plans deployments, and orchestrates rollouts across a distributed fleet.

Bootctrl architecture overview

Design intent

Use this lens when implementing Orchestration across a fleet: define clear boundaries, make change snapshot-based, and keep operational signals observable.

  • Orchestration is a state machine with explicit transitions
  • Reconciliation (desired vs observed) is how fleets stay consistent
  • Idempotent retries prevent partial rollouts from getting stuck

What it is

The backend is the control plane: it persists configuration and versions designs, and orchestrates deployments to edge devices.

Design constraints

  • Orchestration is a state machine with explicit transitions
  • Reconciliation (desired vs observed) is how fleets stay consistent
  • Idempotent retries prevent partial rollouts from getting stuck

Architecture at a glance

  • Endpoints (protocol sessions) → points (signals) → mappings (typed bindings) → control app ports
  • Adapters isolate variable-latency protocol work from deterministic control execution paths
  • Validation and data-quality checks sit between “connected” and “correct”
  • This is a UI + backend + edge concern: changes affect real-world actuation

Typical workflow

  • Define endpoints and point templates (units, scaling, ownership)
  • Bind points to app ports and validate types/limits
  • Commission using a canary device and verify data quality (staleness/range)
  • Roll out with rate limits and monitoring for flaps and errors

System boundary

Treat Orchestration as a repeatable interface between engineering intent (design) and runtime reality (deployments + signals). Keep site-specific details configurable so the same design scales across sites.

Example artifact

I/O mapping table (conceptual)

point_name, protocol, address, type, unit, scale, direction, owner
pump_speed, modbus, 40021, REAL, rpm, 0.1, read, device:pump-1
valve_cmd,  modbus, 00013, BOOL, -,   -,   write, app:fb-network

Why it matters

  • Fleet-wide consistency for deployments and configuration
  • Automated rollout/rollback across sites
  • Single source of truth for audit and compliance

Engineering outcomes

  • Orchestration is a state machine with explicit transitions
  • Reconciliation (desired vs observed) is how fleets stay consistent
  • Idempotent retries prevent partial rollouts from getting stuck

Quick acceptance checks

  • Define a deployment state machine (plan → deploy → verify → complete)
  • Store desired state and reconcile against observed device state

Common failure modes

  • Units/scaling mismatch (values look “reasonable” but are wrong)
  • Swapped addresses/endianness/encoding issues that only show under load
  • Staleness: values stop changing but connectivity stays “green”
  • Write conflicts from unclear single-writer ownership

Acceptance tests

  • Step input values and verify expected output actuation (end-to-end)
  • Inject stale/noisy values and confirm guards flag or suppress them
  • Confirm single-writer ownership with a write-conflict test
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

In the platform

  • Stores device/resource registry and application models
  • Plans deployments from a snapshot to a target fleet
  • Tracks rollout status and failures

Implementation checklist

  • Define a deployment state machine (plan → deploy → verify → complete)
  • Store desired state and reconcile against observed device state
  • Track failures with categories (runtime/adapters/network/config)
  • Automate rollback when health gates fail during rollout

Rollout guidance

  • Start with a canary site that matches real conditions
  • Use health + telemetry gates; stop expansion on regressions
  • Keep rollback to a known-good snapshot fast and rehearsed

Acceptance tests

  • Step input values and verify expected output actuation (end-to-end)
  • Inject stale/noisy values and confirm guards flag or suppress them
  • Confirm single-writer ownership with a write-conflict test
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

Deep dive

Practical next steps

How teams typically apply this in real deployments.

Key takeaways

  • Orchestration is a state machine with explicit transitions
  • Reconciliation (desired vs observed) is how fleets stay consistent
  • Idempotent retries prevent partial rollouts from getting stuck

Checklist

  • Define a deployment state machine (plan → deploy → verify → complete)
  • Store desired state and reconcile against observed device state
  • Track failures with categories (runtime/adapters/network/config)
  • Automate rollback when health gates fail during rollout

Deep dive

Common questions

Quick answers that help during commissioning and operations.

What does orchestration need to record for audits?

Who initiated a change, which snapshot was deployed, which targets were affected, the rollout timeline, and any health/telemetry outcomes.

How do we keep orchestration safe?

Use immutable artifacts, staged rollouts, and explicit gates. Avoid “push latest everywhere”.

What is the most common orchestration failure mode?

Partial rollouts with unclear state. Make transitions explicit and ensure retries are idempotent.