Platform

Connectivity

How the edge layer connects to real equipment using protocol adapters and robust connection management.

Bootctrl architecture overview

Design intent

Use this lens when implementing Connectivity across a fleet: define clear boundaries, make change snapshot-based, and keep operational signals observable.

  • Stability comes from correct timeouts/backoff and realistic polling rates
  • Data quality checks catch subtle protocol/encoding issues
  • Single-writer ownership reduces unsafe write conflicts

What it is

Connectivity is the bridge to the physical world: protocol drivers, device sessions, and mapping signals into a stable I/O model used by control logic.

Data flow

  • Device/protocol session (e.g., Modbus/OPC UA) → point mapping layer → runtime inputs/outputs
  • Runtime outputs → point mapping → protocol writes
  • Connection + error state → health events and diagnostics

Design principles

  • Stable point identifiers so control logic doesn’t depend on vendor-specific addressing
  • Explicit scaling/units and validation to prevent subtle “works but wrong” behavior
  • Backpressure and rate limits to protect devices from overload

Design constraints

  • Stability comes from correct timeouts/backoff and realistic polling rates
  • Data quality checks catch subtle protocol/encoding issues
  • Single-writer ownership reduces unsafe write conflicts

Architecture at a glance

  • Endpoints (protocol sessions) → points (signals) → mappings (typed bindings) → control app ports
  • Adapters isolate variable-latency protocol work from deterministic control execution paths
  • Validation and data-quality checks sit between “connected” and “correct”
  • This is a UI + backend + edge concern: changes affect real-world actuation

Typical workflow

  • Define endpoints and point templates (units, scaling, ownership)
  • Bind points to app ports and validate types/limits
  • Commission using a canary device and verify data quality (staleness/range)
  • Roll out with rate limits and monitoring for flaps and errors

System boundary

Treat Connectivity as a repeatable interface between engineering intent (design) and runtime reality (deployments + signals). Keep site-specific details configurable so the same design scales across sites.

Example artifact

Implementation notes (conceptual)

topic: Connectivity
plan: define -> snapshot -> canary -> expand
signals: health + telemetry + events tied to version
rollback: select known-good snapshot

Why it matters

  • Integrates existing assets without re-platforming everything
  • Isolates protocol complexity away from control logic
  • Improves resilience through reconnect/backoff strategies

Engineering outcomes

  • Stability comes from correct timeouts/backoff and realistic polling rates
  • Data quality checks catch subtle protocol/encoding issues
  • Single-writer ownership reduces unsafe write conflicts

Quick acceptance checks

  • Validate endpoint settings (timeouts, retries, backoff) per protocol
  • Ensure scaling/units are explicit and tested for critical points

Common failure modes

  • Session flapping from aggressive polling or device session limits
  • Timeout/backoff misconfiguration creating retry storms
  • Backpressure issues: buffers fill, telemetry drops, or adapters stall
  • Partial outages that create inconsistent, stale, or delayed signals

Acceptance tests

  • Simulate network loss and verify reconnect/backoff behavior
  • Load test polling rates and confirm devices are not overloaded
  • Confirm store-and-forward covers expected outage windows
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

In the platform

  • Protocol adapters (e.g., Modbus, OPC UA, MQTT)
  • Connection lifecycle and health signals
  • Consistent point mapping into runtime inputs/outputs

Common failure modes

  • Connection flapping due to network quality or device limits
  • Protocol-level errors (timeouts, invalid responses, permissions)
  • Data quality issues (scaling, endian/encoding mismatches, stale values)
  • Write conflicts when multiple systems attempt to control the same register/tag

What to monitor

  • Connection uptime, reconnect frequency, and backoff patterns
  • Read/write error rates by protocol endpoint and device
  • Latency distributions (p50/p95) for reads/writes
  • Staleness and out-of-range checks on critical points

Implementation checklist

  • Validate endpoint settings (timeouts, retries, backoff) per protocol
  • Ensure scaling/units are explicit and tested for critical points
  • Set rate limits/backpressure so devices aren’t overloaded
  • Monitor connection flaps and error rates per endpoint/device

Rollout guidance

  • Start with a canary site that matches real conditions
  • Use health + telemetry gates; stop expansion on regressions
  • Keep rollback to a known-good snapshot fast and rehearsed

Acceptance tests

  • Simulate network loss and verify reconnect/backoff behavior
  • Load test polling rates and confirm devices are not overloaded
  • Confirm store-and-forward covers expected outage windows
  • Verify the deployed snapshot/version matches intent (no drift)
  • Run a canary validation: behavior, health, and telemetry align with expectations
  • Verify rollback works and restores known-good behavior

Deep dive

Practical next steps

How teams typically apply this in real deployments.

Key takeaways

  • Stability comes from correct timeouts/backoff and realistic polling rates
  • Data quality checks catch subtle protocol/encoding issues
  • Single-writer ownership reduces unsafe write conflicts

Checklist

  • Validate endpoint settings (timeouts, retries, backoff) per protocol
  • Ensure scaling/units are explicit and tested for critical points
  • Set rate limits/backpressure so devices aren’t overloaded
  • Monitor connection flaps and error rates per endpoint/device

Deep dive

Common questions

Quick answers that help during commissioning and operations.

Why does connectivity “flap”?

Common causes include network instability, device session limits, aggressive polling, or timeout/backoff misconfiguration. Track reconnect frequency and error codes per endpoint.

How do we prevent subtle data quality bugs?

Make scaling/units explicit, validate encoding/endian assumptions, and add staleness/out-of-range checks so “plausible but wrong” values get flagged.

What is the safest write pattern?

Single-writer ownership, explicit write intent, and guardrails (limits, interlocks). Avoid multiple systems writing the same register/tag.