Platform
Connectivity
How the edge layer connects to real equipment using protocol adapters and robust connection management.
Design intent
Use this lens when implementing Connectivity across a fleet: define clear boundaries, make change snapshot-based, and keep operational signals observable.
- Stability comes from correct timeouts/backoff and realistic polling rates
- Data quality checks catch subtle protocol/encoding issues
- Single-writer ownership reduces unsafe write conflicts
What it is
Connectivity is the bridge to the physical world: protocol drivers, device sessions, and mapping signals into a stable I/O model used by control logic.
Data flow
- Device/protocol session (e.g., Modbus/OPC UA) → point mapping layer → runtime inputs/outputs
- Runtime outputs → point mapping → protocol writes
- Connection + error state → health events and diagnostics
Design principles
- Stable point identifiers so control logic doesn’t depend on vendor-specific addressing
- Explicit scaling/units and validation to prevent subtle “works but wrong” behavior
- Backpressure and rate limits to protect devices from overload
Design constraints
- Stability comes from correct timeouts/backoff and realistic polling rates
- Data quality checks catch subtle protocol/encoding issues
- Single-writer ownership reduces unsafe write conflicts
Architecture at a glance
- Endpoints (protocol sessions) → points (signals) → mappings (typed bindings) → control app ports
- Adapters isolate variable-latency protocol work from deterministic control execution paths
- Validation and data-quality checks sit between “connected” and “correct”
- This is a UI + backend + edge concern: changes affect real-world actuation
Typical workflow
- Define endpoints and point templates (units, scaling, ownership)
- Bind points to app ports and validate types/limits
- Commission using a canary device and verify data quality (staleness/range)
- Roll out with rate limits and monitoring for flaps and errors
System boundary
Treat Connectivity as a repeatable interface between engineering intent (design) and runtime reality (deployments + signals). Keep site-specific details configurable so the same design scales across sites.
Example artifact
Implementation notes (conceptual)
topic: Connectivity
plan: define -> snapshot -> canary -> expand
signals: health + telemetry + events tied to version
rollback: select known-good snapshotWhy it matters
- Integrates existing assets without re-platforming everything
- Isolates protocol complexity away from control logic
- Improves resilience through reconnect/backoff strategies
Engineering outcomes
- Stability comes from correct timeouts/backoff and realistic polling rates
- Data quality checks catch subtle protocol/encoding issues
- Single-writer ownership reduces unsafe write conflicts
Quick acceptance checks
- Validate endpoint settings (timeouts, retries, backoff) per protocol
- Ensure scaling/units are explicit and tested for critical points
Common failure modes
- Session flapping from aggressive polling or device session limits
- Timeout/backoff misconfiguration creating retry storms
- Backpressure issues: buffers fill, telemetry drops, or adapters stall
- Partial outages that create inconsistent, stale, or delayed signals
Acceptance tests
- Simulate network loss and verify reconnect/backoff behavior
- Load test polling rates and confirm devices are not overloaded
- Confirm store-and-forward covers expected outage windows
- Verify the deployed snapshot/version matches intent (no drift)
- Run a canary validation: behavior, health, and telemetry align with expectations
- Verify rollback works and restores known-good behavior
In the platform
- Protocol adapters (e.g., Modbus, OPC UA, MQTT)
- Connection lifecycle and health signals
- Consistent point mapping into runtime inputs/outputs
Common failure modes
- Connection flapping due to network quality or device limits
- Protocol-level errors (timeouts, invalid responses, permissions)
- Data quality issues (scaling, endian/encoding mismatches, stale values)
- Write conflicts when multiple systems attempt to control the same register/tag
What to monitor
- Connection uptime, reconnect frequency, and backoff patterns
- Read/write error rates by protocol endpoint and device
- Latency distributions (p50/p95) for reads/writes
- Staleness and out-of-range checks on critical points
Implementation checklist
- Validate endpoint settings (timeouts, retries, backoff) per protocol
- Ensure scaling/units are explicit and tested for critical points
- Set rate limits/backpressure so devices aren’t overloaded
- Monitor connection flaps and error rates per endpoint/device
Rollout guidance
- Start with a canary site that matches real conditions
- Use health + telemetry gates; stop expansion on regressions
- Keep rollback to a known-good snapshot fast and rehearsed
Acceptance tests
- Simulate network loss and verify reconnect/backoff behavior
- Load test polling rates and confirm devices are not overloaded
- Confirm store-and-forward covers expected outage windows
- Verify the deployed snapshot/version matches intent (no drift)
- Run a canary validation: behavior, health, and telemetry align with expectations
- Verify rollback works and restores known-good behavior
Deep dive
Practical next steps
How teams typically apply this in real deployments.
Key takeaways
- Stability comes from correct timeouts/backoff and realistic polling rates
- Data quality checks catch subtle protocol/encoding issues
- Single-writer ownership reduces unsafe write conflicts
Checklist
- Validate endpoint settings (timeouts, retries, backoff) per protocol
- Ensure scaling/units are explicit and tested for critical points
- Set rate limits/backpressure so devices aren’t overloaded
- Monitor connection flaps and error rates per endpoint/device
Next steps
Related topics
Deep dive
Common questions
Quick answers that help during commissioning and operations.
Why does connectivity “flap”?
Common causes include network instability, device session limits, aggressive polling, or timeout/backoff misconfiguration. Track reconnect frequency and error codes per endpoint.
How do we prevent subtle data quality bugs?
Make scaling/units explicit, validate encoding/endian assumptions, and add staleness/out-of-range checks so “plausible but wrong” values get flagged.
What is the safest write pattern?
Single-writer ownership, explicit write intent, and guardrails (limits, interlocks). Avoid multiple systems writing the same register/tag.