§ 03 · Read

System architecture

How the product composes. The on-box vs off-box seam, components, data flow, segmentation, and trust model.

View as .md

Frame

Conversational Factory is the product. It is the witness, the historian, the MCP server, the operator interfaces, and the end-to-end packaging that ships to a customer site and lets them talk to a packing line.

It is built on the Industrial Independence Architecture (IIA), which is a separate architectural specification governing how a sovereign-unit-per-zone industrial monitoring system is shaped. IIA is the principle. Conversational Factory is the conversational implementation of it.

On-Box vs Off-Box

The product crosses a structural seam that the IIA spec calls out explicitly: the box (the sovereign zone appliance) hosts data, capture, and the i3X surface; the conversational layer (MCP server, NL→i3X, answer composer) runs off-box — on the operator’s workstation, a broker box at broader scope, or a small adjunct host on the local network.

Why this seam exists:

  1. AI clients live where humans are. Claude Desktop on a workstation, ChatGPT in a browser, a local LLM on the GPU host. The gateway has to be near the AI client; the AI client is never on the appliance.
  2. The box has SRP discipline. Safety, Reliability, Performance govern the cell. LLM-adjacent code (model versions, prompt eval drift, token costs, vendor APIs, response variance) is IT-side complexity that doesn’t fit inside a 4 GB / 2-core / 1 TB sovereign appliance with signed-config-only mutation.
  3. No HTTP at the boundary. MCP’s canonical Streamable HTTP transport would violate IIA SL3 if hosted on the box. Stdio MCP doesn’t have this problem, but the larger argument (1) and (2) still apply.
  4. Independent scaling. One zone box per cell, but one workstation may serve a whole site. The gateway scales with users; the witness scales with zones.
  5. Audit cleanliness. Two ledgers — operator-side (gateway: “what was asked”) and box-side (witness: “what was retrieved”) — correlated by request_id. Different retention, different concerns.

Concretely:

ON-BOX (per IIA zone appliance)               OFF-BOX (workstation / broker / adjunct)
─────────────────────────────────             ────────────────────────────────────────
  services/witness/                              services/conversational-gateway/
    • Continuous capture                           • MCP server (stdio + HTTP)
    • Medallion lake (Iron→Bronze→Silver)          • NL → i3X translator
    • AssetDB (Postgres)                           • Answer composer + citations
    • RTDP mesh                                    • Operator-side audit chain
    • Operator console (HTML)                      • Read-only guardrails

  services/query-plane/                          AI CLIENT (Claude Desktop, browser,
    • i3X v1 over mTLS  ◄═══════════════════════   local LLM, MCP-aware tool)
    • Witness adapter
    • Capability flagging                        speaks → MCP / structured query

  services/dpi/  (planned)                       reaches the box via:
    • marlinspike-dpi                              • i3X over mTLS (preferred)
                                                   • Zenoh queryables (alt profile)

  Inter-box mesh, in-flight bus, attestation     Operator-side audit:
  observers, signed-config applier — all on-box.   gateway-audit.jsonl, syncable up.

The architectural seam is i3X over mTLS. Everything left of the seam (witness, query plane, DPI) runs on the appliance. Everything right of the seam (conversational gateway, AI client) runs off-box.

The Layered Stack

                   CUSTOMER / OPERATOR / AI CLIENT

                              │  natural language, MCP tools

                   ┌──────────────────────────────┐
                   │  Conversational Gateway      │  OFF-BOX
                   │  • MCP server (stdio+HTTP)   │  Runs on workstation / broker /
                   │  • NL → i3X translator       │  adjunct host
                   │  • Answer composer           │
                   │  • Operator-side audit chain │
                   │  • Read-only guardrails      │
                   └──────────────┬───────────────┘
                                  │  i3X v1 over mTLS
                                  │  (the architectural seam)

                   ┌──────────────────────────────┐
                   │  i3X Query Plane             │  ON-BOX
                   │  • Object types              │  In-tree today; folds into
                   │  • Relationships             │  witness-rust eventually
                   │  • Current values + history  │
                   │  • Subscription stream       │
                   │  • Multi-source router       │
                   └──────────────┬───────────────┘


        ┌─────────────────────────────────────────────────┐
        │             THE WITNESS (eriswitness)            │
        │  Continuous capture → DPI → medallion lake →     │
        │  asset DB → operator console → mesh / RTDP       │
        │                                                  │
        │  Iron (pcapng)  ─►  Bronze (typed events)        │
        │       │                  │                       │
        │       │                  ▼                       │
        │       │            Silver (conversations,        │
        │       │             asset edges, topology)       │
        │       │                  │                       │
        │       └──────► AssetDB (Postgres, materialized)  │
        │                                                  │
        │  + 26 reference PCAP fixtures, distributed       │
        │    asset replication, forensic hold tiers        │
        └────────────────────┬─────────────────────────────┘

                             │  packets

                   ┌──────────────────────┐
                   │   SPAN port / TAP /   │
                   │   ARP relay / OVS     │
                   │   (passive only)      │
                   └──────────────────────┘


                       ACS DOMAIN
                  PLCs, HMIs, drives, switches

Components

Witness (services/witness/)

Symlinked to ~/eriswitness/. The Python implementation that already exists, runs in customer environments today, and supplies the substrate for everything above it.

What it provides:

  • Continuous capture via tshark ring-buffer on every monitored interface (real NIC, SPAN port, OVS bridge from a MiniNet zone container). 50 MB segments × 10-file ring, zero-gap rotation.
  • 34-protocol DPI covering OT (Modbus, DNP3, IEC 104, IEC 61850 GOOSE/SV/MMS, S7comm, PROFINET, BACnet, EtherNet/IP, OPC UA, HART-IP, FINS, EtherCAT, MRP, PRP, BSAP, CIP, CODESYS, FOX, GE EGD, GE SRTP, IO-Link, KNXnet, MELSEC, OMRON FINS, PCCC, ROC) and IT (DNS, DHCP, HTTP, TLS, SNMP, SSH, FTP, NTP, MQTT, AMQP, CoAP, MDNS, NBNS, RADIUS, RDP, LDAP, Kerberos, SMB, others) protocols.
  • Frame-level integrity (Stovetop): runt/oversized frame detection, FCS validation, padding entropy analysis for covert channels, DNP3 CRC validation.
  • Stateful L2 monitoring (Bilgepump): ARP spoof detection, VLAN hopping, STP root hijacking, rogue DHCP, identity conflicts, MAC flapping.
  • ICMP threat detection (ICMPeeker): redirect detection, covert tunnel entropy analysis, suspicious type flagging.
  • Medallion lake in DuckLake/Parquet: Iron pcapng → Bronze typed events (55 protocol STRUCT columns just for DPI conversations) → Silver conversations and asset edges → Gold dashboards.
  • Distributed AssetDB in Postgres: per-collector asset tables, RTDP replication across the mesh, MAC-primary identity, OUI vendor lookup, DNS hostname enrichment, CVE correlation, finding tracking, intervention workflow.
  • Multi-tenant hierarchy: org → site → zone → subzone, scoped scans, per-org Fernet encryption at rest, audit ledger, RBAC.
  • Forensic hold tiers: per-asset, per-zone, global. Targeted PCAP retention with zstd + Fernet encryption.
  • Server-rendered HTML operator console with 47 templates, no SPA, Three.js workspace viewer, Flask + Jinja2.
  • Continuous capture pipeline that writes Iron and computes Bronze + Silver inline, no separate batch step.
  • i3X surface with multi-source router (_i3x_router.py): AssetDB source for current values, DuckLake source for history, Historian source for OT signal time-series, Sparkplug source for MQTT/Sparkplug-B publishers.
  • Subscription stream (_i3x_subscriptions.py) for SSE-based change notifications.

Conversational Gateway (services/conversational-gateway/)

In-tree. The genuinely new work for this product.

Responsibilities:

  • MCP server exposing i3X verbs (list_objects, get_objects, get_values, get_history, subscribe, list_assets_in_zone, get_topology, get_findings, get_baseline_deviations) as MCP tools to AI clients.
  • NL → i3X translation. A natural-language query like “why did line 3 lose throughput last shift?” expands into a sequence of i3X calls: identify line 3’s elementId → fetch its components → range-query history for the relevant signals → fetch findings in the time window → compose.
  • Answer composer. Aggregates VQT history, asset metadata, topology, findings, and baseline deviations into LLM-readable context. Returns grounded answers with citations into the audit chain.
  • Audit binding. Every conversational query writes to the witness audit ledger so an operator can ask later “why did the AI tell you that?” and trace back through the i3X calls and the underlying lake reads.
  • Read-only guardrails. Architecturally enforced. The gateway has no path to a device write, no path to AssetDB mutation, no path to baseline modification. Read-only is a property of the call surface, not a prompt directive.

i3X Query Plane (services/query-plane/)

In-tree, currently the home of the canonical i3X v1 reference implementation in Rust. Consumes the witness’s i3X surface and re-exposes it under the v1 contract.

Responsibilities:

  • Object/relationship type catalog backed by schemas/i3x/v1/.
  • Address-space resolution (FQDN → element).
  • Multi-source dispatch (AssetDB, DuckLake history, polling historian, Sparkplug) — same router pattern as the Python witness.
  • /v1/info capability flags (query.history, subscribe.stream, etc.) flip based on which sources are reachable.

Data Flow

The product is read-only at every architectural seam.

  1. Wire → Iron. The witness captures every observable frame on the monitored interfaces into Iron pcapng segments. Capture mode (Full / DPI-only / Cleartext) selects fidelity. Capture is passive — no IP stack transmit on the ACS-facing interface.
  2. Iron → Bronze. Tshark dissection emits structured Bronze events: protocol transactions, asset observations, topology observations, parse anomalies, extracted artifacts. Bronze is ~27× smaller than Iron.
  3. Bronze → Silver. The witness’s silver pipeline correlates conversations, builds asset edges, computes traffic matrices, fingerprints devices, runs baselines. Silver is locally computed at every appliance from its own Bronze.
  4. Silver → AssetDB. data_lake.at(now) materialization. The current truth in Postgres, used by the operator console and the i3X surface for fast point-in-time reads.
  5. AssetDB / Lake → i3X. The witness’s multi-source router and the in-tree query plane expose the data as i3X v1 — namespaces, object types, current values, history, subscription streams.
  6. i3X → Conversational Gateway. The gateway translates natural-language queries into i3X calls, composes the responses, and serves the conversation through MCP or its own structured query API.
  7. Gateway → Audit chain. Every query, every i3X call, every composed answer is bound to the audit ledger.

Segmentation Model

This is governed by IIA. Conversational Factory inherits the model:

  • One box per zone. Each zone appliance is sovereign and complete for its scope. The witness, the lake, the asset DB, the query plane, and the conversational gateway all run on every appliance.
  • Box-internal partitioning. INBOUND (passive collectors, witness, lake, IDS), INTERNAL DMZ (transient message bus, audit chain head publisher), OUTBOUND (edge publisher, structured query API on mTLS, outbound tunnel agent). Default-deny conduits between zones, mTLS at every internal hop.
  • Hierarchy. Cloud (optional) → Site → Zone → Subzone → Collector. Data flows upward (Bronze + mesh state always; targeted Iron under forensic hold). Commands flow downward through the mesh (Riptide).
  • Inter-zone visibility. Profile-mediated: Sparkplug B at L1/L2, OPC UA pub/sub, mTLS structured query, Iceberg/Delta batch, depending on the level. The architecture is profile-agnostic; deployments select.
  • Mesh discovery. ARP knock (Undertow L2) for same-segment peers; ICMP (Riptide L3) for cross-subnet command channel; Historian HTTPS/QUIC for bulk Iron transport. Each fails independently.

Trust and Security Model

  • SRP inside the cell, CIA outside. Safety, Reliability, Performance govern the ACS data plane. Confidentiality, Integrity, Availability govern information at the boundary and above.
  • No HTTP at the boundary in either direction. No HTTP listener on the ACS or IT NIC. No outbound HTTP/HTTPS — no registry pulls, no rule-feed updates, no telemetry, no CRL/OCSP. Updates arrive via signed bundles in OS updates or mTLS-tunneled deltas.
  • Configuration is a signed artifact. No live mutation API. The management UI is a text generator; the parser is the trust boundary; the applier executes a gated internal call set, then exits. A configuration attestation observer cross-checks running state against the staged artifact and emits divergence events.
  • Read-only first. No device writes anywhere in the platform. The conversational gateway has no path to a setpoint change. Architectural, not policy.
  • Contract catalog. Every communication — internal and external — is governed by an explicit data contract. Contractlessness is a deployment defect.
  • Attestation observes prevention. Network IDS doubles as contract-attestation observer; IO master cross-checks the physical substrate. Findings emit under ot.attestation.*.
  • Audit chain. Append-only, externally verifiable, externally publishable. Every operator action, every conversational query, every baseline deviation, every forensic hold writes to it.
  • Per-org Fernet encryption at rest. Reports, retained PCAPs, forensic extracts. Nothing hits disk unencrypted.