Agentic Message Bus — Master Plan¶
This doc is the north star for converting Acteon into a message bus specialized for agentic orchestration. It locks architectural decisions, lays out all 9 phases, and documents the model. Phase-level progress links live at the bottom.
Feature gate: the bus feature is compiled behind
acteon-server/busCargo feature (opt-in, same asswarm). Builds without the feature compile unchanged and respond to/v1/topics//v1/publish//v1/subscribe/*with503.
Problem framing¶
Acteon already has ~60–70 % of a bus latent in its machinery: an action dispatch pipeline with rules, quotas, audit, retries, DLQ, circuit breakers, signing, SSE streaming, and 5 polyglot SDKs. What it lacks for agentic workloads is durable topic-based subscriptions, typed envelopes, agent-as-actor primitives, conversation threading, and streaming replies.
Rather than reinvent Kafka, we use Kafka as the transport/canonical log and build Acteon-native primitives on top for the agentic semantics Kafka doesn't care about.
Locked architectural decisions¶
- Kafka is canonical. Messages live on Kafka topics. Acteon's audit store is a searchable projection, not a parallel source of truth.
- Schema registry is Acteon-native. JSON Schema in V1; Avro later. No dependency on Confluent Schema Registry.
- Agent inbox model. Agents share one inbox topic keyed by
agent_idrather than owning a topic each. Better partition utilization; simpler ACLs. - Conversations are a state object. Messages for a conversation land on a shared
conversations.eventstopic partitioned byconversation_idso Kafka gives us per-conversation ordering for free. - HITL gates are pre-publish. Approvals park the message in Acteon state (not Kafka) until approved, then commit via a Kafka transaction. This is the hardest corner; see "Exactly-once edge" below.
- Streaming = per-chunk Kafka records.
stream_id+chunk_seq+ terminal marker. One primitive for LLM tokens, tool-result streams, partial updates. - Actions and Chains stay. Boundary:
POST /v1/dispatch= provider-executed action (unchanged).POST /v1/publish= bus-only event (no provider). Both flow through rules and quotas; only dispatch reaches the executor.- Feature-gated.
busCargo feature; off by default.
Kafka integration model¶
| Concept | Kafka owns | Acteon owns |
|---|---|---|
| Transport | Producer / broker / consumer | Validation, rules, quotas, HITL gate at publish edge |
| Partition assignment | Consumer group rebalancing | Subscription identity (name, ACL, schema binding) |
| Offsets | __consumer_offsets | Lag reporting, replay UX |
| Retention | retention.ms / retention.bytes | Policy metadata on Topic objects |
| Schema | — | Schema objects + validation at publish edge |
| Identity / ACL | (optionally via SASL) | API-key auth, tenant scoping, per-topic ACL |
Rule of thumb: if Kafka has a built-in primitive, we use it; if Kafka is agnostic (agent identity, schema semantics, conversation grouping), Acteon owns it.
Phased roadmap¶
| Phase | Weeks | Scope |
|---|---|---|
| 1 | 1–3 | acteon-bus crate (rdkafka). Topic type + CRUD. POST /v1/publish. GET /v1/subscribe/{id} SSE bridge. bus Cargo feature. Docker Kafka profile. |
| 2 | 4–5 | Subscription + ConsumerGroup first-class types. Ack endpoint. DLQ routing. /lag endpoint. |
| 3 | 6–7 | JSON Schema registry. Publish-edge validation. SDK codecs. |
| 4 | 8–9 | Agent type — identity, capabilities, heartbeat, inbox = shared topic keyed by agent_id. |
| 5 | 10–11 | Conversation type. Per-conversation partitioning on shared events topic. Thread UI. |
| 6 | 12–13 | ToolCall / ToolResult envelopes. correlation_id / reply_to. Streaming chunks. HITL tool-call approvals. |
| 7 | 14–15 | UI: Topics, Subscriptions, Agents, Conversations, Lag dashboards. Metrics. |
| 8 | 16–17 | 5-SDK parity for bus surface (Rust, Python, Node, Go, Java). |
| 9 | 18 | Docs, migration guide, example multi-agent app, benchmarks vs raw Kafka. |
Exactly-once edge¶
The pre-publish HITL gate introduces a "park then produce" window where intent and publication must commit atomically. Design for Phase 5:
- On publish with
require_approval: write anunpublished_messagerow keyed byapproval_idunder an existing transaction with the approval row. - Approval handler, on approve, produces via a Kafka transaction that includes deleting the
unpublished_messagerow (via transactional outbox pattern — we emit a companion "outbox-committed" record and have a cleanup worker reconcile). - Idempotent producer (
enable.idempotence=true) + per-(tenant,message_id)dedup on Kafka side via a hash-based approach.
This section is deliberately deferred to Phase 5 design; Phase 1 does not need transactional producers.
Migration story¶
Phase 1 does not migrate anything. Existing users keep using /v1/dispatch with no behavior change. The bus is additive.
Long-term, once all phases ship, a migration story emerges: - Fan-out dispatches that multiple consumers need → rewrite as Topic + subscribers. - Chains that are really event-driven pipelines → rewrite as topic-to-topic flows with agents subscribing. - Tool-calling apps that use raw dispatch → rewrite as ToolCall envelopes with typed responses.
Migration will be opt-in, per-feature, never forced.
Phase status¶
- Phase 1 — in progress (PR #1)
- Phase 2–9 — not started
See phase-1 feature doc for current-state details.