
A Complete Technical Deep-Dive into Architecture, Design, and Community
Version 4.0.0 | Built on Capability-Centric Architecture 0.2
TABLE OF CONTENTS
1. Prologue: Why the World Needs Another AI Platform (Hint: It Does Not --
Until Now)
2. What Is Octopussy?
3. The Intellectual Foundation: Capability-Centric Architecture (CCA) 0.2
4. The Seven Commandments: Design Principles as Enforced Invariants
5. The Competitive Landscape: Octopussy vs. OpenClaw
6. The System-Level Capability Map: Twenty-Six Arms, One Brain
7. Startup Orchestration: Kahn's Algorithm and the Lifecycle Manager
8. The Orchestrator: The Central Nervous System
9. The AgentFactory: Agents from Markdown Files
10. The Actor Model: Every Agent Is an Island
11. The Agent State Machine: A Strict Lifecycle
12. The OctopussyMessage: The One True Envelope
13. The Messaging Architecture: NATS, Queues, and Dead Letters
14. The LLM/VLM Port Architecture: Smart Routing for Language Models
15. Teams and Multi-Agent Patterns: Divide and Conquer
16. Memory, RAG, and GraphRAG: Giving Agents a Brain
17. Security Architecture: Zero-Trust, All the Way Down
18. The Sandbox: The Bulkhead That Saves Your Server
19. Token Budget Management: Money Does Not Grow on GPUs
20. The Critique-and-Revise Pattern: Quality Before Delivery
21. Observability: The Sidecar That Watches Everything
22. MCP Integration: The Universal Tool Protocol
23. The Plugin Architecture: Hot-Loading New Powers at Runtime
24. Configuration: YAML, Markdown, and Pydantic
25. Deployment: From a Raspberry Pi to a Kubernetes Cluster
26. The Open-Source Stack: Standing on the Shoulders of Giants
27. A Walk-Through: Building Your First Agent in Five Minutes
28. Known Bugs in the Specification and How They Were Fixed
29. Call to Action: Join the Community and Build Octopussy Together
30. Epilogue: The Future of Autonomous AI Agents
FIND MORE INFOMATION IN GitHubRepository of Octopussy
1. PROLOGUE: WHY THE WORLD NEEDS ANOTHER AI PLATFORM (HINT: IT DOES NOT -- UNTIL NOW)
Let us be honest with each other for a moment. The landscape of agentic AI
frameworks is already crowded. Every week a new library appears on GitHub,
promising to let you "build autonomous agents in five lines of code." Most of
them are impressive demos that collapse under the weight of a real production
workload. They share a common set of sins: they are monolithic, they treat
security as an afterthought, they hard-code their LLM providers, they block the
event loop at the worst possible moments, and they make you write code for
things that should be configuration.
OpenClaw -- formerly known as Moltbot, and before that ClawdBot, a lineage of
rebranding that should itself raise an eyebrow -- is the current incumbent in
this space. It is a layered monolith with static agents, Linux-only deployment,
synchronous REST messaging, minimal observability, no token budget management,
no MCP support, and hard-coded extensibility. It is not a bad piece of software.
It is simply not the right piece of software for a world where AI agents are
expected to run on everything from a Raspberry Pi to a thousand-node Kubernetes
cluster, talk to any LLM on the planet, coordinate in sophisticated multi-agent
teams, and do all of this securely, observably, and without costing a fortune in
API tokens.
Octopussy is the answer to that problem. It is not yet implemented. It is a
specification -- a precise, detailed, production-grade architectural blueprint
for what the ideal agentic AI platform should look like. And that is exactly why
you should care about it right now, because a specification without a community
is just a document. Octopussy needs builders, and this article is the invitation.
Read on. By the time you reach the last page, you will understand every corner
of the architecture, every design decision, every trade-off, and every place
where your contribution would make a difference.
2. WHAT IS OCTOPUSSY?
Octopussy is a production-grade, open-source, agentic AI platform built
entirely on Capability-Centric Architecture version 0.2. It enables individuals,
teams, and organizations to deploy autonomous AI agents -- individually or in
coordinated multi-agent teams -- across any platform: macOS, Windows, Linux,
single-board computers such as the Raspberry Pi and the NVIDIA Jetson Nano,
Docker containers, and Kubernetes clusters.
The name is deliberate. An octopus has eight arms, each capable of independent
action, each with its own distributed nervous system, yet all coordinated by a
central brain. Octopussy's agents are exactly like that: autonomous, capable of
independent action, yet coordinated by a central orchestrator. The analogy holds
surprisingly well all the way down to the architecture.
The ambition of Octopussy is captured in a single sentence from its
specification: Octopussy is the operating system for autonomous AI agents. Just
as an operating system handles memory management, process scheduling, I/O, and
security so that application developers do not have to, Octopussy handles agent
lifecycle, LLM routing, memory, retrieval-augmented generation, security,
sandboxing, token budgets, scheduling, multi-agent coordination, and
observability -- so that developers can focus exclusively on agent goals and
behaviors.
Installation is designed to be frictionless. The entire setup sequence is:
pip install octopussy
octopussy install
octopussy start
octopussy status
Four commands. That is it. The platform auto-detects its deployment environment
(local workstation, Docker, Kubernetes, or single-board computer) and
configures itself accordingly. No wrestling with environment variables, no
hand-editing of JSON files, no reading of fifty pages of documentation before
you can run "Hello, World."
The version described in this article is 4.0.0. It targets Python 3.11 or
later, and it implements the MCP (Model Context Protocol) specification dated
2025-11-25, using the Streamable HTTP transport.
3. THE INTELLECTUAL FOUNDATION: CAPABILITY-CENTRIC ARCHITECTURE (CCA) 0.2
Before we can understand Octopussy, we must understand the architectural
philosophy that governs every single line of it. That philosophy is
Capability-Centric Architecture, version 0.2, hereafter referred to as CCA.
CCA is a software architecture style in which every functional unit of a system
is expressed as a "Capability." A Capability is not a class, not a service, not
a microservice, and not a module in the traditional sense. It is a precisely
structured unit of functionality that consists of three mandatory components:
the Nucleus, the Contract, and the Envelope.
3.1 The Capability Nucleus
The Nucleus is the heart of a Capability. It is itself divided into three
mandatory layers, each with a strict rule about what it may and may not contain.
The Essence layer contains pure domain logic. It has no infrastructure, no I/O,
no external dependencies, and no knowledge of how it is deployed. If you are
writing the Essence of a capability that manages agent state transitions, the
Essence contains the state machine logic -- nothing more. It does not know
whether state is stored in SQLite or Redis. It does not know whether it is
called via REST or gRPC. It simply knows the rules of state transitions.
The Realization layer contains the technical mechanisms that make the Essence
work in the real world. This is where database adapters live, where queue
implementations live, where HTTP clients live. The Realization layer knows about
infrastructure, but it does not expose that knowledge to the outside world. It
serves the Essence.
The Adaptation layer contains the external interfaces. REST adapters, gRPC
adapters, CLI adapters, MCP adapters -- these all live in the Adaptation layer.
The Adaptation layer translates between the external world's language and the
Capability's internal language. It knows about HTTP verbs and gRPC method names,
but it delegates all actual work to the Essence via the Realization.
The figure below illustrates the three-layer structure of a Capability Nucleus:
Figure 1: The CCA Capability Nucleus
-----------------------------------------------------------------------
| CAPABILITY NUCLEUS |
| |
| +-------------------------------------------------------------+ |
| | ADAPTATION | |
| | (REST adapters, gRPC adapters, CLI adapters, MCP) | |
| | Translates external interfaces to internal contracts | |
| +-----------------------------+-------------------------------+ |
| | |
| delegates to |
| | |
| +-----------------------------v-------------------------------+ |
| | REALIZATION | |
| | (DB adapters, queue impls, HTTP clients, file readers) | |
| | Technical mechanisms that serve the Essence | |
| +-----------------------------+-------------------------------+ |
| | |
| serves |
| | |
| +-----------------------------v-------------------------------+ |
| | ESSENCE | |
| | (Pure domain logic, no I/O, no infrastructure) | |
| | The brain of the Capability | |
| +-------------------------------------------------------------+ |
-----------------------------------------------------------------------
3.2 The Capability Contract
Every Capability publishes a formal Contract. The Contract specifies three
things precisely.
Provisions are what the Capability provides to others -- its public interface.
These are the methods that other Capabilities may call. They are expressed as
abstract Python classes (Abstract Base Classes), which means the contract is
machine-checkable, not just a comment in a README file.
Requirements are what the Capability needs from others -- its dependencies.
Every Capability declares exactly which other Capabilities it depends on. This
declaration is not optional and not informal. It is used by the system's startup
manager to compute a topologically sorted startup order, ensuring that every
Capability's dependencies are fully started before the Capability itself is
initialized.
Protocols are the interaction rules, ordering constraints, and invariants that
govern how the Capability may be used. For example, the AgentFactoryCapability's
Contract specifies that validate_agent_config_dir() must be called before
create_agent_from_config(). If validation returns any errors,
create_agent_from_config() raises an AgentConfigurationError. This is not a
suggestion. It is a protocol, and the implementation enforces it.
3.3 The Capability Envelope
The Envelope is the versioning and evolution wrapper around a Capability. It
specifies the Capability's semantic version, the minimum client version it
supports, the deprecation policy, and the migration path. This ensures that
Capabilities can evolve over time without breaking their consumers, as long as
the evolution follows the rules of the Envelope.
3.4 The CCA Lifecycle
Every Capability, without exception, goes through a mandatory six-phase
lifecycle:
instantiate --> initialize() --> inject_dependency() --> start()
|
[runtime]
|
cleanup() <-- stop() <--------+
The instantiate phase creates the Capability object. The initialize() phase
performs internal setup that does not require other Capabilities to be ready.
The inject_dependency() phase receives all required dependencies from the
lifecycle manager, injected via dependency injection. The start() phase begins
the Capability's runtime operation. The stop() phase gracefully halts operation.
The cleanup() phase releases all resources.
Shutdown proceeds in strict reverse topological order: the Capabilities that
depend on others are stopped first, and the foundational Capabilities are
stopped last. This mirrors the startup order and ensures that no Capability
attempts to use a dependency that has already been torn down.
3.5 Why CCA Matters for Octopussy
CCA is not just an organizational pattern. It is an enforcement mechanism. In
Octopussy, the CapabilityRegistry rejects any registration that does not conform
to the CCA structure. Circular dependencies are detected at registration time
and rejected immediately with a CircularDependencyError. The system will not
start if the dependency graph has a cycle. This is not a runtime check that
might be triggered in production. It is a startup-time check that fails fast and
loudly.
The consequence of this discipline is that Octopussy's architecture is
self-documenting, testable at the unit level (because the Essence has no
external dependencies), and evolvable (because the Contract is the only surface
that other Capabilities depend on, not the implementation).
4. THE SEVEN COMMANDMENTS: DESIGN PRINCIPLES AS ENFORCED INVARIANTS
Octopussy is governed by seven design principles. The specification is
emphatic on one point: these are enforced invariants, not aspirational
guidelines. Any version of the code that violates one of them is defective by
definition. Let us examine each principle and its enforcement mechanism.
Principle 1: CCA-First. Every functional unit is a Capability. No exceptions.
The CapabilityRegistry enforces this by rejecting any registration that does not
conform to the CCA structure. You cannot sneak a plain class into the system and
call it a Capability. The registry will throw you out.
Principle 2: Zero-Trust by Default. Every message is signed, and every call is
authenticated and authorized. The enforcement mechanism is that the HMAC
signature field on every OctopussyMessage is of type bytes, never
Optional[bytes]. There is no code path in the system where a message can exist
without a valid signature. Verification raises an exception on failure; it does
not return a boolean that a developer might forget to check.
Principle 3: Async Everywhere. No blocking I/O anywhere in the system. All
lifecycle and agent methods are declared as "async def." For the rare cases
where a synchronous call is unavoidable (for example, calling a legacy library
that has no async interface), the code wraps the call in
asyncio.run_in_executor(). The asyncio event loop is never blocked.
Principle 4: Immutable Messages. All inter-agent communication uses frozen
dataclass messages. The OctopussyMessage class is decorated with
@dataclass(frozen=True), which means that any attempt to mutate a message after
creation raises a FrozenInstanceError immediately. Agents cannot accidentally
corrupt messages that are in transit.
Principle 5: Deny-by-Default Security. Least privilege is enforced at every
layer. Permissions are explicit allowlists. The absence of a permission is
equivalent to a denial. There is no "allow all" mode, no debug flag that
bypasses security, and no implicit permission inheritance.
Principle 6: Pluggable Everything. LLMs, vector databases, graph databases,
tools, and communication adapters are all pluggable at runtime. The
PluginCapability supports hot-reload of new Capabilities and tools without
restarting the system. This is not just a nice-to-have feature; it is a first-
class architectural requirement.
Principle 7: Observable by Design. Full metrics, traces, and logs are provided
via an OpenTelemetry sidecar. Every Capability receives an ObservabilityPort via
dependency injection. The sidecar starts before all other Capabilities, ensuring
that observability is available from the very first moment of system operation.
The table below summarizes the seven principles and their enforcement mechanisms:
Table 1: The Seven Design Principles of Octopussy
-----------------------------------------------------------------------
ID Principle Enforcement Mechanism
-----------------------------------------------------------------------
P1 CCA-First CapabilityRegistry rejects non-CCA
registrations at startup time
-----------------------------------------------------------------------
P2 Zero-Trust by Default HMAC signature is bytes, never
Optional[bytes]; verification raises
on failure, never returns false
-----------------------------------------------------------------------
P3 Async Everywhere All lifecycle and agent methods are
async def; sync calls wrapped in
asyncio.run_in_executor()
-----------------------------------------------------------------------
P4 Immutable Messages OctopussyMessage is frozen=True;
mutations raise FrozenInstanceError
-----------------------------------------------------------------------
P5 Deny-by-Default Security Permissions are explicit allowlists;
absence of permission = denial
-----------------------------------------------------------------------
P6 Pluggable Everything Plugin hot-reload via
PluginCapability.hot_reload_plugin()
-----------------------------------------------------------------------
P7 Observable by Design Every Capability receives
ObservabilityPort via DI; sidecar
starts before all others
-----------------------------------------------------------------------
5. THE COMPETITIVE LANDSCAPE: OCTOPUSSY VS. OPENCLAW
To understand what Octopussy achieves, it helps to compare it against the
current state of the art. OpenClaw (formerly ClawdBot, formerly Moltbot) is the
platform that Octopussy is designed to supersede. The comparison is not
flattering to OpenClaw, but it is honest.
Table 2: Octopussy 4.0.0 vs. OpenClaw -- Feature Comparison
-----------------------------------------------------------------------
Dimension OpenClaw Octopussy 4.0.0
-----------------------------------------------------------------------
Architecture Layered monolith CCA 0.2: Nucleus / Contract /
Envelope
-----------------------------------------------------------------------
Agent Model Static agents Dynamic Actor model with
AgentFactory
-----------------------------------------------------------------------
Installation Complex, multi-step pip install octopussy &&
octopussy install
-----------------------------------------------------------------------
Agent Creation Code-first Configuration-first (Markdown
files with YAML front matter)
-----------------------------------------------------------------------
LLM Support Limited All local and cloud providers,
GPU-aware, fallback chains
-----------------------------------------------------------------------
Security Basic Zero-trust, sandboxed,
encrypted secrets store,
guardrails, prompt injection
scanning
-----------------------------------------------------------------------
Multi-Agent Limited patterns Pipeline, Coordinator/Worker,
Mesh+Blackboard, Tree
-----------------------------------------------------------------------
Messaging Synchronous REST Full async Actor queues +
gRPC + NATS event bus
-----------------------------------------------------------------------
Observability Minimal Sidecar pattern, full metrics,
traces, and logs via
OpenTelemetry
-----------------------------------------------------------------------
Token Budget None Per-agent, multi-period,
cost-aware routing integration
-----------------------------------------------------------------------
MCP Support None Full MCP 2025-11-25
(Streamable HTTP), both
server and client roles
-----------------------------------------------------------------------
Platforms Linux only macOS, Windows, Linux, SBCs,
Docker, Kubernetes
-----------------------------------------------------------------------
Extensibility Hard-coded Full plugin architecture with
runtime hot-loading
-----------------------------------------------------------------------
The differences are not cosmetic. They represent fundamentally different
architectural philosophies. OpenClaw was built to solve the problem of "how do
we get an AI agent running quickly?" Octopussy was designed to solve the problem
of "how do we run AI agents reliably, securely, and at scale, for years, across
a heterogeneous infrastructure, with a team of developers who may not all be
AI experts?"
Those are different problems, and they require different solutions.
6. THE SYSTEM-LEVEL CAPABILITY MAP: TWENTY-SIX ARMS, ONE BRAIN
Octopussy consists of exactly twenty-six Capabilities. Each one is a fully
self-contained CCA Capability with its own Nucleus, Contract, and Envelope.
They are all registered in the OctopussyCapabilityRegistry and managed by the
OctopussyCapabilityLifecycleManager (OCLM).
The figure below shows the complete system-level Capability map. The indentation
reflects the logical grouping of Capabilities, not their dependency order (which
is computed dynamically by Kahn's algorithm, as described in the next section).
Figure 2: The Octopussy System-Level Capability Map (v4.0.0)
-----------------------------------------------------------------------
OctopussySystem (v4.0.0)
|
|-- SecretsCapability [Nucleus + Contract + Envelope v4.0]
|-- ConfigurationCapability [Nucleus + Contract + Envelope v4.0]
|-- ObservabilityCapability [Nucleus + Contract + Envelope v4.0]
| [SIDECAR -- starts first]
|-- SecurityCapability [Nucleus + Contract + Envelope v4.0]
|-- MessagingCapability [Nucleus + Contract + Envelope v4.0]
|-- PluginCapability [Nucleus + Contract + Envelope v4.0]
|-- ToolRegistryCapability [Nucleus + Contract + Envelope v4.0]
|-- LLMRouterCapability [Nucleus + Contract + Envelope v4.0]
|-- MemoryCapability [Nucleus + Contract + Envelope v4.0]
|-- RAGCapability [Nucleus + Contract + Envelope v4.0]
|-- GraphRAGCapability [Nucleus + Contract + Envelope v4.0]
|-- SandboxCapability [Nucleus + Contract + Envelope v4.0]
|-- TokenBudgetCapability [Nucleus + Contract + Envelope v4.0]
|-- PromptEngineCapability [Nucleus + Contract + Envelope v4.0]
|-- ActorCapability [Nucleus + Contract + Envelope v4.0]
|-- AgentFactoryCapability [Nucleus + Contract + Envelope v4.0]
|-- AgentLifecycleManagerCapability [Nucleus + Contract + Envelope v4.0]
|-- CritiqueCapability [Nucleus + Contract + Envelope v4.0]
|-- SchedulerCapability [Nucleus + Contract + Envelope v4.0]
|-- TeamCapability [Nucleus + Contract + Envelope v4.0]
|-- CommunicationAdapterCapability [Nucleus + Contract + Envelope v4.0]
|-- UserSessionCapability [Nucleus + Contract + Envelope v4.0]
|-- SpeechCapability [Nucleus + Contract + Envelope v4.0]
|-- OrchestratorCapability [Nucleus + Contract + Envelope v4.0]
|-- CLICapability [Nucleus + Contract + Envelope v4.0]
+-- WebUICapability [Nucleus + Contract + Envelope v4.0]
-----------------------------------------------------------------------
Each of these twenty-six Capabilities is described in detail in the sections
that follow. But before we dive into individual Capabilities, we need to
understand how they are started, because the startup sequence is itself a
fascinating piece of engineering.
7. STARTUP ORCHESTRATION: KAHN'S ALGORITHM AND THE LIFECYCLE MANAGER
The OctopussyCapabilityLifecycleManager (OCLM) is responsible for starting all
twenty-six Capabilities in the correct order. "Correct order" means that every
Capability's dependencies are fully started before the Capability itself begins
its initialize() phase.
The OCLM computes this order using Kahn's algorithm for topological sorting.
Kahn's algorithm works by repeatedly identifying Capabilities that have no
unresolved dependencies (their "in-degree" is zero), starting them, and then
removing them from the dependency graph, which may reduce the in-degree of other
Capabilities to zero, making them eligible for starting in the next round.
The figure below illustrates a simplified view of the dependency resolution
process for a subset of Capabilities:
Figure 3: Simplified Dependency Graph and Topological Startup Order
-----------------------------------------------------------------------
SecretsCapability ConfigurationCapability ObservabilityCapability
| | |
| | |
v v |
SecurityCapability MessagingCapability |
| \ | |
| \ | |
v v v |
LLMRouterCapability MemoryCapability |
| | |
v v |
AgentFactoryCapability <---+ |
| |
v |
AgentLifecycleManagerCapability |
| |
v |
OrchestratorCapability <-----------------------------+
|
v
CLICapability / WebUICapability
Topological startup order (simplified):
1. ObservabilityCapability (sidecar, no dependencies)
2. SecretsCapability (no dependencies)
3. ConfigurationCapability (no dependencies)
4. SecurityCapability (depends on Secrets, Configuration)
5. MessagingCapability (depends on Secrets, Configuration)
6. LLMRouterCapability (depends on Security, Configuration)
7. MemoryCapability (depends on Configuration)
8. RAGCapability (depends on Memory)
9. SandboxCapability (depends on Security, Configuration)
10. TokenBudgetCapability (depends on Configuration)
11. AgentFactoryCapability (depends on LLMRouter, Memory, RAG, ...)
12. AgentLifecycleManager (depends on AgentFactory, Messaging)
13. OrchestratorCapability (depends on almost everything)
14. CLICapability (depends on Orchestrator)
15. WebUICapability (depends on Orchestrator)
-----------------------------------------------------------------------
If a circular dependency is detected during registration -- for example, if
Capability A declares that it requires Capability B, and Capability B declares
that it requires Capability A -- the registry immediately raises a
CircularDependencyError and refuses to proceed. The system will not start in an
inconsistent state.
The ObservabilityCapability is a special case. It is started first, before all
other Capabilities, regardless of what Kahn's algorithm would otherwise compute.
This is because every other Capability needs observability from the moment it
begins its initialize() phase. The sidecar must be watching before the first
event occurs.
Shutdown proceeds in strict reverse topological order. The Capabilities that
were started last are stopped first. This ensures that no Capability attempts to
use a dependency that has already been torn down.
8. THE ORCHESTRATOR: THE CENTRAL NERVOUS SYSTEM
The OrchestratorCapability is the single user-facing entry point for all agent
creation, team formation, task dispatch, and system management. Every external
interaction -- whether it arrives via the REST API, the gRPC interface, the
WebSocket connection, the CLI, the WebUI, or a communication adapter like Slack
or Teams -- is funneled through the OrchestratorCapability.
The figure below shows the internal structure of the OrchestratorCapability,
following the CCA Nucleus pattern:
Figure 4: OrchestratorCapability -- CCA Nucleus Structure
-----------------------------------------------------------------------
OrchestratorCapability
|
+-- ESSENCE
| |-- TaskAnalyzer Determines whether a task requires a single
| | agent or a team, and which pattern to use
| |-- AgentSelector Selects or creates the best agent for a task
| |-- TaskDispatcher Routes tasks to agents via signed
| | OctopussyMessages
| |-- ResultAggregator Collects and synthesizes multi-agent results
| +-- SystemMonitor Monitors all agent states, health, and budgets
|
+-- REALIZATION
| |-- RESTAPIServer FastAPI server on port 47200 (mTLS)
| |-- gRPCServer gRPC server on port 47201 (mTLS)
| |-- WebSocketServer WebSocket server on port 47202 (mTLS)
| +-- AgentGraphRenderer Renders the agent graph in ASCII, SVG, or JSON
|
+-- ADAPTATION
|-- OrchestratorRESTAdapter Maps HTTP requests to TaskDispatcher
|-- OrchestratorGRPCAdapter Maps gRPC calls to TaskDispatcher
+-- OrchestratorWSAdapter WebSocket streaming for real-time updates
-----------------------------------------------------------------------
The OrchestratorContract defines the public interface of the Orchestrator. The
most important methods are submit_task(), create_agent(), create_team(),
get_task_result(), get_system_status(), render_agent_graph(), and list_agents().
Notice that submit_task() returns a TaskHandle immediately, without waiting for
the task to complete. This is a deliberate design choice. Octopussy is an async
system. Tasks are dispatched and results are retrieved asynchronously. A caller
that needs the result polls get_task_result() with the TaskHandle, or subscribes
to a WebSocket stream for real-time updates.
The Orchestrator exposes its services on four network ports, all protected by
mutual TLS (mTLS):
Port 47200: REST API (FastAPI, HTTPS)
Port 47201: gRPC (binary protocol, mTLS)
Port 47202: WebSocket (real-time streaming, mTLS)
Port 47203: MCP server (Model Context Protocol, Streamable HTTP, mTLS)
All four ports require mTLS. There is no unencrypted endpoint, not even for
health checks on a loopback interface.
9. THE AGENTFACTORY: AGENTS FROM MARKDOWN FILES
One of the most distinctive features of Octopussy is its configuration-first
approach to agent creation. In OpenClaw and most other frameworks, creating an
agent means writing code. In Octopussy, creating an agent means writing
Markdown files.
The AgentFactoryCapability reads agent configuration directories, validates all
files, constructs immutable AgentSpec objects, resolves all dependencies,
provisions sandboxes, registers token budgets, and starts agent message loops.
It is the Abstract Factory for all agents.
9.1 The Agent Configuration Directory
Every agent is defined by a directory of files. The directory structure is:
/etc/octopussy/agents/
+-- my_research_agent/
|-- goal.md (Required) The agent's goal and system prompt
|-- traits.md (Required) Personality, tone, and behavioral traits
|-- permissions.md (Required) Allowed tools, sandbox limits, network
|-- intelligence.md (Required) LLM provider, model, routing strategy
|-- scheduler.md (Optional) For DAEMON agents: cron schedule
|-- rag.md (Optional) RAG configuration and document sources
|-- graph_rag.md (Optional) GraphRAG knowledge graph configuration
+-- mcp_servers.json (Optional) External MCP tool servers to connect to
Each file uses YAML front matter for structured configuration and Markdown body
text for human-readable descriptions. The YAML front matter is parsed by the
MarkdownFileReader using the mistune library, and the resulting data is
validated by Pydantic before being assembled into an AgentSpec.
The AgentSpec is an immutable frozen dataclass. Once created, it cannot be
modified. This is a deliberate choice: an agent's specification is fixed at
creation time. If you want to change an agent's behavior, you update the
configuration files and create a new agent.
9.2 The AgentFactoryCapability Nucleus
Figure 5: AgentFactoryCapability -- CCA Nucleus Structure
-----------------------------------------------------------------------
AgentFactoryCapability
|
+-- ESSENCE
| |-- ConfigurationParser Parses Markdown + YAML front matter files
| |-- AgentSpecBuilder Constructs validated AgentSpec objects
| | (Builder pattern)
| |-- DependencyResolver Resolves LLM ports, memory, RAG, tools
| +-- AgentValidator Validates the complete spec before creation
|
+-- REALIZATION
| |-- MarkdownFileReader Reads .md files with YAML front matter
| |-- JSONFileReader Reads .json configuration files
| |-- AgentSpecCache Caches validated specs to avoid re-parsing
| +-- ConfigurationStore Persists agent config metadata to SQLite
|
+-- ADAPTATION
+-- AgentFactoryAdapter Exposes the factory via OrchestratorContract
-----------------------------------------------------------------------
The AgentFactoryContract specifies a critical protocol: validate_agent_config_dir()
must be called before create_agent_from_config(). If validation returns any
errors, create_agent_from_config() raises an AgentConfigurationError. This
fail-fast approach means that configuration errors are caught before any
resources are allocated, before any sandboxes are provisioned, and before any
LLM connections are established.
9.3 The AgentSpec Data Model
The AgentSpec is composed of several nested frozen dataclasses. The most
important sub-specifications are:
LLMProviderSpec defines a single LLM provider: the provider name (such as
"openai", "anthropic", or "ollama"), the model identifier, the temperature, the
maximum token count, the timeout, the base URL for local providers, and the
name of the API key secret stored in SecretsCapability. Crucially, the API key
is never stored in the AgentSpec itself. Only the name of the secret is stored,
and the actual key is retrieved from SecretsCapability at runtime.
LLMIntelligenceSpec defines the agent's intelligence configuration: a primary
LLMProviderSpec, an optional fallback LLMProviderSpec, a routing strategy
(COST, QUALITY, BALANCED, LOCAL, or CLOUD), and optional VLM (Vision Language
Model) configuration.
SandboxSpec defines the agent's sandbox constraints: the sandbox backend
(Docker, gVisor, seccomp, or macOS sandbox), CPU limit in cores, memory limit
in megabytes, disk limit in megabytes, and whether network access is enabled.
Network access is disabled by default. An agent that needs network access must
explicitly list the allowed hosts.
10. THE ACTOR MODEL: EVERY AGENT IS AN ISLAND
Every Octopussy agent is an Actor. This is not a metaphor. It is a precise
architectural statement derived from the Actor Model of computation, originally
described by Carl Hewitt in 1973 and subsequently popularized by Erlang and
Akka.
In the Actor Model, an actor is an autonomous entity that:
- Communicates exclusively via messages (never via direct method calls)
- Owns its state exclusively (no shared mutable state with other actors)
- Processes messages one at a time from its inbox
- Can create new actors, send messages, and determine how to respond to the
next message it receives
Octopussy's OctopussyAgent implements all four of these properties:
Every agent communicates exclusively via OctopussyMessage. There are no direct
method calls between agents. If Agent A needs Agent B to do something, Agent A
sends a signed OctopussyMessage to Agent B's inbox. Agent B processes the
message asynchronously and sends a result back via another OctopussyMessage.
Every agent owns its state exclusively. No two agents share a mutable data
structure. The only shared data store is the Blackboard in the Mesh+Blackboard
team pattern, and even that is isolated per team and accessed via a controlled
interface.
Every agent processes messages asynchronously from its PriorityInboundQueue,
which is an asyncio.PriorityQueue. Messages are prioritized by their
MessagePriority field (LOW, NORMAL, HIGH, CRITICAL), so that control messages
(like STOP or PAUSE) can preempt task messages.
Every agent never blocks the asyncio event loop. All I/O operations are
awaited. All synchronous calls are wrapped in asyncio.run_in_executor().
10.1 Concrete Agent Types
Octopussy provides three concrete agent implementations, all derived from the
abstract OctopussyAgent base class:
StatelessOneShot agents execute a single task and terminate. They are used for
research, code generation, and one-off analysis tasks. After processing one
message, the agent sets its internal running flag to False and exits its message
loop gracefully.
StatefulLoop agents run indefinitely, maintaining conversation history across
messages. They are used for conversational agents and monitoring agents. They
continue processing messages until they receive a STOP control command.
ScheduledTaskDaemon agents run indefinitely on a schedule. They are triggered
by the SchedulerCapability according to a cron expression or an interval defined
in their scheduler.md configuration file. They are used for monitoring,
periodic reporting, and automated workflows.
10.2 The ActorCapability Nucleus
Figure 6: ActorCapability -- CCA Nucleus Structure
-----------------------------------------------------------------------
ActorCapability
|
+-- ESSENCE
| |-- OctopussyAgent (ABC) Abstract base for all agents
| |-- AgentStateMachine Manages agent state transitions
| |-- MessageDispatcher Routes incoming messages to handlers
| |-- TokenBudgetMonitor Tracks input/output token consumption
| |-- HeartbeatEmitter Emits heartbeats every 30 seconds
| |-- LoopGuard Prevents infinite loops (iteration ceiling)
| |-- ConversationMemory Short-term chat history per session
| +-- ReflectionEngine Self-reflection on outputs (ReAct pattern)
|
+-- REALIZATION
| |-- PriorityInboundQueue asyncio.PriorityQueue, thread-safe
| |-- OutboundQueue asyncio.Queue for sending messages
| |-- StateStore Persistent state (SQLite or Redis)
| |-- LLMPortAdapter Wraps LLMPort with retry + circuit breaker
| |-- VLMPortAdapter Wraps VLMPort with retry + circuit breaker
| |-- ToolExecutor Executes tools with permission check
| |-- MCPClientAdapter MCP 2025-11-25 client for tool calls
| +-- CacheStore Response cache (Redis)
|
+-- ADAPTATION
|-- InboundQueueAdapter Receives OctopussyMessages from bus
+-- OutboundQueueAdapter Sends OctopussyMessages to bus
-----------------------------------------------------------------------
10.3 The Message Loop
The heart of every agent is its _message_loop() method. This method implements
the Template Method pattern: it defines the fixed algorithm for processing
messages, and subclasses override only the process_task() method to implement
agent-specific behavior.
The message loop algorithm, in plain English, is as follows. The agent waits
for a message from its PriorityInboundQueue. When a message arrives, the agent
first checks whether enough time has passed since the last heartbeat; if so, it
emits a heartbeat. Then it verifies the message's HMAC-SHA256 signature. If the
signature is invalid, the message is logged, audited, and dropped -- the agent
does not process it. Then the agent checks whether the message has expired (its
TTL has elapsed). If it has, the message is dropped. If the message is a CONTROL
message (PAUSE, RESUME, STOP, SLEEP, WAKE, or STATUS), it is handled by the
control handler. Otherwise, the agent scans the message payload for prompt
injection. If injection is detected, the message is blocked. Then the agent
checks its token budget. If the budget is exhausted, the message is blocked.
Finally, the agent calls process_task() to perform the actual work.
The figure below shows the message loop as a flow diagram:
Figure 7: The OctopussyAgent Message Loop
-----------------------------------------------------------------------
+------------------+
| Wait for message |
| from Priority |
| InboundQueue |
+--------+---------+
|
v
+------------------+
| Heartbeat due? |--YES--> emit_heartbeat() --> continue
+--------+---------+
| NO
v
+------------------+
| verify_signature |--FAIL--> log + audit + DROP
+--------+---------+
| OK
v
+------------------+
| is_expired()? |--YES--> log + DROP
+--------+---------+
| NO
v
+------------------+
| CONTROL message? |--YES--> _handle_control() --> continue
+--------+---------+
| NO
v
+------------------+
| scan_prompt() |--INJECT--> log + audit + BLOCK
+--------+---------+
| CLEAN
v
+------------------+
| check_budget() |--EXHAUSTED--> log + BLOCK
+--------+---------+
| OK
v
+------------------+
| process_task() |
| (subclass impl) |
+--------+---------+
|
v
+------------------+
| Send result via |
| OutboundQueue |
+------------------+
|
+---> back to top
-----------------------------------------------------------------------
11. THE AGENT STATE MACHINE: A STRICT LIFECYCLE
Every agent has a state, and that state can only change according to a strict
set of rules enforced by the AgentStateMachine. The state machine has nine
states:
CREATED is the initial state. The agent object has been instantiated but has not
yet been initialized.
INITIALIZING is the state during which the agent is performing its setup,
resolving its dependencies, and connecting to its LLM provider.
IDLE is the state in which the agent is fully initialized and waiting for
messages. This is the "ready" state.
PROCESSING is the state in which the agent is actively working on a task.
PAUSED is the state in which the agent has been explicitly paused. It will not
process new task messages until it is resumed.
SLEEPING is the state in which a DAEMON agent is between scheduled executions.
STOPPING is the transient state during which the agent is performing its
shutdown sequence.
STOPPED is the terminal state. An agent in this state cannot be restarted.
ERROR is the state into which an agent transitions when it encounters an
unhandled exception. An agent in ERROR state can be recovered (transitioning
back to IDLE) or stopped.
The valid state transitions are:
Table 3: Agent State Machine -- Valid Transitions
-----------------------------------------------------------------------
From State To State Trigger
-----------------------------------------------------------------------
CREATED INITIALIZING initialize() called
INITIALIZING IDLE initialization complete
INITIALIZING ERROR initialization failed
IDLE PROCESSING task message received
IDLE PAUSED pause() control command received
IDLE SLEEPING sleep() control command received
IDLE STOPPING stop() control command received
PROCESSING IDLE task complete
PROCESSING STOPPING stop() during task
PROCESSING ERROR unhandled exception
PAUSED IDLE resume() control command received
PAUSED STOPPING stop() control command received
SLEEPING IDLE wake() control command received
SLEEPING STOPPING stop() control command received
STOPPING STOPPED cleanup complete
ERROR STOPPING stop() control command received
ERROR IDLE error recovery successful
-----------------------------------------------------------------------
Any attempt to perform a transition that is not in this table raises an
InvalidStateTransitionError immediately. There is no "just try it and see what
happens" in Octopussy's agent lifecycle.
The figure below shows the state machine as a diagram:
Figure 8: Agent State Machine
-----------------------------------------------------------------------
+----------+
| CREATED |
+----+-----+
| initialize()
v
+---------------+
+--->| INITIALIZING |---error---> ERROR
| +-------+-------+
| | complete
| v
resume() -----+---------> IDLE <---------+
| / | \ |
| / | \ |
| pause() task sleep() |
| / | \ |
| v v v |
| PAUSED PROCESSING SLEEPING
| | | \ |
| | done \ error |
| | / \ \ |
| | / \ v |
| | v STOPPING ERROR
| +-->| | |
| | cleanup |
| v | |
| STOPPED <------+ |
+-------------------------------+
recover
-----------------------------------------------------------------------
12. THE OCTOPUSSYMESSAGE: THE ONE TRUE ENVELOPE
In Octopussy, there is exactly one way for agents and Capabilities to
communicate with each other: the OctopussyMessage. There are no direct method
calls between agents. There are no shared queues of raw strings. There are no
untyped dictionaries passed around. Every piece of communication is wrapped in
an OctopussyMessage.
The OctopussyMessage is a frozen dataclass with the following fields:
message_id: A UUID4 uniquely identifying this message.
sender_id: The UUID of the sending agent or Capability.
recipient_id: The UUID of the receiving agent or Capability.
message_type: One of TASK, RESULT, CONTROL, HEARTBEAT, ERROR, SYSTEM,
TOOL_CALL, or TOOL_RESULT.
payload: A dictionary of string keys to arbitrary JSON-serializable
values. The content of the task or result.
signature: A bytes object containing the HMAC-SHA256 signature of the
message. This field is never None and never empty.
trace_id: A UUID4 for distributed tracing, linking all messages in a
single task chain.
priority: One of LOW, NORMAL, HIGH, or CRITICAL.
ttl_seconds: The number of seconds before this message expires. Default
is 300 seconds (five minutes).
timestamp_utc: The Unix timestamp (UTC) at which the message was created.
version: Always "4.0". Used for protocol compatibility checking.
The OctopussyMessage is created via a class method called create(), which
computes the HMAC-SHA256 signature over the canonical JSON representation of
the message and stores it in the signature field. The __post_init__ method
validates that the signature is non-empty bytes, providing a defensive invariant
that catches any code that attempts to create an OctopussyMessage without going
through the create() factory method.
The HMAC comparison in verify_signature() uses hmac.compare_digest(), which is
a constant-time comparison function. This prevents timing attacks, where an
attacker could infer information about the correct signature by measuring how
long the comparison takes.
The message types serve distinct purposes. TASK messages carry work to be done.
RESULT messages carry the output of completed work. CONTROL messages carry
lifecycle commands (PAUSE, RESUME, STOP, SLEEP, WAKE, STATUS). HEARTBEAT
messages are emitted by agents every thirty seconds to signal that they are
alive. ERROR messages carry error information. SYSTEM messages carry system-
level notifications. TOOL_CALL and TOOL_RESULT messages carry tool invocation
requests and responses.
13. THE MESSAGING ARCHITECTURE: NATS, QUEUES, AND DEAD LETTERS
The MessagingCapability provides the message bus that connects all agents and
Capabilities. It is backed by NATS.io, a high-performance, cloud-native
messaging system. NATS was chosen for its exceptional throughput, low latency,
and native support for the publish-subscribe pattern that the Actor Model
requires.
The figure below shows the complete message flow from a sending agent to a
receiving agent:
Figure 9: The Octopussy Messaging Architecture
-----------------------------------------------------------------------
Sender Agent
|
| OctopussyMessage.create(secret=hmac_secret)
v
OutboundQueue (asyncio.Queue)
|
v
OutboundQueueAdapter
|
v
MessagingCapability (NATS.io event bus)
|
v (route by recipient_id)
InboundQueueAdapter
|
v
PriorityInboundQueue (asyncio.PriorityQueue, keyed by MessagePriority)
|
v
OctopussyAgent._message_loop()
|
|-- verify_signature() --> DROP if invalid
|-- is_expired() --> DROP if expired
|-- CONTROL --> _handle_control()
|-- scan_prompt() --> BLOCK if injection detected
|-- check_budget() --> BLOCK if exhausted
+-- process_task() --> result
-----------------------------------------------------------------------
The MessagingCapability also maintains a Dead Letter Queue (DLQ). Messages that
cannot be delivered after all retry attempts are placed in the DLQ rather than
silently discarded. The DLQ can be inspected via the CLI (octopussy audit) and
messages can be replayed via the MessagingContract's replay_dlq_message() method.
This is an important operational feature: in a production system, undeliverable
messages are often symptoms of bugs or configuration errors, and having a record
of them is invaluable for debugging.
The MessagingContract defines four abstract methods: publish(), subscribe(),
unsubscribe(), and get_dlq_messages(), plus replay_dlq_message(). The
publish() method raises a MessageSignatureError if the message's signature is
invalid, providing a second line of defense against unsigned messages (the first
line being the agent's own verify_signature() call).
14. THE LLM/VLM PORT ARCHITECTURE: SMART ROUTING FOR LANGUAGE MODELS
One of the most practically important features of Octopussy is its LLM routing
architecture. In a world where there are dozens of LLM providers -- OpenAI,
Anthropic, Google, Mistral, Cohere, and many more -- plus local providers like
Ollama, LM Studio, and llama.cpp -- the question of "which LLM should this agent
use for this request?" is not trivial.
Octopussy answers this question with the LLMRouterCapability, which implements
five routing strategies:
COST strategy routes requests to the cheapest healthy provider that can handle
the request. This is useful for high-volume, low-stakes tasks like summarization
or classification.
QUALITY strategy routes requests to the highest-quality healthy provider. This
is useful for tasks where accuracy is paramount, such as legal document analysis
or medical information retrieval.
BALANCED strategy finds the best cost-quality trade-off. This is the default
strategy and is suitable for most general-purpose tasks.
LOCAL strategy routes requests exclusively to local providers (Ollama, LM Studio,
llama.cpp, etc.). This is useful for privacy-sensitive tasks where data must not
leave the local machine, or for environments without internet connectivity.
CLOUD strategy routes requests exclusively to cloud providers. This is useful
for tasks that require the highest-quality models, which are currently only
available via cloud APIs.
The figure below shows the complete LLM routing flow:
Figure 10: The LLM/VLM Port Architecture
-----------------------------------------------------------------------
AgentSpec.intelligence (LLMIntelligenceSpec)
|
v
LLMRouterCapability.resolve_llm_port()
|
|-- Strategy: COST --> cheapest healthy provider
|-- Strategy: QUALITY --> highest-quality healthy provider
|-- Strategy: BALANCED --> cost/quality trade-off
|-- Strategy: LOCAL --> local providers only (Ollama, etc.)
+-- Strategy: CLOUD --> cloud providers only
|
v
LLMPort (concrete implementation for selected provider)
|
v
LLMPortAdapter (retry + circuit breaker + fallback chain)
|
v
OctopussyAgent._call_llm()
|
|-- TokenBudgetPort.check_budget() --> abort if exhausted
|-- LLMRequest (frozen dataclass)
|-- LLMCompletion (frozen dataclass)
+-- TokenBudgetPort.record_usage()
-----------------------------------------------------------------------
The LLMPortAdapter wraps the LLMPort with a circuit breaker and a retry-with-
backoff mechanism. If a provider fails, the circuit breaker opens and requests
are automatically routed to the fallback provider specified in the
LLMIntelligenceSpec. If the fallback also fails, the circuit breaker for the
fallback opens, and the error is propagated to the agent, which can then decide
how to handle it (for example, by entering ERROR state and waiting for the
circuit breaker to reset).
The LLMRequest and LLMCompletion are both frozen dataclasses, consistent with
the Immutable Messages principle. They carry the system prompt, the user prompt,
the trace ID (for distributed tracing), and the completion text and token counts.
GPU awareness is built into the LLMProviderSpec via the GPUBackend field, which
can be set to CUDA, ROCm, MPS (Apple Silicon), MLX (Apple's ML framework),
Intel, Vulkan, or CPU. This allows Octopussy to select the appropriate local
provider based on the hardware available on the host machine.
15. TEAMS AND MULTI-AGENT PATTERNS: DIVIDE AND CONQUER
Some tasks are too complex for a single agent. Octopussy supports four
multi-agent team patterns, each suited to a different class of problem.
15.1 Coordinator/Worker Pattern
In the Coordinator/Worker pattern, a coordinator agent receives a complex task,
decomposes it into subtasks, dispatches each subtask to a worker agent, collects
the results, and synthesizes a final answer. This pattern is ideal for tasks
that can be parallelized, such as researching multiple topics simultaneously or
processing multiple documents in parallel.
Figure 11: Coordinator/Worker Pattern
-----------------------------------------------------------------------
OrchestratorCapability
|
v (signed OctopussyMessage)
CoordinatorAgent
|
|-- WorkerAgent_1 (subtask_1) --> result_1 -->+
|-- WorkerAgent_2 (subtask_2) --> result_2 -->+
+-- WorkerAgent_N (subtask_N) --> result_N -->+
|
v
ResultAggregator
|
v
synthesized result
-----------------------------------------------------------------------
15.2 Pipeline Pattern
In the Pipeline pattern, agents are arranged in a linear sequence. The output
of each agent becomes the input of the next. This pattern is ideal for multi-
stage processing tasks, such as: fetch data, then clean it, then analyze it,
then format the report.
Figure 12: Pipeline Pattern
-----------------------------------------------------------------------
OrchestratorCapability
|
v
Agent_1 (stage 1) --> output_1
|
v
Agent_2 (stage 2, input=output_1) --> output_2
|
v
Agent_3 (stage 3, input=output_2) --> output_3
|
v
Agent_N (stage N, input=output_N-1) --> final_result
-----------------------------------------------------------------------
15.3 Mesh + Blackboard Pattern
In the Mesh + Blackboard pattern, all agents in the team share a common
Blackboard data store (backed by Redis, keyed by team_id). Each agent reads
from and writes to the Blackboard in each round. The TeamCapability enforces a
maximum number of rounds (max_rounds) to prevent infinite loops. This pattern
is ideal for collaborative problem-solving tasks where agents need to build on
each other's contributions, such as brainstorming or iterative document
refinement.
Figure 13: Mesh + Blackboard Pattern
-----------------------------------------------------------------------
+---------------------------------------------+
| BlackboardStore |
| (Redis, keyed by team_id) |
+---------------------------------------------+
^ ^ ^
| | |
Agent_1 Agent_2 Agent_N
(read/write) (read/write) (read/write)
| | |
+--------------+--------------+
|
max_rounds enforced
by TeamCapability
-----------------------------------------------------------------------
15.4 Tree Pattern
In the Tree pattern, a root coordinator delegates to sub-coordinators, which in
turn delegate to leaf worker agents. This pattern is ideal for very large,
hierarchically structured tasks, such as writing a comprehensive report with
multiple sections, each section requiring multiple research sub-tasks.
Figure 14: Tree Pattern
-----------------------------------------------------------------------
RootCoordinator
|
|-- SubCoordinator_A
| |-- Worker_A1
| +-- Worker_A2
|
+-- SubCoordinator_B
|-- Worker_B1
+-- Worker_B2
-----------------------------------------------------------------------
All inter-agent communication within a team uses signed OctopussyMessages,
regardless of the team pattern. The Blackboard in the Mesh pattern is the only
shared data structure, and it is isolated per team (keyed by team_id). An agent
in Team A cannot accidentally read or write Team B's Blackboard.
16. MEMORY, RAG, AND GRAPHRAG: GIVING AGENTS A BRAIN
A stateless agent that cannot remember anything beyond its current context
window is severely limited. Octopussy provides three levels of memory for agents.
The MemoryCapability manages short-term and long-term agent memory across
sessions. Short-term memory is the conversation history within a single session,
maintained by the ConversationMemory component inside the ActorCapability's
Essence. Long-term memory is persisted across sessions and is managed by the
MemoryCapability.
The RAGCapability provides vector-based retrieval-augmented generation. When an
agent needs to answer a question that requires knowledge beyond its training
data, it can query the RAGCapability with a natural language query. The
RAGCapability converts the query to a vector embedding, searches a vector
database (ChromaDB by default), and returns the most semantically relevant
document chunks. The agent then uses these chunks as additional context in its
LLM prompt.
The GraphRAGCapability provides graph-based retrieval-augmented generation. This
is a more sophisticated form of RAG that uses a knowledge graph (Neo4j by
default) to represent relationships between entities. Graph RAG is particularly
effective for tasks that require multi-hop reasoning -- for example, "What are
the subsidiaries of Company X, and what are their regulatory obligations?"
The combination of vector RAG and graph RAG gives Octopussy agents access to
both semantic similarity search (finding documents that are conceptually similar
to a query) and structured relational search (traversing explicit relationships
in a knowledge graph).
17. SECURITY ARCHITECTURE: ZERO-TRUST, ALL THE WAY DOWN
Security in Octopussy is not a feature. It is an architectural property that
pervades every layer of the system. The security model is zero-trust: no
component, no message, and no user is trusted by default. Everything must be
authenticated and authorized.
17.1 The Zero-Trust Enforcement Chain
The figure below shows the complete security enforcement chain, from an inbound
request to the final response:
Figure 15: The Zero-Trust Security Enforcement Chain
-----------------------------------------------------------------------
Inbound request (REST / gRPC / WebSocket / Channel)
|
|-- mTLS verification (transport layer)
|-- JWT authentication (bearer token)
|-- TOTP verification (SUPERUSER and ADMIN roles only)
+-- RBAC authorization (deny-by-default allowlist)
|
v
OctopussyMessage
|
|-- HMAC-SHA256 signature verification
|-- TTL check (drop if expired)
+-- Prompt injection scan
|
v
Agent processing
|
|-- Tool permission check (explicit allowlist)
+-- Sandbox execution (Bulkhead isolation)
|
v
Agent output
|
|-- Guardrail validation
+-- HMAC-SHA256 signature on response
-----------------------------------------------------------------------
17.2 HMAC Signing
Every OctopussyMessage carries a mandatory HMAC-SHA256 signature. The signature
is computed over the canonical JSON representation of the message. The
comparison uses hmac.compare_digest(), which is a constant-time comparison
function that prevents timing attacks. The signature field is of type bytes,
never Optional[bytes]. There is no code path in the system where a message can
exist without a valid signature.
17.3 RBAC and TOTP
The SecurityCapability implements Role-Based Access Control (RBAC) with a deny-
by-default policy. Roles include USER, ADMIN, and SUPERUSER. TOTP (Time-based
One-Time Password, as specified in RFC 6238) is mandatory for ADMIN and
SUPERUSER roles. This means that even if an attacker obtains a valid JWT token
for an admin user, they cannot perform admin operations without also having
access to the TOTP authenticator.
17.4 Prompt Injection Scanning
Every message payload is scanned for prompt injection before it reaches an LLM.
Prompt injection is a class of attack where a malicious user embeds instructions
in the input that attempt to override the agent's system prompt. The
SecurityCapability's prompt injection scanner uses a combination of pattern
matching and heuristic analysis to detect and block injection attempts.
17.5 Guardrail Validation
Every agent output is validated by the guardrail engine before it is delivered
to the user or to another agent. The guardrail engine checks for policy
violations (such as the generation of harmful content), factual inconsistencies,
and format errors. If a guardrail check fails, the output is rejected and the
agent is asked to revise it (via the CritiqueCapability).
17.6 The SecretsCapability
All credentials, API keys, and tokens are stored in the SecretsCapability. They
are never stored in configuration files, environment variables, or logs. The
SecretsCapability supports three backends: a local AES-256-GCM encrypted file
(the default for single-node deployments), HashiCorp Vault (for enterprise
deployments), and AWS Secrets Manager (for cloud deployments).
17.7 Security Invariants
The specification defines six non-negotiable security invariants:
Every OctopussyMessage carries a mandatory HMAC-SHA256 signature (bytes, never
Optional[bytes]), and every message is verified before processing. Failed
verification results in the message being logged, audited, and dropped.
TOTP is mandatory for SUPERUSER and ADMIN roles. There is no bypass.
Every prompt is scanned for injection before reaching an LLM.
Every agent output is validated by the guardrail engine before delivery.
All secrets are stored in SecretsCapability and never in configuration files or
environment variables.
mTLS is enforced on all four network endpoints: ports 47200, 47201, 47202, and
47203.
18. THE SANDBOX: THE BULKHEAD THAT SAVES YOUR SERVER
When an agent executes a tool -- for example, running a Python script, making
an HTTP request, or reading a file -- there is a risk that the tool will behave
badly. It might consume excessive CPU or memory, attempt to access files it
should not, or try to make network connections to unauthorized hosts.
The SandboxCapability addresses this risk by implementing the Bulkhead pattern:
each agent gets its own isolated sandbox, and tool execution is isolated per
agent. A misbehaving tool in Agent A cannot affect Agent B or the host system.
The SandboxCapability supports four sandbox backends:
DockerSandboxAdapter is the default backend. It runs each agent's tools in a
Docker container with configurable resource limits. This provides strong
isolation with reasonable overhead.
GVisorSandboxAdapter uses Google's gVisor, a user-space kernel that provides
an additional layer of isolation on top of Docker. This is suitable for
environments where the tools being executed are particularly untrusted.
SeccompSandboxAdapter uses Linux's seccomp (Secure Computing Mode) to restrict
the system calls that a tool can make. This is a lightweight option that
provides good protection without the overhead of a full container.
MacOSSandboxAdapter uses macOS's built-in sandbox mechanism, which is based on
the TrustedBSD Mandatory Access Control framework. This is the appropriate
choice for macOS deployments.
The resource limits per sandbox are configurable in the agent's permissions.md
file and include CPU limit in cores, memory limit in megabytes, disk limit in
megabytes, and a network access policy (disabled by default, with an explicit
allowlist of permitted hosts).
19. TOKEN BUDGET MANAGEMENT: MONEY DOES NOT GROW ON GPUS
LLM API calls cost money. Local LLM inference costs electricity and GPU time.
In a production system with many agents running continuously, token costs can
accumulate rapidly. The TokenBudgetCapability provides per-agent token budget
enforcement across multiple time periods.
The TokenBudgetCapability supports three budget periods: per-task, daily, and
monthly. An agent's budget for each period is configured in its permissions.md
file. The BudgetEnforcer checks the budget before every LLM call. If the budget
for any period is exhausted, the call is aborted and an AgentBudgetExhaustedError
is raised. The CostAccumulator tracks token usage across all periods and persists
the state to SQLite or Redis.
The TokenBudgetCapability is integrated with the LLMRouterCapability for cost-
aware routing. When the COST routing strategy is selected, the LLMRouter uses
the pricing table (configured in /etc/octopussy/config/pricing.yaml) to select
the cheapest provider that can handle the request within the agent's remaining
budget.
This integration means that Octopussy can automatically optimize LLM spending
without requiring developers to manually manage provider selection. An agent
configured with a daily budget of $1.00 and the COST routing strategy will
automatically use the cheapest available provider for each request, and will
stop making LLM calls when the daily budget is exhausted.
20. THE CRITIQUE-AND-REVISE PATTERN: QUALITY BEFORE DELIVERY
The CritiqueCapability implements the Critique-and-Revise agentic pattern, which
is a quality gate that operates before any agent output is delivered to the user
or to another agent.
The pattern works as follows. After an agent produces an output, the output is
sent to the CritiqueEngine, which uses a dedicated LLM (separate from the
agent's primary LLM) to evaluate the output for quality, accuracy, and policy
compliance. If the output meets the quality threshold, it is approved and
delivered. If it does not, the RevisionRequestor sends the output back to the
agent along with specific feedback about what needs to be improved. The agent
then revises the output and submits it for critique again. This cycle continues
until the output is approved or a maximum number of revision cycles is reached.
This pattern is particularly valuable for high-stakes tasks where accuracy is
critical, such as legal document drafting, medical information synthesis, or
financial report generation. By using a separate LLM for critique, the system
avoids the "self-review" problem where an agent evaluating its own output is
biased toward approving it.
21. OBSERVABILITY: THE SIDECAR THAT WATCHES EVERYTHING
The ObservabilityCapability is the sidecar that watches everything. It is the
first Capability to start (before all others, regardless of the topological
order) and the last to stop. It provides three pillars of observability:
metrics, traces, and logs, all via OpenTelemetry.
Metrics are numerical measurements of system behavior over time: the number of
messages processed per second, the average LLM response latency, the token
budget utilization per agent, the number of messages in the Dead Letter Queue,
and so on. Metrics are exported to a Prometheus endpoint on port 9090.
Traces are records of the path that a request takes through the system. Every
OctopussyMessage carries a trace_id, which is propagated through all downstream
messages generated as a result of processing the original message. This allows
operators to reconstruct the complete execution path of any task, across any
number of agents and Capabilities.
Logs are structured records of significant events. All logs are emitted in
structured JSON format to facilitate machine parsing. The audit log is a special
log that records all security-relevant events (authentication attempts, HMAC
failures, prompt injection detections, guardrail violations) to a dedicated
audit log file (/var/log/octopussy/audit.jsonl).
The CLI provides convenient access to all three observability channels:
octopussy logs -- Stream system logs in real time
octopussy metrics -- Show current metrics snapshot
octopussy audit -- Show the audit log
22. MCP INTEGRATION: THE UNIVERSAL TOOL PROTOCOL
The Model Context Protocol (MCP) is an emerging standard for how AI agents
communicate with external tools and data sources. Octopussy implements the MCP
2025-11-25 specification using the Streamable HTTP transport, in both server
and client roles.
As an MCP server, Octopussy exposes its own tools and agents to external MCP
clients via port 47203. This means that any MCP-compatible client (such as
Claude Desktop, Cursor, or another Octopussy instance) can call Octopussy's
agents as if they were MCP tools.
As an MCP client, Octopussy agents can call external MCP tool servers. This is
configured in the agent's mcp_servers.json file, which lists the URLs of the
external MCP servers that the agent is permitted to call. All tool calls from
agents go through the MCPClientAdapter, which handles the MCP protocol
negotiation, the Streamable HTTP transport, and the retry-with-backoff logic.
The MCPClientAdapter sends the MCP-Protocol-Version: 2025-11-25 header on all
requests. The OctopussyMCPServer validates this header and returns a clear error
for unsupported protocol versions, ensuring forward compatibility as the MCP
specification evolves.
23. THE PLUGIN ARCHITECTURE: HOT-LOADING NEW POWERS AT RUNTIME
The PluginCapability provides runtime hot-loading of new Capabilities and tools.
This means that a new Capability can be added to a running Octopussy instance
without restarting the system. The hot_reload_plugin() method loads the new
Capability's Python module, registers it with the CapabilityRegistry, resolves
its dependencies, and starts it according to the CCA lifecycle.
This is a powerful feature for production systems where downtime is costly.
Instead of deploying a new version of the entire Octopussy platform to add a
new tool or Capability, operators can hot-load just the new plugin.
The PluginCapability also enforces the CCA structure on plugins: a plugin that
does not conform to the CCA Nucleus/Contract/Envelope structure will be rejected
at load time. This ensures that third-party plugins cannot introduce architectural
violations into the system.
Plugins are stored in the plugins directory, configured in octopussy.yaml:
plugins_dir: "/etc/octopussy/plugins"
Each plugin is a Python package with a standard entry point that the
PluginCapability uses to discover and load it.
24. CONFIGURATION: YAML, MARKDOWN, AND PYDANTIC
Octopussy's configuration system has two levels: system configuration and agent
configuration.
24.1 System Configuration
The system configuration is stored in /etc/octopussy/config/octopussy.yaml.
This file configures all system-level settings: network ports, TLS certificates,
LLM providers, vector database backend, graph database backend, cache backend,
persistence backend, secrets backend, sandbox backend, observability settings,
and messaging backend.
A representative system configuration file looks like this:
version: "4.0"
profile: "LOCAL" # auto-detected; override if needed
platform:
rest_port: 47200
grpc_port: 47201
ws_port: 47202
mcp_port: 47203
metrics_port: 9090
tls_cert: "/etc/octopussy/tls/server.crt"
tls_key: "/etc/octopussy/tls/server.key"
tls_ca: "/etc/octopussy/tls/ca.crt"
llm:
default_routing_strategy: "BALANCED"
providers:
- provider: "ollama"
base_url: "http://localhost:11434"
models:
- "llama3.2:8b"
- "mistral:7b"
routing_table: "/etc/octopussy/config/routing_table.yaml"
pricing_table: "/etc/octopussy/config/pricing.yaml"
vector_db:
backend: "chromadb"
chromadb_path: "/var/lib/octopussy/chromadb"
graph_db:
backend: "neo4j"
neo4j_uri: "bolt://localhost:7687"
neo4j_user_secret: "neo4j_user"
neo4j_password_secret: "neo4j_password"
secrets:
backend: "local_encrypted"
keyfile: "/etc/octopussy/secrets/.keyfile"
store: "/etc/octopussy/secrets/secrets.enc"
sandbox:
default_backend: "docker"
docker_socket: "/var/run/docker.sock"
observability:
otlp_endpoint: "http://localhost:4317"
prometheus_port: 9090
log_level: "INFO"
audit_log: "/var/log/octopussy/audit.jsonl"
messaging:
backend: "nats"
nats_url: "nats://localhost:4222"
agents_dir: "/etc/octopussy/agents"
plugins_dir: "/etc/octopussy/plugins"
prompt_catalog_dir: "/etc/octopussy/prompt_catalog"
The ConfigurationCapability loads this file using PyYAML (version 6.0 or later)
and validates it using Pydantic (version 2.7 or later). The FileWatcher
component monitors the file for changes and triggers hot-reload of the affected
Capabilities when the configuration changes.
24.2 Schema Versioning
All configuration files carry a schema_version field set to "4.0". When the
schema changes, a new schema version is introduced with full backward
compatibility for one major version. The ConfigurationCapability validates the
schema version and reports clear migration errors if an old schema is detected.
A migration guide is published before the old schema version is removed.
24.3 The ConfigurationCapability Nucleus
Figure 16: ConfigurationCapability -- CCA Nucleus Structure
-----------------------------------------------------------------------
ConfigurationCapability
|
+-- ESSENCE
| |-- ConfigLoader Loads octopussy.yaml with PyYAML + Pydantic
| +-- ConfigValidator Validates configuration schema
|
+-- REALIZATION
| +-- FileWatcher Watches config files for changes (watchfiles)
|
+-- ADAPTATION
+-- ConfigurationAdapter Exposes ConfigurationContract
-----------------------------------------------------------------------
25. DEPLOYMENT: FROM A RASPBERRY PI TO A KUBERNETES CLUSTER
Octopussy is designed to run on a remarkably wide range of hardware and
deployment environments. The platform auto-detects its deployment environment
and configures itself accordingly. The supported profiles are:
LOCAL profile is for development on a workstation or laptop. It uses SQLite for
persistence, ChromaDB for vector storage, and local file storage for secrets.
It does not require Docker or Kubernetes.
DOCKER profile is for containerized single-node deployments. The Octopussy
container exposes all four service ports and the metrics port.
KUBERNETES profile is for production multi-node deployments. The Kubernetes
deployment manifest configures three replicas, liveness and readiness probes,
resource requests and limits, and volume mounts for agent configuration and TLS
certificates.
SBC profile is for single-board computers such as the Raspberry Pi and the
NVIDIA Jetson Nano. This profile uses resource-constrained defaults: smaller
memory limits, CPU-only LLM inference, and SQLite for all persistence.
The Kubernetes deployment manifest is included in the specification and
configures:
- 3 replicas for high availability
- Liveness probe on port 47200 (/health, HTTPS)
- Readiness probe on port 47200 (/ready, HTTPS)
- Resource requests: 512Mi memory, 500m CPU
- Resource limits: 2Gi memory, 2000m CPU
- Volume mounts for agent configuration (read-only ConfigMap)
and TLS certificates (read-only Secret)
- Prometheus scraping annotations on port 9090
The figure below shows the deployment architecture across all supported
environments:
Figure 17: Octopussy Deployment Architecture
-----------------------------------------------------------------------
+------------------+ +------------------+ +------------------+
| LOCAL / SBC | | DOCKER | | KUBERNETES |
| | | | | |
| octopussy:4.0.0 | | octopussy:4.0.0 | | 3x replicas |
| SQLite | | SQLite or Redis | | Redis |
| ChromaDB | | ChromaDB | | ChromaDB |
| Local secrets | | Local secrets | | Vault / AWS SM |
| CPU LLM | | GPU LLM | | GPU LLM |
| No NATS | | NATS container | | NATS cluster |
+------------------+ +------------------+ +------------------+
| | |
+------- All expose ports 47200, 47201, 47202, 47203 ------+
+------- All protected by mTLS --------------------------------+
+------- All export metrics on port 9090 ----------------------+
-----------------------------------------------------------------------
26. THE OPEN-SOURCE STACK: STANDING ON THE SHOULDERS OF GIANTS
Octopussy is built entirely on open-source components. The table below lists
the key libraries and their roles:
Table 4: Octopussy Open-Source Stack
-----------------------------------------------------------------------
Component Library Version Purpose
-----------------------------------------------------------------------
Python runtime CPython 3.11+ Runtime
Async framework asyncio stdlib Async I/O
REST API FastAPI >= 0.111 REST server + MCP
gRPC grpcio + tools >= 1.64 gRPC transport
HTTP client httpx >= 0.27 MCP client + LLM API
Message bus nats-py >= 2.6 NATS.io async client
Vector DB chromadb >= 0.5 Default vector store
Graph DB neo4j >= 5.0 GraphRAG knowledge
Cache redis >= 5.0 Session + blackboard
Config parsing PyYAML >= 6.0 YAML configuration
Data validation Pydantic >= 2.7 Config schema
Markdown parsing mistune >= 3.0 Agent config files
CLI Click + Rich >= 8.1 / CLI interface
>= 13.0
STT OpenAI Whisper local Speech to text
TTS Piper local Text to speech
Observability OpenTelemetry latest Metrics/traces/logs
-----------------------------------------------------------------------
The choice of each library reflects careful consideration. FastAPI was chosen
for the REST server because it provides automatic OpenAPI documentation, native
async support, and excellent performance. NATS.io was chosen for the message bus
because it is lightweight, high-performance, and natively supports the publish-
subscribe pattern. ChromaDB was chosen as the default vector store because it
is easy to deploy locally and has good performance for small to medium datasets.
Pydantic was chosen for data validation because it provides runtime type
checking and clear error messages, which is essential for a system where
configuration errors should be caught early.
27. A WALK-THROUGH: BUILDING YOUR FIRST AGENT IN FIVE MINUTES
Let us make this concrete. Here is how you would create a simple research agent
in Octopussy, from installation to first task.
Step 1: Install Octopussy.
pip install octopussy
octopussy install
octopussy start
octopussy status
The "octopussy install" command sets up the directory structure, generates TLS
certificates for local development, initializes the secrets store, and starts
the NATS message broker. The "octopussy start" command starts all twenty-six
Capabilities in topological order. The "octopussy status" command shows the
health of all Capabilities.
Step 2: Create the agent configuration directory.
mkdir -p /etc/octopussy/agents/research_agent
Step 3: Write the goal.md file. This file defines the agent's purpose and
system prompt.
---
agent_type: STATELESS
execution_mode: ONE_SHOT
---
You are a research assistant. Given a topic, you search for relevant
information, synthesize it into a clear summary, and cite your sources.
You are thorough, accurate, and concise.
Step 4: Write the intelligence.md file. This file defines which LLM the agent
uses and how it routes requests.
---
primary:
provider: ollama
model_id: llama3.2:8b
temperature: 0.3
max_tokens: 2048
routing_strategy: LOCAL
---
Use the local Ollama instance for all inference. Prefer accuracy over speed.
Step 5: Write the permissions.md file. This file defines what the agent is
allowed to do.
---
allowed_tools:
- web_search
- read_file
sandbox:
backend: DOCKER
cpu_limit_cores: 0.5
memory_limit_mb: 512
network_enabled: true
allowed_hosts:
- "*.wikipedia.org"
- "*.arxiv.org"
token_budget:
per_task: 10000
daily: 100000
---
The agent may search the web and read files. Network access is restricted
to Wikipedia and arXiv. Token budget is 10,000 per task and 100,000 per day.
Step 6: Write the traits.md file. This file defines the agent's personality.
---
---
Be precise and scholarly. Cite sources. Acknowledge uncertainty. Do not
speculate beyond what the evidence supports.
Step 7: Create the agent.
octopussy agent create /etc/octopussy/agents/research_agent
Octopussy will validate the configuration, resolve the LLM port, provision the
Docker sandbox, register the token budget, and start the agent's message loop.
The command returns an agent_id.
Step 8: Submit a task.
octopussy task submit "Summarize the current state of research on
transformer-based protein structure prediction" --agent <agent_id>
The command returns a task_id immediately. The agent is now working
asynchronously.
Step 9: Retrieve the result.
octopussy task result <task_id>
When the agent has finished, the result is printed to the terminal. If the
agent is still working, the command waits and polls until the result is ready.
That is it. Five minutes, seven files, nine commands, and you have a fully
functional, sandboxed, budget-managed, zero-trust research agent running on your
local machine.
28. KNOWN BUGS IN THE SPECIFICATION AND HOW THEY WERE FIXED
The Octopussy specification, in its original form, contained a number of bugs
across five categories. A thorough audit identified forty-seven distinct issues.
The most significant ones are described here, along with the corrections applied.
28.1 Rendering Artifacts
The original specification was authored in Markdown and processed by the Markdig
parser. Several rendering artifacts leaked into the document text. The most
pervasive was the string "Markdig.Syntax.Inlines.HtmlEntityInline" appearing
wherever an ampersand (&) should have been. For example, "Vision & Positioning"
appeared as "Vision Markdig.Syntax.Inlines.HtmlEntityInline Positioning" in
the table of contents and section headings throughout the document. Similarly,
the shell AND operator (&&) in the installation command appeared as a double
artifact. HTML angle-bracket placeholders like <agent_name> were also stripped
by the parser.
Correction: All rendering artifacts were replaced with the correct characters
throughout the document.
28.2 Python Code Bugs
The most systematic bug in the code listings was the corruption of Python dunder
(double-underscore) names. The string "__init__" appeared as "init" throughout
all exception class constructors, and "__name__" appeared as "name" in all
logging.getLogger() calls. This would cause every exception class to be
syntactically broken and every logger to be incorrectly named.
A logical bug was found in the _emit_heartbeat() method: the payload field was
set to {"state": str(self._spec.agent_id)}, which echoes the agent's ID rather
than its current state. The correct payload is {"state": str(self._state)}.
The from __future__ import annotations statement was missing its double
underscores in all twelve Python files, appearing as "from future import
annotations", which would cause an ImportError at runtime.
F-string interpolation was broken in multiple exception classes: the variable
references inside f-strings were stripped, leaving empty placeholders.
Correction: All dunder names were restored. The heartbeat payload was corrected
to report the agent's current state. All f-string variables were restored.
28.3 Structural Issues
The document's section numbering was inconsistent: sections 1 through 4 were
correctly numbered in the table of contents, but all subsequent sections in the
body reset to "1." The Security Architecture section, which should have been
Section 10, was numbered "1." throughout.
The security invariants paragraph (describing HMAC, TOTP, guardrails, and mTLS)
was misplaced inside the SecretsCapability description (Section 4.13) rather
than in the Security Architecture section (Section 10) where it belongs.
Correction: All section numbers were corrected. The security invariants paragraph
was moved to Section 10.4.
28.4 Consistency and Accuracy Issues
The mTLS enforcement list in the Security Architecture section originally listed
only three ports (47200, 47201, 47202), omitting port 47203 (the MCP port).
This contradicted the system configuration file, which clearly shows all four
ports.
The description of failed message verification said the message was "silently
dropped," which contradicts the statement immediately preceding it that the
failure is "logged and audited." A silently dropped message is, by definition,
not logged.
The open-source stack appendix listed "PyYAML + Pydantic >= 6.5" as a single
entry, which is ambiguous: the version 6.5 applies to Pydantic, not to PyYAML.
Correction: Port 47203 was added to the mTLS enforcement list. "Silently dropped"
was corrected to "dropped after logging and auditing." PyYAML and Pydantic
versions were separated: PyYAML >= 6.0 and Pydantic >= 2.7.
28.5 Minor Style Issues
Unused imports were found in several files: the "field" name from dataclasses
was imported in message.py but not used; "Any" was imported in agent_spec.py
but not used in the shown fragment. Trailing whitespace was found in a test file.
Correction: Unused imports were removed. Trailing whitespace was eliminated.
29. CALL TO ACTION: JOIN THE COMMUNITY AND BUILD OCTOPUSSY TOGETHER
You have now read the complete architecture of Octopussy. You understand its
twenty-six Capabilities, its seven design principles, its zero-trust security
model, its async Actor-based agent execution, its four multi-agent team patterns,
its LLM routing strategies, its sandbox isolation, its token budget management,
its MCP integration, and its plugin architecture.
You also know that Octopussy does not yet exist as running code. It is a
specification. A very good specification -- one that has been carefully audited,
debugged, and refined -- but a specification nonetheless.
This is where you come in.
Octopussy is designed to be a community project. It is open-source. It is built
on open-source components. It is designed to be extended by plugins. And it is
designed to be understood by anyone who reads its specification carefully.
Here is what the community needs to build:
The core infrastructure needs to be implemented first. This means the
CapabilityRegistry, the OctopussyCapabilityLifecycleManager, the
OctopussyMessage class, the AgentStateMachine, and the base OctopussyAgent
abstract class. These are the foundations on which everything else is built.
The foundational Capabilities need to be implemented next. SecretsCapability,
ConfigurationCapability, and ObservabilityCapability are the three Capabilities
that everything else depends on. They should be implemented and thoroughly tested
before any other Capability is started.
The agent execution engine needs to be implemented. This means the
ActorCapability with its message loop, the AgentFactoryCapability with its
Markdown configuration parser, and the AgentLifecycleManagerCapability with its
state machine enforcement.
The LLM integration layer needs to be implemented. This means the
LLMRouterCapability with its five routing strategies, the LLMPortAdapter with
its circuit breaker and retry logic, and the concrete LLMPort implementations
for at least Ollama (for local inference) and OpenAI (for cloud inference).
The security layer needs to be implemented. This means the SecurityCapability
with its RBAC, TOTP, JWT validation, and prompt injection scanning, and the
SandboxCapability with its Docker backend.
The messaging layer needs to be implemented. This means the MessagingCapability
with its NATS.io backend, its Dead Letter Queue, and its priority routing.
The team patterns need to be implemented. This means the TeamCapability with
all four patterns: Coordinator/Worker, Pipeline, Mesh+Blackboard, and Tree.
The MCP integration needs to be implemented. This means both the MCP server
(exposing Octopussy tools to external clients) and the MCP client (allowing
agents to call external MCP tool servers).
The CLI and WebUI need to be implemented. These are the user-facing interfaces
that make Octopussy accessible to non-developers.
Where to start? The specification is your guide. Every Capability has a clearly
defined Nucleus structure, a Contract (abstract Python class), and a set of
dependencies. Start with the Capabilities that have no dependencies (Secrets,
Configuration, Observability), implement their Contracts as abstract classes,
and then implement the Essence, Realization, and Adaptation layers one at a
time.
The specification also includes a complete implementation skeleton with Python
code for the core enumerations, exception classes, OctopussyMessage, AgentSpec,
OctopussyAgent, and the contract interfaces. This skeleton is your starting
point. It is not complete -- it is a skeleton -- but it gives you the shape of
the code and the conventions to follow.
The community project needs:
Contributors who can implement Capabilities, starting with the foundational ones
and working up the dependency graph.
Contributors who can write tests, because a system this complex needs
comprehensive unit tests, integration tests, and end-to-end tests.
Contributors who can write documentation, because the architecture is rich and
the specification, while detailed, is not a tutorial.
Contributors who can build LLM provider adapters, because the more providers
Octopussy supports, the more useful it is.
Contributors who can build communication adapters (Slack, Teams, Telegram,
Discord, email), because agents that can only be reached via REST are limited.
Contributors who can test on different platforms (macOS, Windows, Linux, Raspberry
Pi, Jetson Nano), because cross-platform compatibility is a first-class
requirement.
The architecture is sound. The specification is detailed. The vision is clear.
What is missing is the code, and that is where you come in.
If you have ever wanted to contribute to a project that matters -- a project
that could genuinely change how autonomous AI agents are built and deployed --
this is your moment. Octopussy is not yet another chatbot wrapper. It is a
serious, production-grade, architecturally principled platform for autonomous
AI agents, and it needs serious, principled contributors.
Get the specification. Read it. Pick a Capability. Start coding.
30. EPILOGUE: THE FUTURE OF AUTONOMOUS AI AGENTS
We are at an inflection point in the history of software. For decades, software
has been reactive: it does what users tell it to do, when they tell it to do it.
Agentic AI changes this. An agent can be given a goal and left to pursue it
autonomously, using tools, memory, and reasoning to navigate a complex task
landscape without constant human supervision.
The potential applications are staggering: autonomous research assistants that
monitor the scientific literature and alert researchers to relevant new findings;
software engineering agents that monitor production systems, detect anomalies,
and propose fixes; business intelligence agents that continuously analyze data
streams and surface actionable insights; personal productivity agents that manage
calendars, emails, and tasks with minimal human intervention.
But the potential risks are equally significant. Autonomous agents that are not
properly sandboxed can cause damage. Agents that are not properly budgeted can
run up enormous API bills. Agents that are not properly secured can be
manipulated via prompt injection. Agents that are not properly observable are
black boxes that cannot be debugged or audited.
Octopussy addresses all of these risks, not as an afterthought, but as first-
class architectural properties. Sandboxing is built in. Token budgets are built
in. Zero-trust security is built in. Observability is built in. These are not
features that can be turned off or bypassed. They are enforced invariants.
The world needs an open-source, production-grade, architecturally principled
agentic AI platform. It needs a platform that takes security seriously, that
takes observability seriously, that takes cross-platform compatibility seriously,
and that takes developer experience seriously. It needs a platform that is built
on a solid architectural foundation -- one that can evolve without breaking, that
can be extended without compromising its integrity, and that can be understood
by a community of contributors without requiring them to reverse-engineer a
monolithic codebase.
Octopussy is designed to be that platform.
It is not finished. It is a specification, a blueprint, an invitation. The
invitation is to you: to read, to understand, to contribute, and to build
something that matters.
The octopus has eight arms. It needs many more hands.
APPENDIX A: COMPLETE CAPABILITY REFERENCE
Table 5: All 26 Octopussy Capabilities -- Quick Reference
-----------------------------------------------------------------------
# Capability Name Primary Role
-----------------------------------------------------------------------
1 SecretsCapability Encrypted secrets store (AES-256-GCM,
Vault, AWS SM)
2 ConfigurationCapability YAML config loading, validation,
hot-reload
3 ObservabilityCapability OpenTelemetry sidecar (metrics,
traces, logs)
4 SecurityCapability RBAC, TOTP, JWT, prompt injection
scanning, guardrails
5 MessagingCapability NATS.io async message bus + DLQ
6 PluginCapability Runtime hot-loading of Capabilities
and tools
7 ToolRegistryCapability Registration, validation, and access
to agent tools
8 LLMRouterCapability Smart LLM routing (COST / QUALITY /
BALANCED / LOCAL / CLOUD)
9 MemoryCapability Short-term and long-term agent memory
10 RAGCapability Vector-based retrieval-augmented
generation (ChromaDB)
11 GraphRAGCapability Graph-based RAG (Neo4j knowledge
graph)
12 SandboxCapability Agent isolation (Docker, gVisor,
seccomp, macOS)
13 TokenBudgetCapability Per-agent token budgets (per-task,
daily, monthly)
14 PromptEngineCapability Prompt template management and
rendering
15 ActorCapability Abstract base for all agents +
message loop
16 AgentFactoryCapability Agent creation from Markdown config
directories
17 AgentLifecycleManagerCapability Agent state machine enforcement +
health monitoring
18 CritiqueCapability Critique-and-Revise quality gate
19 SchedulerCapability Cron-style scheduling for DAEMON
agents
20 TeamCapability Multi-agent team lifecycle and
coordination
21 CommunicationAdapterCapability Slack, Teams, Telegram, Discord,
email adapters
22 UserSessionCapability User session management per channel
23 SpeechCapability STT (Whisper) + TTS (Piper) for
voice agents
24 OrchestratorCapability Central nervous system + REST/gRPC/
WebSocket server
25 CLICapability Click + Rich CLI interface
26 WebUICapability React SPA web dashboard
-----------------------------------------------------------------------
APPENDIX B: DESIGN PATTERNS REFERENCE
Table 6: Design Patterns Used in Octopussy
-----------------------------------------------------------------------
Pattern Category Where Applied
-----------------------------------------------------------------------
Actor Model Behavioral OctopussyAgent -- message-passing
concurrency, no shared state
State Machine Behavioral AgentStateMachine -- strict, validated
state transitions
Template Method Behavioral OctopussyAgent._message_loop() --
fixed algorithm, subclasses override
process_task()
Strategy Behavioral LLMRouterCapability -- pluggable
routing strategies
Observer Behavioral ObservabilityCapability -- event
monitoring and metrics
Command Behavioral ControlCommand enum -- encapsulated
lifecycle commands
Mediator Behavioral OrchestratorCapability -- mediates
all agent interactions
Abstract Factory Creational AgentFactoryCapability -- creates all
agent types from configuration
Builder Creational AgentSpecBuilder -- constructs complex
immutable AgentSpec objects
Circuit Breaker Resilience LLMPortAdapter, VLMPortAdapter --
prevents cascade failures
Bulkhead Resilience SandboxCapability -- isolates agent
tool execution
Dead Letter Queue Resilience MessagingCapability -- captures
undeliverable messages for replay
Retry + Backoff Resilience LLMPortAdapter -- retries failed LLM
calls with exponential backoff
Adapter Integration All Adaptation layers -- maps external
interfaces to internal contracts
Plugin Extensibility PluginCapability -- runtime hot-loading
of new Capabilities
Blackboard Agentic TeamCapability (MESH) -- shared
knowledge store for collaborative agents
ReAct Agentic ReflectionEngine -- Reason + Act cycles
for complex task solving
Critique-and-Revise Agentic CritiqueCapability -- quality gate
before result delivery
Plan-and-Execute Agentic OrchestratorCapability TaskAnalyzer --
decomposes complex tasks into plans
-----------------------------------------------------------------------
APPENDIX C: VERSIONING AND EVOLUTION POLICY
Octopussy follows strict semantic versioning (MAJOR.MINOR.PATCH). Breaking
changes to any Capability Contract increment the MAJOR version. New Capabilities
or non-breaking Contract additions increment the MINOR version. Bug fixes
increment the PATCH version.
Every Capability has its own Evolution Envelope, which specifies:
CAPABILITY_VERSION: the current version of the Capability's Contract.
MINIMUM_CLIENT_VERSION: the oldest Contract version that this implementation
still supports.
DEPRECATION_POLICY: at least twelve months' notice before removal of any
Contract method.
MIGRATION_GUIDE: a link to the migration documentation.
The MCP protocol version is pinned to 2025-11-25 in Octopussy 4.0.0. The
MCPClientAdapter sends the MCP-Protocol-Version: 2025-11-25 header on all
requests. The OctopussyMCPServer validates this header and returns a clear error
for unsupported versions.