Monday, March 30, 2026

OCTOPUSSY v5.0 -- THE OPEN-SOURCE AGENTIC AI PLATFORM - A TUTORIAL FOR CONTRIBUTORS AND CURIOUS MINDS




Written for developers, architects, and AI enthusiasts who want to understand, build upon, and contribute to a genuinely novel approach to autonomous AI systems.


CHAPTER ONE: WHY OCTOPUSSY EXISTS, AND WHY IT MATTERS


Octopussy intends to be an Open Source Agentic AI platform with clean and sound architecture. I already wrote a tutorial about Octopussy 4.0 recently (http://stal.blogspot.com/2026/03/octopussy-open-source-agentic-ai.html). In the meantime, I have cleaned up, extended and improved its architecture specification which resulted in Version 5.0 of Ocopussy. While the aforementioned tutorial aimed at developers and users, this article provides architecture insights.

Before we dive into the architecture, let us take a moment to understand the problem Octopussy is solving, because the problem is real, urgent, and surprisingly underserved by existing platforms.

The world of agentic AI -- systems where AI models do not merely answer questions but actually take actions, use tools, remember things, collaborate with other agents, and pursue goals autonomously -- is growing at a pace that is frankly difficult to keep up with. Developers everywhere are building agents that browse the web, write code, send emails, query databases, and coordinate with other agents to accomplish complex multi-step tasks. But the infrastructure underneath these agents is often a mess: a tangle of synchronous REST calls, hard-coded LLM provider integrations, no real security model, no memory architecture, no observability, and no principled way to extend or compose agent behaviors.

OpenClaw, one of the existing platforms in this space, represents a common pattern: a layered monolith where the architecture is essentially a pile of code organized by technical concern rather than by capability. Agents in OpenClaw are static, meaning they are defined in code and cannot be extended or replaced at runtime without modifying and redeploying the core platform. Security is basic, observability is minimal, multi-agent coordination patterns are limited, and the messaging model is synchronous REST -- which is a serious bottleneck for any system that needs to handle many agents working in parallel. The platform runs only on Linux, and there is no support for the Model Context Protocol (MCP), which is rapidly becoming the standard way for AI agents to call external tools.

Octopussy was designed from scratch to address every one of these shortcomings. It is not an incremental improvement; it is a rethinking of what an agentic AI platform should look like when you start from first principles and ask: what would it take to build something that is genuinely production-grade, genuinely extensible, genuinely secure, and genuinely portable?

The answer, as we will see in exhaustive detail throughout this tutorial, is Capability-Centric Architecture -- a design philosophy that treats every functional unit of the system as a self-contained, versioned, dependency- injected Capability, and that enforces this discipline without exception.

Octopussy is currently available as an architecture specification. This means the design is complete, documented, and ready for implementation. What it needs is contributors -- developers who understand the vision, appreciate the architecture, and want to help build something genuinely excellent. If you are reading this tutorial, you are exactly the kind of person the project is looking for.

 

CHAPTER TWO: CAPABILITY-CENTRIC ARCHITECTURE -- THE PHILOSOPHICAL FOUNDATION


To understand Octopussy, you must first understand Capability-Centric Architecture, or CCA, in version 0.2. CCA is not just a naming convention or a folder structure. It is a rigorous architectural discipline that governs how every single functional unit in the system is structured, how units communicate with each other, how they are created, how they start up and shut down, and how they evolve over time.

The central concept in CCA is the Capability. A Capability is a self-contained unit of functionality that encapsulates a single, clearly expressible responsibility. Every Capability has three mandatory internal layers, which together form its Nucleus. The first layer is the Essence, which contains pure domain logic with no infrastructure dependencies, no I/O, and no external calls. The second layer is the Realization, which contains the technical mechanisms that the Capability uses to do its work -- database adapters, HTTP clients, queue implementations, and so on. The third layer is the Adaptation, which contains the external interfaces through which other parts of the system interact with the Capability -- REST adapters, gRPC adapters, CLI adapters, and MCP adapters.

This three-layer structure is not optional. Every Capability in Octopussy, without exception, follows it. This discipline ensures that the domain logic of a Capability is always cleanly separated from its infrastructure concerns, which makes testing dramatically easier, makes swapping out infrastructure backends straightforward, and makes the system as a whole much easier to reason about.

Beyond the Nucleus, every Capability has a Contract and an Evolution Envelope. The Contract is the formal specification of the Capability's interface: what it provides to other Capabilities (its Provisions), what it needs from other Capabilities (its Requirements), and the interaction rules and invariants that govern its behavior (its Protocols). The Evolution Envelope carries semantic versioning information, a deprecation policy, and a migration guide, ensuring that the system can evolve over time without breaking existing integrations.

The distinction between Provisions and Requirements is crucial and worth dwelling on. Provisions are the interfaces a Capability exposes to the outside world -- the services it offers. Requirements are the interfaces a Capability consumes from other Capabilities -- its dependencies. The key insight is that Requirements are never fetched by the Capability itself. They are injected into the Capability by a central orchestrator called the OctopussyCapabilityLifecycle Manager, or OCLM. This is dependency injection at the architectural level, and it is what makes the system's wiring explicit, testable, and free of hidden coupling.

Let us make this concrete with a small illustration. Imagine you are looking at the LLMRouterCapability, which is responsible for selecting the right language model for a given agent. Its Essence contains a RoutingEngine, a HealthChecker, and a FallbackChainResolver -- pure domain logic that knows nothing about HTTP or databases. Its Realization contains a RoutingTableStore that loads routing configuration from a YAML file, and a GPUDetector that figures out what hardware acceleration is available. Its Adaptation contains the LLMRouterAdapter, which exposes the LLMRouterContract to the rest of the system. The LLMRouterCapability's Requirements include a ConfigurationPort (from ConfigurationCapability), a SecretsPort (from SecretsCapability), a PluginContract (from PluginCapability), and an ObservabilityPort (from ObservabilityCapability). None of these are fetched by the LLMRouterCapability itself; they are all injected by the OCLM during the third phase of the six-phase lifecycle.

Speaking of which: the six-phase lifecycle is another cornerstone of CCA. Every Capability Instance passes through exactly these six phases, in order, with no exceptions: instantiate, initialize, inject_dependency, start, runtime, and stop/cleanup. The OCLM computes the correct startup order using Kahn's algorithm for topological sorting, which means it automatically figures out which Capabilities must start before which others based on their declared dependencies. Shutdown happens in strict reverse topological order. This eliminates an entire class of bugs that plague systems where startup and shutdown ordering is implicit or ad hoc.

Here is a simplified diagram of the six-phase lifecycle to make this tangible:

PHASE 1: INSTANTIATE
+-----------------------------------------------------------------+
|  CapabilityFactory.create(CapabilityClass, config)             |
|  Allocates the instance; sets fields to None / defaults        |
|  Does NOT connect to any external resource yet                 |
+-----------------------------------------------------------------+
                          |
                          v
PHASE 2: INITIALIZE
+-----------------------------------------------------------------+
|  capability.initialize()                                       |
|  Loads local configuration from ConfigurationPort              |
|  Sets up internal state; no external connections yet           |
+-----------------------------------------------------------------+
                          |
                          v
PHASE 3: INJECT DEPENDENCY
+-----------------------------------------------------------------+
|  OCLM calls capability.inject_dependency()                     |
|  Resolves all Requirement interfaces                           |
|  Injects concrete port references into the Capability          |
|  This is the ONLY way Capabilities obtain references to others |
+-----------------------------------------------------------------+
                          |
                          v
PHASE 4: START
+-----------------------------------------------------------------+
|  capability.start()                                            |
|  Opens connections, starts background tasks                    |
|  Capability is now fully operational                           |
+-----------------------------------------------------------------+
                          |
                          v
PHASE 5: RUNTIME
+-----------------------------------------------------------------+
|  Normal operation; Capability serves requests                  |
|  Emits metrics, traces, and logs via ObservabilityPort         |
+-----------------------------------------------------------------+
                          |
                          v
PHASE 6: STOP / CLEANUP
+-----------------------------------------------------------------+
|  capability.stop() then capability.cleanup()                   |
|  Closes connections, releases resources                        |
|  Happens in strict reverse topological order                   |
+-----------------------------------------------------------------+

The OCLM enforces this sequence on every Capability without exception. If a Capability tries to skip a phase or access a port before inject_dependency has been called, the system will fail loudly and clearly rather than silently misbehaving. This is by design: Octopussy is built to fail fast and fail visibly, never to swallow errors or produce mysterious behavior.

One more concept deserves attention before we move on: the Capability Registry. The OctopussyCapabilityRegistry is the authoritative catalogue of all Capability Instances in the system. When a Capability is registered, it declares its dependencies. The Registry uses Kahn's algorithm to detect circular dependencies at registration time and rejects them immediately with a CircularDependencyError. This means you will never encounter a situation where two Capabilities are waiting for each other to start -- a classic deadlock scenario that plagues systems with implicit dependency management.

The Registry is itself a CCA element, which is a nice illustration of the principle that CCA applies to everything, including the infrastructure that manages CCA itself.


CHAPTER THREE: THE SYSTEM-LEVEL CAPABILITY MAP -- TWENTY-SEVEN CAPABILITIES


Octopussy v5.0 consists of exactly twenty-seven Capabilities, organized into six architectural layers. Understanding this map is essential for understanding how the system hangs together, so let us walk through it carefully before we dive into the details of individual Capabilities.

The Infrastructure Layer is the foundation on which everything else rests. It contains SecretsCapability, ConfigurationCapability, ObservabilityCapability, SecurityCapability, and MessagingCapability. These five Capabilities must start before anything else, because every other Capability depends on at least some of them. SecretsCapability starts first because it holds the cryptographic keys and API credentials that every other Capability needs. ConfigurationCapability starts second because it provides the configuration data that every other Capability reads during initialization. ObservabilityCapability starts third, but it is special: it runs as a sidecar, meaning it starts independently and never blocks the startup of other Capabilities. SecurityCapability and MessagingCapability follow.

The Extensibility Layer comes next, containing PluginCapability and, new in v5.0, UserAgentExtensionCapability. PluginCapability is the runtime extensibility engine that allows new tools, LLM adapters, VLM adapters, communication adapters, vector database backends, graph database backends, MCP server plugins, inference engine plugins, prompt pattern plugins, and user-defined agent class plugins to be loaded, hot-reloaded, and unloaded without restarting the system. UserAgentExtensionCapability provides the mechanism by which users can register their own custom agent classes and have them treated as first-class citizens by the rest of the platform.

The Execution Layer contains SandboxCapability, ToolRegistryCapability, and LLMRouterCapability. SandboxCapability provides isolated execution environments for tool calls, using Docker, gVisor, seccomp-BPF, or macOS sandbox depending on the deployment profile. ToolRegistryCapability maintains the catalogue of all available tools and enforces per-agent permission policies. LLMRouterCapability selects the right language model for each agent based on a configurable routing strategy.

The Knowledge Layer contains MemoryCapability, RAGCapability, GraphRAGCapability, TokenBudgetCapability, and PromptEngineCapability. This layer is where agents get their intelligence infrastructure: long-term memory, retrieval-augmented generation from vector databases, graph-based knowledge retrieval, token budget enforcement, and prompt template management.

The Agent Layer contains ActorCapability, AgentLifecycleManagerCapability, AgentFactoryCapability, CritiqueCapability, and SchedulerCapability. This is where agents actually live and breathe. ActorCapability implements the Actor Model for agents, giving each agent its own isolated message queues and asyncio tasks. AgentFactoryCapability creates agents from configuration files. AgentLifecycleManagerCapability tracks the health and state of all running agents. CritiqueCapability provides output quality enforcement through a Critique-and-Revise agentic pattern. SchedulerCapability triggers agents on cron schedules.

The Coordination Layer contains TeamCapability, CommunicationAdapterCapability, UserSessionCapability, and SpeechCapability. TeamCapability orchestrates multi- agent teams using four coordination patterns. CommunicationAdapterCapability bridges between the internal messaging system and external communication channels. UserSessionCapability manages user sessions. SpeechCapability provides speech-to-text and text-to-speech using OpenAI Whisper and Piper TTS.

Finally, the Interface Layer contains OrchestratorCapability, CLICapability, and WebUICapability. OrchestratorCapability is the top-level coordinator that exposes the primary REST API on port 47200. CLICapability provides the command- line interface. WebUICapability provides the web-based user interface with real-time updates via WebSocket.

Here is the complete system-level Capability map, showing all twenty-seven Capabilities and their layer assignments:

OctopussySystem (v5.0.0)
|
+-- [INFRASTRUCTURE LAYER]
|   +-- [1]  SecretsCapability
|   +-- [2]  ConfigurationCapability
|   +-- [3]  ObservabilityCapability          [Sidecar]
|   +-- [4]  SecurityCapability
|   +-- [5]  MessagingCapability
|
+-- [EXTENSIBILITY LAYER]
|   +-- [6]  PluginCapability
|   +-- [15] UserAgentExtensionCapability      [NEW in v5.0]
|
+-- [EXECUTION LAYER]
|   +-- [7]  SandboxCapability
|   +-- [8]  ToolRegistryCapability
|   +-- [9]  LLMRouterCapability
|
+-- [KNOWLEDGE LAYER]
|   +-- [10] MemoryCapability
|   +-- [11] RAGCapability
|   +-- [12] GraphRAGCapability
|   +-- [13] TokenBudgetCapability
|   +-- [14] PromptEngineCapability
|
+-- [AGENT LAYER]
|   +-- [16] ActorCapability
|   +-- [17] AgentLifecycleManagerCapability
|   +-- [18] AgentFactoryCapability
|   +-- [19] CritiqueCapability
|   +-- [20] SchedulerCapability
|
+-- [COORDINATION LAYER]
|   +-- [21] TeamCapability
|   +-- [22] CommunicationAdapterCapability
|   +-- [23] UserSessionCapability
|   +-- [24] SpeechCapability
|
+-- [INTERFACE LAYER]
    +-- [25] OrchestratorCapability
    +-- [26] CLICapability
    +-- [27] WebUICapability

The startup order is computed topologically and is deterministic. It proceeds as follows: SecretsCapability first, then ConfigurationCapability, then ObservabilityCapability (as a non-blocking sidecar), then SecurityCapability, MessagingCapability, PluginCapability, SandboxCapability, ToolRegistryCapability, LLMRouterCapability, MemoryCapability, RAGCapability, GraphRAGCapability, TokenBudgetCapability, PromptEngineCapability, UserAgentExtensionCapability (which must precede ActorCapability and AgentFactoryCapability), ActorCapability, AgentLifecycleManagerCapability, AgentFactoryCapability, CritiqueCapability, SchedulerCapability, TeamCapability, CommunicationAdapterCapability, UserSessionCapability, SpeechCapability, OrchestratorCapability, CLICapability, and finally WebUICapability. Shutdown proceeds in strict reverse order.

This ordering is not arbitrary. It reflects the actual dependency relationships between Capabilities. You cannot start the LLMRouterCapability before the SecretsCapability, because the LLMRouterCapability needs API keys from the SecretsCapability. You cannot start the AgentFactoryCapability before the UserAgentExtensionCapability, because the factory needs to know about user- defined agent classes before it can create agents. The topological sort ensures all of this automatically.


CHAPTER FOUR: THE NUCLEUS IN DETAIL -- HOW A CAPABILITY IS BUILT


Now that we understand the overall map, let us zoom in and look at how an individual Capability is actually structured. We will use a concrete example to make this vivid.

Every Capability follows the same three-layer Nucleus structure. Here is the template, shown abstractly:

CapabilityName
|
+-- Essence
|   +-- [Pure domain logic components]
|   +-- [No I/O, no infrastructure, no external dependencies]
|
+-- Realization
|   +-- [Technical mechanism components]
|   +-- [Database adapters, HTTP clients, queue implementations]
|
+-- Adaptation
    +-- [CapabilityNameAdapter]
        +-- [Implements CapabilityNameContract]
        +-- [Exposes Provisions to other Capabilities via DI]

Let us apply this template to the MemoryCapability, which provides long-term key-value memory and semantic search memory for agents. The Essence contains two components: LongTermMemoryStore, which handles key-value persistence across conversations, and SemanticMemorySearch, which performs embedding-based search over accumulated memory. These are pure domain logic components -- they define what memory does, not how it is stored. The Realization contains four backend implementations: SQLiteMemoryBackend (the default for local and SBC deployments), RedisMemoryBackend (for high-performance caching scenarios), PostgreSQLMemoryBackend (for Docker and Kubernetes deployments), and EmbeddingSearchEngine (which uses sentence-transformers to generate embeddings for semantic search). The Adaptation contains the MemoryAdapter, which exposes the MemoryPort to agents via dependency injection.

This structure has a profound practical consequence: if you want to add a new memory backend -- say, a MongoDB backend -- you add it to the Realization layer without touching the Essence or the Adaptation. The domain logic stays the same, the interface stays the same, and the new backend slots in cleanly. This is the kind of extensibility that is extremely difficult to achieve in a layered monolith but falls out naturally from CCA.

Now let us look at the LLMRouterCapability in similar detail, because it is one of the most interesting and complex Capabilities in the system. Its Essence contains three components: the RoutingEngine, which applies one of five routing strategies (COST, QUALITY, BALANCED, LOCAL, or CLOUD); the HealthChecker, which continuously polls all configured LLM providers to determine which are healthy; and the FallbackChainResolver, which implements the primary-to-fallback-to-error chain that ensures agents always get a response even when their preferred provider is unavailable. The Realization contains the RoutingTableStore, which loads the routing configuration from a YAML file, and the GPUDetector, which detects available GPU backends (CUDA, Metal, or ROCm) and uses this information to prefer local providers when appropriate hardware is available. The Adaptation contains the LLMRouterAdapter, which implements the LLMRouterContract.

The routing strategies deserve a moment of explanation, because they represent a genuinely useful feature that most agentic platforms lack entirely. The COST strategy minimizes API costs by preferring cheaper providers. The QUALITY strategy maximizes output quality by preferring the best available models. The BALANCED strategy finds a middle ground. The LOCAL strategy prefers local providers such as Ollama, LMStudio, vLLM, SGLang, HuggingFace, or MLX, which is essential for privacy-sensitive deployments or edge deployments on devices like the Raspberry Pi or Jetson Nano. The CLOUD strategy prefers cloud providers such as OpenAI, Anthropic, Google Gemini, AWS Bedrock, or Azure OpenAI.

Here is a diagram of the LLM/VLM Router Architecture to make the routing logic visual:

LLMRouterCapability
|
+-- RoutingEngine (Strategy pattern)
|   +-- COST     -- minimize cost; prefer cheaper providers
|   +-- QUALITY  -- maximize quality; prefer best models
|   +-- BALANCED -- balance cost and quality
|   +-- LOCAL    -- prefer local providers
|   |              (Ollama, LMStudio, vLLM, SGLang, HF, MLX)
|   +-- CLOUD    -- prefer cloud providers
|                  (OpenAI, Anthropic, Gemini, Bedrock, Azure)
|
+-- HealthChecker
|   +-- polls all providers continuously
|   +-- marks unhealthy providers; excludes from routing
|
+-- FallbackChainResolver
|   +-- primary provider
|   +-- fallback provider (if primary unhealthy)
|   +-- LLMProviderError (if all providers unhealthy)
|
+-- GPUDetector
    +-- CUDA detection (NVIDIA GPUs)
    +-- Metal detection (Apple Silicon)
    +-- ROCm detection (AMD GPUs)
    +-- informs LOCAL routing decisions

The LLMRouterCapability also supports Vision Language Models (VLMs) through the VLMPort, which is resolved alongside the LLMPort when an agent's specification declares vlm_enabled. This means agents can process images as well as text, using providers like Ollama (for local VLMs), OpenAI (GPT-4V), or Anthropic (Claude's vision capabilities).

One detail that is easy to overlook but critically important: API keys for all LLM and VLM providers are never stored in configuration files. They are always retrieved from the SecretsCapability. This is a hard architectural rule, not a recommendation. The SecretsCapability provides a SecretsPort that other Capabilities use to retrieve secrets by name, and key rotation is supported via SecretsCapability.rotate_secret(). This design ensures that secrets are never accidentally committed to version control, never exposed in log files, and can be rotated without restarting the system.


CHAPTER FIVE: THE ACTOR MODEL AND AGENT ARCHITECTURE


The heart of Octopussy is its agent model. Every agent in the system is an instance of OctopussyAgent, an abstract base class that implements the Actor Model. Understanding the Actor Model is essential for understanding how agents work, so let us take a moment to explain it before diving into the specifics.

The Actor Model is a mathematical model of concurrent computation in which the fundamental unit of computation is an actor -- an entity that has its own private state, communicates with other actors exclusively by sending and receiving messages, and processes messages one at a time from its own mailbox. Actors never share state with each other; all coordination happens through message passing. This model eliminates an entire class of concurrency bugs related to shared mutable state and makes it much easier to reason about the behavior of concurrent systems.

In Octopussy, each OctopussyAgent has two priority queues: an inbound queue (inQ) and an outbound queue (outQ), both implemented as asyncio.PriorityQueue instances. Messages are prioritized using the MessagePriority enum, which has four levels: CRITICAL (priority 0, processed first), HIGH (priority 1), NORMAL (priority 2), and LOW (priority 3). Each agent runs three dedicated asyncio tasks: a message loop that consumes from inQ, an outbound dispatch loop that consumes from outQ and forwards messages to the MessagingCapability, and a heartbeat loop that emits a heartbeat event every thirty seconds.

The message loop is implemented as a Chain of Responsibility, meaning each incoming message passes through a series of processing steps in order. The steps are: verify the HMAC-SHA256 signature (drop the message if invalid), check the TTL (drop the message if expired), check if the message is a CONTROL message (handle it immediately if so), scan the prompt payload for injection attacks (drop the message if an attack is detected), check the token budget (drop the message if the budget is exhausted), and finally call process_task, which is the method that subclasses override to implement their specific behavior.

Here is a diagram of the message processing chain inside an agent:

Inbound OctopussyMessage arrives in inQ
|
v
+-------------------------------------+
|  STEP 1: verify_signature(secret)  |---- FAIL ----> DROP + log warning
+-------------------------------------+
| PASS
v
+-------------------------------------+
|  STEP 2: is_expired()              |---- EXPIRED -> DROP + log warning
+-------------------------------------+
| NOT EXPIRED
v
+-------------------------------------+
|  STEP 3: message_type == CONTROL?  |---- YES -----> _handle_control()
+-------------------------------------+
| NO (TASK / EVENT)
v
+-------------------------------------+
|  STEP 4: scan_prompt(payload)      |---- INJECTION -> DROP + audit log
|          (SecurityPort)            |
+-------------------------------------+
| CLEAN
v
+-------------------------------------+
|  STEP 5: check_budget(agent_id)    |---- EXHAUSTED -> DROP + notify sender
|          (TokenBudgetPort)         |
+-------------------------------------+
| BUDGET OK
v
+-------------------------------------+
|  STEP 6: process_task(message)     |---- ERROR ----> AgentState.ERROR
|          (subclass override)       |                 + DLQ if unrecoverable
+-------------------------------------+
| SUCCESS
v
Result OctopussyMessage.create(...)
--> placed in outQ --> dispatched to MessagingCapability

There are three built-in agent subtypes. The StatelessOneShot agent executes once, returns a result, and terminates. This is the simplest agent type and is appropriate for tasks like answering a single question or performing a single web search. The StatefulLoop agent executes repeatedly until stopped, maintaining state between iterations. This is appropriate for agents that need to monitor something continuously or that engage in extended conversations. The ScheduledTaskDaemon agent wakes on a cron schedule, executes its task, and then sleeps until the next scheduled time. This is appropriate for periodic tasks like generating daily reports or checking for new data.

In addition to these three built-in types, v5.0 introduces the USER_DEFINED agent type, which allows users to register their own custom agent classes through the UserAgentExtensionCapability. We will cover this in detail in Chapter Twelve.

The AgentSpec is the immutable blueprint for an agent. It is a frozen dataclass -- meaning it cannot be modified after creation -- that contains everything the system needs to know about an agent: its unique identifier, name, description, type, execution mode, goal, persona, output format, intelligence specification (which LLM to use and with what parameters), permissions specification (which tools the agent is allowed to use), sandbox specification (resource limits), token budget specification, RAG specification (if the agent uses retrieval- augmented generation), graph RAG specification (if the agent uses graph-based knowledge retrieval), scheduler specification (if the agent runs on a schedule), MCP server specifications (if the agent calls external MCP servers), and tags.

The agent's persona and goal are loaded from Markdown files in a configuration directory, which makes it possible to define agents through configuration rather than code. This is one of the key advantages of Octopussy over OpenClaw: you can create a new agent by writing a few Markdown files and a YAML configuration, without touching any Python code. Here is a minimal example of what an agent configuration directory looks like:

my_research_agent/
|
+-- agent.yaml        (agent metadata: name, type, intelligence, budget, etc.)
+-- persona.md        (who the agent is: "You are a meticulous researcher...")
+-- goal.md           (what the agent does: "Your goal is to find and summarize...")
+-- permissions.md    (what tools the agent can use: web_search, file_write, etc.)

To create this agent from the command line, you would run:

octopussy agent create ./my_research_agent/

The AgentFactoryCapability reads the configuration directory, validates all the fields, builds an AgentSpec, and instantiates the appropriate OctopussyAgent subclass. The agent is then registered with the AgentLifecycleManagerCapability, which tracks its health and state throughout its lifetime.

The agent state machine enforces valid state transitions. An agent starts in the CREATED state, transitions to IDLE when it is ready to receive messages, moves to PROCESSING when it is handling a task, and returns to IDLE when the task is complete. If an error occurs, the agent moves to the ERROR state. The agent can also be PAUSED (temporarily suspended) or STOPPED (permanently terminated). Invalid state transitions are rejected by the state machine, which means you will never find an agent in an inconsistent state.

AGENT STATE MACHINE:

CREATED --> IDLE --> PROCESSING --> IDLE
              \                      |
               \                     v
                +-------> ERROR --> STOPPING --> STOPPED
               /
IDLE --> PAUSED --> IDLE


CHAPTER SIX: THE OCTOPUSSYMESSAGE -- THE UNIVERSAL COMMUNICATION VEHICLE


In Octopussy, there is exactly one way for agents and Capabilities to communicate with each other: the OctopussyMessage. This is not a convention or a guideline; it is a hard architectural rule. No direct method calls between agents are permitted. No shared state. No global variables. Only messages.

The OctopussyMessage is a frozen dataclass, which means it is immutable by design. Attempting to modify a field after creation raises a FrozenInstanceError. This immutability is not just a nice property; it is a security and correctness guarantee. A message that has been signed cannot be tampered with, because any modification would invalidate the signature.

Let us look at the fields of an OctopussyMessage in detail:

The message_id field is a UUID that uniquely identifies the message. 

The sender_id field is the UUID of the agent that sent the message. The recipient_id field is the UUID of the agent that should receive the message. 

The message_type field is one of four values: TASK (a request for the recipient to do something), RESULT (the output of a completed task), CONTROL (a system command such as pause, resume, or stop), or EVENT (a notification that something has happened, such as a heartbeat). 

The payload field is a dictionary of JSON-serializable values that carries the actual content of the message. 

The priority field determines how urgently the message should be processed, using the four-level MessagePriority enum. 

The trace_id field is a UUID that correlates all messages belonging to the same distributed trace, enabling end-to-end observability across multi-agent workflows. 

The created_at field is a Unix timestamp that records when the message was created. 

The ttl_seconds field specifies how long the message is valid; messages that arrive after their TTL has expired are dropped. 

The signature field is a mandatory HMAC-SHA256 signature computed over the message's key fields using a secret key retrieved from the SecretsCapability.

The signature is never Optional. It is always bytes. This is a deliberate design choice that eliminates an entire class of security vulnerabilities: you cannot accidentally create an unsigned message, because the OctopussyMessage.create() factory method always computes and attaches the signature. If you try to pass a non-bytes secret, the factory method raises a TypeError immediately.

The HMAC signature is computed as follows:

sig_data = (
    message_id + sender_id + recipient_id + message_type
    + created_at + ttl_seconds + priority
).encode("utf-8")

signature = hmac.new(secret, sig_data, hashlib.sha256).digest()

Verification uses hmac.compare_digest(), which is a constant-time comparison function that prevents timing attacks. A timing attack is a subtle security vulnerability where an attacker can determine whether a guess is correct by measuring how long the comparison takes; constant-time comparison eliminates this attack vector.

The Dead Letter Queue (DLQ) is the safety net for messages that cannot be delivered. If a message fails delivery after three consecutive attempts, it is stored in the DLQ rather than being silently discarded. Every DLQ entry contains the full original OctopussyMessage, the failure reason, the attempt count, and the timestamp of the last failure. DLQ entries are persisted in SQLite and survive system restarts. They can be inspected and replayed via the CLI or the REST API:

octopussy task list --dlq
octopussy task replay <dlq_entry_id>
octopussy task discard <dlq_entry_id>

This is a critical reliability feature. In a production agentic system, messages that fail delivery are not just inconvenient; they may represent work that needs to be done, decisions that need to be made, or alerts that need to be acted upon. The DLQ ensures that no work is ever silently lost.


CHAPTER SEVEN: SECURITY -- ZERO-TRUST FROM THE GROUND UP


Security in Octopussy is not an afterthought. It is a foundational design principle that permeates every layer of the system. The security model is zero-trust by default, meaning that every message is signed, every call is authenticated and authorized, and the absence of an explicit permission is treated as a denial.

The zero-trust enforcement chain has ten layers, and every inbound request passes through all of them in order. Let us walk through the chain:

The first layer is mTLS verification at the transport level. All gRPC communication uses mutual TLS, meaning both the client and the server must present valid certificates. This ensures that only authorized components can communicate with each other at the network level.

The second layer is JWT authentication. Every REST request must carry a valid JWT bearer token. The SecurityCapability validates the token and extracts the user identity.

The third layer is TOTP verification for elevated roles. Users with SUPERUSER or ADMIN roles must provide a valid Time-based One-Time Password in addition to their JWT token. This two-factor authentication requirement ensures that even if a JWT token is compromised, an attacker cannot perform privileged operations without also having access to the TOTP device.

The fourth layer is RBAC authorization. The SecurityCapability checks whether the authenticated user has permission to perform the requested action on the requested resource. Permissions are explicit allowlists: if a permission is not explicitly granted, it is denied. There is no implicit permission, no default allow, no "close enough."

The fifth layer is HMAC-SHA256 signature verification on the OctopussyMessage. As described in the previous section, every message carries a mandatory signature, and messages with invalid signatures are dropped immediately.

The sixth layer is TTL checking. Messages that have expired are dropped.

The seventh layer is prompt injection scanning. Before any message payload is passed to an LLM, the SecurityCapability scans it for prompt injection attacks -- attempts by malicious content in the payload to override the agent's instructions. Messages that contain injection attempts are dropped and an audit log event is generated.

The eighth layer is token budget checking. Before every LLM call, the TokenBudgetCapability checks whether the agent has sufficient budget remaining. If the budget is exhausted, the message is dropped and the sender is notified.

The ninth layer is sandbox isolation. Every tool execution happens inside a sandboxed subprocess that is isolated from the host system and from other agents. The sandbox enforces resource limits: by default, 1.0 CPU core, 512 MB of memory, 1024 MB of disk, and no network access (network can be explicitly enabled for specific tools that require it). A misbehaving tool in one agent cannot affect another agent or the host system. This is the Bulkhead pattern applied to agent isolation.

The tenth layer is output guardrail validation. Before an agent's output is returned to the caller, the SecurityCapability validates it against a guardrail policy. Outputs that violate the policy are suppressed and an audit log event is generated.

Here is a diagram of the complete zero-trust enforcement chain:

Inbound request (REST/gRPC/WebSocket)
|
+-- [1] mTLS verification (transport layer)
+-- [2] JWT authentication (SecurityCapability.authenticate())
+-- [3] TOTP verification (SUPERUSER/ADMIN roles)
+-- [4] RBAC authorization (deny-by-default)
|
v
OctopussyMessage
|
+-- [5] HMAC-SHA256 signature verification
+-- [6] TTL check (is_expired())
+-- [7] Prompt injection scan (SecurityCapability.scan_prompt())
+-- [8] Token budget check (TokenBudgetCapability)
|
v
Agent processing
|
+-- [9] Sandbox execution (SandboxCapability -- Bulkhead pattern)
|
v
Agent output
|
+-- [10] Guardrail validation (SecurityCapability.validate_output())
+-- [10] HMAC-SHA256 signature on response (OctopussyMessage.create())
|
v
SIGNED RESPONSE DELIVERED TO CALLER

This ten-layer security model is dramatically more comprehensive than what OpenClaw offers. OpenClaw's security is described in the specification as "basic," which in practice means API key authentication with no message signing, no sandbox isolation, no prompt injection scanning, and no output guardrails. For production deployments where agents have access to real tools and real data, this is simply not acceptable.


CHAPTER EIGHT: MULTI-AGENT TEAMS -- FOUR COORDINATION PATTERNS


One of the most exciting features of Octopussy is its support for multi-agent teams. A team is a group of agents that collaborate to accomplish a task that is too complex or too large for a single agent. Octopussy supports four coordination patterns, each implemented by a dedicated engine in the TeamCapability.

Before we describe the patterns, it is worth noting that all inter-agent communication within a team uses OctopussyMessage exclusively, just like communication between individual agents. Team messages are signed, priority- queued, and subject to the same security checks as any other message. The result synthesis strategy -- how the team's final result is assembled from the individual agents' contributions -- is configurable per team.

The first pattern is Coordinator/Worker. In this pattern, a designated coordinator agent receives the overall task, decomposes it into subtasks, dispatches the subtasks to worker agents in parallel, collects the results, and synthesizes a final answer. The synthesis strategy can be one of three options: coordinator (the coordinator agent synthesizes the results using its own LLM), vote (the workers vote on the best result), or concat (the results are concatenated). The maximum number of coordination rounds and the overall timeout are configurable. This pattern is ideal for tasks that can be naturally decomposed into independent subtasks, such as researching multiple topics in parallel.

Here is a diagram of the Coordinator/Worker pattern:

External Task
|
v
+---------------------+
|   COORDINATOR       |  <-- receives task; decomposes into subtasks
|   Agent             |
+-------+-------------+
        |  OctopussyMessage (TASK) x N subtasks
        |
+-------+----------------------------------+
|        |               |                 |
v        v               v                 v
+--------+  +--------+  +--------+  +--------+
| Worker1|  | Worker2|  | Worker3|  | WorkerN|
+---+----+  +---+----+  +---+----+  +---+----+
    |           |           |           |
    | RESULT    | RESULT    | RESULT    | RESULT
    +-----+-----+-----------+-----------+
          |
          v
+---------------------+
|   COORDINATOR       |  <-- synthesizes results
|   (synthesis)       |      strategy: coordinator | vote | concat
+---------------------+
          |
          v
     Final Result

The second pattern is Pipeline. In this pattern, agents are arranged in a sequence, and the output of each agent becomes the input to the next. This is a strictly sequential pattern -- there is no parallelism within a pipeline. It is ideal for multi-step processing tasks where each step depends on the output of the previous step, such as: fetch data, clean data, analyze data, generate report.

The third pattern is Mesh plus Blackboard. In this pattern, all agents in the team are peers -- there is no coordinator. Each agent can communicate directly with any other agent, and all agents share access to a Redis-backed blackboard that serves as a shared knowledge store. Agents can read from and write to the blackboard at any time, accumulating shared knowledge as they work. The final result is synthesized from the contents of the blackboard. This pattern is ideal for tasks that require emergent collaboration, where the best approach is not known in advance and agents need to discover it together.

The fourth pattern is Tree. In this pattern, agents are arranged in a hierarchy. A root agent receives the overall task and delegates subtasks to child agents. Child agents may further delegate to grandchild agents, and so on. Results flow upward from leaf agents to their parents, and ultimately to the root, which synthesizes the final result. This pattern is ideal for hierarchical tasks that can be recursively decomposed, such as writing a large document where each chapter is handled by a different agent.

Here is a diagram of the Tree pattern:

External Task
|
v
+----------------------+
|   ROOT Agent         |  <-- decomposes task; delegates to children
+-----------+----------+
            |
    +-------+-------+
    v               v
+--------+       +--------+
| Child1 |       | Child2 |  <-- may further delegate to grandchildren
+---+----+       +---+----+
    |               |
+---+---+       +---+---+
v       v       v       v
Leaf1  Leaf2  Leaf3  Leaf4  <-- leaf agents execute; return results upward
    |       |       |       |
    +---+---+       +---+---+
            |               |
            +-------+-------+
                    |
                    v
+----------------------+
|   ROOT Agent         |  <-- synthesizes all child results
+----------------------+
            |
            v
       Final Result

Teams are configured through YAML files. Here is a minimal example of a coordinator team configuration:

team_id: "auto-generated"
name: "research_team"
pattern: "coordinator"
agents:
  - config_dir: "./coordinator_agent/"
  - config_dir: "./researcher_agent_1/"
  - config_dir: "./researcher_agent_2/"
execution:
  max_rounds: 3
  result_synthesis: "coordinator"
  timeout_seconds: 600

This configuration defines a three-agent team where one agent acts as the coordinator and two agents act as researchers. The coordinator will receive tasks, decompose them into research subtasks, dispatch them to the researchers, and synthesize the results. The team has a maximum of three coordination rounds and a ten-minute timeout.


CHAPTER NINE: KNOWLEDGE INFRASTRUCTURE -- MEMORY, RAG, AND GRAPH RAG


Intelligent agents need more than just a language model. They need memory -- the ability to remember things across conversations and tasks. They need retrieval-augmented generation -- the ability to search through large bodies of knowledge and incorporate relevant information into their responses. And for complex knowledge domains, they need graph-based knowledge retrieval -- the ability to traverse relationships between entities and reason about connected knowledge.

Octopussy provides all three of these capabilities through dedicated Capabilities in the Knowledge Layer.

The MemoryCapability provides two types of memory. Long-term memory is a key-value store that persists across conversations. An agent can store a fact under a key and retrieve it later, even in a completely different session. This is implemented using SQLite by default, with Redis and PostgreSQL as alternatives. Semantic memory search allows an agent to search its accumulated memory using natural language queries, finding relevant memories even when the exact key is not known. This is implemented using sentence-transformers to generate embeddings and vector similarity search to find the most relevant memories.

The RAGCapability provides retrieval-augmented generation. An agent can ingest documents into a named collection, and then retrieve the most relevant chunks from that collection in response to a query. The ingestion process chunks the documents, generates embeddings for each chunk, and stores the chunks and their embeddings in a vector database. The retrieval process generates an embedding for the query, performs a vector similarity search, and reranks the results. Collections are isolated per agent, meaning one agent's knowledge base does not bleed into another's.

Octopussy supports five vector database backends for RAG: ChromaDB (the default, running locally on port 8000), PGVector (PostgreSQL with the pgvector extension), Weaviate, Qdrant, and Milvus. The backend is selected through configuration, and switching backends requires only a configuration change -- no code changes.

Here is a small showcase of how RAG works from an agent's perspective. Imagine an agent that has been given a collection of technical documents and needs to answer questions about them. The agent first ingests the documents:

# Conceptual illustration of RAG ingestion (not literal Python API)
await rag_port.ingest(
    collection_name = "technical_docs_agent_42",
    documents       = ["Document 1 content...", "Document 2 content..."],
    metadata        = [{"source": "doc1.pdf"}, {"source": "doc2.pdf"}]
)

Later, when the agent receives a question, it retrieves the most relevant chunks:

# Conceptual illustration of RAG retrieval (not literal Python API)
results = await rag_port.retrieve(
    collection_name = "technical_docs_agent_42",
    query           = "What are the performance characteristics?",
    top_k           = 5
)
# results is a list of the 5 most relevant document chunks

The agent then incorporates these chunks into its prompt to the LLM, giving the LLM access to relevant information that it would not otherwise have. This is the essence of retrieval-augmented generation: augmenting the LLM's knowledge with retrieved information at inference time.

The GraphRAGCapability takes this a step further by providing graph-based knowledge retrieval. Instead of treating knowledge as a flat collection of documents, GraphRAG represents knowledge as a graph of entities and relationships. An agent can query this graph using Cypher (the query language for Neo4j), upsert nodes and edges, and traverse relationships to discover connected knowledge.

Octopussy supports three graph database backends: Neo4j (the default, using the Bolt protocol on port 7687), ArangoDB, and TigerGraph. As with the vector database backends, switching between them requires only a configuration change.

The combination of MemoryCapability, RAGCapability, and GraphRAGCapability gives Octopussy agents a rich and flexible knowledge infrastructure that is far beyond what most agentic platforms provide. An agent can remember specific facts in its key-value memory, search through large document collections using semantic similarity, and traverse knowledge graphs to reason about complex relationships -- all in the same workflow.


CHAPTER TEN: THE MODEL CONTEXT PROTOCOL -- TOOL CALLING DONE RIGHT


Tool calling is one of the most important capabilities of an agentic AI system. An agent that can only generate text is limited; an agent that can also search the web, read and write files, execute code, and call external APIs is genuinely useful. Octopussy implements tool calling through the Model Context Protocol (MCP), version 2025-11-25, using the Streamable HTTP transport.

MCP is an open standard for connecting AI models to external tools and data sources. The 2025-11-25 version uses HTTP POST for requests and Server-Sent Events (SSE) for streaming responses, with a required header of MCP-Protocol-Version: 2025-11-25 on all requests. Octopussy uses the mcp Python package (version 1.6 or later) to implement both the client side (for agents calling tools) and the server side (for exposing Octopussy's built-in tools to external systems).

A key design decision in Octopussy is that all tool calls go exclusively through the MCPClientAdapter. There are no LLM-dependent tool calling mechanisms -- no function calling, no tool use APIs specific to particular LLM providers. This is a deliberate choice that ensures tool calling works consistently regardless of which LLM provider is being used, and that the security and sandboxing infrastructure is always applied.

Here is a diagram of the MCP integration architecture:

OctopussyAgent
|
v
MCPClientAdapter
|
+-- validates MCP-Protocol-Version: 2025-11-25 on all requests
+-- sends tool call request via Streamable HTTP POST
+-- streams response via SSE (Server-Sent Events)
|
+-------+---------------------------+
|                                   |
v                                   v
OctopussyMCPServer              External MCP Servers
(built-in tools)                (per MCPServerSpec in AgentSpec)
|
+-- web_search                  Any MCP 2025-11-25 compliant server
+-- web_download                Configured per-agent in agent.yaml
+-- file_read                   URL, auth, and capabilities defined
+-- file_write                  in MCPServerSpec
+-- file_list
+-- file_delete
+-- os_command
+-- python_exec
+-- memory_store
+-- memory_retrieve

Octopussy comes with a built-in catalogue of ten tools, all registered in the ToolRegistryCapability at startup. The web_search tool performs web searches via a configurable search API. The web_download tool downloads and parses web pages, converting HTML to text. The file_read, file_write, file_list, and file_delete tools provide file system access within the paths explicitly allowed by the agent's PermissionsSpec. The os_command tool executes OS shell commands from a restricted allowlist. The python_exec tool executes Python code in an isolated subprocess. The memory_store and memory_retrieve tools provide direct access to the agent's key-value memory.

All of these tools require sandbox execution, meaning they run inside the SandboxCapability's isolated environment. This ensures that even if a tool behaves unexpectedly -- perhaps because it is processing malicious input -- it cannot harm the host system or other agents.

Per-agent tool permissions are enforced through the PermissionsSpec. An agent's configuration specifies which tools it is allowed to use (allowed_tools), which tools it is explicitly forbidden from using (forbidden_tools), and which file system paths it is allowed to access (allowed_paths). Attempting to use a tool that is not in the allowlist results in an immediate denial, not a security exception -- the tool simply does not execute.

External MCP servers are configured per-agent through MCPServerSpec entries in the agent's configuration. Each MCPServerSpec specifies the server's endpoint URL, an optional API key secret name (retrieved from SecretsCapability), the protocol version (always 2025-11-25), a timeout, and an optional tool name prefix. This allows agents to call any MCP-compliant external tool server, dramatically extending the range of tools available to agents without requiring any changes to the Octopussy core.


CHAPTER ELEVEN: OBSERVABILITY -- KNOWING WHAT YOUR AGENTS ARE DOING


In a production agentic system, observability is not optional. When you have dozens or hundreds of agents running in parallel, processing tasks, calling tools, and collaborating in teams, you need to be able to see what is happening, diagnose problems, and understand performance characteristics. Octopussy provides full observability through the ObservabilityCapability, which runs as a sidecar.

The sidecar pattern means that ObservabilityCapability starts independently of the other Capabilities and never blocks their startup. It is always available, even during the startup sequence itself. Every Capability receives an ObservabilityPort via dependency injection, which it uses to emit metrics, traces, and logs.

Octopussy uses OpenTelemetry as its observability framework. OpenTelemetry is the industry standard for distributed tracing, metrics, and logging, and it is supported by virtually every observability platform including Jaeger, Tempo, Prometheus, Grafana, Datadog, and many others. The ObservabilityCapability exports traces and metrics to an OTLP endpoint (configurable, defaulting to localhost:4317) and exposes Prometheus metrics on port 9090.

Every operation in the system is traced. The trace_id field in OctopussyMessage correlates all messages belonging to the same distributed trace, so you can follow a task from the moment it enters the system through every agent that processes it, every tool call it triggers, and every LLM call it makes. This end-to-end traceability is invaluable for debugging complex multi-agent workflows.

The ObservabilityCapability also maintains an immutable audit log in JSONL format. Every security event -- failed authentication, denied authorization, dropped message, detected prompt injection, suppressed output -- is recorded in the audit log. The audit log is append-only, meaning it cannot be modified or deleted, which makes it suitable for compliance and forensic purposes.

The quality attribute targets for observability are ambitious: 100% of operations traced, all metrics exported, and every security event logged. These are not aspirational goals; they are architectural requirements enforced by the design of the system.


CHAPTER TWELVE: THE PLUGIN ARCHITECTURE -- EXTENDING OCTOPUSSY


Octopussy is designed to be extended without modifying its core code. The PluginCapability provides a runtime extensibility engine that supports ten types of plugins: TOOL_PLUGIN (new tools), LLM_ADAPTER_PLUGIN (new LLM providers), VLM_ADAPTER_PLUGIN (new VLM providers), COMM_ADAPTER_PLUGIN (new communication channel adapters), VECTOR_DB_PLUGIN (new vector database backends), GRAPH_DB_PLUGIN (new graph database backends), MCP_SERVER_PLUGIN (new MCP servers), INFERENCE_ENGINE_PLUGIN (new inference engines), PROMPT_PATTERN_PLUGIN (new prompt patterns), and USER_AGENT_PLUGIN (new user-defined agent classes).

Every plugin is described by a plugin.json manifest file. Here is an example manifest for a user-defined agent plugin:

{
  "plugin_id": "my_specialist_agent",
  "plugin_type": "USER_AGENT_PLUGIN",
  "version": "1.0.0",
  "entry_module": "my_package.specialist_agent",
  "agent_type": "specialist",
  "agent_class": "SpecialistAgent",
  "description": "A specialized agent for domain-specific tasks.",
  "schema_version": "5.0",
  "min_octopussy_version": "5.0"
}

The plugin lifecycle has three operations: load, hot-reload, and unload. Loading a plugin validates its manifest, imports its Python module, and registers it with the appropriate Capability. Hot-reloading a plugin replaces the running version with an updated version without restarting the system. Unloading a plugin cleanly removes all its registrations and cleans up its imported module references.

The hot-reload capability is particularly valuable in development and production environments. In development, you can iterate on a tool or agent class and see the changes immediately without restarting the system. In production, you can deploy a bug fix or enhancement to a plugin without taking the system offline.

The UserAgentExtensionCapability, new in v5.0, is specifically designed for user-defined agent classes. It provides the UserAgentRegistryPort, which the AgentFactoryCapability uses to look up and instantiate user-defined agent classes. When a user registers a custom agent class through a USER_AGENT_PLUGIN, that class becomes available to the AgentFactoryCapability as if it were a built-in agent type. The user's agent class must subclass OctopussyAgent and implement the process_task method; everything else -- the message queues, the heartbeat loop, the security checks, the token budget enforcement -- is handled by the base class.

Here is a conceptual illustration of what a user-defined agent class looks like:

# Conceptual illustration of a user-defined agent class
# (subclasses OctopussyAgent, overrides process_task)

class SpecialistAgent(OctopussyAgent):
    """
    A user-defined agent that specializes in a particular domain.
    The base class handles all infrastructure concerns:
    - message queues (inQ, outQ)
    - HMAC verification
    - TTL checking
    - prompt injection scanning
    - token budget enforcement
    - heartbeat emission
    - state machine transitions
    The user only needs to implement process_task().
    """

    async def process_task(self, message: OctopussyMessage) -> OctopussyMessage:
        # Extract the task from the message payload
        task = message.payload["task"]

        # Use the injected LLM port to generate a response
        request = LLMRequest(
            messages=[{"role": "user", "content": task}],
            model=self.spec.intelligence.model,
            system=self.spec.persona,
        )
        completion = await self._llm_port.complete(request)

        # Return the result as a new OctopussyMessage
        return OctopussyMessage.create(
            sender_id    = self.spec.agent_id,
            recipient_id = message.sender_id,
            message_type = MessageType.RESULT,
            payload      = {"result": completion.content},
            secret       = self._hmac_secret,
            trace_id     = message.trace_id,
        )

This is a minimal but complete user-defined agent. The user writes only the process_task method; the entire infrastructure -- security, observability, memory, RAG, tool calling, team coordination -- is provided by the platform.


CHAPTER THIRTEEN: DEPLOYMENT -- FROM RASPBERRY PI TO KUBERNETES


One of Octopussy's most distinctive features is its ability to run unchanged on radically different hardware and infrastructure configurations. The same agent definitions -- the same persona.md, goal.md, agent.yaml, and permissions.md files -- run on a Raspberry Pi and on a Kubernetes cluster without modification. The deployment profile determines which infrastructure backends are used; the agent definitions are profile-agnostic.

There are four deployment profiles: SBC, LOCAL, DOCKER, and KUBERNETES.

The SBC profile targets single-board computers like the Raspberry Pi and Jetson Nano. It uses SQLite for memory storage, an in-memory message bus (asyncio queues, in-process), ChromaDB running locally for RAG, and local-only LLM providers (Ollama, LMStudio, HuggingFace, or MLX). The sandbox uses seccomp-BPF or macOS sandbox for isolation. This profile is designed for resource-constrained environments where simplicity and low overhead are paramount.

The LOCAL profile targets developer workstations. It uses SQLite for memory storage, an in-memory bus or NATS (on localhost:4222), ChromaDB for RAG, Neo4j for graph RAG, and both local and cloud LLM providers. Docker is used for sandboxing. This profile is designed for development and experimentation.

The DOCKER profile targets Docker Compose deployments. It uses PostgreSQL for memory storage, NATS for the message bus, ChromaDB or PGVector for RAG, Neo4j for graph RAG, Redis for caching, and all LLM providers (local and cloud). Docker-in-Docker is used for sandboxing. This profile is designed for team deployments and staging environments.

The KUBERNETES profile targets production clusters. It uses PostgreSQL with high availability and connection pooling for memory storage, a NATS cluster with JetStream persistence for the message bus, PGVector or Weaviate or Qdrant or Milvus for RAG, Neo4j cluster or ArangoDB or TigerGraph for graph RAG, Redis cluster for caching, gVisor (runsc) for the strongest sandbox isolation, and HashiCorp Vault for secrets management. Horizontal scaling is achieved through stateless agents and NATS clustering. This profile is designed for production deployments with high availability and linear throughput scaling.

The network ports used by Octopussy are fixed across all profiles: port 47200 for the REST API (HTTPS/TLS), port 47201 for gRPC (mTLS), port 47202 for WebSocket (WSS), port 9090 for Prometheus metrics, port 4222 for NATS, port 5672 for RabbitMQ AMQP, port 4317 for the OTLP gRPC endpoint, port 8000 for ChromaDB REST, and port 7687 for Neo4j Bolt.

The quality attribute targets for the system are ambitious but achievable: task latency at the 99th percentile below five seconds for local LLMs and below two seconds for cloud LLMs; agent uptime above 99.9%; zero message loss under normal conditions; 100% of messages signed; 100% of prompts scanned; 100% of operations traced; and linear throughput scaling with replica count in Kubernetes.


CHAPTER FOURTEEN: DESIGN PATTERNS -- THE ENGINEERING EXCELLENCE BEHIND THE SCENES


Octopussy is not just well-architected at the macro level; it is also carefully engineered at the micro level, applying a rich set of design patterns to solve specific problems. Understanding these patterns helps you understand why the system behaves the way it does and how to extend it correctly.

In the behavioral category, the Actor Model is applied to OctopussyAgent to achieve message-passing concurrency without shared state. The State Machine pattern is applied to AgentStateMachine to enforce valid state transitions. The Chain of Responsibility pattern is applied to the message loop to create an ordered processing pipeline where each step can accept or reject the message. The Template Method pattern is applied to OctopussyAgent._message_loop(), which defines the fixed algorithm for message processing while allowing subclasses to override process_task() with their specific behavior. The Strategy pattern is applied to the LLMRouterCapability's RoutingEngine to make routing strategies pluggable. The Observer pattern is applied to ConfigurationCapability's FileWatcher to notify other Capabilities of configuration changes. The Command pattern is applied to OctopussyMessage of type CONTROL to encapsulate control commands. The Mediator pattern is applied to MessagingCapability to decouple inter-agent communication.

In the creational category, the Abstract Factory pattern is applied to CapabilityFactory and AgentFactoryCapability to decouple creation from lifecycle management. The Builder pattern is applied to AgentSpecBuilder to enable step- by-step construction of AgentSpec instances. The Factory Method pattern is applied to OctopussyMessage.create() to ensure that all messages are correctly signed at creation time. The Prototype pattern is applied to AgentSpec (a frozen dataclass) to enable immutable spec reuse.

In the resilience category, the Circuit Breaker pattern is applied to LLMPortAdapter and VLMPortAdapter to prevent cascading failures when an LLM provider is unhealthy. The Retry plus Backoff pattern is applied to LLMPortAdapter to recover from transient failures. The Bulkhead pattern is applied to SandboxCapability to isolate agent failures and prevent cascade failures. The Dead Letter Queue pattern is applied to MessagingCapability to capture failed messages for inspection and replay. The Timeout pattern is applied to all external calls to prevent indefinite blocking.

In the agentic category, the ReAct pattern (Reasoning plus Acting) is applied through the ReflectionEngine to enable agents to reason about their actions and adjust their behavior. The Plan-and-Execute pattern is applied through the PromptEngineCapability to enable agents to plan complex multi-step tasks before executing them. The Reflection pattern is applied through the ReflectionEngine to enable agents to critique their own outputs and improve them. The Critique- and-Revise pattern is applied through the CritiqueCapability to enforce output quality. The Blackboard pattern is applied through the MeshBlackboardEngine to enable shared knowledge accumulation in mesh teams.

The anti-patterns that Octopussy explicitly avoids are equally instructive. The God Object anti-pattern is avoided by ensuring every Capability has a single, expressible responsibility. Circular dependencies are detected at registration time and rejected immediately. Shared mutable state is avoided by making all messages frozen dataclasses and ensuring state is owned by exactly one Capability. Synchronous blocking I/O is avoided by making all lifecycle and agent methods async def. Magic strings are avoided by using StrEnum and IntEnum for all constants. Direct instantiation of Capabilities is avoided by requiring all Capabilities to be created via CapabilityFactory. Optional security is avoided by making HMAC signatures mandatory (bytes, never Optional) and TOTP mandatory for elevated roles. Swallowed exceptions are avoided by logging all exceptions with full trace context before re-raising or routing to the DLQ. Implicit ordering is avoided by computing and validating startup order before the first initialize() call.


CHAPTER FIFTEEN: HOW TO CONTRIBUTE -- JOINING THE OCTOPUSSY PROJECT

 

Octopussy is currently available as an architecture specification. The design is complete, documented, and production-ready. What it needs is contributors who will turn this specification into working code. This is an unusual and exciting position for an open-source project to be in: the hard thinking has been done, the architecture is sound, and the path forward is clear. What remains is the implementation.

The implementation skeleton is organized into a clear directory structure that mirrors the Capability map. Each Capability has its own directory under octopussy/capabilities/, with subdirectories for contract.py, essence/, realization/, and adaptation/. The core infrastructure -- enums, message types, port interfaces, registry, lifecycle manager, and capability factory -- lives under octopussy/core/.

There are approximately forty implementation tasks, each corresponding to a specific file or set of files in the project structure. The tasks range from relatively straightforward (implementing the enums in core/enums.py or the OctopussyMessage in core/message.py) to more complex (implementing the LLMRouterCapability with its routing engine, health checker, and fallback chain resolver, or implementing the TeamCapability with its four coordination pattern engines).

The key invariants that every contributor must enforce are: all Capabilities use async def everywhere with no blocking I/O; all messages are OctopussyMessage instances with frozen=True and mandatory HMAC signatures; all ports are injected via inject_dependency() only, never fetched via a service locator; all tool calls go through MCPClientAdapter only with the MCP-Protocol-Version: 2025-11-25 header; all secrets are retrieved from SecretsCapability only and never stored in configuration files; all Capabilities are created via CapabilityFactory only; security is deny-by-default with HMAC bytes never Optional and TOTP mandatory for SUPERUSER and ADMIN roles; and the DLQ never silently discards messages but persists them in SQLite and survives restarts.

The open-source stack is entirely based on well-established, actively maintained Python libraries. Python 3.11 or later is required. The key dependencies include FastAPI for the REST API, asyncio for async I/O, NATS.io (nats-py) for the message bus, APScheduler for the scheduler, ChromaDB for the default vector database, Neo4j for the default graph database, sentence-transformers for embeddings, OpenTelemetry SDK for observability, Prometheus client for metrics, Docker SDK for sandboxing, OpenAI Whisper for speech-to-text, Piper TTS for text-to-speech, and the mcp package (version 1.6 or later) for the Model Context Protocol.

If you are a developer who wants to contribute, the best place to start is with the core infrastructure: the enums, the OctopussyMessage, the port interfaces, the Capability Registry, the OCLM, and the CapabilityFactory. These are the foundation on which everything else rests, and getting them right is essential. Once the core infrastructure is in place, the individual Capabilities can be implemented in topological order, starting with SecretsCapability and working upward through the layers.

If you are an architect or technical lead who wants to contribute at a higher level, the specification itself is the contribution surface. The architecture is documented in exhaustive detail, but there are always opportunities to refine, clarify, and extend the design as implementation reveals edge cases and opportunities for improvement.

If you are a domain expert -- in security, observability, LLM providers, vector databases, graph databases, or any of the other technical areas that Octopussy touches -- your expertise is invaluable for ensuring that the implementation of specific Capabilities meets the highest standards in your area.

And if you are simply curious and enthusiastic about agentic AI, the best thing you can do is read the specification, understand the architecture, and start experimenting. The installation is intentionally frictionless:

pip install octopussy
octopussy install
octopussy start
octopussy status

(Note: as of the writing of this tutorial, Octopussy is in the specification phase and the above commands represent the intended installation experience once the implementation is complete. Contributors are needed to make this a reality.)


CHAPTER SIXTEEN: OCTOPUSSY VERSUS OPENCLAW -- A DETAILED COMPARISON


Throughout this tutorial, we have made several references to OpenClaw, the existing platform that Octopussy is designed to supersede. Let us now make the comparison explicit and systematic, so you can understand exactly what Octopussy offers that OpenClaw does not.

In terms of architecture, OpenClaw uses a layered monolith, meaning its components are organized by technical concern (data layer, business layer, presentation layer) rather than by capability. This makes it difficult to replace individual components, hard to test in isolation, and prone to the God Object anti-pattern. Octopussy uses CCA 0.2 with its Nucleus/Contract/ Envelope structure, which gives every component a single responsibility, a formal interface, and a versioning policy.

In terms of the agent model, OpenClaw uses static agents that are defined in code and cannot be extended or replaced at runtime. Octopussy uses a dynamic Actor Model with AgentFactory and a user-extensible agent framework, allowing new agent types to be registered and instantiated at runtime without modifying core code.

In terms of installation, OpenClaw requires a complex, multi-step installation process. Octopussy is designed for frictionless installation with three commands.

In terms of agent creation, OpenClaw is code-first, meaning you must write Python code to define an agent. Octopussy is configuration-first, meaning you can define an agent by writing Markdown and YAML files, with code-first as an option for advanced use cases.

In terms of LLM support, OpenClaw supports a limited set of LLM providers. Octopussy supports all major local and cloud providers, with GPU-aware routing and fallback chains.

In terms of security, OpenClaw's security is basic. Octopussy implements a ten-layer zero-trust security model with HMAC-SHA256 message signing, mTLS, JWT authentication, TOTP for elevated roles, RBAC, prompt injection scanning, output guardrails, and sandbox isolation.

In terms of multi-agent coordination, OpenClaw supports limited patterns. Octopussy supports four coordination patterns (Coordinator/Worker, Pipeline, Mesh plus Blackboard, and Tree) with configurable result synthesis strategies.

In terms of messaging, OpenClaw uses synchronous REST. Octopussy uses fully asynchronous Actor queues with priority ordering, gRPC with mTLS, and NATS event bus.

In terms of observability, OpenClaw's observability is minimal. Octopussy provides full OpenTelemetry integration with distributed tracing, Prometheus metrics, structured logging, and an immutable audit log.

In terms of token budget management, OpenClaw has none. Octopussy provides per-agent, multi-period (hourly, daily, monthly), cost-aware token budget enforcement.

In terms of MCP support, OpenClaw has none. Octopussy provides full MCP 2025-11-25 (Streamable HTTP) support, both as a server (exposing built-in tools) and as a client (calling external MCP servers).

In terms of platform support, OpenClaw runs only on Linux. Octopussy runs on macOS, Windows, Linux, single-board computers (Raspberry Pi, Jetson Nano), Docker containers, and Kubernetes clusters.

In terms of extensibility, OpenClaw is hard-coded. Octopussy provides a full plugin architecture with ten plugin types, hot-reload support, and a user agent extension framework.

This comparison makes clear that Octopussy is not a marginal improvement over OpenClaw. It is a fundamentally different and fundamentally better platform, designed from the ground up to address the real needs of production agentic AI deployments.


CONCLUSION: THE INVITATION


Octopussy is a genuinely ambitious project. It aims to be the operating system for autonomous AI agents: a platform that handles all the infrastructure concerns -- lifecycle, routing, memory, security, sandboxing, observability, extensibility, multi-agent coordination -- so that developers can focus exclusively on what their agents should do and how they should behave.

The architecture is sound. The design is complete. The specification is detailed enough that a skilled developer can implement any Capability from it without ambiguity. The open-source stack is mature, well-maintained, and widely understood. The quality attribute targets are ambitious but achievable.

What Octopussy needs now is the community of developers, architects, and AI enthusiasts who will bring it to life. If you have read this tutorial and found yourself thinking "I could implement that" or "I know exactly how to make that work" -- then you are exactly the kind of contributor the project is looking for.

The architecture of Octopussy reflects a deep respect for software engineering principles: clean separation of concerns, explicit dependency management, immutable data, zero-trust security, comprehensive observability, and rigorous testing. These are not just nice properties; they are the foundation of a system that can be trusted to run autonomously in production, handling real tasks with real consequences.

The name "Octopussy" is, of course, a nod to the fact that the system has many arms -- twenty-seven Capabilities, each reaching into a different domain of functionality -- all coordinated by a central intelligence. Like its biological namesake, Octopussy is adaptable, extensible, and surprisingly sophisticated beneath its surface.

The project is open-source. The specification is public. The invitation is open. Come build something excellent.

========================================================================== END OF TUTORIAL

No comments: