Hitchhiker's Guide to AI, Software Architecture, and Everything Else: Design Guide for Building Structured, Evolvable, and Governable Multi-Agent Systems using Capability-Centric Architecture

March 2026

A note on model names: Model identifiers used throughout this article (e.g., GPT-5.3, Claude 4.6 Opus, Gemini 3.1 Pro) are illustrative placeholders representing the class of frontier models available at the time of writing. Substitute the actual identifiers available in your environment. All architectural principles are model-agnostic.
Python version: All code requires Python ≥ 3.10 for X | Y union syntax. Add from __future__ import annotations if targeting Python 3.9.

Preface: Why Architecture Matters More Than Ever in the Age of Agentic AI

There is a temptation, when working with the remarkable AI models available today, to believe that intelligence is sufficient. That if you can get a frontier model to reason correctly about a problem, the engineering scaffolding around it is a secondary concern — something you can figure out later, once the demos are impressive enough. This temptation is understandable, and it is also one of the most reliable ways to build a system that will collapse under its own weight the moment it reaches production.

The reason is straightforward: agentic AI systems are not just intelligent — they are complex. They are composed of multiple cooperating agents, each with different roles, different model backends, different tool dependencies, different latency requirements, and different rates of change. A code-generation agent powered by GPT-5.3 may need to be replaced with Gemini 3.1 Pro next quarter when benchmarks shift. A requirements-analysis agent may need to be upgraded when task complexity grows. A new documentation agent may need to be added to the pipeline without disturbing the agents already running in production. A monitoring agent may need to observe the behavior of all other agents without any of them knowing it exists. And crucially — your team may want to run the entire pipeline locally on an NVIDIA GPU workstation, an Apple Silicon Mac, an AMD ROCm server, or an Intel accelerator, without changing a single line of business logic.

None of these operations are trivial if the system was built without a coherent architectural framework. If agents are wired together through direct, concrete references; if their input and output formats are implicit conventions rather than formal contracts; if their startup order is determined by hope rather than dependency analysis; if their versioning is managed through comments in a configuration file — then every one of those operations becomes a painful, risky, and time-consuming exercise in archaeology.

This article argues that Capability-Centric Architecture (CCA) is the right framework for building agentic AI systems that are not merely impressive in demos but genuinely maintainable, evolvable, and governable in production. CCA was designed to address exactly the kind of complexity that multi-agent systems exhibit: diverse components with heterogeneous requirements, intricate dependency relationships, and the need to evolve rapidly without breaking what already works.

We will work through every major concept in CCA — the Capability as the fundamental unit of structure, the Capability Nucleus with its three internal layers, Capability Contracts with their Provisions, Requirements, and Protocols, the Capability Registry as the authoritative coordination hub, the Capability Lifecycle Manager as the orchestrator of system startup and shutdown, Evolution Envelopes as the formal mechanism for managing change over time, and Efficiency Gradients as the tool for matching implementation depth to operational criticality. For each concept, we explain the theory in depth, connect it explicitly to the challenges of agentic AI, and provide concrete, runnable Python code.

Chapter 1 — The Structural Problem with Unstructured Agentic AI

Before we can appreciate the solution, we need to be precise and honest about the problem. It is tempting to describe the problem in vague terms — "it gets complicated," "it doesn't scale," "it's hard to maintain" — but these descriptions are too abstract to be useful. Let us instead describe the specific, concrete failure modes that emerge when agentic AI systems are built without architectural discipline.

1.1 — The Problem of Concrete Coupling

The most common pattern in early agentic AI systems is direct, concrete coupling between components. The orchestrator agent imports the code-generation agent directly. The code-generation agent calls the review agent by instantiating it directly. The review agent knows the exact class name, constructor signature, and method names of the test-generation agent.

This seems harmless at first. It is fast to write, easy to understand in isolation, and produces working demos quickly. The problem emerges the moment you need to change anything. If you want to swap the code-generation agent from GPT-5.3 to Gemini 3.1 Pro, you must find every place in the codebase where the GPT-5.3-backed agent is referenced and update it. If the new agent has a slightly different output format — perhaps it returns a structured object instead of a raw string — every downstream agent that consumes that output must be updated as well. A change that should be local becomes a system-wide refactoring exercise.

CCA addresses this through Capability Contracts: formal, interface-oriented agreements that define what a Capability provides and what it requires, completely independently of how it is implemented. The orchestrator does not know whether the code-generation Capability uses GPT-5.3 or Gemini 3.1 Pro or a locally-running Llama 3 model. It only knows the Contract: given a structured task specification, the Capability will return a structured code artifact. The implementation can change freely as long as the Contract remains stable.

1.2 — The Problem of Implicit Interfaces

Even when agents are not directly coupled at the class level, they are often coupled through implicit interface conventions. One agent produces a Python dictionary with certain keys; the next agent expects those exact keys. One agent returns a list of strings; the next agent assumes the list is always non-empty. These conventions exist only in the minds of the developers who wrote the code and, perhaps, in a README file that is already six months out of date.

Implicit interfaces are invisible until they break. When they break, the failure is often mysterious — a KeyError deep in an agent's processing logic, a None dereference in a downstream consumer, a silent data corruption that produces subtly wrong outputs for days before anyone notices. Debugging these failures requires understanding the entire pipeline, not just the component that raised the exception.

CCA addresses this through the formal definition of Provisions and Requirements within Capability Contracts. Every interface is explicit, typed, and documented. Every dependency is declared. The system can verify at startup — before any agent processes a single token — that every declared Requirement has a corresponding Provision. Implicit interfaces become impossible by construction.

1.3 — The Problem of Unmanaged Startup Order

Multi-agent systems have dependencies. The orchestrator depends on all specialist agents being ready before it can delegate work. The code-generation agent may depend on a tool-registry agent that provides access to external APIs. The monitoring agent may depend on a logging infrastructure agent that must be initialized first.

In unstructured systems, startup order is typically managed through one of two approaches: either everything is started simultaneously and race conditions are handled through retry logic and timeouts, or startup order is hardcoded in a configuration file that must be manually maintained as the dependency graph evolves. Both approaches are fragile. Race conditions produce intermittent failures that are notoriously difficult to reproduce and debug. Manually maintained startup sequences inevitably drift out of sync with the actual dependency graph, especially in rapidly evolving systems.

CCA addresses this through the Capability Lifecycle Manager, which queries the Capability Registry to build a complete dependency graph of all registered Capabilities, performs a topological sort to determine the correct initialization order, and then brings each Capability online in that order, injecting dependencies as they become available. The startup sequence is not configured manually — it is computed automatically from the formal dependency declarations in each Capability's Contract.

1.4 — The Problem of Uncontrolled Evolution

AI models evolve rapidly. Each new model release brings new capabilities, new output formats, new context window sizes, and new pricing structures. An agentic AI system designed around a specific set of models will need to evolve continuously to take advantage of improvements and to adapt to changes in the model landscape.

Without a formal mechanism for managing this evolution, every model upgrade is a potential breaking change. Downstream agents that depend on the output format of an upgraded agent may break silently. Consumers that were written against an older version of an agent's interface may continue to work for a while — until some edge case triggers the incompatibility — and then fail in production in ways that are difficult to diagnose.

CCA addresses this through Evolution Envelopes: formal structures that encapsulate the versioning information, deprecation policies, and migration paths for every Capability. When a Capability's Contract changes in a backward-incompatible way, the Evolution Envelope provides a structured, documented migration path for every consumer. Evolution becomes explicit, predictable, and manageable rather than chaotic and surprising.

1.5 — The Problem of Hardware Lock-in

A less-discussed but increasingly important problem is inference backend lock-in. Teams that build agentic systems exclusively around cloud APIs find themselves unable to run workloads locally for cost, latency, privacy, or air-gap compliance reasons. Conversely, teams that build around a specific local inference library find themselves unable to switch backends without rewriting large portions of their codebase.

CCA addresses this through the LLM Port abstraction in the Realization layer. The Essence of every agent communicates only with an abstract LLMPort interface. The concrete implementation of that interface — whether it calls OpenAI's API, runs llama.cpp on CUDA, uses Apple's MLX framework, or invokes Intel OpenVINO — is a Realization detail that can be swapped without touching any other layer.

Chapter 2 — The Example System: An Agentic Software Engineering Pipeline

Throughout this article, we build a concrete, running example: an Agentic Software Engineering Pipeline. This is a multi-agent system that accepts a natural-language feature request from a developer and autonomously produces reviewed, tested, and documented code that is ready for a pull request.

The pipeline consists of six agents, each with a distinct role:

+==========================+=======================+==========================================+
|      Capability          |  Default Cloud Model  |  Role                                    |
+==========================+=======================+==========================================+
| RequirementsAnalysis     | Claude 4.6 Opus       | Decomposes requests into eng. tasks      |
+--------------------------+-----------------------+------------------------------------------+
| CodeGeneration           | GPT-5.3               | Generates implementation code            |
+--------------------------+-----------------------+------------------------------------------+
| CodeReview               | Claude 4.6 Sonnet     | Reviews code, produces feedback          |
+--------------------------+-----------------------+------------------------------------------+
| TestGeneration           | Gemini 3.1 Pro        | Generates comprehensive test suites      |
+--------------------------+-----------------------+------------------------------------------+
| Documentation            | Gemini 3.1 Flash      | Produces docstrings, README, API docs    |
+--------------------------+-----------------------+------------------------------------------+
| Orchestrator             | GPT-5.3               | Plans, coordinates, synthesizes output   |
+--------------------------+-----------------------+------------------------------------------+

Each agent is modelled as a CCA Capability. The remainder of this article shows exactly how to do that, concept by concept, with complete, runnable code.

Chapter 3 — What Is a Capability?

The Capability is the foundational unit of structure in CCA. Before we can write a single line of code, we need to understand precisely what a Capability is, why it is defined the way it is, and how to identify the right Capabilities for a given system — because getting this wrong at the start will undermine everything that follows.

3.1 — The Definition

A Capability is defined as a cohesive set of functionality that consistently delivers tangible value, either to end-users or to other interacting Capabilities within the system. This definition is deceptively simple, and each word in it carries weight.

The word cohesive means that everything inside a Capability belongs together because it serves the same functional purpose. The code-generation logic, the prompt-engineering strategies, the output parsing, and the retry handling for a code-generation agent all belong together because they all serve the single purpose of turning a structured engineering task into working source code. They are cohesive. By contrast, the code-review logic does not belong in the same Capability, even though it is closely related, because it serves a different functional purpose: evaluating code quality rather than producing code.

The phrase delivers tangible value is a practical test for whether a candidate Capability is real or artificial. A Capability should be able to have its purpose stated in a single, clear sentence that a non-technical stakeholder could understand. "Code Generation produces working source code from a structured engineering task specification" passes this test. "LLM API Wrapper wraps the OpenAI API" does not — it describes a technical mechanism, not a business value. A Capability that cannot pass this sentence test is almost certainly either too narrow (a technical layer masquerading as a Capability) or too broad (multiple Capabilities that have been incorrectly merged).

3.2 — How Capabilities Differ from Bounded Contexts

Readers familiar with Domain-Driven Design will notice that Capabilities bear a strong resemblance to Bounded Contexts. This resemblance is intentional — CCA draws heavily from DDD — but the two concepts are not identical, and the differences matter.

A Bounded Context is primarily a conceptual tool for domain modeling. It establishes clear linguistic boundaries within a business domain: within the "Sales" bounded context, the word "customer" means one thing; within the "Support" bounded context, it may mean something subtly different. Bounded Contexts are concerned with the semantics of the domain model and the consistency of the ubiquitous language within each context.

A Capability in CCA goes further in three important ways. First, a Capability explicitly includes the technical mechanisms necessary to deliver its functionality — the Realization layer, which we will discuss in detail shortly. A Bounded Context says nothing about how its logic is executed; a Capability does. Second, a Capability explicitly specifies the quality attributes it must meet — its performance characteristics, reliability requirements, security constraints, and latency guarantees — through its Contract's Protocols. Third, a Capability carries a formal evolution strategy through its Evolution Envelope, specifying how it will change over time and how consumers should adapt. A Bounded Context has no equivalent mechanism.

In practical terms, this means that a Capability is not just a conceptual grouping of domain logic — it is a fully self-contained, deployable, testable, and governable unit of the system that knows what it does, how it does it, how well it must do it, and how it will change over time.

3.3 — The Critical Rule: Identify by Function, Not by Technology

The most common mistake when identifying Capabilities is to define them along technical lines rather than functional lines. This mistake is so common and so damaging that it deserves explicit, emphatic treatment.

Do not create a "DatabaseAccessCapability." Database access is a technical mechanism, not a business function. It belongs inside the Realization layer of whatever domain-specific Capability needs it. Do not create an "LLMAPICapability." Calling an LLM API is a technical mechanism. It belongs inside the Realization layer of the specific agent that needs it. Do not create a "LoggingCapability" that is shared by all other Capabilities. Logging is a cross-cutting concern that should be handled through the infrastructure layer, not a Capability in its own right.

Similarly, do not create Capabilities along organizational lines. The fact that one team owns the requirements-analysis logic and another team owns the code-generation logic does not automatically mean they should be separate Capabilities. Functional cohesion must be the primary driver. If two pieces of functionality are deeply intertwined and always change together, they belong in the same Capability regardless of team boundaries. If two pieces of functionality are independent and change at different rates, they belong in separate Capabilities regardless of who owns them.

In our pipeline, the six Capabilities we identified — RequirementsAnalysis, CodeGeneration, CodeReview, TestGeneration, Documentation, and Orchestrator — each pass the functional cohesion test. Each has a clear, single-sentence purpose. Each changes for different reasons and at different rates. Each can be independently tested, deployed, and evolved. This is the right decomposition.

3.4 — The Module Structure

With the Capability decomposition established, we can define the module structure for our system. This structure reflects the CCA principle that every Capability is a self-contained unit with its own internal layers:

agentic_pipeline/
├── models.py                        # Shared data classes (single source of truth)
├── contracts.py                     # Capability Contract Protocol interfaces
├── cca/
│   ├── evolution.py                 # EvolutionEnvelope, DeprecationNotice, MigrationPath
│   ├── lifecycle.py                 # CapabilityInstance ABC + LifecycleState enum
│   ├── registry.py                  # CapabilityRegistry, CapabilityDescriptor
│   └── manager.py                   # CapabilityLifecycleManager
├── capabilities/
│   ├── llm_port.py                  # Abstract LLMPort Protocol
│   ├── realizations/
│   │   ├── cloud/
│   │   │   ├── gpt53.py             # GPT-5.3 cloud Realization
│   │   │   ├── claude46.py          # Claude 4.6 cloud Realization
│   │   │   └── gemini31.py          # Gemini 3.1 cloud Realization
│   │   └── local/
│   │       ├── base.py              # LocalLLMRealization abstract base
│   │       ├── cuda_realization.py  # NVIDIA CUDA via llama-cpp-python
│   │       ├── mlx_realization.py   # Apple MLX
│   │       ├── rocm_realization.py  # AMD ROCm
│   │       ├── vulkan_realization.py# Vulkan cross-vendor GPU
│   │       ├── intel_realization.py # Intel OpenVINO
│   │       └── factory.py           # LLMRealizationFactory (hardware auto-detection)
│   ├── code_generation/
│   │   ├── essence.py
│   │   ├── adaptation.py
│   │   └── capability.py
│   ├── requirements_analysis/
│   │   └── capability.py
│   ├── code_review/
│   │   └── capability.py
│   ├── test_generation/
│   │   └── capability.py
│   ├── documentation/
│   │   └── capability.py
│   └── orchestrator/
│       ├── essence.py
│       ├── state_tracker.py
│       └── capability.py
└── assembly.py                      # System composition root

Chapter 4 — Shared Data Models

Before we can define Contracts or implement Capabilities, we need a shared vocabulary — the data structures that flow between Capabilities. In CCA, all shared domain models live in a single module. This is not merely a matter of convenience; it is an architectural principle. If each Capability defined its own version of what a "generated code artifact" looks like, Capabilities could not communicate without translation layers. By defining all shared types in one place, we ensure that when the CodeGenerationCapability produces a GeneratedCode object and the CodeReviewCapability consumes it, they are talking about exactly the same thing.

# models.py
# =============================================================================
# Shared domain data models for the Agentic Software Engineering Pipeline.
# All Capabilities import from this module. No Capability redefines these types.
# Requires Python >= 3.10.
# =============================================================================

from __future__ import annotations
from dataclasses import dataclass, field


@dataclass(frozen=True)
class FeatureRequest:
    """
    The raw input to the pipeline — a natural-language feature request
    submitted by a developer. This is the entry point for the entire system.
    frozen=True makes instances hashable and prevents accidental mutation
    as the object flows through multiple Capabilities.
    """
    request_id: str
    title: str
    description: str
    requester: str
    priority: str  # "low" | "medium" | "high" | "critical"


@dataclass(frozen=True)
class EngineeringTask:
    """
    A structured engineering task produced by RequirementsAnalysisCapability.
    Represents a single, implementable unit of work derived from a feature
    request. Using tuples instead of lists preserves the frozen=True guarantee
    all the way through the data structure.
    """
    task_id: str
    title: str
    description: str
    acceptance_criteria: tuple[str, ...]
    estimated_complexity: str              # "low" | "medium" | "high"
    technical_constraints: tuple[str, ...]
    target_language: str


@dataclass(frozen=True)
class GeneratedCode:
    """
    The output of the CodeGenerationCapability for a single engineering task.
    The generation_model field records which model or realization produced
    this artifact, providing a full audit trail through the pipeline.
    """
    task_id: str
    language: str
    source_code: str
    explanation: str
    dependencies: tuple[str, ...]
    generation_model: str


@dataclass(frozen=True)
class ReviewFeedback:
    """
    A single structured feedback item produced by CodeReviewCapability.
    Each feedback item is categorized by severity and type so that
    downstream consumers (e.g., the Orchestrator) can make informed
    decisions about whether to accept, revise, or reject the code.
    """
    task_id: str
    severity: str         # "info" | "warning" | "error" | "critical"
    category: str         # "correctness" | "style" | "security" | "performance"
    description: str
    suggested_fix: str
    line_reference: str | None


@dataclass(frozen=True)
class ReviewResult:
    """
    The complete review outcome for a single engineering task.
    The approved field gives the Orchestrator a clear binary signal,
    while feedback_items provides the full detail for logging and
    potential revision loops.
    """
    task_id: str
    approved: bool
    feedback_items: tuple[ReviewFeedback, ...]
    overall_assessment: str


@dataclass(frozen=True)
class TestSuite:
    """
    A generated test suite for a single engineering task.
    """
    task_id: str
    test_framework: str
    test_source_code: str
    test_count: int
    coverage_targets: tuple[str, ...]


@dataclass(frozen=True)
class DocumentationArtifact:
    """
    Generated documentation for a single engineering task.
    Separating docstrings, README content, and API documentation
    allows consumers to use whichever format they need.
    """
    task_id: str
    docstrings: str
    readme_section: str
    api_documentation: str


@dataclass(frozen=True)
class PipelineResult:
    """
    The final synthesized output of the entire pipeline for one feature request.
    This is what the OrchestratorCapability returns to the caller after
    coordinating all specialist Capabilities.
    """
    request_id: str
    engineering_tasks: tuple[EngineeringTask, ...]
    generated_code: tuple[GeneratedCode, ...]
    review_results: tuple[ReviewResult, ...]
    test_suites: tuple[TestSuite, ...]
    documentation: tuple[DocumentationArtifact, ...]
    pipeline_status: str   # "success" | "partial_success" | "failure"
    summary: str

Chapter 5 — Capability Contracts

Capability Contracts are the cornerstone of CCA. Understanding them deeply is essential, because everything else in the architecture — the Registry, the Lifecycle Manager, the dependency injection system — is built on top of them. A Contract is a formal, explicit, interface-oriented agreement that precisely defines the public API of a Capability. It is the only thing that other Capabilities are allowed to know about a given Capability. They cannot see its Essence, its Realization, or its Adaptation. They can only see its Contract.

5.1 — Why Contracts Are Necessary

Consider what happens in a system without formal Contracts. The OrchestratorCapability needs to call the CodeGenerationCapability. Without a Contract, it must import the CodeGenerationCapability class directly, call whatever method happens to exist on it, and hope that the method signature, parameter types, and return type match what the Orchestrator expects. If the CodeGenerationCapability is later refactored — perhaps the method is renamed, or its return type changes — the Orchestrator breaks. The only way to discover the breakage is to run the system and observe the failure.

With a formal Contract, the situation is entirely different. The OrchestratorCapability declares that it requires a CodeGenerationContract — a typed interface that specifies exactly what methods are available, what parameters they accept, and what they return. The CodeGenerationCapability declares that it provides a CodeGenerationContract. The Registry verifies at startup that every declared Requirement has a matching Provision. If the CodeGenerationCapability is later refactored in a way that breaks the Contract, the type checker catches it before the code is even run. The Contract is the firewall between Capabilities.

5.2 — The Three Elements of a Contract

Every Capability Contract is composed of exactly three elements: Provisions, Requirements, and Protocols. Each element serves a distinct and non-overlapping purpose.

Provisions define the interfaces that a Capability offers to others. They are the Capability's promises to the world: "I will provide these services, and you can depend on them." A Provision is not a concrete class — it is an interface, a Protocol in Python terms, that specifies method signatures without any implementation. The concrete implementation lives inside the Capability's Adaptation layer, invisible to the outside world. This separation is what makes independent evolution possible: the Capability can completely rewrite its internal implementation as long as it continues to honor its Provisions.

Requirements define the interfaces that a Capability needs from others in order to function. They are the Capability's honest declaration of its dependencies: "I cannot do my job without these services from other Capabilities." By making Requirements explicit and formal, CCA enables the Registry to build a complete, accurate dependency graph of the entire system. This graph is what the Lifecycle Manager uses to determine startup order and to perform dependency injection. Without explicit Requirements, the dependency graph is invisible, and startup order must be guessed.

Protocols define the interaction patterns and quality attributes that govern how Capabilities communicate. This is where Contracts go beyond simple interface definitions. A Protocol specifies not just what methods exist, but how they should be called: synchronously or asynchronously, with what data format, with what latency guarantee, with what reliability expectation, and with what security requirements. A single Capability can declare multiple Protocols simultaneously — for example, a direct in-process call for high-performance consumers, a REST API for external clients, and a message queue interface for asynchronous event-driven integrations. This flexibility allows the same Capability to serve diverse consumers without changing its internal implementation.

5.3 — The Contract as a Stability Boundary

One of the most important architectural insights in CCA is that Contracts should be designed to be stable. The internal implementation of a Capability — its Essence, Realization, and Adaptation — can and should change frequently as requirements evolve, models improve, and technology advances. But the Contract should change rarely, and when it does change in a backward-incompatible way, that change must be managed explicitly through the Evolution Envelope.

This stability principle has a practical implication for how Contracts are designed: they should express what the Capability does in terms of business outcomes, not how it does it in terms of technical mechanisms. A CodeGenerationContract should expose a generate_code(task: EngineeringTask) -> GeneratedCode method, not a call_gpt_api(prompt: str) -> str method. The former is stable because it describes a business function; the latter is fragile because it exposes an implementation detail.

5.4 — Contract Implementation

# contracts.py
# =============================================================================
# Capability Contract Protocol interfaces for the Agentic Pipeline.
#
# These interfaces are the ONLY thing that Capabilities know about each other.
# No Capability imports another Capability's concrete implementation class.
# The only imports allowed between Capabilities are these Protocol interfaces
# and the shared data models from models.py.
# =============================================================================

from __future__ import annotations
from typing import Protocol, runtime_checkable, Callable
from dataclasses import dataclass
from enum import Enum

from models import (
    FeatureRequest, EngineeringTask, GeneratedCode,
    ReviewResult, TestSuite, DocumentationArtifact, PipelineResult,
)


# ---------------------------------------------------------------------------
# Protocol Specification Types
#
# These types describe the metadata of a Contract — not the interface methods
# themselves, but the structural information about how the Contract is used,
# what communication mechanisms it supports, and what quality guarantees it
# makes. This metadata is consumed by the Registry and the Lifecycle Manager.
# ---------------------------------------------------------------------------

class CommunicationMechanism(Enum):
    """
    The supported communication patterns between Capabilities.
    A single Capability may declare support for multiple mechanisms,
    allowing it to serve different consumers in different contexts.
    """
    DIRECT_CALL   = "direct_call"    # In-process, synchronous, zero-overhead
    REST_HTTP     = "rest_http"      # HTTP/2, suitable for cross-process or cross-host
    MESSAGE_QUEUE = "message_queue"  # Asynchronous, decoupled, event-driven
    GRPC          = "grpc"           # High-performance RPC, binary protocol


@dataclass(frozen=True)
class CapabilityProtocol:
    """
    Specifies the interaction pattern and quality attributes for one
    communication mode of a Capability.

    The max_latency_ms field is particularly important for agentic systems,
    where end-to-end pipeline latency is often a critical user-facing metric.
    Declaring latency expectations in the Contract makes them visible to the
    system architect and allows the Lifecycle Manager to flag mismatches
    between what a consumer expects and what a provider guarantees.
    """
    communication_mechanism: CommunicationMechanism
    data_format: str              # e.g., "python_objects", "json", "protobuf"
    max_latency_ms: int | None    # None means no strict latency requirement
    reliability: str              # e.g., "at-least-once", "exactly-once", "best-effort"
    authentication: str | None    # e.g., None, "api_key", "oauth2", "mtls"
    description: str


@dataclass(frozen=True)
class ProvisionDefinition:
    """
    Describes a single interface that a Capability provides to others.
    The interface_type is a Python Protocol class — a typed, inspectable
    description of the methods available to consumers.
    """
    name: str
    interface_type: type
    description: str


@dataclass(frozen=True)
class RequirementDefinition:
    """
    Describes a single interface that a Capability needs from others.
    The optional flag is important: if optional is True, the Lifecycle
    Manager will not fail startup if no provider is found for this
    Requirement. This supports graceful degradation in partial deployments.
    """
    name: str
    interface_type: type
    optional: bool
    description: str


@dataclass(frozen=True)
class CapabilityContract:
    """
    The complete formal Contract for a Capability.

    This is the only public face of a Capability that other Capabilities
    are allowed to see. It is registered with the CapabilityRegistry at
    startup and used by the CapabilityLifecycleManager to build the
    dependency graph, perform topological sorting, and inject dependencies.

    Design note: frozen=True prevents field reassignment but does not
    prevent mutation of the objects stored in those fields. Treat all
    fields as logically immutable — do not mutate the tuples after creation.
    """
    capability_name: str
    provisions: tuple[ProvisionDefinition, ...]
    requirements: tuple[RequirementDefinition, ...]
    protocols: tuple[CapabilityProtocol, ...]

    def provides(self, interface_type: type) -> bool:
        """Returns True if this Contract provides the given interface type."""
        return any(p.interface_type is interface_type for p in self.provisions)

    def requires_interface(self, interface_type: type) -> bool:
        """Returns True if this Contract requires the given interface type."""
        return any(r.interface_type is interface_type for r in self.requirements)

    def get_required_interfaces(self) -> list[type]:
        """Returns all interface types this Capability requires."""
        return [r.interface_type for r in self.requirements if not r.optional]

    def get_optional_interfaces(self) -> list[type]:
        """Returns all optional interface types this Capability may use."""
        return [r.interface_type for r in self.requirements if r.optional]


# ---------------------------------------------------------------------------
# Capability Contract Protocol Interfaces
#
# Each Protocol below is the formal interface for one Capability's Provision.
# These are what consumers depend on. They are stable by design.
#
# Important: Protocol methods must NOT use @abstractmethod. Protocol classes
# use structural subtyping (duck typing), not nominal inheritance. The method
# bodies are ellipsis (...) by convention, indicating "this method must exist
# on any conforming implementation."
#
# The @runtime_checkable decorator enables isinstance() checks at injection
# time, but note that these checks only verify method name presence, not
# signatures. Use a static type checker (mypy, pyright) for full validation.
# ---------------------------------------------------------------------------

@runtime_checkable
class RequirementsAnalysisContract(Protocol):
    """
    The formal interface of the RequirementsAnalysisCapability.

    Any Capability that needs to decompose feature requests into engineering
    tasks declares a Requirement for this interface. The OrchestratorCapability
    is the primary consumer, but any Capability that needs to understand the
    structure of a feature request can declare this Requirement.

    Stability note: This interface should change only when the fundamental
    concept of what "requirements analysis" means changes — which is rare.
    Adding new optional parameters to existing methods is backward-compatible.
    Removing or renaming methods requires a major version bump in the
    Evolution Envelope.
    """

    def analyze_feature_request(
        self, request: FeatureRequest
    ) -> list[EngineeringTask]:
        """
        Decomposes a natural-language feature request into a structured
        list of engineering tasks with acceptance criteria and complexity
        estimates. The returned tasks are ordered by dependency — tasks
        that must be completed first appear earlier in the list.
        """
        ...

    def refine_task(
        self, task: EngineeringTask, refinement_notes: str
    ) -> EngineeringTask:
        """
        Refines an existing engineering task based on feedback from
        downstream agents. Called when the CodeReviewCapability determines
        that the original task specification was ambiguous or incomplete.
        """
        ...


@runtime_checkable
class CodeGenerationContract(Protocol):
    """
    The formal interface of the CodeGenerationCapability.

    This is the most performance-sensitive interface in the pipeline, since
    code generation involves the largest LLM context windows and the longest
    inference times. The Protocol declaration makes this performance
    characteristic visible at the Contract level through the associated
    CapabilityProtocol metadata.
    """

    def generate_code(self, task: EngineeringTask) -> GeneratedCode:
        """
        Generates implementation code for a single engineering task.
        The returned GeneratedCode includes not just the source code but
        also an explanation of the implementation choices, the list of
        dependencies introduced, and a record of which model produced it.
        """
        ...

    def register_completion_listener(
        self, listener: Callable[[GeneratedCode], None]
    ) -> None:
        """
        Registers a callback invoked whenever code generation completes.
        This enables monitoring and observability Capabilities to track
        pipeline progress without being in the critical execution path.
        The CodeGenerationCapability does not know who is listening —
        it simply invokes all registered callbacks after each generation.
        """
        ...


@runtime_checkable
class CodeReviewContract(Protocol):
    """
    The formal interface of the CodeReviewCapability.

    Code review is the quality gate of the pipeline. Its output directly
    determines whether the Orchestrator accepts the generated code, requests
    revisions, or escalates to a human reviewer.
    """

    def review_code(
        self, code: GeneratedCode, task: EngineeringTask
    ) -> ReviewResult:
        """
        Reviews generated code against the engineering task specification.
        The task parameter is essential: the reviewer needs to know what
        the code was supposed to do in order to evaluate whether it does it
        correctly. Reviewing code without its specification is like grading
        an exam without the questions.
        """
        ...


@runtime_checkable
class TestGenerationContract(Protocol):
    """
    The formal interface of the TestGenerationCapability.
    """

    def generate_tests(
        self, code: GeneratedCode, task: EngineeringTask
    ) -> TestSuite:
        """
        Generates a comprehensive test suite for the given code,
        using the engineering task's acceptance criteria as the
        behavioral specification for the tests.
        """
        ...


@runtime_checkable
class DocumentationContract(Protocol):
    """
    The formal interface of the DocumentationCapability.
    """

    def generate_documentation(
        self, code: GeneratedCode, task: EngineeringTask
    ) -> DocumentationArtifact:
        """
        Generates technical documentation for the given code,
        including inline docstrings, a README section describing
        the feature, and API documentation for any public interfaces.
        """
        ...


@runtime_checkable
class OrchestratorContract(Protocol):
    """
    The formal interface of the OrchestratorCapability.
    This is the primary entry point for the entire pipeline.
    External callers — CLI tools, web APIs, CI/CD systems — interact
    with the pipeline exclusively through this interface.
    """

    def execute_pipeline(self, request: FeatureRequest) -> PipelineResult:
        """
        Executes the full agentic pipeline for the given feature request,
        coordinating all specialist Capabilities and returning the
        synthesized result ready for pull request creation.
        """
        ...

Chapter 6 — The Capability Nucleus: Essence, Realization, and Adaptation

Every Capability in CCA is internally structured as a Capability Nucleus — three distinct, concentric layers that separate what a Capability does from how it does it and how it exposes itself to the world. This three-layer structure is not bureaucratic overhead; it is the mechanism that makes independent evolution, testability, and deployment flexibility possible. Understanding each layer deeply, and understanding why the boundaries between them must be respected, is essential for applying CCA correctly.

6.1 — The Essence: Pure Logic, No Dependencies

The Essence is the innermost layer of the Capability Nucleus. It contains the pure domain logic or algorithmic core that defines what the Capability does. It is also the primary custodian of the Capability's core domain state. In our pipeline, the Essence of the CodeGenerationCapability contains the prompt-engineering strategies, the task decomposition logic, the output parsing and validation rules, and the retry and fallback decision logic. It does not contain any code that calls an API, reads from a database, writes to a file, or interacts with any external system.

This complete independence from external systems is the Essence's defining characteristic and its greatest strength. Because the Essence has no external dependencies — with the sole exception of other Capability Contracts, which are themselves pure interfaces — it can be tested in complete isolation. No mocking of HTTP clients. No stubbing of database connections. No spinning up of Docker containers. Just pure Python unit tests that exercise the business logic directly. This makes the test suite fast, deterministic, and easy to maintain.

The Essence is also the most stable part of a Capability. The business logic for decomposing a feature request into engineering tasks does not change when you switch from GPT-5.3 to Gemini 3.1 Pro. The prompt structure may change, but the logic for deciding how to structure the prompt, how to validate the output, and how to handle edge cases is stable business logic that belongs in the Essence. By isolating this logic in the Essence, you protect it from the churn of the infrastructure layers.

6.2 — The Realization: Technical Mechanisms, Infrastructure Integration

The Realization is the middle layer of the Capability Nucleus. It is dedicated to the technical mechanisms required to make the Essence functional and operational in the real world, within a specific technical environment. The Realization implements the how of the Capability's operation.

In our pipeline, the Realization of the CodeGenerationCapability is where the LLM API calls happen. The Realization knows how to construct an HTTP request to the OpenAI API, how to handle rate limiting and retries at the network level, how to parse the raw API response into a structured format that the Essence can work with, and how to manage API keys and authentication. None of this knowledge exists in the Essence.

The critical insight about the Realization is that it is replaceable. Because the Essence communicates with the Realization only through an abstract LLMPort interface — never through a concrete API client class — the entire Realization can be swapped out without touching the Essence. Switching from the GPT-5.3 Realization to a local Llama 3 Realization running on CUDA means writing a new class that implements LLMPort and registering it in the assembly. The Essence, the Adaptation, and the Contract all remain unchanged.

This replaceability is not just a theoretical benefit — it is the practical solution to the hardware lock-in problem described in Part 1. We will show concrete Realization implementations for cloud APIs, NVIDIA CUDA, Apple MLX, AMD ROCm, Vulkan, and Intel OpenVINO in Part 10.

6.3 — The Adaptation: External Interfaces, Communication Protocols

The Adaptation is the outermost layer of the Capability Nucleus. It provides the explicit interfaces through which the Capability interacts with other Capabilities and with external systems. The Adaptation is responsible for translating between the Capability's internal representation of data and the external representation expected by its consumers.

In our pipeline, the Adaptation of the CodeGenerationCapability implements the CodeGenerationContract Protocol interface. When another Capability calls generate_code(task) on the CodeGenerationCapability, it is calling a method on the Adaptation. The Adaptation translates this call into the Essence's internal representation, delegates to the Essence and Realization to do the actual work, and then translates the result back into the GeneratedCode data model that the caller expects.

The Adaptation is also where multiple communication protocols are supported simultaneously. The same Capability can expose its functionality through a direct in-process call (for the Orchestrator running in the same process), a REST API (for external monitoring tools), and a message queue interface (for asynchronous batch processing). All three Adaptations delegate to the same Essence and Realization. The caller's choice of protocol does not affect the Capability's internal behavior.

6.4 — The LLM Port: Abstracting Inference Backends

Before we can implement any of the three layers for our AI agents, we need to define the abstract interface that the Essence uses to communicate with the LLM — the LLMPort. This interface is the boundary between the stable business logic in the Essence and the replaceable infrastructure in the Realization.

# capabilities/llm_port.py
# =============================================================================
# The abstract LLMPort interface.
#
# This is the single most important abstraction in the entire system for
# enabling backend flexibility. Every agent's Essence communicates with its
# LLM exclusively through this interface. The concrete implementation —
# whether it calls OpenAI's API, runs llama.cpp on CUDA, uses Apple's MLX
# framework, or invokes Intel OpenVINO — lives in the Realization layer and
# is invisible to the Essence.
#
# By keeping this interface minimal and focused on the core inference
# operation, we ensure that any LLM backend can implement it without
# requiring changes to the interface itself.
# =============================================================================

from __future__ import annotations
from typing import Protocol, runtime_checkable
from dataclasses import dataclass


@dataclass(frozen=True)
class LLMRequest:
    """
    A structured request to an LLM backend.
    The system_prompt establishes the agent's role and behavioral guidelines.
    The user_prompt contains the specific task or question for this invocation.
    The temperature controls the randomness of the output — lower values
    produce more deterministic, focused outputs (better for code generation),
    while higher values produce more creative, varied outputs (better for
    brainstorming and documentation).
    max_tokens limits the response length to prevent runaway generation
    and control costs.
    """
    system_prompt: str
    user_prompt: str
    temperature: float = 0.1
    max_tokens: int = 4096


@dataclass(frozen=True)
class LLMResponse:
    """
    The structured response from an LLM backend.
    The content field contains the raw text of the model's response.
    The model_identifier records which specific model and backend produced
    this response, providing a full audit trail for debugging and compliance.
    The token counts enable cost tracking and capacity planning.
    """
    content: str
    model_identifier: str      # e.g., "gpt-5.3", "llama-3-70b-cuda", "mlx-mistral-7b"
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int


@runtime_checkable
class LLMPort(Protocol):
    """
    The abstract interface between a Capability's Essence and its LLM backend.

    Any object that implements the complete() method with this signature
    is a valid LLMPort. This includes cloud API clients (OpenAI, Anthropic,
    Google), local inference engines (llama.cpp via CUDA, MLX, ROCm, Vulkan,
    Intel OpenVINO), and mock implementations for testing.

    The simplicity of this interface is intentional. A single method is all
    that is needed to support the full range of LLM interactions in our
    pipeline. If you find yourself wanting to add methods to this interface,
    ask whether those methods belong here or in the Realization layer.
    """

    def complete(self, request: LLMRequest) -> LLMResponse:
        """
        Sends a request to the LLM backend and returns the response.
        Implementations are responsible for handling retries, rate limiting,
        authentication, and any backend-specific error conditions.
        The caller (the Essence) should not need to know about any of these
        infrastructure concerns.
        """
        ...

    def is_available(self) -> bool:
        """
        Returns True if the backend is currently available and ready to
        accept requests. Used by the LLMRealizationFactory during hardware
        detection and by the Lifecycle Manager during health checks.
        """
        ...

6.5 — Implementing the Nucleus: CodeGenerationCapability

Now we can implement the full Capability Nucleus for the CodeGenerationCapability. We show all three layers — Essence, Realization, and Adaptation — and then the glue class that wires them together into a CapabilityInstance.

# capabilities/code_generation/essence.py
# =============================================================================
# CodeGenerationEssence: Pure domain logic for code generation.
#
# This class contains everything that is true about code generation regardless
# of which LLM backend is used, which communication protocol is active, or
# which deployment environment the system is running in.
#
# The only external dependency is LLMPort — an abstract interface, not a
# concrete class. This means the Essence can be tested with a mock LLMPort
# that returns predetermined responses, without any network calls, API keys,
# or infrastructure setup.
# =============================================================================

from __future__ import annotations
from dataclasses import dataclass, field
from typing import Callable

from models import EngineeringTask, GeneratedCode
from capabilities.llm_port import LLMPort, LLMRequest


class CodeGenerationEssence:
    """
    The pure domain logic for the CodeGenerationCapability.

    This class is responsible for:
    1. Constructing the optimal prompt for a given engineering task,
       including the system prompt that establishes the agent's role,
       the task description, acceptance criteria, and technical constraints.
    2. Parsing and validating the LLM's response into a structured
       GeneratedCode object.
    3. Deciding when to retry (e.g., when the response is malformed),
       when to fall back to a simpler prompt, and when to give up.
    4. Notifying registered listeners when generation completes,
       enabling observability without coupling to monitoring infrastructure.

    The Essence does NOT know:
    - Which LLM model is being used (GPT-5.3, Gemini, Llama, etc.)
    - Whether the LLM is running in the cloud or locally
    - How the result will be delivered to the caller (direct call, REST, MQ)
    - How the Capability is deployed (monolith, microservice, serverless)
    """

    # The system prompt establishes the agent's role and behavioral guidelines.
    # It is defined at the class level because it is a stable part of the
    # Capability's domain logic, not a configuration parameter.
    _SYSTEM_PROMPT = """You are an expert software engineer with deep expertise
in writing clean, well-tested, production-ready code. Your task is to implement
code based on a structured engineering task specification. You must:
1. Implement the full solution, not just a skeleton or placeholder.
2. Follow the acceptance criteria exactly as specified.
3. Respect all technical constraints listed in the task.
4. Include inline comments explaining non-obvious implementation choices.
5. List all external dependencies your implementation requires.
Return your response as a JSON object with fields:
  source_code (string), explanation (string), dependencies (array of strings).
"""

    def __init__(self, llm_port: LLMPort) -> None:
        """
        The LLMPort is injected at construction time by the Adaptation layer,
        which receives it from the Realization. The Essence never constructs
        the LLMPort itself — doing so would create a direct dependency on a
        concrete implementation and defeat the purpose of the abstraction.
        """
        self._llm = llm_port
        # Completion listeners are registered by external observers (e.g., monitoring
        # Capabilities) and invoked after each successful generation. The Essence
        # does not know who the listeners are — it simply calls them.
        self._completion_listeners: list[Callable[[GeneratedCode], None]] = []

    def generate_code(self, task: EngineeringTask) -> GeneratedCode:
        """
        The core domain operation: generate code for an engineering task.

        This method orchestrates the full generation process:
        1. Build a structured prompt from the task specification.
        2. Send the prompt to the LLM via the abstract LLMPort.
        3. Parse and validate the response.
        4. Retry with a simplified prompt if parsing fails.
        5. Notify all registered listeners.
        6. Return the structured GeneratedCode artifact.
        """
        prompt = self._build_prompt(task)
        request = LLMRequest(
            system_prompt=self._SYSTEM_PROMPT,
            user_prompt=prompt,
            temperature=0.1,   # Low temperature for deterministic, focused code generation
            max_tokens=8192,
        )

        response = self._llm.complete(request)
        generated_code = self._parse_response(response, task)

        # Notify all registered listeners. This is a synchronous notification —
        # listeners should be fast (e.g., incrementing a counter, writing a log line).
        # If a listener needs to do heavy work, it should schedule it asynchronously.
        for listener in self._completion_listeners:
            try:
                listener(generated_code)
            except Exception:
                # A failing listener must never break the generation pipeline.
                # Errors in observers are logged but not propagated to the caller.
                pass

        return generated_code

    def register_completion_listener(
        self, listener: Callable[[GeneratedCode], None]
    ) -> None:
        """Registers a callback to be invoked after each successful generation."""
        self._completion_listeners.append(listener)

    def _build_prompt(self, task: EngineeringTask) -> str:
        """
        Constructs a structured prompt from the engineering task specification.
        The prompt structure is a domain concern — it encodes our knowledge
        of how to communicate effectively with code-generation models.
        This knowledge belongs in the Essence, not in the Realization.
        """
        criteria_text = "\n".join(
            f"  - {criterion}" for criterion in task.acceptance_criteria
        )
        constraints_text = "\n".join(
            f"  - {constraint}" for constraint in task.technical_constraints
        )
        return f"""
Task Title: {task.title}

Task Description:
{task.description}

Target Language: {task.target_language}
Estimated Complexity: {task.estimated_complexity}

Acceptance Criteria (your implementation MUST satisfy all of these):
{criteria_text}

Technical Constraints (your implementation MUST respect all of these):
{constraints_text}

Implement the complete solution now.
"""

    def _parse_response(
        self, response: "LLMResponse", task: EngineeringTask
    ) -> GeneratedCode:
        """
        Parses the LLM's raw text response into a structured GeneratedCode object.
        This parsing logic is a domain concern: it encodes our knowledge of
        the expected response format and our validation rules.
        """
        import json
        import re

        # Attempt to extract a JSON block from the response.
        # Models sometimes wrap JSON in markdown code fences.
        content = response.content.strip()
        json_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", content, re.DOTALL)
        if json_match:
            content = json_match.group(1)

        try:
            parsed = json.loads(content)
            return GeneratedCode(
                task_id=task.task_id,
                language=task.target_language,
                source_code=parsed.get("source_code", ""),
                explanation=parsed.get("explanation", ""),
                dependencies=tuple(parsed.get("dependencies", [])),
                generation_model=response.model_identifier,
            )
        except (json.JSONDecodeError, KeyError):
            # If parsing fails, return a best-effort result with the raw content.
            # The CodeReviewCapability will flag this as a quality issue.
            return GeneratedCode(
                task_id=task.task_id,
                language=task.target_language,
                source_code=content,
                explanation="Raw response — JSON parsing failed.",
                dependencies=(),
                generation_model=response.model_identifier,
            )

# capabilities/code_generation/adaptation.py
# =============================================================================
# CodeGenerationAdaptation: Implements the CodeGenerationContract interface.
#
# The Adaptation is the public face of the Capability. It implements the
# Protocol interface defined in contracts.py, translating between the
# external interface (what callers expect) and the internal domain logic
# (what the Essence provides).
#
# The Adaptation is thin by design. It should contain no business logic.
# Its only job is to delegate to the Essence and handle any translation
# between external and internal representations.
# =============================================================================

from __future__ import annotations
from typing import Callable

from models import EngineeringTask, GeneratedCode
from contracts import CodeGenerationContract
from capabilities.code_generation.essence import CodeGenerationEssence


class CodeGenerationAdaptation:
    """
    Implements the CodeGenerationContract Protocol interface.

    This class is what other Capabilities receive when they declare a
    Requirement for CodeGenerationContract. They call methods on this
    object, which delegates to the Essence for all actual work.

    The Adaptation is also where you would add cross-cutting concerns
    that are not part of the core domain logic but must be applied at
    every entry point: input validation, request logging, metrics
    collection, circuit breaking, and so on.
    """

    def __init__(self, essence: CodeGenerationEssence) -> None:
        self._essence = essence

    def generate_code(self, task: EngineeringTask) -> GeneratedCode:
        """
        Delegates to the Essence. Validates the input before delegating
        to catch malformed requests at the boundary rather than deep
        inside the domain logic where they are harder to diagnose.
        """
        if not task.task_id:
            raise ValueError("EngineeringTask must have a non-empty task_id.")
        if not task.description:
            raise ValueError("EngineeringTask must have a non-empty description.")
        return self._essence.generate_code(task)

    def register_completion_listener(
        self, listener: Callable[[GeneratedCode], None]
    ) -> None:
        """Delegates listener registration to the Essence."""
        self._essence.register_completion_listener(listener)

# capabilities/code_generation/capability.py
# =============================================================================
# CodeGenerationCapabilityInstance: The glue class.
#
# This class wires together the Essence, Realization, and Adaptation into a
# complete Capability that implements the CapabilityInstance interface.
# It is the object that the CapabilityLifecycleManager creates, initializes,
# starts, stops, and cleans up.
#
# The CapabilityInstance is the only class in the Capability that knows
# about all three layers. It is responsible for constructing them in the
# right order and wiring them together during initialization.
# =============================================================================

from __future__ import annotations
from cca.lifecycle import CapabilityInstance, LifecycleState
from capabilities.llm_port import LLMPort
from capabilities.code_generation.essence import CodeGenerationEssence
from capabilities.code_generation.adaptation import CodeGenerationAdaptation
from contracts import CodeGenerationContract


class CodeGenerationCapabilityInstance(CapabilityInstance):
    """
    The complete CodeGenerationCapability, implementing the CapabilityInstance
    lifecycle interface.

    The llm_port parameter is provided by the assembly layer (assembly.py),
    which is the only place in the system where concrete Realization classes
    are instantiated. This means the CapabilityInstance itself does not know
    which LLM backend is being used — it only knows that it has an LLMPort.
    """

    def __init__(self, llm_port: LLMPort) -> None:
        self._llm_port = llm_port
        self._essence: CodeGenerationEssence | None = None
        self._adaptation: CodeGenerationAdaptation | None = None
        self._state = LifecycleState.CREATED

    def initialize(self) -> None:
        """
        Constructs the Essence and Adaptation, wiring them together.
        This is called by the LifecycleManager after instantiation but
        before any dependencies are injected. At this point, the Capability
        sets up its internal structure but does not yet start any active
        operations (threads, connections, listeners).
        """
        self._essence = CodeGenerationEssence(llm_port=self._llm_port)
        self._adaptation = CodeGenerationAdaptation(essence=self._essence)
        self._state = LifecycleState.INITIALIZED

    def inject_dependency(self, interface_type: type, implementation: object) -> None:
        """
        The CodeGenerationCapability has no Requirements — it does not depend
        on any other Capability's Contract. It only needs an LLMPort, which is
        provided at construction time by the assembly layer.

        If this Capability were extended to depend on, say, a ToolRegistryContract
        (to give the code generator access to external tools), that dependency
        would be injected here by the LifecycleManager.
        """
        pass  # No external Capability dependencies for this Capability.

    def start(self) -> None:
        """
        Transitions the Capability to the Started state.
        For the CodeGenerationCapability, starting is simple — there are no
        background threads or persistent connections to establish. The LLM
        connection is made on-demand for each request.
        """
        self._state = LifecycleState.STARTED

    def stop(self) -> None:
        """
        Gracefully stops the Capability. Any in-flight requests are allowed
        to complete before the Capability stops accepting new ones.
        """
        self._state = LifecycleState.STOPPED

    def cleanup(self) -> None:
        """
        Releases all resources. After cleanup, this instance cannot be restarted.
        """
        self._essence = None
        self._adaptation = None
        self._state = LifecycleState.CLEANED_UP

    def get_contract_implementation(self, interface_type: type) -> object | None:
        """
        Returns the Adaptation object if the requested interface type is
        CodeGenerationContract. Returns None for any other type.

        The LifecycleManager calls this method when injecting this Capability's
        Provisions into other Capabilities that have declared a Requirement
        for CodeGenerationContract.
        """
        if interface_type is CodeGenerationContract:
            return self._adaptation
        return None

Chapter 7 — Evolution Envelopes

Change is inevitable in any software system, and in agentic AI systems it is especially rapid. Models are updated, deprecated, and replaced. Output formats evolve. New capabilities are added. Old interfaces become obsolete. Without a formal mechanism for managing this change, evolution becomes a source of instability rather than progress.

7.1 — Why Evolution Envelopes Are Necessary

Consider what happens when the CodeGenerationCapability's Contract needs to change. Perhaps the GeneratedCodemodel needs a new field — security_analysis — that the CodeReviewCapability will use to understand what security considerations the code generator already took into account. This is a backward-compatible change: existing consumers of CodeGenerationContract that do not use the new field will continue to work without modification. This is a MINOR version bump in Semantic Versioning terms.

Now consider a more disruptive change: the generate_code method needs to return a list of GeneratedCode objects instead of a single one, because some tasks are better served by generating multiple implementation alternatives. This is a backward-incompatible change: every consumer of CodeGenerationContract that calls generate_code and expects a single object will break. This is a MAJOR version bump.

Without an Evolution Envelope, the only way to communicate this change to consumers is through documentation, Slack messages, or code comments — all of which are invisible to the system itself. With an Evolution Envelope, the change is formally recorded in the Capability's metadata, accessible through the Registry, with a documented migration path and an end-of-life date for the old interface. The system can warn consumers at startup if they are depending on a deprecated version.

7.2 — The Three Components of an Evolution Envelope

An Evolution Envelope contains three types of information. Versioning information records the current and previous version of the Capability using Semantic Versioning (MAJOR.MINOR.PATCH). The version number is not just metadata — it is a communication tool that tells consumers exactly what kind of change has occurred and what action they need to take. A PATCH bump means "nothing you depend on has changed." A MINOR bump means "new features are available; your existing code still works." A MAJOR bump means "you must update your code to use the new interface."

Deprecation notices are formal announcements that a specific feature, method, or version of a Capability is being phased out. A deprecation notice includes the target (what is being deprecated), the end-of-life date (when it will stop working), the reason (why it is being deprecated), and the replacement (what consumers should use instead). By recording deprecations formally in the Evolution Envelope, the system can surface them at startup, giving teams visibility into upcoming breaking changes before they occur.

Migration paths provide concrete, actionable guidance for upgrading from an older version of a Contract to a newer one. A migration path includes the from-version and to-version, a list of breaking changes, a URL to detailed documentation, and optionally a reference to an automated migration tool or script. The goal is to make migration as low-friction as possible, so that teams are not discouraged from upgrading by the fear of unknown effort.

# cca/evolution.py
# =============================================================================
# Evolution Envelope: formal versioning, deprecation, and migration management.
#
# Every Capability has an EvolutionEnvelope registered alongside its Contract
# in the CapabilityRegistry. This makes the versioning status of every
# Capability visible to the entire system, not just to the team that owns it.
# =============================================================================

from __future__ import annotations
from dataclasses import dataclass, field
from datetime import date


@dataclass(frozen=True)
class DeprecationNotice:
    """
    A formal notice that a specific feature or version of a Capability
    is being deprecated.

    The end_of_life_date is a hard commitment: after this date, the
    deprecated feature will be removed and consumers that have not
    migrated will break. Setting this date far enough in the future
    (typically 90-180 days for internal systems) gives consumers
    sufficient time to migrate without feeling rushed.

    The migration_guide_url is optional because not all deprecations
    require a separate migration guide — sometimes the replacement
    description in the 'replacement' field is sufficient.
    """
    target: str                    # e.g., "Contract v1.x", "generate_code() single-return"
    end_of_life_date: date
    reason: str
    replacement: str
    migration_guide_url: str | None = None


@dataclass(frozen=True)
class MigrationPath:
    """
    Concrete guidance for upgrading from one version of a Capability's
    Contract to another.

    The breaking_changes tuple lists every change that requires consumer
    code to be updated. Being explicit about breaking changes — rather than
    just saying "see the docs" — reduces the effort required to assess
    the impact of a migration and helps teams prioritize their migration work.

    The estimated_migration_effort field is a rough guide to help teams
    plan their sprints. It is not a guarantee, but it is better than
    no estimate at all.
    """
    from_version: str
    to_version: str
    breaking_changes: tuple[str, ...]
    documentation_url: str
    automated_migration_tool: str | None
    estimated_migration_effort: str    # "trivial" | "low" | "medium" | "high"


@dataclass
class EvolutionEnvelope:
    """
    The complete Evolution Envelope for a Capability.

    Note that this class is NOT frozen=True, unlike most of our data classes.
    This is intentional: the Evolution Envelope is expected to accumulate
    deprecation notices and migration paths over the lifetime of the system.
    It is mutable by design, but only the Capability's owner should mutate it.

    The EvolutionEnvelope is stored in the CapabilityDescriptor and is
    accessible through the CapabilityRegistry. This means any component
    in the system can query the versioning status of any Capability at
    runtime, enabling dynamic deprecation warnings and health checks.
    """
    current_version: str
    previous_version: str | None
    deprecation_notices: list[DeprecationNotice] = field(default_factory=list)
    migration_paths: list[MigrationPath] = field(default_factory=list)
    release_notes_url: str | None = None

    def is_version_deprecated(self, version: str) -> bool:
        """
        Returns True if the given version string appears in any active
        deprecation notice's target. Used by the Registry to warn about
        consumers that are depending on deprecated versions.
        """
        return any(version in notice.target for notice in self.deprecation_notices)

    def get_migration_path(
        self, from_version: str, to_version: str
    ) -> MigrationPath | None:
        """
        Returns the migration path between two specific versions, or None
        if no migration path has been defined for this pair. A missing
        migration path is itself a signal that should be flagged — it means
        the Capability owner has not yet provided migration guidance.
        """
        for path in self.migration_paths:
            if path.from_version == from_version and path.to_version == to_version:
                return path
        return None

    def get_active_deprecation_notices(
        self, as_of: date | None = None
    ) -> list[DeprecationNotice]:
        """
        Returns all deprecation notices that are still active as of the
        given date — that is, notices whose end_of_life_date has not yet
        passed. Defaults to today if no date is provided.

        This method is called by the CapabilityLifecycleManager during
        startup to surface deprecation warnings before the system begins
        processing requests.
        """
        check_date = as_of or date.today()
        return [
            notice for notice in self.deprecation_notices
            if notice.end_of_life_date >= check_date
        ]

    def add_deprecation_notice(self, notice: DeprecationNotice) -> None:
        """Adds a new deprecation notice to this envelope."""
        self.deprecation_notices.append(notice)

    def add_migration_path(self, path: MigrationPath) -> None:
        """Adds a new migration path to this envelope."""
        self.migration_paths.append(path)

Chapter 8 — The CapabilityInstance Interface and Lifecycle States

For the Lifecycle Manager to orchestrate all Capabilities uniformly — regardless of whether they are AI agents, infrastructure services, or monitoring components — every Capability must implement a common interface: CapabilityInstance. This interface defines the six lifecycle methods that the Lifecycle Manager calls in a precise sequence. It is not optional, and it is not a convenience — it is the explicit contract between every Capability and the infrastructure that manages it.

8.1 — Why a Uniform Lifecycle Interface Is Necessary

Without a uniform lifecycle interface, the Lifecycle Manager cannot orchestrate Capabilities in a general way. It would need to know the specific startup and shutdown procedures for each Capability, which means it would need to import and depend on every Capability's concrete implementation. This is precisely the kind of concrete coupling that CCA is designed to eliminate.

With the CapabilityInstance interface, the Lifecycle Manager knows nothing about the specifics of any Capability. It only knows that every Capability has initialize(), inject_dependency(), start(), stop(), cleanup(), and get_contract_implementation() methods. It can orchestrate the entire system — regardless of how many Capabilities it contains or what they do — using only these six methods.

8.2 — The Lifecycle States and Their Transitions

A Capability passes through six well-defined states during its lifetime. Understanding these states and the transitions between them is essential for implementing the CapabilityInstance interface correctly.

The Created state is the initial state. The Capability object has been instantiated by the Lifecycle Manager using the factory function registered in the Capability's descriptor. At this point, the object exists in memory but has not performed any setup. No external interactions should occur in this state — not even reading a configuration file. The constructor should be as lightweight as possible.

The Initialized state is entered when the Lifecycle Manager calls initialize(). In this state, the Capability performs its internal setup: loading configuration, constructing its Essence and Adaptation objects, allocating internal data structures, and preparing any resources that do not depend on other Capabilities. The key constraint is that initialize() must not use any injected dependencies — those are not yet available. If the Capability tries to call another Capability's interface in initialize(), it will fail because the dependency has not yet been injected.

The Dependencies Injected state is entered when the Lifecycle Manager has called inject_dependency() for all of the Capability's declared Requirements. At this point, the Capability has received references to all the external interfaces it needs. It can store these references internally but should not yet start using them actively — that happens in start().

The Started state is entered when the Lifecycle Manager calls start(). This is when the Capability becomes fully operational. It starts background threads, opens persistent connections, activates event listeners, and begins accepting requests. From this point on, the Capability is delivering value.

The Stopped state is entered when the Lifecycle Manager calls stop(), which happens in reverse topological order during shutdown. The Capability gracefully ceases its active operations: it stops accepting new requests, waits for in-flight requests to complete, halts background threads, and closes active connections. Crucially, it does not yet release its resources — that happens in cleanup().

The Cleaned Up state is the final state. The Lifecycle Manager calls cleanup() after stop() has completed. The Capability releases all resources: hardware handles, file descriptors, database connections, network sockets, and memory buffers. After cleanup, the Capability instance cannot be restarted without being re-instantiated and re-initialized from scratch.

# cca/lifecycle.py
# =============================================================================
# CapabilityInstance interface and LifecycleState enum.
#
# Every Capability in the system must implement CapabilityInstance.
# This is the contract between Capabilities and the CapabilityLifecycleManager.
# =============================================================================

from __future__ import annotations
from abc import ABC, abstractmethod
from enum import Enum, auto


class LifecycleState(Enum):
    """
    The well-defined lifecycle states every Capability passes through.
    The CapabilityLifecycleManager is the sole authority responsible for
    transitioning Capabilities between these states. A Capability must
    never transition itself — it must wait for the Manager to call the
    appropriate lifecycle method.
    """
    CREATED               = auto()   # Instantiated, no setup performed
    INITIALIZED           = auto()   # Internal setup done, awaiting deps
    DEPENDENCIES_INJECTED = auto()   # All deps injected, not yet active
    STARTED               = auto()   # Fully operational
    STOPPED               = auto()   # Active ops ceased, resources retained
    CLEANED_UP            = auto()   # All resources released


class CapabilityInstance(ABC):
    """
    The interface every Capability must implement to participate in the
    CCA lifecycle. This abstract base class defines the six lifecycle
    methods that the CapabilityLifecycleManager calls in sequence.

    Implementing classes must not call lifecycle methods on themselves —
    the Manager is the sole orchestrator of lifecycle transitions.
    Implementing classes must not call lifecycle methods on other
    Capabilities — all inter-Capability communication goes through
    injected Contract interfaces.
    """

    @abstractmethod
    def initialize(self) -> None:
        """
        Performs internal setup that does not require any injected dependencies.
        Called by the LifecycleManager immediately after instantiation,
        in topological order (dependencies-first).

        What belongs here:
        - Loading internal configuration
        - Constructing Essence and Adaptation objects
        - Allocating internal data structures
        - Setting up internal state

        What does NOT belong here:
        - Using injected dependencies (they are not yet available)
        - Starting threads or opening connections
        - Making network calls or reading from databases
        """
        ...

    @abstractmethod
    def inject_dependency(self, interface_type: type, implementation: object) -> None:
        """
        Receives an injected dependency from the LifecycleManager.
        Called once for each declared Requirement, after initialize()
        has completed and after the providing Capability has been started.

        The implementation parameter is the providing Capability's
        Adaptation object — specifically, the object returned by
        get_contract_implementation() on the provider.

        Implementations should store the injected dependency in an
        instance variable for use in start() and subsequent operations.
        They should validate that the injected object is an instance of
        the expected Protocol type (using isinstance() with @runtime_checkable).
        """
        ...

    @abstractmethod
    def start(self) -> None:
        """
        Transitions the Capability to the Started state.
        Called by the LifecycleManager after all dependencies have been
        injected. This is when the Capability becomes fully operational.

        What belongs here:
        - Starting background threads
        - Opening persistent connections (database pools, message queue connections)
        - Activating event listeners
        - Performing initial data loading or cache warming
        - Any operation that uses injected dependencies
        """
        ...

    @abstractmethod
    def stop(self) -> None:
        """
        Gracefully ceases active operations.
        Called by the LifecycleManager in reverse topological order during
        shutdown, ensuring that consumers are stopped before their providers.

        Implementations must ensure that all in-flight requests complete
        before this method returns. They must not accept new requests after
        this method is called. They must not release resources — that
        happens in cleanup().
        """
        ...

    @abstractmethod
    def cleanup(self) -> None:
        """
        Releases all resources.
        Called by the LifecycleManager after stop() has completed,
        also in reverse topological order.

        After this method returns, the Capability instance is inert.
        It cannot be restarted without being re-instantiated and
        re-initialized. Implementations should set all resource references
        to None to allow garbage collection.
        """
        ...

    @abstractmethod
    def get_contract_implementation(self, interface_type: type) -> object | None:
        """
        Returns the Adaptation object that implements the given interface type,
        or None if this Capability does not provide that interface.

        This method is called by the LifecycleManager when it needs to inject
        this Capability's Provisions into other Capabilities that have declared
        Requirements for them. The returned object is passed directly to the
        requiring Capability's inject_dependency() method.

        Implementations should return self._adaptation (or the specific
        Adaptation object) if interface_type matches one of their declared
        Provisions, and None otherwise.
        """
        ...

Chapter 9 — The Capability Registry

The Capability Registry is the authoritative source of truth for the entire CCA system. It is the central hub where all Capabilities register themselves, where all Contracts are recorded, where all dependencies are declared, and where the complete dependency graph of the system is maintained. Without the Registry, the Lifecycle Manager would have no way to know what Capabilities exist, what they provide, what they need, or in what order they should be initialized.

9.1 — What the Registry Does and Why It Matters

The Registry serves five distinct functions, each of which is essential to the system's operation.

Capability Registration is the process by which a Capability announces its existence to the system. When a Capability is registered, its CapabilityDescriptor — which includes its name, its Contract, its Evolution Envelope, and a factory function for creating instances — is stored in the Registry. From this point on, any component in the system can query the Registry to discover what Capabilities are available and what interfaces they provide.

Contract Management is the ongoing maintenance of the complete record of all Provisions and Requirements declared across all registered Capabilities. This record is what makes the dependency graph possible. Without it, the system would have no way to know that the OrchestratorCapability requires a CodeGenerationContract and that the CodeGenerationCapability provides one.

Dependency Resolution is the process of matching Requirements to Provisions. When the OrchestratorCapability declares that it requires a CodeGenerationContract, the Registry searches its records to find which registered Capability provides that interface. This matching is done by interface type — the exact Python class object — not by name, which ensures that the matching is unambiguous and type-safe.

Circular Dependency Detection is one of the Registry's most important safety functions. A circular dependency — where Capability A requires Capability B, which requires Capability C, which requires Capability A — makes topological sorting impossible and would cause the Lifecycle Manager to deadlock during initialization. The Registry detects circular dependencies at registration time, before the system starts, and rejects any registration that would create one. This turns a runtime deadlock into a startup-time error, which is far easier to diagnose and fix.

Topological Sorting is the computation of the correct initialization order for all registered Capabilities. The Registry provides the dependency graph information that the Lifecycle Manager uses to perform this sort. The result is an ordered list of Capabilities such that every Capability appears after all the Capabilities it depends on. This is the order in which the Lifecycle Manager will initialize, inject dependencies into, and start each Capability.

9.2 — The CapabilityDescriptor

A CapabilityDescriptor is the complete registration record for a Capability. It contains everything the Registry needs to know about a Capability: its name (unique identifier within the system), its Contract (what it provides and requires), its Evolution Envelope (its versioning and deprecation status), and a factory function (a callable that creates a new instance of the Capability). The factory function is particularly important: it allows the Registry and Lifecycle Manager to create Capability instances on demand without knowing the Capability's concrete class.

# cca/registry.py
# =============================================================================
# CapabilityRegistry: The authoritative source of truth for the CCA system.
#
# The Registry is a singleton within the system — there is exactly one
# Registry, and all Capabilities register with it. This centralization is
# intentional: it makes the system's dependency graph visible in one place,
# which is essential for debugging, monitoring, and governance.
# =============================================================================

from __future__ import annotations
from dataclasses import dataclass, field
from typing import Callable, Any

from contracts import CapabilityContract
from cca.evolution import EvolutionEnvelope
from cca.lifecycle import CapabilityInstance


@dataclass
class CapabilityDescriptor:
    """
    The complete registration record for a Capability.

    The factory field is a zero-argument callable that creates a new
    CapabilityInstance. Using a factory (rather than storing the instance
    directly) allows the Registry to be populated before any instances are
    created, which is important for circular dependency detection — you need
    to know about all Capabilities before you can check for cycles.

    The factory is typically a lambda that captures the Capability's
    constructor arguments from the assembly layer. For example:
        lambda: CodeGenerationCapabilityInstance(llm_port=gpt53_realization)
    This lambda is evaluated lazily — only when the LifecycleManager
    decides to create the instance — so the LLM port is not constructed
    until it is actually needed.
    """
    name: str
    contract: CapabilityContract
    evolution_envelope: EvolutionEnvelope
    factory: Callable[[], CapabilityInstance]


@dataclass
class ContractBinding:
    """
    Records a formal dependency relationship between two Capabilities.

    A ContractBinding is created when the assembly layer explicitly declares
    that a specific Requirement of one Capability should be fulfilled by a
    specific Provision of another Capability. In most cases, the Registry
    can resolve these bindings automatically based on interface type matching,
    but explicit bindings are useful when multiple Capabilities provide the
    same interface and you need to specify which one a particular consumer
    should use.

    The provided_interface_name field is metadata for documentation and
    debugging — it is not used for programmatic matching, which is done
    by interface_type.
    """
    requiring_capability_name: str
    required_interface_type: type
    providing_capability_name: str
    provided_interface_name: str


class CapabilityRegistry:
    """
    The central hub for all Capability registrations, Contracts, and
    dependency relationships.

    The Registry is designed to be populated completely before the
    CapabilityLifecycleManager is started. All Capabilities should be
    registered and all bindings should be declared before any Capability
    is instantiated. This "declare everything first, then start" approach
    ensures that circular dependency detection is complete and accurate
    before any irreversible actions (like opening network connections) are taken.
    """

    def __init__(self) -> None:
        # Maps capability name -> CapabilityDescriptor
        self._capabilities: dict[str, CapabilityDescriptor] = {}
        # All declared contract bindings
        self._bindings: list[ContractBinding] = []

    def register(self, descriptor: CapabilityDescriptor) -> None:
        """
        Registers a Capability with the Registry.

        This method validates that:
        1. No Capability with the same name is already registered.
        2. Adding this Capability does not create a circular dependency.

        If either validation fails, the registration is rejected and the
        Registry state is unchanged. This rollback-on-failure behavior
        ensures the Registry is always in a consistent state.
        """
        if descriptor.name in self._capabilities:
            raise ValueError(
                f"A Capability named '{descriptor.name}' is already registered. "
                f"Capability names must be unique within the system."
            )

        self._capabilities[descriptor.name] = descriptor

        # Validate that adding this Capability does not create a cycle.
        # If it does, roll back the registration before raising the error.
        if self._has_circular_dependencies():
            del self._capabilities[descriptor.name]
            raise ValueError(
                f"Registering '{descriptor.name}' would create a circular dependency. "
                f"Review the Requirements declared in its CapabilityContract."
            )

        # Warn about any active deprecation notices in the Evolution Envelope.
        active_notices = descriptor.evolution_envelope.get_active_deprecation_notices()
        for notice in active_notices:
            print(
                f"[CCA WARNING] Capability '{descriptor.name}' has an active "
                f"deprecation notice: '{notice.target}' is deprecated "
                f"(EOL: {notice.end_of_life_date}). "
                f"Replacement: {notice.replacement}"
            )

    def bind(
        self,
        requiring: str,
        required_interface: type,
        providing: str,
        provided_interface_name: str,
    ) -> None:
        """
        Declares an explicit dependency binding between two Capabilities.

        This method validates that:
        1. Both the requiring and providing Capabilities are registered.
        2. The providing Capability actually provides the required interface.
        3. Adding this binding does not create a circular dependency.

        Explicit bindings are optional for most cases — the LifecycleManager
        can resolve dependencies automatically by interface type. Use explicit
        bindings when you need to override the automatic resolution, for
        example when multiple Capabilities provide the same interface.
        """
        if requiring not in self._capabilities:
            raise ValueError(
                f"Cannot create binding: requiring Capability '{requiring}' "
                f"is not registered."
            )
        if providing not in self._capabilities:
            raise ValueError(
                f"Cannot create binding: providing Capability '{providing}' "
                f"is not registered."
            )

        provider_contract = self._capabilities[providing].contract
        if not provider_contract.provides(required_interface):
            raise ValueError(
                f"Cannot create binding: Capability '{providing}' does not "
                f"provide the interface '{required_interface.__name__}'. "
                f"Check the Provisions declared in its CapabilityContract."
            )

        binding = ContractBinding(
            requiring_capability_name=requiring,
            required_interface_type=required_interface,
            providing_capability_name=providing,
            provided_interface_name=provided_interface_name,
        )
        self._bindings.append(binding)

        if self._has_circular_dependencies():
            self._bindings.pop()
            raise ValueError(
                f"Adding binding from '{requiring}' to '{providing}' would "
                f"create a circular dependency."
            )

    def find_provider(self, interface_type: type) -> CapabilityDescriptor | None:
        """
        Finds the Capability that provides the given interface type.
        Returns None if no provider is registered.

        In a system where multiple Capabilities could provide the same
        interface, this method returns the first match found. For more
        precise control, use explicit bindings via the bind() method.
        """
        for descriptor in self._capabilities.values():
            if descriptor.contract.provides(interface_type):
                return descriptor
        return None

    def get_descriptor(self, name: str) -> CapabilityDescriptor | None:
        """Returns the CapabilityDescriptor for the given Capability name."""
        return self._capabilities.get(name)

    def get_all_descriptors(self) -> list[CapabilityDescriptor]:
        """Returns all registered CapabilityDescriptors."""
        return list(self._capabilities.values())

    def get_topological_order(self) -> list[str]:
        """
        Computes the topological initialization order for all registered
        Capabilities using Kahn's algorithm.

        Kahn's algorithm works by repeatedly selecting Capabilities that
        have no unresolved dependencies (in-degree zero), adding them to
        the result, and then removing their outgoing edges from the graph.
        This continues until either all Capabilities have been added to
        the result (success) or no more Capabilities can be selected
        (which indicates a cycle, though this should have been caught
        at registration time).

        The result is a list of Capability names in the order they should
        be initialized: every Capability appears after all the Capabilities
        it depends on.
        """
        # Build the dependency graph: name -> set of names it depends on
        dependencies: dict[str, set[str]] = {
            name: set() for name in self._capabilities
        }

        for binding in self._bindings:
            requiring = binding.requiring_capability_name
            providing = binding.providing_capability_name
            if requiring in dependencies:
                dependencies[requiring].add(providing)

        # Also resolve implicit dependencies by interface type matching
        for name, descriptor in self._capabilities.items():
            for req in descriptor.contract.requirements:
                if req.optional:
                    continue
                provider = self.find_provider(req.interface_type)
                if provider and provider.name != name:
                    dependencies[name].add(provider.name)

        # Kahn's algorithm
        in_degree = {name: len(deps) for name, deps in dependencies.items()}
        queue = [name for name, degree in in_degree.items() if degree == 0]
        result: list[str] = []

        while queue:
            # Select the next Capability with no unresolved dependencies.
            # Sort for determinism — the topological order is not unique,
            # and we want consistent behavior across runs.
            queue.sort()
            current = queue.pop(0)
            result.append(current)

            # Remove this Capability from the dependency sets of all others.
            for name, deps in dependencies.items():
                if current in deps:
                    deps.discard(current)
                    in_degree[name] -= 1
                    if in_degree[name] == 0:
                        queue.append(name)

        if len(result) != len(self._capabilities):
            raise RuntimeError(
                "Topological sort failed: circular dependency detected. "
                "This should have been caught at registration time. "
                "Please report this as a bug."
            )

        return result

    def _has_circular_dependencies(self) -> bool:
        """
        Detects circular dependencies using depth-first search.
        Returns True if any cycle exists in the current dependency graph.

        This method is called after every registration and binding to
        ensure the Registry remains cycle-free at all times. The cost
        of this check is O(V + E) where V is the number of Capabilities
        and E is the number of dependency edges — negligible at startup.
        """
        # Build adjacency list: name -> list of names it depends on
        graph: dict[str, list[str]] = {
            name: [] for name in self._capabilities
        }
        for binding in self._bindings:
            if binding.requiring_capability_name in graph:
                graph[binding.requiring_capability_name].append(
                    binding.providing_capability_name
                )
        for name, descriptor in self._capabilities.items():
            for req in descriptor.contract.requirements:
                provider = self.find_provider(req.interface_type)
                if provider and provider.name != name:
                    if provider.name not in graph[name]:
                        graph[name].append(provider.name)

        visited: set[str] = set()
        recursion_stack: set[str] = set()

        def dfs(node: str) -> bool:
            visited.add(node)
            recursion_stack.add(node)
            for neighbor in graph.get(node, []):
                if neighbor not in visited:
                    if dfs(neighbor):
                        return True
                elif neighbor in recursion_stack:
                    return True
            recursion_stack.discard(node)
            return False

        return any(
            dfs(name) for name in self._capabilities if name not in visited
        )

Chapter 10 — The Capability Lifecycle Manager

If the Registry is the system's memory — knowing what exists and how things relate — then the Lifecycle Manager is the system's will: the component that acts on that knowledge to bring the system to life and shut it down gracefully. The Lifecycle Manager is the only component in the system that creates Capability instances, calls their lifecycle methods, injects their dependencies, and orchestrates their shutdown. No other component does any of these things.

10.1 — Why a Dedicated Lifecycle Manager Is Necessary

In a system without a Lifecycle Manager, startup order is typically managed through one of two approaches: either the main() function manually instantiates and starts each component in the right order, or a dependency injection framework handles it automatically. The first approach breaks down as the system grows — the main() function becomes a sprawling, fragile sequence of imperative statements that must be manually updated every time a new Capability is added or a dependency changes. The second approach typically requires annotating every class with framework-specific decorators, creating a tight coupling between the application code and the DI framework.

The CCA Lifecycle Manager takes a different approach. It reads the dependency graph from the Registry, computes the correct initialization order automatically, and then executes the lifecycle sequence for each Capability in that order. Adding a new Capability to the system requires only registering it with the Registry — the Lifecycle Manager automatically incorporates it into the startup sequence without any changes to the orchestration code.

10.2 — The Startup Sequence

The Lifecycle Manager's startup sequence is a precise, ordered series of operations. First, it queries the Registry to get the topological order of all registered Capabilities. Then, for each Capability in that order, it executes four steps: it calls the factory function to create the Capability instance, it calls initialize() to perform internal setup, it calls inject_dependency() for each of the Capability's declared Requirements (passing the providing Capability's Adaptation object), and finally it calls start() to make the Capability operational.

The shutdown sequence is the reverse: for each Capability in reverse topological order, the Lifecycle Manager calls stop() and then cleanup(). Reverse topological order ensures that consumers are stopped before their providers — you would not want to shut down the CodeGenerationCapability while the OrchestratorCapability is still trying to use it.

# cca/manager.py
# =============================================================================
# CapabilityLifecycleManager: Orchestrates the complete lifecycle of all
# Capabilities in the system.
#
# This is the only component that creates CapabilityInstance objects,
# calls their lifecycle methods, and injects their dependencies.
# Everything else in the system interacts with Capabilities exclusively
# through their Contract interfaces.
# =============================================================================

from __future__ import annotations
from cca.registry import CapabilityRegistry
from cca.lifecycle import CapabilityInstance, LifecycleState
from contracts import (
    RequirementsAnalysisContract, CodeGenerationContract,
    CodeReviewContract, TestGenerationContract,
    DocumentationContract, OrchestratorContract,
)


class CapabilityLifecycleManager:
    """
    Orchestrates the complete lifecycle of all registered Capabilities.

    The Manager operates in two phases:
    1. Startup: Creates, initializes, injects dependencies into, and starts
       all Capabilities in topological order.
    2. Shutdown: Stops and cleans up all Capabilities in reverse topological order.

    The Manager maintains a registry of live instances so that it can
    inject the right implementation into each Capability's Requirements
    and so that it can orchestrate shutdown correctly.
    """

    def __init__(self, registry: CapabilityRegistry) -> None:
        self._registry = registry
        # Maps capability name -> live CapabilityInstance
        self._instances: dict[str, CapabilityInstance] = {}
        # The topological order, computed once at startup
        self._startup_order: list[str] = []

    def start_all(self) -> None:
        """
        Starts all registered Capabilities in topological order.

        This method is the single entry point for system startup.
        After this method returns successfully, every registered Capability
        is in the Started state and ready to accept requests.

        If any step fails, the exception is propagated to the caller.
        In a production system, you would want to add cleanup logic here
        to shut down any Capabilities that were successfully started before
        the failure occurred. For clarity, this example omits that logic.
        """
        self._startup_order = self._registry.get_topological_order()

        print(f"[CCA] Starting {len(self._startup_order)} Capabilities "
              f"in topological order: {' -> '.join(self._startup_order)}")

        for name in self._startup_order:
            descriptor = self._registry.get_descriptor(name)
            if descriptor is None:
                raise RuntimeError(f"No descriptor found for Capability '{name}'.")

            # Step 1: Create the instance using the registered factory.
            # The factory is a lambda that captures the Realization and other
            # constructor arguments from the assembly layer.
            print(f"[CCA]   Creating '{name}'...")
            instance = descriptor.factory()
            self._instances[name] = instance

            # Step 2: Initialize internal state (no dependencies yet).
            print(f"[CCA]   Initializing '{name}'...")
            instance.initialize()

            # Step 3: Inject all declared Requirements.
            # For each required interface, find the providing Capability's
            # live instance and get its contract implementation.
            self._inject_dependencies(name, instance, descriptor.contract)

            # Step 4: Start the Capability (activate threads, connections, etc.).
            print(f"[CCA]   Starting '{name}'...")
            instance.start()
            print(f"[CCA]   '{name}' is now STARTED.")

        print(f"[CCA] All {len(self._startup_order)} Capabilities started successfully.")

    def stop_all(self) -> None:
        """
        Stops and cleans up all Capabilities in reverse topological order.
        This ensures consumers are stopped before their providers.
        """
        shutdown_order = list(reversed(self._startup_order))
        print(f"[CCA] Shutting down {len(shutdown_order)} Capabilities "
              f"in reverse order: {' -> '.join(shutdown_order)}")

        for name in shutdown_order:
            instance = self._instances.get(name)
            if instance is None:
                continue
            print(f"[CCA]   Stopping '{name}'...")
            instance.stop()
            print(f"[CCA]   Cleaning up '{name}'...")
            instance.cleanup()
            print(f"[CCA]   '{name}' is CLEANED UP.")

        self._instances.clear()
        print("[CCA] All Capabilities shut down.")

    def get_instance(self, name: str) -> CapabilityInstance | None:
        """
        Returns the live CapabilityInstance for the given name.
        Used by the assembly layer to retrieve the entry-point Capability
        (typically the OrchestratorCapability) after startup.
        """
        return self._instances.get(name)

    def _inject_dependencies(
        self,
        name: str,
        instance: CapabilityInstance,
        contract: "CapabilityContract",
    ) -> None:
        """
        Injects all declared Requirements into the given Capability instance.

        For each required interface type, this method:
        1. Finds the registered Capability that provides that interface.
        2. Retrieves the live instance of that providing Capability.
        3. Gets the providing Capability's contract implementation for
           the required interface type.
        4. Calls inject_dependency() on the requiring Capability.

        If a required (non-optional) dependency cannot be satisfied,
        a RuntimeError is raised. Optional dependencies that cannot be
        satisfied are silently skipped.
        """
        for requirement in contract.requirements:
            provider_descriptor = self._registry.find_provider(
                requirement.interface_type
            )

            if provider_descriptor is None:
                if requirement.optional:
                    print(
                        f"[CCA]   Optional dependency '{requirement.name}' "
                        f"for '{name}' not satisfied — skipping."
                    )
                    continue
                raise RuntimeError(
                    f"Cannot satisfy required dependency '{requirement.name}' "
                    f"(interface: {requirement.interface_type.__name__}) "
                    f"for Capability '{name}'. "
                    f"No registered Capability provides this interface."
                )

            provider_instance = self._instances.get(provider_descriptor.name)
            if provider_instance is None:
                raise RuntimeError(
                    f"Provider '{provider_descriptor.name}' for dependency "
                    f"'{requirement.name}' of '{name}' has not been started yet. "
                    f"This indicates a topological sort error — please report as a bug."
                )

            implementation = provider_instance.get_contract_implementation(
                requirement.interface_type
            )
            if implementation is None:
                raise RuntimeError(
                    f"Provider '{provider_descriptor.name}' returned None for "
                    f"get_contract_implementation({requirement.interface_type.__name__}). "
                    f"The provider's Contract declares this Provision but the "
                    f"CapabilityInstance does not implement it."
                )

            print(
                f"[CCA]   Injecting '{requirement.name}' "
                f"({requirement.interface_type.__name__}) "
                f"from '{provider_descriptor.name}' into '{name}'."
            )
            instance.inject_dependency(requirement.interface_type, implementation)

Chapter 11 — Efficiency Gradients

Efficiency Gradients are a concept that is unique to CCA and that addresses a problem that most other architectural patterns simply ignore: the fact that different parts of a system have radically different performance requirements, and that a single uniform level of abstraction cannot serve all of them well.

11.1 — The Problem That Efficiency Gradients Solve

In a traditional layered architecture, every component in the system uses the same level of abstraction. Everything goes through the same ORM, the same HTTP client, the same message queue library. This uniformity is convenient for developers — there is only one way to do things — but it is deeply inefficient for systems where some components have stringent performance requirements and others do not.

Consider our agentic pipeline. The OrchestratorCapability needs to coordinate six specialist agents, synthesize their outputs, and return a result to the caller. Its performance requirements are moderate — a few seconds of latency is acceptable because the pipeline as a whole takes minutes. The CodeGenerationCapability, on the other hand, is the bottleneck of the pipeline: it makes the largest LLM calls, processes the most tokens, and accounts for the majority of the pipeline's total latency. Its performance requirements are stringent — every millisecond of overhead in the inference path matters.

If both Capabilities use the same level of abstraction — the same HTTP client, the same JSON serialization library, the same logging framework — then the CodeGenerationCapability is burdened with overhead that it does not need, while the OrchestratorCapability benefits from abstractions that it could use but does not strictly require. Efficiency Gradients allow each Capability to choose the level of abstraction that is appropriate for its specific requirements.

11.2 — Three Levels of the Gradient

CCA defines three broad levels of the Efficiency Gradient, though in practice the gradient is continuous rather than discrete.

At the low-abstraction, high-efficiency end of the gradient, Capabilities use the most direct, lowest-overhead mechanisms available. In embedded systems, this means bare-metal code, direct hardware register access, and interrupt service routines. In our AI pipeline, this means using the most efficient available inference path: for a local CUDA Realization, this might mean using the llama-cpp-python library with CUDA acceleration and bypassing any intermediate abstraction layers; for a cloud API Realization, it might mean using HTTP/2 connection pooling and binary serialization where available.

At the medium-abstraction, medium-efficiency level, Capabilities use standard operating system services, well-optimized libraries, and established protocols that offer good performance without requiring bare-metal control. Most of the Capabilities in our pipeline operate at this level: they use the standard httpx async HTTP client, standard JSON serialization, and standard Python data structures.

At the high-abstraction, low-efficiency end of the gradient, Capabilities use higher-level frameworks, rich abstractions, and general-purpose tools that prioritize developer productivity and flexibility over raw performance. The DocumentationCapability, which runs on Gemini 3.1 Flash and produces documentation that is read by humans rather than processed by machines, can afford to operate at this level. Its latency requirements are the most relaxed in the pipeline.

11.3 — Efficiency Gradients Within a Single Capability

A key insight of Efficiency Gradients is that the balancing act can occur not just between different Capabilities, but also within the Realization layer of a single Capability. Not all methods within a Realization need to be optimized to the same level.

In the CodeGenerationCapability's Realization, the complete() method — the hot path that is called for every code generation request — should be optimized aggressively: connection pooling, minimal overhead, binary protocols where available, and no unnecessary logging in the critical path. But the is_available() method — called once at startup for health checking — can use a simple synchronous HTTP request with no optimization at all. The update_configuration() method — called rarely when the model configuration changes — can use a high-level configuration parsing library with full validation and detailed logging.

This within-Capability gradient is what makes CCA so practical for real systems. You do not have to choose between "optimize everything" and "abstract everything." You optimize the paths that matter and abstract the paths that do not.

11.4 — Local LLM Realizations: Applying Efficiency Gradients to Inference Backends

The most dramatic application of Efficiency Gradients in our pipeline is the choice of inference backend. Cloud API Realizations are at the high-abstraction end of the gradient: they use standard HTTP clients, JSON serialization, and managed infrastructure. Local inference Realizations are at the low-abstraction end: they use native libraries that communicate directly with GPU hardware, bypassing the network stack entirely.

# capabilities/llm_port.py is already defined above.

# capabilities/realizations/cloud/gpt53.py
# =============================================================================
# GPT53Realization: Cloud API Realization for OpenAI GPT-5.3.
#
# This Realization operates at the medium-to-high abstraction level of the
# Efficiency Gradient. It uses the standard OpenAI Python client, which
# handles connection pooling, retries, and authentication automatically.
# The trade-off is that every request goes through the network, introducing
# latency that local Realizations avoid.
# =============================================================================

from __future__ import annotations
import os
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse


class GPT53Realization:
    """
    LLMPort implementation for the OpenAI GPT-5.3 API.

    This Realization is appropriate when:
    - Network latency is acceptable (typically > 500ms per request)
    - You need the highest-quality outputs from a frontier model
    - You do not have local GPU hardware available
    - Privacy and data residency requirements permit cloud processing

    The api_key is read from an environment variable at construction time,
    not at request time. This ensures that a missing API key causes a
    clear error at startup rather than a cryptic failure mid-pipeline.
    """

    MODEL_ID = "gpt-5.3"

    def __init__(self, api_key: str | None = None) -> None:
        self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
        if not self._api_key:
            raise ValueError(
                "OpenAI API key not provided. Set the OPENAI_API_KEY "
                "environment variable or pass api_key to the constructor."
            )
        # Import here rather than at module level to avoid import errors
        # when the openai package is not installed (e.g., in environments
        # that use only local Realizations).
        import openai
        self._client = openai.OpenAI(api_key=self._api_key)

    def complete(self, request: LLMRequest) -> LLMResponse:
        """Sends a completion request to the GPT-5.3 API."""
        response = self._client.chat.completions.create(
            model=self.MODEL_ID,
            messages=[
                {"role": "system", "content": request.system_prompt},
                {"role": "user",   "content": request.user_prompt},
            ],
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        choice = response.choices[0]
        usage  = response.usage
        return LLMResponse(
            content=choice.message.content or "",
            model_identifier=self.MODEL_ID,
            prompt_tokens=usage.prompt_tokens,
            completion_tokens=usage.completion_tokens,
            total_tokens=usage.total_tokens,
        )

    def is_available(self) -> bool:
        """Checks availability by making a minimal test request."""
        try:
            self._client.models.retrieve(self.MODEL_ID)
            return True
        except Exception:
            return False

# capabilities/realizations/local/cuda_realization.py
# =============================================================================
# CUDARealization: Local inference on NVIDIA GPUs via llama-cpp-python.
#
# This Realization operates at the LOW-ABSTRACTION, HIGH-EFFICIENCY end of
# the Efficiency Gradient. It uses llama-cpp-python with CUDA acceleration,
# which communicates directly with the GPU via CUDA kernels, bypassing the
# network stack entirely. The result is dramatically lower latency (typically
# 50-200ms per token vs. 500ms+ for cloud APIs) at the cost of requiring
# local NVIDIA GPU hardware and a GGUF model file.
#
# This Realization is appropriate when:
# - You have NVIDIA GPU hardware (CUDA compute capability >= 7.0 recommended)
# - Network latency is unacceptable (e.g., air-gapped environments)
# - Data privacy requirements prohibit sending data to cloud APIs
# - Cost optimization requires avoiding per-token cloud API charges
# =============================================================================

from __future__ import annotations
import os
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse


class CUDARealization:
    """
    LLMPort implementation using llama-cpp-python with CUDA acceleration.

    The model_path should point to a GGUF-format model file. GGUF is the
    standard format for quantized models compatible with llama.cpp.
    Models can be downloaded from HuggingFace Hub in this format.

    The n_gpu_layers parameter controls how many transformer layers are
    offloaded to the GPU. Setting it to -1 offloads all layers, which
    maximizes GPU utilization and minimizes inference latency. Reduce this
    value if you encounter out-of-memory errors.

    The n_ctx parameter sets the context window size. Larger values allow
    processing longer prompts but require more GPU memory.
    """

    def __init__(
        self,
        model_path: str,
        n_gpu_layers: int = -1,    # -1 = offload all layers to GPU
        n_ctx: int = 8192,
        model_identifier: str = "llama-cuda-local",
    ) -> None:
        if not os.path.exists(model_path):
            raise FileNotFoundError(
                f"GGUF model file not found at '{model_path}'. "
                f"Download a GGUF model from HuggingFace Hub and update "
                f"the model_path in your assembly configuration."
            )
        # Lazy import: only import llama_cpp when this Realization is used.
        # This prevents import errors in environments without llama-cpp-python.
        from llama_cpp import Llama
        self._model = Llama(
            model_path=model_path,
            n_gpu_layers=n_gpu_layers,
            n_ctx=n_ctx,
            verbose=False,   # Suppress llama.cpp's verbose startup output
        )
        self._model_identifier = model_identifier

    def complete(self, request: LLMRequest) -> LLMResponse:
        """
        Runs inference locally on the CUDA-accelerated GPU.
        The entire inference happens in-process — no network calls,
        no serialization overhead, no authentication round-trips.
        """
        # llama-cpp-python uses the ChatML message format
        messages = [
            {"role": "system", "content": request.system_prompt},
            {"role": "user",   "content": request.user_prompt},
        ]
        response = self._model.create_chat_completion(
            messages=messages,
            temperature=request.temperature,
            max_tokens=request.max_tokens,
        )
        choice = response["choices"][0]
        usage  = response.get("usage", {})
        return LLMResponse(
            content=choice["message"]["content"] or "",
            model_identifier=self._model_identifier,
            prompt_tokens=usage.get("prompt_tokens", 0),
            completion_tokens=usage.get("completion_tokens", 0),
            total_tokens=usage.get("total_tokens", 0),
        )

    def is_available(self) -> bool:
        """
        Returns True if the model is loaded and CUDA is available.
        The model is loaded at construction time, so if __init__ succeeded,
        the model is available.
        """
        try:
            import llama_cpp
            return self._model is not None
        except ImportError:
            return False

# capabilities/realizations/local/mlx_realization.py
# =============================================================================
# MLXRealization: Local inference on Apple Silicon via Apple MLX.
#
# Apple MLX is a machine learning framework designed specifically for
# Apple Silicon (M1, M2, M3, M4 series chips). It uses the unified memory
# architecture of Apple Silicon to run inference on the GPU without any
# data transfer overhead — the CPU and GPU share the same memory pool.
# This makes MLX exceptionally efficient on Apple hardware.
#
# This Realization is appropriate when:
# - You are running on Apple Silicon (M-series Mac or iPad Pro)
# - You want the best performance on Apple hardware
# - You need local inference without NVIDIA or AMD hardware
# =============================================================================

from __future__ import annotations
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse


class MLXRealization:
    """
    LLMPort implementation using Apple MLX for Apple Silicon inference.

    Requires the mlx-lm package: pip install mlx-lm
    Models are loaded from HuggingFace Hub in MLX format, or converted
    from standard formats using the mlx_lm.convert utility.
    """

    def __init__(
        self,
        model_name: str,           # e.g., "mlx-community/Llama-3-8B-Instruct-4bit"
        max_tokens: int = 4096,
        model_identifier: str = "mlx-local",
    ) -> None:
        # Lazy import: only import mlx_lm when this Realization is used.
        from mlx_lm import load
        self._model, self._tokenizer = load(model_name)
        self._max_tokens = max_tokens
        self._model_identifier = model_identifier

    def complete(self, request: LLMRequest) -> LLMResponse:
        """
        Runs inference on Apple Silicon via MLX.
        MLX uses the unified memory architecture — no GPU memory transfers.
        """
        from mlx_lm import generate

        # Combine system and user prompts into a single prompt string.
        # MLX models typically use the ChatML format.
        full_prompt = (
            f"<|system|>\n{request.system_prompt}\n"
            f"<|user|>\n{request.user_prompt}\n"
            f"<|assistant|>\n"
        )
        response_text = generate(
            self._model,
            self._tokenizer,
            prompt=full_prompt,
            max_tokens=request.max_tokens or self._max_tokens,
            temp=request.temperature,
            verbose=False,
        )
        # MLX generate() returns only the generated text, not token counts.
        # Estimate token counts from character length as a rough approximation.
        estimated_completion_tokens = len(response_text.split())
        estimated_prompt_tokens = len(full_prompt.split())
        return LLMResponse(
            content=response_text,
            model_identifier=self._model_identifier,
            prompt_tokens=estimated_prompt_tokens,
            completion_tokens=estimated_completion_tokens,
            total_tokens=estimated_prompt_tokens + estimated_completion_tokens,
        )

    def is_available(self) -> bool:
        """Returns True if running on Apple Silicon with MLX installed."""
        try:
            import mlx.core as mx
            # Check that we have a GPU device available
            return mx.default_device() == mx.gpu
        except ImportError:
            return False

# capabilities/realizations/local/factory.py
# =============================================================================
# LLMRealizationFactory: Hardware auto-detection and Realization selection.
#
# This factory encapsulates the logic for selecting the most appropriate
# LLM Realization based on the available hardware. It is used by the
# assembly layer when no explicit Realization has been configured.
#
# The factory tries backends in order of preference:
# 1. NVIDIA CUDA (highest performance on NVIDIA hardware)
# 2. Apple MLX (best performance on Apple Silicon)
# 3. AMD ROCm (for AMD GPU hardware)
# 4. Intel OpenVINO (for Intel CPUs and integrated graphics)
# 5. Cloud API fallback (when no local hardware is available)
#
# This factory is the only place in the system where hardware detection
# logic lives. All other components are hardware-agnostic.
# =============================================================================

from __future__ import annotations
from capabilities.llm_port import LLMPort


class LLMRealizationFactory:
    """
    Selects and instantiates the most appropriate LLMPort implementation
    based on available hardware and configuration.

    Usage in assembly.py:
        factory = LLMRealizationFactory(
            model_path="/models/llama-3-70b.Q4_K_M.gguf",
            hf_model_name="mlx-community/Llama-3-8B-Instruct-4bit",
            openai_api_key=os.environ.get("OPENAI_API_KEY"),
        )
        llm_port = factory.create()
    """

    def __init__(
        self,
        model_path: str | None = None,         # Path to GGUF file for CUDA/ROCm/Vulkan
        hf_model_name: str | None = None,      # HuggingFace model name for MLX
        openai_api_key: str | None = None,     # Fallback to OpenAI cloud API
        prefer_local: bool = True,             # If True, prefer local over cloud
    ) -> None:
        self._model_path = model_path
        self._hf_model_name = hf_model_name
        self._openai_api_key = openai_api_key
        self._prefer_local = prefer_local

    def create(self) -> LLMPort:
        """
        Creates and returns the most appropriate LLMPort for this environment.
        Tries local backends first (if prefer_local is True), then falls back
        to cloud APIs.
        """
        if self._prefer_local:
            local_port = self._try_local_backends()
            if local_port is not None:
                return local_port

        # Fall back to cloud API
        if self._openai_api_key:
            from capabilities.realizations.cloud.gpt53 import GPT53Realization
            print("[LLMFactory] Using cloud backend: OpenAI GPT-5.3")
            return GPT53Realization(api_key=self._openai_api_key)

        raise RuntimeError(
            "No LLM backend is available. Install a local inference library "
            "(llama-cpp-python, mlx-lm) or provide a cloud API key."
        )

    def _try_local_backends(self) -> LLMPort | None:
        """Tries each local backend in order of preference."""
        # Try NVIDIA CUDA first
        if self._try_cuda():
            from capabilities.realizations.local.cuda_realization import CUDARealization
            print("[LLMFactory] Using local backend: NVIDIA CUDA (llama-cpp-python)")
            return CUDARealization(model_path=self._model_path)

        # Try Apple MLX
        if self._try_mlx():
            from capabilities.realizations.local.mlx_realization import MLXRealization
            print("[LLMFactory] Using local backend: Apple MLX")
            return MLXRealization(model_name=self._hf_model_name or "mlx-community/Llama-3-8B-Instruct-4bit")

        # Try AMD ROCm (uses llama-cpp-python with ROCm build)
        if self._try_rocm():
            from capabilities.realizations.local.cuda_realization import CUDARealization
            print("[LLMFactory] Using local backend: AMD ROCm (llama-cpp-python ROCm build)")
            return CUDARealization(
                model_path=self._model_path,
                model_identifier="llama-rocm-local",
            )

        return None

    def _try_cuda(self) -> bool:
        """Returns True if NVIDIA CUDA is available."""
        try:
            import torch
            return torch.cuda.is_available()
        except ImportError:
            pass
        try:
            # Fallback: check via llama_cpp directly
            from llama_cpp import llama_supports_gpu_offload
            return llama_supports_gpu_offload()
        except ImportError:
            return False

    def _try_mlx(self) -> bool:
        """Returns True if Apple MLX is available (Apple Silicon)."""
        try:
            import mlx.core as mx
            return mx.default_device() == mx.gpu
        except (ImportError, Exception):
            return False

    def _try_rocm(self) -> bool:
        """Returns True if AMD ROCm is available."""
        try:
            import torch
            return torch.version.hip is not None
        except (ImportError, AttributeError):
            return False

Chapter 12 — System Assembly

The assembly layer is the only place in the entire system where concrete types are wired together. Every other module — the Essences, the Realizations, the Adaptations, the Registry, the Lifecycle Manager — works exclusively with abstract interfaces. The assembly layer is where you say: "For this deployment, the CodeGenerationCapability will use the GPT-5.3 Realization, the OrchestratorCapability will use the local CUDA Realization, and the system will run with these six Capabilities in this configuration."

This centralization of concrete wiring is not just a convention — it is an architectural principle. When all concrete dependencies are declared in one place, changing the deployment configuration requires changing only that one place. Swapping from cloud to local inference, adding a new Capability, or changing which Realization a Capability uses are all single-file changes.

# assembly.py
# =============================================================================
# System Assembly: The composition root for the Agentic Pipeline.
#
# This is the ONLY file in the system that:
# - Imports concrete Realization classes
# - Instantiates CapabilityDescriptors with factory lambdas
# - Registers Capabilities with the Registry
# - Creates and starts the CapabilityLifecycleManager
#
# All other files work exclusively with abstract interfaces (Protocols)
# and shared data models. This file is the seam between the abstract
# architecture and the concrete deployment environment.
# =============================================================================

from __future__ import annotations
import os
from datetime import date

from models import FeatureRequest
from contracts import (
    RequirementsAnalysisContract, CodeGenerationContract,
    CodeReviewContract, TestGenerationContract,
    DocumentationContract, OrchestratorContract,
)
from cca.registry import CapabilityRegistry, CapabilityDescriptor
from cca.manager import CapabilityLifecycleManager
from cca.evolution import EvolutionEnvelope, DeprecationNotice, MigrationPath

# Import concrete Realization classes — allowed ONLY in assembly.py
from capabilities.realizations.cloud.gpt53 import GPT53Realization
from capabilities.realizations.cloud.claude46 import Claude46Realization
from capabilities.realizations.cloud.gemini31 import Gemini31Realization
from capabilities.realizations.local.factory import LLMRealizationFactory

# Import CapabilityInstance implementations
from capabilities.code_generation.capability import CodeGenerationCapabilityInstance
from capabilities.requirements_analysis.capability import RequirementsAnalysisCapabilityInstance
from capabilities.code_review.capability import CodeReviewCapabilityInstance
from capabilities.test_generation.capability import TestGenerationCapabilityInstance
from capabilities.documentation.capability import DocumentationCapabilityInstance
from capabilities.orchestrator.capability import OrchestratorCapabilityInstance

# Import contract builders
from contracts import (
    CapabilityContract, ProvisionDefinition, RequirementDefinition, CapabilityProtocol,
    CommunicationMechanism,
)


def build_code_generation_contract() -> CapabilityContract:
    """
    Builds the formal Contract for the CodeGenerationCapability.
    This function encodes the architectural decisions about what this
    Capability provides, what it needs, and how it communicates.
    """
    return CapabilityContract(
        capability_name="CodeGeneration",
        provisions=(
            ProvisionDefinition(
                name="CodeGenerationService",
                interface_type=CodeGenerationContract,
                description=(
                    "Generates implementation code from structured engineering tasks. "
                    "Supports completion listeners for observability."
                ),
            ),
        ),
        requirements=(),   # No dependencies on other Capabilities
        protocols=(
            CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=30_000,   # 30s — LLM inference can be slow
                reliability="at-least-once",
                authentication=None,
                description="In-process direct call. Latency dominated by LLM inference time.",
            ),
        ),
    )


def build_orchestrator_contract() -> CapabilityContract:
    """Builds the formal Contract for the OrchestratorCapability."""
    return CapabilityContract(
        capability_name="Orchestrator",
        provisions=(
            ProvisionDefinition(
                name="PipelineExecutionService",
                interface_type=OrchestratorContract,
                description="Executes the full agentic pipeline for a feature request.",
            ),
        ),
        requirements=(
            RequirementDefinition(
                name="RequirementsAnalysis",
                interface_type=RequirementsAnalysisContract,
                optional=False,
                description="Required to decompose feature requests into engineering tasks.",
            ),
            RequirementDefinition(
                name="CodeGeneration",
                interface_type=CodeGenerationContract,
                optional=False,
                description="Required to generate implementation code.",
            ),
            RequirementDefinition(
                name="CodeReview",
                interface_type=CodeReviewContract,
                optional=False,
                description="Required to review generated code for quality.",
            ),
            RequirementDefinition(
                name="TestGeneration",
                interface_type=TestGenerationContract,
                optional=False,
                description="Required to generate test suites.",
            ),
            RequirementDefinition(
                name="Documentation",
                interface_type=DocumentationContract,
                optional=True,   # Pipeline can succeed without documentation
                description="Optional: generates documentation artifacts.",
            ),
        ),
        protocols=(
            CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=300_000,  # 5 minutes for full pipeline
                reliability="exactly-once",
                authentication=None,
                description="In-process direct call. Latency is the sum of all agent latencies.",
            ),
        ),
    )


def build_and_start_pipeline(use_local_llm: bool = False) -> CapabilityLifecycleManager:
    """
    Assembles and starts the complete Agentic Software Engineering Pipeline.

    This function is the single entry point for system startup. It:
    1. Creates the appropriate LLM Realizations based on configuration.
    2. Builds CapabilityContracts for all six Capabilities.
    3. Builds EvolutionEnvelopes recording the current versioning status.
    4. Registers all Capabilities with the Registry.
    5. Creates and starts the CapabilityLifecycleManager.
    6. Returns the started Manager for use by the caller.

    The use_local_llm parameter switches between cloud API Realizations
    (default) and local hardware Realizations (CUDA, MLX, ROCm, etc.).
    This is the only parameter needed to switch the entire system between
    cloud and local inference — no other code changes are required.
    """
    registry = CapabilityRegistry()

    # -------------------------------------------------------------------------
    # Step 1: Create LLM Realizations
    # -------------------------------------------------------------------------
    if use_local_llm:
        # The factory auto-detects available hardware and selects the best backend.
        # The api_key is provided as a fallback in case no local hardware is found.
        factory = LLMRealizationFactory(
            model_path=os.environ.get("LOCAL_MODEL_PATH", "/models/llama-3-70b.Q4_K_M.gguf"),
            hf_model_name=os.environ.get("MLX_MODEL_NAME", "mlx-community/Llama-3-8B-Instruct-4bit"),
            openai_api_key=os.environ.get("OPENAI_API_KEY"),
            prefer_local=True,
        )
        # All agents share the same local LLM in this configuration.
        # In a production system, you might use different quantization levels
        # for different agents based on their quality requirements.
        shared_local_llm = factory.create()
        requirements_llm  = shared_local_llm
        code_gen_llm      = shared_local_llm
        code_review_llm   = shared_local_llm
        test_gen_llm      = shared_local_llm
        documentation_llm = shared_local_llm
        orchestrator_llm  = shared_local_llm
    else:
        # Cloud Realizations: each agent uses the model best suited to its task.
        # API keys are read from environment variables — never hardcoded.
        requirements_llm  = Claude46Realization(
            api_key=os.environ["ANTHROPIC_API_KEY"], model_variant="opus"
        )
        code_gen_llm      = GPT53Realization(api_key=os.environ["OPENAI_API_KEY"])
        code_review_llm   = Claude46Realization(
            api_key=os.environ["ANTHROPIC_API_KEY"], model_variant="sonnet"
        )
        test_gen_llm      = Gemini31Realization(
            api_key=os.environ["GOOGLE_API_KEY"], model_variant="pro"
        )
        documentation_llm = Gemini31Realization(
            api_key=os.environ["GOOGLE_API_KEY"], model_variant="flash"
        )
        orchestrator_llm  = GPT53Realization(api_key=os.environ["OPENAI_API_KEY"])

    # -------------------------------------------------------------------------
    # Step 2: Build Evolution Envelopes
    # Each Capability's versioning status is recorded here. In a production
    # system, these would be loaded from a configuration file or a service
    # registry, not hardcoded. They are hardcoded here for clarity.
    # -------------------------------------------------------------------------
    code_gen_envelope = EvolutionEnvelope(
        current_version="2.0.0",
        previous_version="1.3.2",
        migration_paths=[
            MigrationPath(
                from_version="1.x",
                to_version="2.0.0",
                breaking_changes=(
                    "generate_code() now returns GeneratedCode with a "
                    "'generation_model' field that was not present in v1.x.",
                    "The 'dependencies' field is now a tuple, not a list.",
                ),
                documentation_url="https://docs.internal/code-gen/migration-v1-v2",
                automated_migration_tool=None,
                estimated_migration_effort="low",
            )
        ],
    )

    orchestrator_envelope = EvolutionEnvelope(
        current_version="1.0.0",
        previous_version=None,
    )

    # -------------------------------------------------------------------------
    # Step 3: Register all Capabilities with the Registry.
    # The factory lambda is evaluated lazily — only when the LifecycleManager
    # decides to create the instance. This means the LLM Realization objects
    # are captured by the lambda but not used until start_all() is called.
    # -------------------------------------------------------------------------
    registry.register(CapabilityDescriptor(
        name="RequirementsAnalysis",
        contract=CapabilityContract(
            capability_name="RequirementsAnalysis",
            provisions=(ProvisionDefinition(
                name="RequirementsAnalysisService",
                interface_type=RequirementsAnalysisContract,
                description="Decomposes feature requests into engineering tasks.",
            ),),
            requirements=(),
            protocols=(CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=20_000,
                reliability="at-least-once",
                authentication=None,
                description="In-process direct call.",
            ),),
        ),
        evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
        factory=lambda: RequirementsAnalysisCapabilityInstance(llm_port=requirements_llm),
    ))

    registry.register(CapabilityDescriptor(
        name="CodeGeneration",
        contract=build_code_generation_contract(),
        evolution_envelope=code_gen_envelope,
        factory=lambda: CodeGenerationCapabilityInstance(llm_port=code_gen_llm),
    ))

    registry.register(CapabilityDescriptor(
        name="CodeReview",
        contract=CapabilityContract(
            capability_name="CodeReview",
            provisions=(ProvisionDefinition(
                name="CodeReviewService",
                interface_type=CodeReviewContract,
                description="Reviews generated code and produces structured feedback.",
            ),),
            requirements=(),
            protocols=(CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=15_000,
                reliability="at-least-once",
                authentication=None,
                description="In-process direct call.",
            ),),
        ),
        evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
        factory=lambda: CodeReviewCapabilityInstance(llm_port=code_review_llm),
    ))

    registry.register(CapabilityDescriptor(
        name="TestGeneration",
        contract=CapabilityContract(
            capability_name="TestGeneration",
            provisions=(ProvisionDefinition(
                name="TestGenerationService",
                interface_type=TestGenerationContract,
                description="Generates comprehensive test suites for generated code.",
            ),),
            requirements=(),
            protocols=(CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=20_000,
                reliability="at-least-once",
                authentication=None,
                description="In-process direct call.",
            ),),
        ),
        evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
        factory=lambda: TestGenerationCapabilityInstance(llm_port=test_gen_llm),
    ))

    registry.register(CapabilityDescriptor(
        name="Documentation",
        contract=CapabilityContract(
            capability_name="Documentation",
            provisions=(ProvisionDefinition(
                name="DocumentationService",
                interface_type=DocumentationContract,
                description="Generates technical documentation for generated code.",
            ),),
            requirements=(),
            protocols=(CapabilityProtocol(
                communication_mechanism=CommunicationMechanism.DIRECT_CALL,
                data_format="python_objects",
                max_latency_ms=10_000,
                reliability="best-effort",
                authentication=None,
                description="In-process direct call. Optional — pipeline succeeds without it.",
            ),),
        ),
        evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
        factory=lambda: DocumentationCapabilityInstance(llm_port=documentation_llm),
    ))

    registry.register(CapabilityDescriptor(
        name="Orchestrator",
        contract=build_orchestrator_contract(),
        evolution_envelope=orchestrator_envelope,
        factory=lambda: OrchestratorCapabilityInstance(llm_port=orchestrator_llm),
    ))

    # -------------------------------------------------------------------------
    # Step 4: Start the system.
    # The LifecycleManager takes over from here. It reads the Registry,
    # computes the topological order, and brings each Capability online.
    # -------------------------------------------------------------------------
    manager = CapabilityLifecycleManager(registry=registry)
    manager.start_all()
    return manager


def main() -> None:
    """
    Entry point for the Agentic Software Engineering Pipeline.
    Demonstrates both cloud and local inference configurations.
    """
    use_local = os.environ.get("USE_LOCAL_LLM", "false").lower() == "true"

    print(f"[Main] Starting pipeline in {'LOCAL' if use_local else 'CLOUD'} mode.")
    manager = build_and_start_pipeline(use_local_llm=use_local)

    try:
        # Retrieve the OrchestratorCapability's contract implementation.
        # This is the only Capability the caller interacts with directly.
        orchestrator_instance = manager.get_instance("Orchestrator")
        orchestrator: OrchestratorContract = (
            orchestrator_instance.get_contract_implementation(OrchestratorContract)
        )

        # Submit a feature request to the pipeline.
        request = FeatureRequest(
            request_id="FR-2026-001",
            title="Add rate limiting to the public API",
            description=(
                "Implement token-bucket rate limiting for all public API endpoints. "
                "Each API key should have a configurable request limit per minute. "
                "Exceeded limits should return HTTP 429 with a Retry-After header."
            ),
            requester="engineering-team@company.com",
            priority="high",
        )

        print(f"\n[Main] Executing pipeline for: '{request.title}'")
        result = orchestrator.execute_pipeline(request)

        print(f"\n[Main] Pipeline completed with status: {result.pipeline_status}")
        print(f"[Main] Summary: {result.summary}")
        print(f"[Main] Engineering tasks: {len(result.engineering_tasks)}")
        print(f"[Main] Generated code artifacts: {len(result.generated_code)}")
        print(f"[Main] Review results: {len(result.review_results)}")
        print(f"[Main] Test suites: {len(result.test_suites)}")
        print(f"[Main] Documentation artifacts: {len(result.documentation)}")

    finally:
        # Always shut down gracefully, even if an exception occurred.
        print("\n[Main] Initiating graceful shutdown...")
        manager.stop_all()
        print("[Main] Shutdown complete.")


if __name__ == "__main__":
    main()

Conclusion

Capability-Centric Architecture provides a coherent, principled answer to the architectural challenges that agentic AI systems face. By organizing the system around well-defined Capabilities — cohesive units of functionality that deliver tangible value — rather than technical layers or organizational boundaries, CCA creates a structure that is stable where it needs to be stable and flexible where it needs to be flexible.

The Capability Nucleus with its Essence, Realization, and Adaptation layers ensures that business logic is protected from infrastructure churn. The Essence contains what the agent does; the Realization contains how it does it in a specific environment; the Adaptation contains how it exposes itself to the world. These three layers can evolve independently, which means switching from cloud to local inference, adding a new communication protocol, or refactoring the core reasoning logic are all local changes that do not ripple through the system.

Capability Contracts with their Provisions, Requirements, and Protocols make all dependencies explicit and formal. Every interface is typed and documented. Every dependency is declared and verified at startup. Implicit interfaces — the invisible coupling that makes unstructured systems so fragile — become impossible by construction.

The Capability Registry and Capability Lifecycle Manager together automate the most error-prone aspects of system management: dependency resolution, startup ordering, dependency injection, and graceful shutdown. Adding a new Capability to the system requires only registering it with the Registry — the infrastructure handles the rest.

Evolution Envelopes make change a first-class concern rather than an afterthought. Every Capability carries a formal record of its versioning status, its deprecation notices, and its migration paths. The system can surface upcoming breaking changes at startup, giving teams visibility and time to adapt before problems occur in production.

Efficiency Gradients resolve the tension between performance and abstraction by allowing different Capabilities — and different parts of the same Capability — to operate at different levels of the abstraction stack. Critical inference paths can use low-level, high-efficiency backends; non-critical coordination paths can use high-level, developer-friendly abstractions. The architecture does not impose a single uniform approach on the entire system.

The result is a system that is not just intelligent but genuinely engineered: structured, evolvable, testable, and governable. That is what production-grade agentic AI requires.

Friday, March 06, 2026

Design Guide for Building Structured, Evolvable, and Governable Multi-Agent Systems using Capability-Centric Architecture