March 2026
A note on model names: Model identifiers used throughout this article (e.g., GPT-5.3, Claude 4.6 Opus, Gemini 3.1 Pro) are illustrative placeholders representing the class of frontier models available at the time of writing. Substitute the actual identifiers available in your environment. All architectural principles are model-agnostic.
Python version: All code requires Python ≥ 3.10 for
X | Yunion syntax. Addfrom __future__ import annotationsif targeting Python 3.9.
Preface: Why Architecture Matters More Than Ever in the Age of Agentic AI
There is a temptation, when working with the remarkable AI models available today, to believe that intelligence is sufficient. That if you can get a frontier model to reason correctly about a problem, the engineering scaffolding around it is a secondary concern — something you can figure out later, once the demos are impressive enough. This temptation is understandable, and it is also one of the most reliable ways to build a system that will collapse under its own weight the moment it reaches production.
The reason is straightforward: agentic AI systems are not just intelligent — they are complex. They are composed of multiple cooperating agents, each with different roles, different model backends, different tool dependencies, different latency requirements, and different rates of change. A code-generation agent powered by GPT-5.3 may need to be replaced with Gemini 3.1 Pro next quarter when benchmarks shift. A requirements-analysis agent may need to be upgraded when task complexity grows. A new documentation agent may need to be added to the pipeline without disturbing the agents already running in production. A monitoring agent may need to observe the behavior of all other agents without any of them knowing it exists. And crucially — your team may want to run the entire pipeline locally on an NVIDIA GPU workstation, an Apple Silicon Mac, an AMD ROCm server, or an Intel accelerator, without changing a single line of business logic.
None of these operations are trivial if the system was built without a coherent architectural framework. If agents are wired together through direct, concrete references; if their input and output formats are implicit conventions rather than formal contracts; if their startup order is determined by hope rather than dependency analysis; if their versioning is managed through comments in a configuration file — then every one of those operations becomes a painful, risky, and time-consuming exercise in archaeology.
This article argues that Capability-Centric Architecture (CCA) is the right framework for building agentic AI systems that are not merely impressive in demos but genuinely maintainable, evolvable, and governable in production. CCA was designed to address exactly the kind of complexity that multi-agent systems exhibit: diverse components with heterogeneous requirements, intricate dependency relationships, and the need to evolve rapidly without breaking what already works.
We will work through every major concept in CCA — the Capability as the fundamental unit of structure, the Capability Nucleus with its three internal layers, Capability Contracts with their Provisions, Requirements, and Protocols, the Capability Registry as the authoritative coordination hub, the Capability Lifecycle Manager as the orchestrator of system startup and shutdown, Evolution Envelopes as the formal mechanism for managing change over time, and Efficiency Gradients as the tool for matching implementation depth to operational criticality. For each concept, we explain the theory in depth, connect it explicitly to the challenges of agentic AI, and provide concrete, runnable Python code.
Chapter 1 — The Structural Problem with Unstructured Agentic AI
Before we can appreciate the solution, we need to be precise and honest about the problem. It is tempting to describe the problem in vague terms — "it gets complicated," "it doesn't scale," "it's hard to maintain" — but these descriptions are too abstract to be useful. Let us instead describe the specific, concrete failure modes that emerge when agentic AI systems are built without architectural discipline.
1.1 — The Problem of Concrete Coupling
The most common pattern in early agentic AI systems is direct, concrete coupling between components. The orchestrator agent imports the code-generation agent directly. The code-generation agent calls the review agent by instantiating it directly. The review agent knows the exact class name, constructor signature, and method names of the test-generation agent.
This seems harmless at first. It is fast to write, easy to understand in isolation, and produces working demos quickly. The problem emerges the moment you need to change anything. If you want to swap the code-generation agent from GPT-5.3 to Gemini 3.1 Pro, you must find every place in the codebase where the GPT-5.3-backed agent is referenced and update it. If the new agent has a slightly different output format — perhaps it returns a structured object instead of a raw string — every downstream agent that consumes that output must be updated as well. A change that should be local becomes a system-wide refactoring exercise.
CCA addresses this through Capability Contracts: formal, interface-oriented agreements that define what a Capability provides and what it requires, completely independently of how it is implemented. The orchestrator does not know whether the code-generation Capability uses GPT-5.3 or Gemini 3.1 Pro or a locally-running Llama 3 model. It only knows the Contract: given a structured task specification, the Capability will return a structured code artifact. The implementation can change freely as long as the Contract remains stable.
1.2 — The Problem of Implicit Interfaces
Even when agents are not directly coupled at the class level, they are often coupled through implicit interface conventions. One agent produces a Python dictionary with certain keys; the next agent expects those exact keys. One agent returns a list of strings; the next agent assumes the list is always non-empty. These conventions exist only in the minds of the developers who wrote the code and, perhaps, in a README file that is already six months out of date.
Implicit interfaces are invisible until they break. When they break, the failure is often mysterious — a KeyError deep in an agent's processing logic, a None dereference in a downstream consumer, a silent data corruption that produces subtly wrong outputs for days before anyone notices. Debugging these failures requires understanding the entire pipeline, not just the component that raised the exception.
CCA addresses this through the formal definition of Provisions and Requirements within Capability Contracts. Every interface is explicit, typed, and documented. Every dependency is declared. The system can verify at startup — before any agent processes a single token — that every declared Requirement has a corresponding Provision. Implicit interfaces become impossible by construction.
1.3 — The Problem of Unmanaged Startup Order
Multi-agent systems have dependencies. The orchestrator depends on all specialist agents being ready before it can delegate work. The code-generation agent may depend on a tool-registry agent that provides access to external APIs. The monitoring agent may depend on a logging infrastructure agent that must be initialized first.
In unstructured systems, startup order is typically managed through one of two approaches: either everything is started simultaneously and race conditions are handled through retry logic and timeouts, or startup order is hardcoded in a configuration file that must be manually maintained as the dependency graph evolves. Both approaches are fragile. Race conditions produce intermittent failures that are notoriously difficult to reproduce and debug. Manually maintained startup sequences inevitably drift out of sync with the actual dependency graph, especially in rapidly evolving systems.
CCA addresses this through the Capability Lifecycle Manager, which queries the Capability Registry to build a complete dependency graph of all registered Capabilities, performs a topological sort to determine the correct initialization order, and then brings each Capability online in that order, injecting dependencies as they become available. The startup sequence is not configured manually — it is computed automatically from the formal dependency declarations in each Capability's Contract.
1.4 — The Problem of Uncontrolled Evolution
AI models evolve rapidly. Each new model release brings new capabilities, new output formats, new context window sizes, and new pricing structures. An agentic AI system designed around a specific set of models will need to evolve continuously to take advantage of improvements and to adapt to changes in the model landscape.
Without a formal mechanism for managing this evolution, every model upgrade is a potential breaking change. Downstream agents that depend on the output format of an upgraded agent may break silently. Consumers that were written against an older version of an agent's interface may continue to work for a while — until some edge case triggers the incompatibility — and then fail in production in ways that are difficult to diagnose.
CCA addresses this through Evolution Envelopes: formal structures that encapsulate the versioning information, deprecation policies, and migration paths for every Capability. When a Capability's Contract changes in a backward-incompatible way, the Evolution Envelope provides a structured, documented migration path for every consumer. Evolution becomes explicit, predictable, and manageable rather than chaotic and surprising.
1.5 — The Problem of Hardware Lock-in
A less-discussed but increasingly important problem is inference backend lock-in. Teams that build agentic systems exclusively around cloud APIs find themselves unable to run workloads locally for cost, latency, privacy, or air-gap compliance reasons. Conversely, teams that build around a specific local inference library find themselves unable to switch backends without rewriting large portions of their codebase.
CCA addresses this through the LLM Port abstraction in the Realization layer. The Essence of every agent communicates only with an abstract LLMPort interface. The concrete implementation of that interface — whether it calls OpenAI's API, runs llama.cpp on CUDA, uses Apple's MLX framework, or invokes Intel OpenVINO — is a Realization detail that can be swapped without touching any other layer.
Chapter 2 — The Example System: An Agentic Software Engineering Pipeline
Throughout this article, we build a concrete, running example: an Agentic Software Engineering Pipeline. This is a multi-agent system that accepts a natural-language feature request from a developer and autonomously produces reviewed, tested, and documented code that is ready for a pull request.
The pipeline consists of six agents, each with a distinct role:
+==========================+=======================+==========================================+
| Capability | Default Cloud Model | Role |
+==========================+=======================+==========================================+
| RequirementsAnalysis | Claude 4.6 Opus | Decomposes requests into eng. tasks |
+--------------------------+-----------------------+------------------------------------------+
| CodeGeneration | GPT-5.3 | Generates implementation code |
+--------------------------+-----------------------+------------------------------------------+
| CodeReview | Claude 4.6 Sonnet | Reviews code, produces feedback |
+--------------------------+-----------------------+------------------------------------------+
| TestGeneration | Gemini 3.1 Pro | Generates comprehensive test suites |
+--------------------------+-----------------------+------------------------------------------+
| Documentation | Gemini 3.1 Flash | Produces docstrings, README, API docs |
+--------------------------+-----------------------+------------------------------------------+
| Orchestrator | GPT-5.3 | Plans, coordinates, synthesizes output |
+--------------------------+-----------------------+------------------------------------------+
Each agent is modelled as a CCA Capability. The remainder of this article shows exactly how to do that, concept by concept, with complete, runnable code.
Chapter 3 — What Is a Capability?
The Capability is the foundational unit of structure in CCA. Before we can write a single line of code, we need to understand precisely what a Capability is, why it is defined the way it is, and how to identify the right Capabilities for a given system — because getting this wrong at the start will undermine everything that follows.
3.1 — The Definition
A Capability is defined as a cohesive set of functionality that consistently delivers tangible value, either to end-users or to other interacting Capabilities within the system. This definition is deceptively simple, and each word in it carries weight.
The word cohesive means that everything inside a Capability belongs together because it serves the same functional purpose. The code-generation logic, the prompt-engineering strategies, the output parsing, and the retry handling for a code-generation agent all belong together because they all serve the single purpose of turning a structured engineering task into working source code. They are cohesive. By contrast, the code-review logic does not belong in the same Capability, even though it is closely related, because it serves a different functional purpose: evaluating code quality rather than producing code.
The phrase delivers tangible value is a practical test for whether a candidate Capability is real or artificial. A Capability should be able to have its purpose stated in a single, clear sentence that a non-technical stakeholder could understand. "Code Generation produces working source code from a structured engineering task specification" passes this test. "LLM API Wrapper wraps the OpenAI API" does not — it describes a technical mechanism, not a business value. A Capability that cannot pass this sentence test is almost certainly either too narrow (a technical layer masquerading as a Capability) or too broad (multiple Capabilities that have been incorrectly merged).
3.2 — How Capabilities Differ from Bounded Contexts
Readers familiar with Domain-Driven Design will notice that Capabilities bear a strong resemblance to Bounded Contexts. This resemblance is intentional — CCA draws heavily from DDD — but the two concepts are not identical, and the differences matter.
A Bounded Context is primarily a conceptual tool for domain modeling. It establishes clear linguistic boundaries within a business domain: within the "Sales" bounded context, the word "customer" means one thing; within the "Support" bounded context, it may mean something subtly different. Bounded Contexts are concerned with the semantics of the domain model and the consistency of the ubiquitous language within each context.
A Capability in CCA goes further in three important ways. First, a Capability explicitly includes the technical mechanisms necessary to deliver its functionality — the Realization layer, which we will discuss in detail shortly. A Bounded Context says nothing about how its logic is executed; a Capability does. Second, a Capability explicitly specifies the quality attributes it must meet — its performance characteristics, reliability requirements, security constraints, and latency guarantees — through its Contract's Protocols. Third, a Capability carries a formal evolution strategy through its Evolution Envelope, specifying how it will change over time and how consumers should adapt. A Bounded Context has no equivalent mechanism.
In practical terms, this means that a Capability is not just a conceptual grouping of domain logic — it is a fully self-contained, deployable, testable, and governable unit of the system that knows what it does, how it does it, how well it must do it, and how it will change over time.
3.3 — The Critical Rule: Identify by Function, Not by Technology
The most common mistake when identifying Capabilities is to define them along technical lines rather than functional lines. This mistake is so common and so damaging that it deserves explicit, emphatic treatment.
Do not create a "DatabaseAccessCapability." Database access is a technical mechanism, not a business function. It belongs inside the Realization layer of whatever domain-specific Capability needs it. Do not create an "LLMAPICapability." Calling an LLM API is a technical mechanism. It belongs inside the Realization layer of the specific agent that needs it. Do not create a "LoggingCapability" that is shared by all other Capabilities. Logging is a cross-cutting concern that should be handled through the infrastructure layer, not a Capability in its own right.
Similarly, do not create Capabilities along organizational lines. The fact that one team owns the requirements-analysis logic and another team owns the code-generation logic does not automatically mean they should be separate Capabilities. Functional cohesion must be the primary driver. If two pieces of functionality are deeply intertwined and always change together, they belong in the same Capability regardless of team boundaries. If two pieces of functionality are independent and change at different rates, they belong in separate Capabilities regardless of who owns them.
In our pipeline, the six Capabilities we identified — RequirementsAnalysis, CodeGeneration, CodeReview, TestGeneration, Documentation, and Orchestrator — each pass the functional cohesion test. Each has a clear, single-sentence purpose. Each changes for different reasons and at different rates. Each can be independently tested, deployed, and evolved. This is the right decomposition.
3.4 — The Module Structure
With the Capability decomposition established, we can define the module structure for our system. This structure reflects the CCA principle that every Capability is a self-contained unit with its own internal layers:
agentic_pipeline/
├── models.py # Shared data classes (single source of truth)
├── contracts.py # Capability Contract Protocol interfaces
├── cca/
│ ├── evolution.py # EvolutionEnvelope, DeprecationNotice, MigrationPath
│ ├── lifecycle.py # CapabilityInstance ABC + LifecycleState enum
│ ├── registry.py # CapabilityRegistry, CapabilityDescriptor
│ └── manager.py # CapabilityLifecycleManager
├── capabilities/
│ ├── llm_port.py # Abstract LLMPort Protocol
│ ├── realizations/
│ │ ├── cloud/
│ │ │ ├── gpt53.py # GPT-5.3 cloud Realization
│ │ │ ├── claude46.py # Claude 4.6 cloud Realization
│ │ │ └── gemini31.py # Gemini 3.1 cloud Realization
│ │ └── local/
│ │ ├── base.py # LocalLLMRealization abstract base
│ │ ├── cuda_realization.py # NVIDIA CUDA via llama-cpp-python
│ │ ├── mlx_realization.py # Apple MLX
│ │ ├── rocm_realization.py # AMD ROCm
│ │ ├── vulkan_realization.py# Vulkan cross-vendor GPU
│ │ ├── intel_realization.py # Intel OpenVINO
│ │ └── factory.py # LLMRealizationFactory (hardware auto-detection)
│ ├── code_generation/
│ │ ├── essence.py
│ │ ├── adaptation.py
│ │ └── capability.py
│ ├── requirements_analysis/
│ │ └── capability.py
│ ├── code_review/
│ │ └── capability.py
│ ├── test_generation/
│ │ └── capability.py
│ ├── documentation/
│ │ └── capability.py
│ └── orchestrator/
│ ├── essence.py
│ ├── state_tracker.py
│ └── capability.py
└── assembly.py # System composition root
Chapter 4 — Shared Data Models
Before we can define Contracts or implement Capabilities, we need a shared vocabulary — the data structures that flow between Capabilities. In CCA, all shared domain models live in a single module. This is not merely a matter of convenience; it is an architectural principle. If each Capability defined its own version of what a "generated code artifact" looks like, Capabilities could not communicate without translation layers. By defining all shared types in one place, we ensure that when the CodeGenerationCapability produces a GeneratedCode object and the CodeReviewCapability consumes it, they are talking about exactly the same thing.
# models.py
# =============================================================================
# Shared domain data models for the Agentic Software Engineering Pipeline.
# All Capabilities import from this module. No Capability redefines these types.
# Requires Python >= 3.10.
# =============================================================================
from __future__ import annotations
from dataclasses import dataclass, field
@dataclass(frozen=True)
class FeatureRequest:
"""
The raw input to the pipeline — a natural-language feature request
submitted by a developer. This is the entry point for the entire system.
frozen=True makes instances hashable and prevents accidental mutation
as the object flows through multiple Capabilities.
"""
request_id: str
title: str
description: str
requester: str
priority: str # "low" | "medium" | "high" | "critical"
@dataclass(frozen=True)
class EngineeringTask:
"""
A structured engineering task produced by RequirementsAnalysisCapability.
Represents a single, implementable unit of work derived from a feature
request. Using tuples instead of lists preserves the frozen=True guarantee
all the way through the data structure.
"""
task_id: str
title: str
description: str
acceptance_criteria: tuple[str, ...]
estimated_complexity: str # "low" | "medium" | "high"
technical_constraints: tuple[str, ...]
target_language: str
@dataclass(frozen=True)
class GeneratedCode:
"""
The output of the CodeGenerationCapability for a single engineering task.
The generation_model field records which model or realization produced
this artifact, providing a full audit trail through the pipeline.
"""
task_id: str
language: str
source_code: str
explanation: str
dependencies: tuple[str, ...]
generation_model: str
@dataclass(frozen=True)
class ReviewFeedback:
"""
A single structured feedback item produced by CodeReviewCapability.
Each feedback item is categorized by severity and type so that
downstream consumers (e.g., the Orchestrator) can make informed
decisions about whether to accept, revise, or reject the code.
"""
task_id: str
severity: str # "info" | "warning" | "error" | "critical"
category: str # "correctness" | "style" | "security" | "performance"
description: str
suggested_fix: str
line_reference: str | None
@dataclass(frozen=True)
class ReviewResult:
"""
The complete review outcome for a single engineering task.
The approved field gives the Orchestrator a clear binary signal,
while feedback_items provides the full detail for logging and
potential revision loops.
"""
task_id: str
approved: bool
feedback_items: tuple[ReviewFeedback, ...]
overall_assessment: str
@dataclass(frozen=True)
class TestSuite:
"""
A generated test suite for a single engineering task.
"""
task_id: str
test_framework: str
test_source_code: str
test_count: int
coverage_targets: tuple[str, ...]
@dataclass(frozen=True)
class DocumentationArtifact:
"""
Generated documentation for a single engineering task.
Separating docstrings, README content, and API documentation
allows consumers to use whichever format they need.
"""
task_id: str
docstrings: str
readme_section: str
api_documentation: str
@dataclass(frozen=True)
class PipelineResult:
"""
The final synthesized output of the entire pipeline for one feature request.
This is what the OrchestratorCapability returns to the caller after
coordinating all specialist Capabilities.
"""
request_id: str
engineering_tasks: tuple[EngineeringTask, ...]
generated_code: tuple[GeneratedCode, ...]
review_results: tuple[ReviewResult, ...]
test_suites: tuple[TestSuite, ...]
documentation: tuple[DocumentationArtifact, ...]
pipeline_status: str # "success" | "partial_success" | "failure"
summary: str
Chapter 5 — Capability Contracts
Capability Contracts are the cornerstone of CCA. Understanding them deeply is essential, because everything else in the architecture — the Registry, the Lifecycle Manager, the dependency injection system — is built on top of them. A Contract is a formal, explicit, interface-oriented agreement that precisely defines the public API of a Capability. It is the only thing that other Capabilities are allowed to know about a given Capability. They cannot see its Essence, its Realization, or its Adaptation. They can only see its Contract.
5.1 — Why Contracts Are Necessary
Consider what happens in a system without formal Contracts. The OrchestratorCapability needs to call the CodeGenerationCapability. Without a Contract, it must import the CodeGenerationCapability class directly, call whatever method happens to exist on it, and hope that the method signature, parameter types, and return type match what the Orchestrator expects. If the CodeGenerationCapability is later refactored — perhaps the method is renamed, or its return type changes — the Orchestrator breaks. The only way to discover the breakage is to run the system and observe the failure.
With a formal Contract, the situation is entirely different. The OrchestratorCapability declares that it requires a CodeGenerationContract — a typed interface that specifies exactly what methods are available, what parameters they accept, and what they return. The CodeGenerationCapability declares that it provides a CodeGenerationContract. The Registry verifies at startup that every declared Requirement has a matching Provision. If the CodeGenerationCapability is later refactored in a way that breaks the Contract, the type checker catches it before the code is even run. The Contract is the firewall between Capabilities.
5.2 — The Three Elements of a Contract
Every Capability Contract is composed of exactly three elements: Provisions, Requirements, and Protocols. Each element serves a distinct and non-overlapping purpose.
Provisions define the interfaces that a Capability offers to others. They are the Capability's promises to the world: "I will provide these services, and you can depend on them." A Provision is not a concrete class — it is an interface, a Protocol in Python terms, that specifies method signatures without any implementation. The concrete implementation lives inside the Capability's Adaptation layer, invisible to the outside world. This separation is what makes independent evolution possible: the Capability can completely rewrite its internal implementation as long as it continues to honor its Provisions.
Requirements define the interfaces that a Capability needs from others in order to function. They are the Capability's honest declaration of its dependencies: "I cannot do my job without these services from other Capabilities." By making Requirements explicit and formal, CCA enables the Registry to build a complete, accurate dependency graph of the entire system. This graph is what the Lifecycle Manager uses to determine startup order and to perform dependency injection. Without explicit Requirements, the dependency graph is invisible, and startup order must be guessed.
Protocols define the interaction patterns and quality attributes that govern how Capabilities communicate. This is where Contracts go beyond simple interface definitions. A Protocol specifies not just what methods exist, but how they should be called: synchronously or asynchronously, with what data format, with what latency guarantee, with what reliability expectation, and with what security requirements. A single Capability can declare multiple Protocols simultaneously — for example, a direct in-process call for high-performance consumers, a REST API for external clients, and a message queue interface for asynchronous event-driven integrations. This flexibility allows the same Capability to serve diverse consumers without changing its internal implementation.
5.3 — The Contract as a Stability Boundary
One of the most important architectural insights in CCA is that Contracts should be designed to be stable. The internal implementation of a Capability — its Essence, Realization, and Adaptation — can and should change frequently as requirements evolve, models improve, and technology advances. But the Contract should change rarely, and when it does change in a backward-incompatible way, that change must be managed explicitly through the Evolution Envelope.
This stability principle has a practical implication for how Contracts are designed: they should express what the Capability does in terms of business outcomes, not how it does it in terms of technical mechanisms. A CodeGenerationContract should expose a generate_code(task: EngineeringTask) -> GeneratedCode method, not a call_gpt_api(prompt: str) -> str method. The former is stable because it describes a business function; the latter is fragile because it exposes an implementation detail.
5.4 — Contract Implementation
# contracts.py
# =============================================================================
# Capability Contract Protocol interfaces for the Agentic Pipeline.
#
# These interfaces are the ONLY thing that Capabilities know about each other.
# No Capability imports another Capability's concrete implementation class.
# The only imports allowed between Capabilities are these Protocol interfaces
# and the shared data models from models.py.
# =============================================================================
from __future__ import annotations
from typing import Protocol, runtime_checkable, Callable
from dataclasses import dataclass
from enum import Enum
from models import (
FeatureRequest, EngineeringTask, GeneratedCode,
ReviewResult, TestSuite, DocumentationArtifact, PipelineResult,
)
# ---------------------------------------------------------------------------
# Protocol Specification Types
#
# These types describe the metadata of a Contract — not the interface methods
# themselves, but the structural information about how the Contract is used,
# what communication mechanisms it supports, and what quality guarantees it
# makes. This metadata is consumed by the Registry and the Lifecycle Manager.
# ---------------------------------------------------------------------------
class CommunicationMechanism(Enum):
"""
The supported communication patterns between Capabilities.
A single Capability may declare support for multiple mechanisms,
allowing it to serve different consumers in different contexts.
"""
DIRECT_CALL = "direct_call" # In-process, synchronous, zero-overhead
REST_HTTP = "rest_http" # HTTP/2, suitable for cross-process or cross-host
MESSAGE_QUEUE = "message_queue" # Asynchronous, decoupled, event-driven
GRPC = "grpc" # High-performance RPC, binary protocol
@dataclass(frozen=True)
class CapabilityProtocol:
"""
Specifies the interaction pattern and quality attributes for one
communication mode of a Capability.
The max_latency_ms field is particularly important for agentic systems,
where end-to-end pipeline latency is often a critical user-facing metric.
Declaring latency expectations in the Contract makes them visible to the
system architect and allows the Lifecycle Manager to flag mismatches
between what a consumer expects and what a provider guarantees.
"""
communication_mechanism: CommunicationMechanism
data_format: str # e.g., "python_objects", "json", "protobuf"
max_latency_ms: int | None # None means no strict latency requirement
reliability: str # e.g., "at-least-once", "exactly-once", "best-effort"
authentication: str | None # e.g., None, "api_key", "oauth2", "mtls"
description: str
@dataclass(frozen=True)
class ProvisionDefinition:
"""
Describes a single interface that a Capability provides to others.
The interface_type is a Python Protocol class — a typed, inspectable
description of the methods available to consumers.
"""
name: str
interface_type: type
description: str
@dataclass(frozen=True)
class RequirementDefinition:
"""
Describes a single interface that a Capability needs from others.
The optional flag is important: if optional is True, the Lifecycle
Manager will not fail startup if no provider is found for this
Requirement. This supports graceful degradation in partial deployments.
"""
name: str
interface_type: type
optional: bool
description: str
@dataclass(frozen=True)
class CapabilityContract:
"""
The complete formal Contract for a Capability.
This is the only public face of a Capability that other Capabilities
are allowed to see. It is registered with the CapabilityRegistry at
startup and used by the CapabilityLifecycleManager to build the
dependency graph, perform topological sorting, and inject dependencies.
Design note: frozen=True prevents field reassignment but does not
prevent mutation of the objects stored in those fields. Treat all
fields as logically immutable — do not mutate the tuples after creation.
"""
capability_name: str
provisions: tuple[ProvisionDefinition, ...]
requirements: tuple[RequirementDefinition, ...]
protocols: tuple[CapabilityProtocol, ...]
def provides(self, interface_type: type) -> bool:
"""Returns True if this Contract provides the given interface type."""
return any(p.interface_type is interface_type for p in self.provisions)
def requires_interface(self, interface_type: type) -> bool:
"""Returns True if this Contract requires the given interface type."""
return any(r.interface_type is interface_type for r in self.requirements)
def get_required_interfaces(self) -> list[type]:
"""Returns all interface types this Capability requires."""
return [r.interface_type for r in self.requirements if not r.optional]
def get_optional_interfaces(self) -> list[type]:
"""Returns all optional interface types this Capability may use."""
return [r.interface_type for r in self.requirements if r.optional]
# ---------------------------------------------------------------------------
# Capability Contract Protocol Interfaces
#
# Each Protocol below is the formal interface for one Capability's Provision.
# These are what consumers depend on. They are stable by design.
#
# Important: Protocol methods must NOT use @abstractmethod. Protocol classes
# use structural subtyping (duck typing), not nominal inheritance. The method
# bodies are ellipsis (...) by convention, indicating "this method must exist
# on any conforming implementation."
#
# The @runtime_checkable decorator enables isinstance() checks at injection
# time, but note that these checks only verify method name presence, not
# signatures. Use a static type checker (mypy, pyright) for full validation.
# ---------------------------------------------------------------------------
@runtime_checkable
class RequirementsAnalysisContract(Protocol):
"""
The formal interface of the RequirementsAnalysisCapability.
Any Capability that needs to decompose feature requests into engineering
tasks declares a Requirement for this interface. The OrchestratorCapability
is the primary consumer, but any Capability that needs to understand the
structure of a feature request can declare this Requirement.
Stability note: This interface should change only when the fundamental
concept of what "requirements analysis" means changes — which is rare.
Adding new optional parameters to existing methods is backward-compatible.
Removing or renaming methods requires a major version bump in the
Evolution Envelope.
"""
def analyze_feature_request(
self, request: FeatureRequest
) -> list[EngineeringTask]:
"""
Decomposes a natural-language feature request into a structured
list of engineering tasks with acceptance criteria and complexity
estimates. The returned tasks are ordered by dependency — tasks
that must be completed first appear earlier in the list.
"""
...
def refine_task(
self, task: EngineeringTask, refinement_notes: str
) -> EngineeringTask:
"""
Refines an existing engineering task based on feedback from
downstream agents. Called when the CodeReviewCapability determines
that the original task specification was ambiguous or incomplete.
"""
...
@runtime_checkable
class CodeGenerationContract(Protocol):
"""
The formal interface of the CodeGenerationCapability.
This is the most performance-sensitive interface in the pipeline, since
code generation involves the largest LLM context windows and the longest
inference times. The Protocol declaration makes this performance
characteristic visible at the Contract level through the associated
CapabilityProtocol metadata.
"""
def generate_code(self, task: EngineeringTask) -> GeneratedCode:
"""
Generates implementation code for a single engineering task.
The returned GeneratedCode includes not just the source code but
also an explanation of the implementation choices, the list of
dependencies introduced, and a record of which model produced it.
"""
...
def register_completion_listener(
self, listener: Callable[[GeneratedCode], None]
) -> None:
"""
Registers a callback invoked whenever code generation completes.
This enables monitoring and observability Capabilities to track
pipeline progress without being in the critical execution path.
The CodeGenerationCapability does not know who is listening —
it simply invokes all registered callbacks after each generation.
"""
...
@runtime_checkable
class CodeReviewContract(Protocol):
"""
The formal interface of the CodeReviewCapability.
Code review is the quality gate of the pipeline. Its output directly
determines whether the Orchestrator accepts the generated code, requests
revisions, or escalates to a human reviewer.
"""
def review_code(
self, code: GeneratedCode, task: EngineeringTask
) -> ReviewResult:
"""
Reviews generated code against the engineering task specification.
The task parameter is essential: the reviewer needs to know what
the code was supposed to do in order to evaluate whether it does it
correctly. Reviewing code without its specification is like grading
an exam without the questions.
"""
...
@runtime_checkable
class TestGenerationContract(Protocol):
"""
The formal interface of the TestGenerationCapability.
"""
def generate_tests(
self, code: GeneratedCode, task: EngineeringTask
) -> TestSuite:
"""
Generates a comprehensive test suite for the given code,
using the engineering task's acceptance criteria as the
behavioral specification for the tests.
"""
...
@runtime_checkable
class DocumentationContract(Protocol):
"""
The formal interface of the DocumentationCapability.
"""
def generate_documentation(
self, code: GeneratedCode, task: EngineeringTask
) -> DocumentationArtifact:
"""
Generates technical documentation for the given code,
including inline docstrings, a README section describing
the feature, and API documentation for any public interfaces.
"""
...
@runtime_checkable
class OrchestratorContract(Protocol):
"""
The formal interface of the OrchestratorCapability.
This is the primary entry point for the entire pipeline.
External callers — CLI tools, web APIs, CI/CD systems — interact
with the pipeline exclusively through this interface.
"""
def execute_pipeline(self, request: FeatureRequest) -> PipelineResult:
"""
Executes the full agentic pipeline for the given feature request,
coordinating all specialist Capabilities and returning the
synthesized result ready for pull request creation.
"""
...
Chapter 6 — The Capability Nucleus: Essence, Realization, and Adaptation
Every Capability in CCA is internally structured as a Capability Nucleus — three distinct, concentric layers that separate what a Capability does from how it does it and how it exposes itself to the world. This three-layer structure is not bureaucratic overhead; it is the mechanism that makes independent evolution, testability, and deployment flexibility possible. Understanding each layer deeply, and understanding why the boundaries between them must be respected, is essential for applying CCA correctly.
6.1 — The Essence: Pure Logic, No Dependencies
The Essence is the innermost layer of the Capability Nucleus. It contains the pure domain logic or algorithmic core that defines what the Capability does. It is also the primary custodian of the Capability's core domain state. In our pipeline, the Essence of the CodeGenerationCapability contains the prompt-engineering strategies, the task decomposition logic, the output parsing and validation rules, and the retry and fallback decision logic. It does not contain any code that calls an API, reads from a database, writes to a file, or interacts with any external system.
This complete independence from external systems is the Essence's defining characteristic and its greatest strength. Because the Essence has no external dependencies — with the sole exception of other Capability Contracts, which are themselves pure interfaces — it can be tested in complete isolation. No mocking of HTTP clients. No stubbing of database connections. No spinning up of Docker containers. Just pure Python unit tests that exercise the business logic directly. This makes the test suite fast, deterministic, and easy to maintain.
The Essence is also the most stable part of a Capability. The business logic for decomposing a feature request into engineering tasks does not change when you switch from GPT-5.3 to Gemini 3.1 Pro. The prompt structure may change, but the logic for deciding how to structure the prompt, how to validate the output, and how to handle edge cases is stable business logic that belongs in the Essence. By isolating this logic in the Essence, you protect it from the churn of the infrastructure layers.
6.2 — The Realization: Technical Mechanisms, Infrastructure Integration
The Realization is the middle layer of the Capability Nucleus. It is dedicated to the technical mechanisms required to make the Essence functional and operational in the real world, within a specific technical environment. The Realization implements the how of the Capability's operation.
In our pipeline, the Realization of the CodeGenerationCapability is where the LLM API calls happen. The Realization knows how to construct an HTTP request to the OpenAI API, how to handle rate limiting and retries at the network level, how to parse the raw API response into a structured format that the Essence can work with, and how to manage API keys and authentication. None of this knowledge exists in the Essence.
The critical insight about the Realization is that it is replaceable. Because the Essence communicates with the Realization only through an abstract LLMPort interface — never through a concrete API client class — the entire Realization can be swapped out without touching the Essence. Switching from the GPT-5.3 Realization to a local Llama 3 Realization running on CUDA means writing a new class that implements LLMPort and registering it in the assembly. The Essence, the Adaptation, and the Contract all remain unchanged.
This replaceability is not just a theoretical benefit — it is the practical solution to the hardware lock-in problem described in Part 1. We will show concrete Realization implementations for cloud APIs, NVIDIA CUDA, Apple MLX, AMD ROCm, Vulkan, and Intel OpenVINO in Part 10.
6.3 — The Adaptation: External Interfaces, Communication Protocols
The Adaptation is the outermost layer of the Capability Nucleus. It provides the explicit interfaces through which the Capability interacts with other Capabilities and with external systems. The Adaptation is responsible for translating between the Capability's internal representation of data and the external representation expected by its consumers.
In our pipeline, the Adaptation of the CodeGenerationCapability implements the CodeGenerationContract Protocol interface. When another Capability calls generate_code(task) on the CodeGenerationCapability, it is calling a method on the Adaptation. The Adaptation translates this call into the Essence's internal representation, delegates to the Essence and Realization to do the actual work, and then translates the result back into the GeneratedCode data model that the caller expects.
The Adaptation is also where multiple communication protocols are supported simultaneously. The same Capability can expose its functionality through a direct in-process call (for the Orchestrator running in the same process), a REST API (for external monitoring tools), and a message queue interface (for asynchronous batch processing). All three Adaptations delegate to the same Essence and Realization. The caller's choice of protocol does not affect the Capability's internal behavior.
6.4 — The LLM Port: Abstracting Inference Backends
Before we can implement any of the three layers for our AI agents, we need to define the abstract interface that the Essence uses to communicate with the LLM — the LLMPort. This interface is the boundary between the stable business logic in the Essence and the replaceable infrastructure in the Realization.
# capabilities/llm_port.py
# =============================================================================
# The abstract LLMPort interface.
#
# This is the single most important abstraction in the entire system for
# enabling backend flexibility. Every agent's Essence communicates with its
# LLM exclusively through this interface. The concrete implementation —
# whether it calls OpenAI's API, runs llama.cpp on CUDA, uses Apple's MLX
# framework, or invokes Intel OpenVINO — lives in the Realization layer and
# is invisible to the Essence.
#
# By keeping this interface minimal and focused on the core inference
# operation, we ensure that any LLM backend can implement it without
# requiring changes to the interface itself.
# =============================================================================
from __future__ import annotations
from typing import Protocol, runtime_checkable
from dataclasses import dataclass
@dataclass(frozen=True)
class LLMRequest:
"""
A structured request to an LLM backend.
The system_prompt establishes the agent's role and behavioral guidelines.
The user_prompt contains the specific task or question for this invocation.
The temperature controls the randomness of the output — lower values
produce more deterministic, focused outputs (better for code generation),
while higher values produce more creative, varied outputs (better for
brainstorming and documentation).
max_tokens limits the response length to prevent runaway generation
and control costs.
"""
system_prompt: str
user_prompt: str
temperature: float = 0.1
max_tokens: int = 4096
@dataclass(frozen=True)
class LLMResponse:
"""
The structured response from an LLM backend.
The content field contains the raw text of the model's response.
The model_identifier records which specific model and backend produced
this response, providing a full audit trail for debugging and compliance.
The token counts enable cost tracking and capacity planning.
"""
content: str
model_identifier: str # e.g., "gpt-5.3", "llama-3-70b-cuda", "mlx-mistral-7b"
prompt_tokens: int
completion_tokens: int
total_tokens: int
@runtime_checkable
class LLMPort(Protocol):
"""
The abstract interface between a Capability's Essence and its LLM backend.
Any object that implements the complete() method with this signature
is a valid LLMPort. This includes cloud API clients (OpenAI, Anthropic,
Google), local inference engines (llama.cpp via CUDA, MLX, ROCm, Vulkan,
Intel OpenVINO), and mock implementations for testing.
The simplicity of this interface is intentional. A single method is all
that is needed to support the full range of LLM interactions in our
pipeline. If you find yourself wanting to add methods to this interface,
ask whether those methods belong here or in the Realization layer.
"""
def complete(self, request: LLMRequest) -> LLMResponse:
"""
Sends a request to the LLM backend and returns the response.
Implementations are responsible for handling retries, rate limiting,
authentication, and any backend-specific error conditions.
The caller (the Essence) should not need to know about any of these
infrastructure concerns.
"""
...
def is_available(self) -> bool:
"""
Returns True if the backend is currently available and ready to
accept requests. Used by the LLMRealizationFactory during hardware
detection and by the Lifecycle Manager during health checks.
"""
...
6.5 — Implementing the Nucleus: CodeGenerationCapability
Now we can implement the full Capability Nucleus for the CodeGenerationCapability. We show all three layers — Essence, Realization, and Adaptation — and then the glue class that wires them together into a CapabilityInstance.
# capabilities/code_generation/essence.py
# =============================================================================
# CodeGenerationEssence: Pure domain logic for code generation.
#
# This class contains everything that is true about code generation regardless
# of which LLM backend is used, which communication protocol is active, or
# which deployment environment the system is running in.
#
# The only external dependency is LLMPort — an abstract interface, not a
# concrete class. This means the Essence can be tested with a mock LLMPort
# that returns predetermined responses, without any network calls, API keys,
# or infrastructure setup.
# =============================================================================
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Callable
from models import EngineeringTask, GeneratedCode
from capabilities.llm_port import LLMPort, LLMRequest
class CodeGenerationEssence:
"""
The pure domain logic for the CodeGenerationCapability.
This class is responsible for:
1. Constructing the optimal prompt for a given engineering task,
including the system prompt that establishes the agent's role,
the task description, acceptance criteria, and technical constraints.
2. Parsing and validating the LLM's response into a structured
GeneratedCode object.
3. Deciding when to retry (e.g., when the response is malformed),
when to fall back to a simpler prompt, and when to give up.
4. Notifying registered listeners when generation completes,
enabling observability without coupling to monitoring infrastructure.
The Essence does NOT know:
- Which LLM model is being used (GPT-5.3, Gemini, Llama, etc.)
- Whether the LLM is running in the cloud or locally
- How the result will be delivered to the caller (direct call, REST, MQ)
- How the Capability is deployed (monolith, microservice, serverless)
"""
# The system prompt establishes the agent's role and behavioral guidelines.
# It is defined at the class level because it is a stable part of the
# Capability's domain logic, not a configuration parameter.
_SYSTEM_PROMPT = """You are an expert software engineer with deep expertise
in writing clean, well-tested, production-ready code. Your task is to implement
code based on a structured engineering task specification. You must:
1. Implement the full solution, not just a skeleton or placeholder.
2. Follow the acceptance criteria exactly as specified.
3. Respect all technical constraints listed in the task.
4. Include inline comments explaining non-obvious implementation choices.
5. List all external dependencies your implementation requires.
Return your response as a JSON object with fields:
source_code (string), explanation (string), dependencies (array of strings).
"""
def __init__(self, llm_port: LLMPort) -> None:
"""
The LLMPort is injected at construction time by the Adaptation layer,
which receives it from the Realization. The Essence never constructs
the LLMPort itself — doing so would create a direct dependency on a
concrete implementation and defeat the purpose of the abstraction.
"""
self._llm = llm_port
# Completion listeners are registered by external observers (e.g., monitoring
# Capabilities) and invoked after each successful generation. The Essence
# does not know who the listeners are — it simply calls them.
self._completion_listeners: list[Callable[[GeneratedCode], None]] = []
def generate_code(self, task: EngineeringTask) -> GeneratedCode:
"""
The core domain operation: generate code for an engineering task.
This method orchestrates the full generation process:
1. Build a structured prompt from the task specification.
2. Send the prompt to the LLM via the abstract LLMPort.
3. Parse and validate the response.
4. Retry with a simplified prompt if parsing fails.
5. Notify all registered listeners.
6. Return the structured GeneratedCode artifact.
"""
prompt = self._build_prompt(task)
request = LLMRequest(
system_prompt=self._SYSTEM_PROMPT,
user_prompt=prompt,
temperature=0.1, # Low temperature for deterministic, focused code generation
max_tokens=8192,
)
response = self._llm.complete(request)
generated_code = self._parse_response(response, task)
# Notify all registered listeners. This is a synchronous notification —
# listeners should be fast (e.g., incrementing a counter, writing a log line).
# If a listener needs to do heavy work, it should schedule it asynchronously.
for listener in self._completion_listeners:
try:
listener(generated_code)
except Exception:
# A failing listener must never break the generation pipeline.
# Errors in observers are logged but not propagated to the caller.
pass
return generated_code
def register_completion_listener(
self, listener: Callable[[GeneratedCode], None]
) -> None:
"""Registers a callback to be invoked after each successful generation."""
self._completion_listeners.append(listener)
def _build_prompt(self, task: EngineeringTask) -> str:
"""
Constructs a structured prompt from the engineering task specification.
The prompt structure is a domain concern — it encodes our knowledge
of how to communicate effectively with code-generation models.
This knowledge belongs in the Essence, not in the Realization.
"""
criteria_text = "\n".join(
f" - {criterion}" for criterion in task.acceptance_criteria
)
constraints_text = "\n".join(
f" - {constraint}" for constraint in task.technical_constraints
)
return f"""
Task Title: {task.title}
Task Description:
{task.description}
Target Language: {task.target_language}
Estimated Complexity: {task.estimated_complexity}
Acceptance Criteria (your implementation MUST satisfy all of these):
{criteria_text}
Technical Constraints (your implementation MUST respect all of these):
{constraints_text}
Implement the complete solution now.
"""
def _parse_response(
self, response: "LLMResponse", task: EngineeringTask
) -> GeneratedCode:
"""
Parses the LLM's raw text response into a structured GeneratedCode object.
This parsing logic is a domain concern: it encodes our knowledge of
the expected response format and our validation rules.
"""
import json
import re
# Attempt to extract a JSON block from the response.
# Models sometimes wrap JSON in markdown code fences.
content = response.content.strip()
json_match = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", content, re.DOTALL)
if json_match:
content = json_match.group(1)
try:
parsed = json.loads(content)
return GeneratedCode(
task_id=task.task_id,
language=task.target_language,
source_code=parsed.get("source_code", ""),
explanation=parsed.get("explanation", ""),
dependencies=tuple(parsed.get("dependencies", [])),
generation_model=response.model_identifier,
)
except (json.JSONDecodeError, KeyError):
# If parsing fails, return a best-effort result with the raw content.
# The CodeReviewCapability will flag this as a quality issue.
return GeneratedCode(
task_id=task.task_id,
language=task.target_language,
source_code=content,
explanation="Raw response — JSON parsing failed.",
dependencies=(),
generation_model=response.model_identifier,
)
# capabilities/code_generation/adaptation.py
# =============================================================================
# CodeGenerationAdaptation: Implements the CodeGenerationContract interface.
#
# The Adaptation is the public face of the Capability. It implements the
# Protocol interface defined in contracts.py, translating between the
# external interface (what callers expect) and the internal domain logic
# (what the Essence provides).
#
# The Adaptation is thin by design. It should contain no business logic.
# Its only job is to delegate to the Essence and handle any translation
# between external and internal representations.
# =============================================================================
from __future__ import annotations
from typing import Callable
from models import EngineeringTask, GeneratedCode
from contracts import CodeGenerationContract
from capabilities.code_generation.essence import CodeGenerationEssence
class CodeGenerationAdaptation:
"""
Implements the CodeGenerationContract Protocol interface.
This class is what other Capabilities receive when they declare a
Requirement for CodeGenerationContract. They call methods on this
object, which delegates to the Essence for all actual work.
The Adaptation is also where you would add cross-cutting concerns
that are not part of the core domain logic but must be applied at
every entry point: input validation, request logging, metrics
collection, circuit breaking, and so on.
"""
def __init__(self, essence: CodeGenerationEssence) -> None:
self._essence = essence
def generate_code(self, task: EngineeringTask) -> GeneratedCode:
"""
Delegates to the Essence. Validates the input before delegating
to catch malformed requests at the boundary rather than deep
inside the domain logic where they are harder to diagnose.
"""
if not task.task_id:
raise ValueError("EngineeringTask must have a non-empty task_id.")
if not task.description:
raise ValueError("EngineeringTask must have a non-empty description.")
return self._essence.generate_code(task)
def register_completion_listener(
self, listener: Callable[[GeneratedCode], None]
) -> None:
"""Delegates listener registration to the Essence."""
self._essence.register_completion_listener(listener)
# capabilities/code_generation/capability.py
# =============================================================================
# CodeGenerationCapabilityInstance: The glue class.
#
# This class wires together the Essence, Realization, and Adaptation into a
# complete Capability that implements the CapabilityInstance interface.
# It is the object that the CapabilityLifecycleManager creates, initializes,
# starts, stops, and cleans up.
#
# The CapabilityInstance is the only class in the Capability that knows
# about all three layers. It is responsible for constructing them in the
# right order and wiring them together during initialization.
# =============================================================================
from __future__ import annotations
from cca.lifecycle import CapabilityInstance, LifecycleState
from capabilities.llm_port import LLMPort
from capabilities.code_generation.essence import CodeGenerationEssence
from capabilities.code_generation.adaptation import CodeGenerationAdaptation
from contracts import CodeGenerationContract
class CodeGenerationCapabilityInstance(CapabilityInstance):
"""
The complete CodeGenerationCapability, implementing the CapabilityInstance
lifecycle interface.
The llm_port parameter is provided by the assembly layer (assembly.py),
which is the only place in the system where concrete Realization classes
are instantiated. This means the CapabilityInstance itself does not know
which LLM backend is being used — it only knows that it has an LLMPort.
"""
def __init__(self, llm_port: LLMPort) -> None:
self._llm_port = llm_port
self._essence: CodeGenerationEssence | None = None
self._adaptation: CodeGenerationAdaptation | None = None
self._state = LifecycleState.CREATED
def initialize(self) -> None:
"""
Constructs the Essence and Adaptation, wiring them together.
This is called by the LifecycleManager after instantiation but
before any dependencies are injected. At this point, the Capability
sets up its internal structure but does not yet start any active
operations (threads, connections, listeners).
"""
self._essence = CodeGenerationEssence(llm_port=self._llm_port)
self._adaptation = CodeGenerationAdaptation(essence=self._essence)
self._state = LifecycleState.INITIALIZED
def inject_dependency(self, interface_type: type, implementation: object) -> None:
"""
The CodeGenerationCapability has no Requirements — it does not depend
on any other Capability's Contract. It only needs an LLMPort, which is
provided at construction time by the assembly layer.
If this Capability were extended to depend on, say, a ToolRegistryContract
(to give the code generator access to external tools), that dependency
would be injected here by the LifecycleManager.
"""
pass # No external Capability dependencies for this Capability.
def start(self) -> None:
"""
Transitions the Capability to the Started state.
For the CodeGenerationCapability, starting is simple — there are no
background threads or persistent connections to establish. The LLM
connection is made on-demand for each request.
"""
self._state = LifecycleState.STARTED
def stop(self) -> None:
"""
Gracefully stops the Capability. Any in-flight requests are allowed
to complete before the Capability stops accepting new ones.
"""
self._state = LifecycleState.STOPPED
def cleanup(self) -> None:
"""
Releases all resources. After cleanup, this instance cannot be restarted.
"""
self._essence = None
self._adaptation = None
self._state = LifecycleState.CLEANED_UP
def get_contract_implementation(self, interface_type: type) -> object | None:
"""
Returns the Adaptation object if the requested interface type is
CodeGenerationContract. Returns None for any other type.
The LifecycleManager calls this method when injecting this Capability's
Provisions into other Capabilities that have declared a Requirement
for CodeGenerationContract.
"""
if interface_type is CodeGenerationContract:
return self._adaptation
return None
Chapter 7 — Evolution Envelopes
Change is inevitable in any software system, and in agentic AI systems it is especially rapid. Models are updated, deprecated, and replaced. Output formats evolve. New capabilities are added. Old interfaces become obsolete. Without a formal mechanism for managing this change, evolution becomes a source of instability rather than progress.
7.1 — Why Evolution Envelopes Are Necessary
Consider what happens when the CodeGenerationCapability's Contract needs to change. Perhaps the GeneratedCodemodel needs a new field — security_analysis — that the CodeReviewCapability will use to understand what security considerations the code generator already took into account. This is a backward-compatible change: existing consumers of CodeGenerationContract that do not use the new field will continue to work without modification. This is a MINOR version bump in Semantic Versioning terms.
Now consider a more disruptive change: the generate_code method needs to return a list of GeneratedCode objects instead of a single one, because some tasks are better served by generating multiple implementation alternatives. This is a backward-incompatible change: every consumer of CodeGenerationContract that calls generate_code and expects a single object will break. This is a MAJOR version bump.
Without an Evolution Envelope, the only way to communicate this change to consumers is through documentation, Slack messages, or code comments — all of which are invisible to the system itself. With an Evolution Envelope, the change is formally recorded in the Capability's metadata, accessible through the Registry, with a documented migration path and an end-of-life date for the old interface. The system can warn consumers at startup if they are depending on a deprecated version.
7.2 — The Three Components of an Evolution Envelope
An Evolution Envelope contains three types of information. Versioning information records the current and previous version of the Capability using Semantic Versioning (MAJOR.MINOR.PATCH). The version number is not just metadata — it is a communication tool that tells consumers exactly what kind of change has occurred and what action they need to take. A PATCH bump means "nothing you depend on has changed." A MINOR bump means "new features are available; your existing code still works." A MAJOR bump means "you must update your code to use the new interface."
Deprecation notices are formal announcements that a specific feature, method, or version of a Capability is being phased out. A deprecation notice includes the target (what is being deprecated), the end-of-life date (when it will stop working), the reason (why it is being deprecated), and the replacement (what consumers should use instead). By recording deprecations formally in the Evolution Envelope, the system can surface them at startup, giving teams visibility into upcoming breaking changes before they occur.
Migration paths provide concrete, actionable guidance for upgrading from an older version of a Contract to a newer one. A migration path includes the from-version and to-version, a list of breaking changes, a URL to detailed documentation, and optionally a reference to an automated migration tool or script. The goal is to make migration as low-friction as possible, so that teams are not discouraged from upgrading by the fear of unknown effort.
# cca/evolution.py
# =============================================================================
# Evolution Envelope: formal versioning, deprecation, and migration management.
#
# Every Capability has an EvolutionEnvelope registered alongside its Contract
# in the CapabilityRegistry. This makes the versioning status of every
# Capability visible to the entire system, not just to the team that owns it.
# =============================================================================
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import date
@dataclass(frozen=True)
class DeprecationNotice:
"""
A formal notice that a specific feature or version of a Capability
is being deprecated.
The end_of_life_date is a hard commitment: after this date, the
deprecated feature will be removed and consumers that have not
migrated will break. Setting this date far enough in the future
(typically 90-180 days for internal systems) gives consumers
sufficient time to migrate without feeling rushed.
The migration_guide_url is optional because not all deprecations
require a separate migration guide — sometimes the replacement
description in the 'replacement' field is sufficient.
"""
target: str # e.g., "Contract v1.x", "generate_code() single-return"
end_of_life_date: date
reason: str
replacement: str
migration_guide_url: str | None = None
@dataclass(frozen=True)
class MigrationPath:
"""
Concrete guidance for upgrading from one version of a Capability's
Contract to another.
The breaking_changes tuple lists every change that requires consumer
code to be updated. Being explicit about breaking changes — rather than
just saying "see the docs" — reduces the effort required to assess
the impact of a migration and helps teams prioritize their migration work.
The estimated_migration_effort field is a rough guide to help teams
plan their sprints. It is not a guarantee, but it is better than
no estimate at all.
"""
from_version: str
to_version: str
breaking_changes: tuple[str, ...]
documentation_url: str
automated_migration_tool: str | None
estimated_migration_effort: str # "trivial" | "low" | "medium" | "high"
@dataclass
class EvolutionEnvelope:
"""
The complete Evolution Envelope for a Capability.
Note that this class is NOT frozen=True, unlike most of our data classes.
This is intentional: the Evolution Envelope is expected to accumulate
deprecation notices and migration paths over the lifetime of the system.
It is mutable by design, but only the Capability's owner should mutate it.
The EvolutionEnvelope is stored in the CapabilityDescriptor and is
accessible through the CapabilityRegistry. This means any component
in the system can query the versioning status of any Capability at
runtime, enabling dynamic deprecation warnings and health checks.
"""
current_version: str
previous_version: str | None
deprecation_notices: list[DeprecationNotice] = field(default_factory=list)
migration_paths: list[MigrationPath] = field(default_factory=list)
release_notes_url: str | None = None
def is_version_deprecated(self, version: str) -> bool:
"""
Returns True if the given version string appears in any active
deprecation notice's target. Used by the Registry to warn about
consumers that are depending on deprecated versions.
"""
return any(version in notice.target for notice in self.deprecation_notices)
def get_migration_path(
self, from_version: str, to_version: str
) -> MigrationPath | None:
"""
Returns the migration path between two specific versions, or None
if no migration path has been defined for this pair. A missing
migration path is itself a signal that should be flagged — it means
the Capability owner has not yet provided migration guidance.
"""
for path in self.migration_paths:
if path.from_version == from_version and path.to_version == to_version:
return path
return None
def get_active_deprecation_notices(
self, as_of: date | None = None
) -> list[DeprecationNotice]:
"""
Returns all deprecation notices that are still active as of the
given date — that is, notices whose end_of_life_date has not yet
passed. Defaults to today if no date is provided.
This method is called by the CapabilityLifecycleManager during
startup to surface deprecation warnings before the system begins
processing requests.
"""
check_date = as_of or date.today()
return [
notice for notice in self.deprecation_notices
if notice.end_of_life_date >= check_date
]
def add_deprecation_notice(self, notice: DeprecationNotice) -> None:
"""Adds a new deprecation notice to this envelope."""
self.deprecation_notices.append(notice)
def add_migration_path(self, path: MigrationPath) -> None:
"""Adds a new migration path to this envelope."""
self.migration_paths.append(path)
Chapter 8 — The CapabilityInstance Interface and Lifecycle States
For the Lifecycle Manager to orchestrate all Capabilities uniformly — regardless of whether they are AI agents, infrastructure services, or monitoring components — every Capability must implement a common interface: CapabilityInstance. This interface defines the six lifecycle methods that the Lifecycle Manager calls in a precise sequence. It is not optional, and it is not a convenience — it is the explicit contract between every Capability and the infrastructure that manages it.
8.1 — Why a Uniform Lifecycle Interface Is Necessary
Without a uniform lifecycle interface, the Lifecycle Manager cannot orchestrate Capabilities in a general way. It would need to know the specific startup and shutdown procedures for each Capability, which means it would need to import and depend on every Capability's concrete implementation. This is precisely the kind of concrete coupling that CCA is designed to eliminate.
With the CapabilityInstance interface, the Lifecycle Manager knows nothing about the specifics of any Capability. It only knows that every Capability has initialize(), inject_dependency(), start(), stop(), cleanup(), and get_contract_implementation() methods. It can orchestrate the entire system — regardless of how many Capabilities it contains or what they do — using only these six methods.
8.2 — The Lifecycle States and Their Transitions
A Capability passes through six well-defined states during its lifetime. Understanding these states and the transitions between them is essential for implementing the CapabilityInstance interface correctly.
The Created state is the initial state. The Capability object has been instantiated by the Lifecycle Manager using the factory function registered in the Capability's descriptor. At this point, the object exists in memory but has not performed any setup. No external interactions should occur in this state — not even reading a configuration file. The constructor should be as lightweight as possible.
The Initialized state is entered when the Lifecycle Manager calls initialize(). In this state, the Capability performs its internal setup: loading configuration, constructing its Essence and Adaptation objects, allocating internal data structures, and preparing any resources that do not depend on other Capabilities. The key constraint is that initialize() must not use any injected dependencies — those are not yet available. If the Capability tries to call another Capability's interface in initialize(), it will fail because the dependency has not yet been injected.
The Dependencies Injected state is entered when the Lifecycle Manager has called inject_dependency() for all of the Capability's declared Requirements. At this point, the Capability has received references to all the external interfaces it needs. It can store these references internally but should not yet start using them actively — that happens in start().
The Started state is entered when the Lifecycle Manager calls start(). This is when the Capability becomes fully operational. It starts background threads, opens persistent connections, activates event listeners, and begins accepting requests. From this point on, the Capability is delivering value.
The Stopped state is entered when the Lifecycle Manager calls stop(), which happens in reverse topological order during shutdown. The Capability gracefully ceases its active operations: it stops accepting new requests, waits for in-flight requests to complete, halts background threads, and closes active connections. Crucially, it does not yet release its resources — that happens in cleanup().
The Cleaned Up state is the final state. The Lifecycle Manager calls cleanup() after stop() has completed. The Capability releases all resources: hardware handles, file descriptors, database connections, network sockets, and memory buffers. After cleanup, the Capability instance cannot be restarted without being re-instantiated and re-initialized from scratch.
# cca/lifecycle.py
# =============================================================================
# CapabilityInstance interface and LifecycleState enum.
#
# Every Capability in the system must implement CapabilityInstance.
# This is the contract between Capabilities and the CapabilityLifecycleManager.
# =============================================================================
from __future__ import annotations
from abc import ABC, abstractmethod
from enum import Enum, auto
class LifecycleState(Enum):
"""
The well-defined lifecycle states every Capability passes through.
The CapabilityLifecycleManager is the sole authority responsible for
transitioning Capabilities between these states. A Capability must
never transition itself — it must wait for the Manager to call the
appropriate lifecycle method.
"""
CREATED = auto() # Instantiated, no setup performed
INITIALIZED = auto() # Internal setup done, awaiting deps
DEPENDENCIES_INJECTED = auto() # All deps injected, not yet active
STARTED = auto() # Fully operational
STOPPED = auto() # Active ops ceased, resources retained
CLEANED_UP = auto() # All resources released
class CapabilityInstance(ABC):
"""
The interface every Capability must implement to participate in the
CCA lifecycle. This abstract base class defines the six lifecycle
methods that the CapabilityLifecycleManager calls in sequence.
Implementing classes must not call lifecycle methods on themselves —
the Manager is the sole orchestrator of lifecycle transitions.
Implementing classes must not call lifecycle methods on other
Capabilities — all inter-Capability communication goes through
injected Contract interfaces.
"""
@abstractmethod
def initialize(self) -> None:
"""
Performs internal setup that does not require any injected dependencies.
Called by the LifecycleManager immediately after instantiation,
in topological order (dependencies-first).
What belongs here:
- Loading internal configuration
- Constructing Essence and Adaptation objects
- Allocating internal data structures
- Setting up internal state
What does NOT belong here:
- Using injected dependencies (they are not yet available)
- Starting threads or opening connections
- Making network calls or reading from databases
"""
...
@abstractmethod
def inject_dependency(self, interface_type: type, implementation: object) -> None:
"""
Receives an injected dependency from the LifecycleManager.
Called once for each declared Requirement, after initialize()
has completed and after the providing Capability has been started.
The implementation parameter is the providing Capability's
Adaptation object — specifically, the object returned by
get_contract_implementation() on the provider.
Implementations should store the injected dependency in an
instance variable for use in start() and subsequent operations.
They should validate that the injected object is an instance of
the expected Protocol type (using isinstance() with @runtime_checkable).
"""
...
@abstractmethod
def start(self) -> None:
"""
Transitions the Capability to the Started state.
Called by the LifecycleManager after all dependencies have been
injected. This is when the Capability becomes fully operational.
What belongs here:
- Starting background threads
- Opening persistent connections (database pools, message queue connections)
- Activating event listeners
- Performing initial data loading or cache warming
- Any operation that uses injected dependencies
"""
...
@abstractmethod
def stop(self) -> None:
"""
Gracefully ceases active operations.
Called by the LifecycleManager in reverse topological order during
shutdown, ensuring that consumers are stopped before their providers.
Implementations must ensure that all in-flight requests complete
before this method returns. They must not accept new requests after
this method is called. They must not release resources — that
happens in cleanup().
"""
...
@abstractmethod
def cleanup(self) -> None:
"""
Releases all resources.
Called by the LifecycleManager after stop() has completed,
also in reverse topological order.
After this method returns, the Capability instance is inert.
It cannot be restarted without being re-instantiated and
re-initialized. Implementations should set all resource references
to None to allow garbage collection.
"""
...
@abstractmethod
def get_contract_implementation(self, interface_type: type) -> object | None:
"""
Returns the Adaptation object that implements the given interface type,
or None if this Capability does not provide that interface.
This method is called by the LifecycleManager when it needs to inject
this Capability's Provisions into other Capabilities that have declared
Requirements for them. The returned object is passed directly to the
requiring Capability's inject_dependency() method.
Implementations should return self._adaptation (or the specific
Adaptation object) if interface_type matches one of their declared
Provisions, and None otherwise.
"""
...
Chapter 9 — The Capability Registry
The Capability Registry is the authoritative source of truth for the entire CCA system. It is the central hub where all Capabilities register themselves, where all Contracts are recorded, where all dependencies are declared, and where the complete dependency graph of the system is maintained. Without the Registry, the Lifecycle Manager would have no way to know what Capabilities exist, what they provide, what they need, or in what order they should be initialized.
9.1 — What the Registry Does and Why It Matters
The Registry serves five distinct functions, each of which is essential to the system's operation.
Capability Registration is the process by which a Capability announces its existence to the system. When a Capability is registered, its CapabilityDescriptor — which includes its name, its Contract, its Evolution Envelope, and a factory function for creating instances — is stored in the Registry. From this point on, any component in the system can query the Registry to discover what Capabilities are available and what interfaces they provide.
Contract Management is the ongoing maintenance of the complete record of all Provisions and Requirements declared across all registered Capabilities. This record is what makes the dependency graph possible. Without it, the system would have no way to know that the OrchestratorCapability requires a CodeGenerationContract and that the CodeGenerationCapability provides one.
Dependency Resolution is the process of matching Requirements to Provisions. When the OrchestratorCapability declares that it requires a CodeGenerationContract, the Registry searches its records to find which registered Capability provides that interface. This matching is done by interface type — the exact Python class object — not by name, which ensures that the matching is unambiguous and type-safe.
Circular Dependency Detection is one of the Registry's most important safety functions. A circular dependency — where Capability A requires Capability B, which requires Capability C, which requires Capability A — makes topological sorting impossible and would cause the Lifecycle Manager to deadlock during initialization. The Registry detects circular dependencies at registration time, before the system starts, and rejects any registration that would create one. This turns a runtime deadlock into a startup-time error, which is far easier to diagnose and fix.
Topological Sorting is the computation of the correct initialization order for all registered Capabilities. The Registry provides the dependency graph information that the Lifecycle Manager uses to perform this sort. The result is an ordered list of Capabilities such that every Capability appears after all the Capabilities it depends on. This is the order in which the Lifecycle Manager will initialize, inject dependencies into, and start each Capability.
9.2 — The CapabilityDescriptor
A CapabilityDescriptor is the complete registration record for a Capability. It contains everything the Registry needs to know about a Capability: its name (unique identifier within the system), its Contract (what it provides and requires), its Evolution Envelope (its versioning and deprecation status), and a factory function (a callable that creates a new instance of the Capability). The factory function is particularly important: it allows the Registry and Lifecycle Manager to create Capability instances on demand without knowing the Capability's concrete class.
# cca/registry.py
# =============================================================================
# CapabilityRegistry: The authoritative source of truth for the CCA system.
#
# The Registry is a singleton within the system — there is exactly one
# Registry, and all Capabilities register with it. This centralization is
# intentional: it makes the system's dependency graph visible in one place,
# which is essential for debugging, monitoring, and governance.
# =============================================================================
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Callable, Any
from contracts import CapabilityContract
from cca.evolution import EvolutionEnvelope
from cca.lifecycle import CapabilityInstance
@dataclass
class CapabilityDescriptor:
"""
The complete registration record for a Capability.
The factory field is a zero-argument callable that creates a new
CapabilityInstance. Using a factory (rather than storing the instance
directly) allows the Registry to be populated before any instances are
created, which is important for circular dependency detection — you need
to know about all Capabilities before you can check for cycles.
The factory is typically a lambda that captures the Capability's
constructor arguments from the assembly layer. For example:
lambda: CodeGenerationCapabilityInstance(llm_port=gpt53_realization)
This lambda is evaluated lazily — only when the LifecycleManager
decides to create the instance — so the LLM port is not constructed
until it is actually needed.
"""
name: str
contract: CapabilityContract
evolution_envelope: EvolutionEnvelope
factory: Callable[[], CapabilityInstance]
@dataclass
class ContractBinding:
"""
Records a formal dependency relationship between two Capabilities.
A ContractBinding is created when the assembly layer explicitly declares
that a specific Requirement of one Capability should be fulfilled by a
specific Provision of another Capability. In most cases, the Registry
can resolve these bindings automatically based on interface type matching,
but explicit bindings are useful when multiple Capabilities provide the
same interface and you need to specify which one a particular consumer
should use.
The provided_interface_name field is metadata for documentation and
debugging — it is not used for programmatic matching, which is done
by interface_type.
"""
requiring_capability_name: str
required_interface_type: type
providing_capability_name: str
provided_interface_name: str
class CapabilityRegistry:
"""
The central hub for all Capability registrations, Contracts, and
dependency relationships.
The Registry is designed to be populated completely before the
CapabilityLifecycleManager is started. All Capabilities should be
registered and all bindings should be declared before any Capability
is instantiated. This "declare everything first, then start" approach
ensures that circular dependency detection is complete and accurate
before any irreversible actions (like opening network connections) are taken.
"""
def __init__(self) -> None:
# Maps capability name -> CapabilityDescriptor
self._capabilities: dict[str, CapabilityDescriptor] = {}
# All declared contract bindings
self._bindings: list[ContractBinding] = []
def register(self, descriptor: CapabilityDescriptor) -> None:
"""
Registers a Capability with the Registry.
This method validates that:
1. No Capability with the same name is already registered.
2. Adding this Capability does not create a circular dependency.
If either validation fails, the registration is rejected and the
Registry state is unchanged. This rollback-on-failure behavior
ensures the Registry is always in a consistent state.
"""
if descriptor.name in self._capabilities:
raise ValueError(
f"A Capability named '{descriptor.name}' is already registered. "
f"Capability names must be unique within the system."
)
self._capabilities[descriptor.name] = descriptor
# Validate that adding this Capability does not create a cycle.
# If it does, roll back the registration before raising the error.
if self._has_circular_dependencies():
del self._capabilities[descriptor.name]
raise ValueError(
f"Registering '{descriptor.name}' would create a circular dependency. "
f"Review the Requirements declared in its CapabilityContract."
)
# Warn about any active deprecation notices in the Evolution Envelope.
active_notices = descriptor.evolution_envelope.get_active_deprecation_notices()
for notice in active_notices:
print(
f"[CCA WARNING] Capability '{descriptor.name}' has an active "
f"deprecation notice: '{notice.target}' is deprecated "
f"(EOL: {notice.end_of_life_date}). "
f"Replacement: {notice.replacement}"
)
def bind(
self,
requiring: str,
required_interface: type,
providing: str,
provided_interface_name: str,
) -> None:
"""
Declares an explicit dependency binding between two Capabilities.
This method validates that:
1. Both the requiring and providing Capabilities are registered.
2. The providing Capability actually provides the required interface.
3. Adding this binding does not create a circular dependency.
Explicit bindings are optional for most cases — the LifecycleManager
can resolve dependencies automatically by interface type. Use explicit
bindings when you need to override the automatic resolution, for
example when multiple Capabilities provide the same interface.
"""
if requiring not in self._capabilities:
raise ValueError(
f"Cannot create binding: requiring Capability '{requiring}' "
f"is not registered."
)
if providing not in self._capabilities:
raise ValueError(
f"Cannot create binding: providing Capability '{providing}' "
f"is not registered."
)
provider_contract = self._capabilities[providing].contract
if not provider_contract.provides(required_interface):
raise ValueError(
f"Cannot create binding: Capability '{providing}' does not "
f"provide the interface '{required_interface.__name__}'. "
f"Check the Provisions declared in its CapabilityContract."
)
binding = ContractBinding(
requiring_capability_name=requiring,
required_interface_type=required_interface,
providing_capability_name=providing,
provided_interface_name=provided_interface_name,
)
self._bindings.append(binding)
if self._has_circular_dependencies():
self._bindings.pop()
raise ValueError(
f"Adding binding from '{requiring}' to '{providing}' would "
f"create a circular dependency."
)
def find_provider(self, interface_type: type) -> CapabilityDescriptor | None:
"""
Finds the Capability that provides the given interface type.
Returns None if no provider is registered.
In a system where multiple Capabilities could provide the same
interface, this method returns the first match found. For more
precise control, use explicit bindings via the bind() method.
"""
for descriptor in self._capabilities.values():
if descriptor.contract.provides(interface_type):
return descriptor
return None
def get_descriptor(self, name: str) -> CapabilityDescriptor | None:
"""Returns the CapabilityDescriptor for the given Capability name."""
return self._capabilities.get(name)
def get_all_descriptors(self) -> list[CapabilityDescriptor]:
"""Returns all registered CapabilityDescriptors."""
return list(self._capabilities.values())
def get_topological_order(self) -> list[str]:
"""
Computes the topological initialization order for all registered
Capabilities using Kahn's algorithm.
Kahn's algorithm works by repeatedly selecting Capabilities that
have no unresolved dependencies (in-degree zero), adding them to
the result, and then removing their outgoing edges from the graph.
This continues until either all Capabilities have been added to
the result (success) or no more Capabilities can be selected
(which indicates a cycle, though this should have been caught
at registration time).
The result is a list of Capability names in the order they should
be initialized: every Capability appears after all the Capabilities
it depends on.
"""
# Build the dependency graph: name -> set of names it depends on
dependencies: dict[str, set[str]] = {
name: set() for name in self._capabilities
}
for binding in self._bindings:
requiring = binding.requiring_capability_name
providing = binding.providing_capability_name
if requiring in dependencies:
dependencies[requiring].add(providing)
# Also resolve implicit dependencies by interface type matching
for name, descriptor in self._capabilities.items():
for req in descriptor.contract.requirements:
if req.optional:
continue
provider = self.find_provider(req.interface_type)
if provider and provider.name != name:
dependencies[name].add(provider.name)
# Kahn's algorithm
in_degree = {name: len(deps) for name, deps in dependencies.items()}
queue = [name for name, degree in in_degree.items() if degree == 0]
result: list[str] = []
while queue:
# Select the next Capability with no unresolved dependencies.
# Sort for determinism — the topological order is not unique,
# and we want consistent behavior across runs.
queue.sort()
current = queue.pop(0)
result.append(current)
# Remove this Capability from the dependency sets of all others.
for name, deps in dependencies.items():
if current in deps:
deps.discard(current)
in_degree[name] -= 1
if in_degree[name] == 0:
queue.append(name)
if len(result) != len(self._capabilities):
raise RuntimeError(
"Topological sort failed: circular dependency detected. "
"This should have been caught at registration time. "
"Please report this as a bug."
)
return result
def _has_circular_dependencies(self) -> bool:
"""
Detects circular dependencies using depth-first search.
Returns True if any cycle exists in the current dependency graph.
This method is called after every registration and binding to
ensure the Registry remains cycle-free at all times. The cost
of this check is O(V + E) where V is the number of Capabilities
and E is the number of dependency edges — negligible at startup.
"""
# Build adjacency list: name -> list of names it depends on
graph: dict[str, list[str]] = {
name: [] for name in self._capabilities
}
for binding in self._bindings:
if binding.requiring_capability_name in graph:
graph[binding.requiring_capability_name].append(
binding.providing_capability_name
)
for name, descriptor in self._capabilities.items():
for req in descriptor.contract.requirements:
provider = self.find_provider(req.interface_type)
if provider and provider.name != name:
if provider.name not in graph[name]:
graph[name].append(provider.name)
visited: set[str] = set()
recursion_stack: set[str] = set()
def dfs(node: str) -> bool:
visited.add(node)
recursion_stack.add(node)
for neighbor in graph.get(node, []):
if neighbor not in visited:
if dfs(neighbor):
return True
elif neighbor in recursion_stack:
return True
recursion_stack.discard(node)
return False
return any(
dfs(name) for name in self._capabilities if name not in visited
)
Chapter 10 — The Capability Lifecycle Manager
If the Registry is the system's memory — knowing what exists and how things relate — then the Lifecycle Manager is the system's will: the component that acts on that knowledge to bring the system to life and shut it down gracefully. The Lifecycle Manager is the only component in the system that creates Capability instances, calls their lifecycle methods, injects their dependencies, and orchestrates their shutdown. No other component does any of these things.
10.1 — Why a Dedicated Lifecycle Manager Is Necessary
In a system without a Lifecycle Manager, startup order is typically managed through one of two approaches: either the main() function manually instantiates and starts each component in the right order, or a dependency injection framework handles it automatically. The first approach breaks down as the system grows — the main() function becomes a sprawling, fragile sequence of imperative statements that must be manually updated every time a new Capability is added or a dependency changes. The second approach typically requires annotating every class with framework-specific decorators, creating a tight coupling between the application code and the DI framework.
The CCA Lifecycle Manager takes a different approach. It reads the dependency graph from the Registry, computes the correct initialization order automatically, and then executes the lifecycle sequence for each Capability in that order. Adding a new Capability to the system requires only registering it with the Registry — the Lifecycle Manager automatically incorporates it into the startup sequence without any changes to the orchestration code.
10.2 — The Startup Sequence
The Lifecycle Manager's startup sequence is a precise, ordered series of operations. First, it queries the Registry to get the topological order of all registered Capabilities. Then, for each Capability in that order, it executes four steps: it calls the factory function to create the Capability instance, it calls initialize() to perform internal setup, it calls inject_dependency() for each of the Capability's declared Requirements (passing the providing Capability's Adaptation object), and finally it calls start() to make the Capability operational.
The shutdown sequence is the reverse: for each Capability in reverse topological order, the Lifecycle Manager calls stop() and then cleanup(). Reverse topological order ensures that consumers are stopped before their providers — you would not want to shut down the CodeGenerationCapability while the OrchestratorCapability is still trying to use it.
# cca/manager.py
# =============================================================================
# CapabilityLifecycleManager: Orchestrates the complete lifecycle of all
# Capabilities in the system.
#
# This is the only component that creates CapabilityInstance objects,
# calls their lifecycle methods, and injects their dependencies.
# Everything else in the system interacts with Capabilities exclusively
# through their Contract interfaces.
# =============================================================================
from __future__ import annotations
from cca.registry import CapabilityRegistry
from cca.lifecycle import CapabilityInstance, LifecycleState
from contracts import (
RequirementsAnalysisContract, CodeGenerationContract,
CodeReviewContract, TestGenerationContract,
DocumentationContract, OrchestratorContract,
)
class CapabilityLifecycleManager:
"""
Orchestrates the complete lifecycle of all registered Capabilities.
The Manager operates in two phases:
1. Startup: Creates, initializes, injects dependencies into, and starts
all Capabilities in topological order.
2. Shutdown: Stops and cleans up all Capabilities in reverse topological order.
The Manager maintains a registry of live instances so that it can
inject the right implementation into each Capability's Requirements
and so that it can orchestrate shutdown correctly.
"""
def __init__(self, registry: CapabilityRegistry) -> None:
self._registry = registry
# Maps capability name -> live CapabilityInstance
self._instances: dict[str, CapabilityInstance] = {}
# The topological order, computed once at startup
self._startup_order: list[str] = []
def start_all(self) -> None:
"""
Starts all registered Capabilities in topological order.
This method is the single entry point for system startup.
After this method returns successfully, every registered Capability
is in the Started state and ready to accept requests.
If any step fails, the exception is propagated to the caller.
In a production system, you would want to add cleanup logic here
to shut down any Capabilities that were successfully started before
the failure occurred. For clarity, this example omits that logic.
"""
self._startup_order = self._registry.get_topological_order()
print(f"[CCA] Starting {len(self._startup_order)} Capabilities "
f"in topological order: {' -> '.join(self._startup_order)}")
for name in self._startup_order:
descriptor = self._registry.get_descriptor(name)
if descriptor is None:
raise RuntimeError(f"No descriptor found for Capability '{name}'.")
# Step 1: Create the instance using the registered factory.
# The factory is a lambda that captures the Realization and other
# constructor arguments from the assembly layer.
print(f"[CCA] Creating '{name}'...")
instance = descriptor.factory()
self._instances[name] = instance
# Step 2: Initialize internal state (no dependencies yet).
print(f"[CCA] Initializing '{name}'...")
instance.initialize()
# Step 3: Inject all declared Requirements.
# For each required interface, find the providing Capability's
# live instance and get its contract implementation.
self._inject_dependencies(name, instance, descriptor.contract)
# Step 4: Start the Capability (activate threads, connections, etc.).
print(f"[CCA] Starting '{name}'...")
instance.start()
print(f"[CCA] '{name}' is now STARTED.")
print(f"[CCA] All {len(self._startup_order)} Capabilities started successfully.")
def stop_all(self) -> None:
"""
Stops and cleans up all Capabilities in reverse topological order.
This ensures consumers are stopped before their providers.
"""
shutdown_order = list(reversed(self._startup_order))
print(f"[CCA] Shutting down {len(shutdown_order)} Capabilities "
f"in reverse order: {' -> '.join(shutdown_order)}")
for name in shutdown_order:
instance = self._instances.get(name)
if instance is None:
continue
print(f"[CCA] Stopping '{name}'...")
instance.stop()
print(f"[CCA] Cleaning up '{name}'...")
instance.cleanup()
print(f"[CCA] '{name}' is CLEANED UP.")
self._instances.clear()
print("[CCA] All Capabilities shut down.")
def get_instance(self, name: str) -> CapabilityInstance | None:
"""
Returns the live CapabilityInstance for the given name.
Used by the assembly layer to retrieve the entry-point Capability
(typically the OrchestratorCapability) after startup.
"""
return self._instances.get(name)
def _inject_dependencies(
self,
name: str,
instance: CapabilityInstance,
contract: "CapabilityContract",
) -> None:
"""
Injects all declared Requirements into the given Capability instance.
For each required interface type, this method:
1. Finds the registered Capability that provides that interface.
2. Retrieves the live instance of that providing Capability.
3. Gets the providing Capability's contract implementation for
the required interface type.
4. Calls inject_dependency() on the requiring Capability.
If a required (non-optional) dependency cannot be satisfied,
a RuntimeError is raised. Optional dependencies that cannot be
satisfied are silently skipped.
"""
for requirement in contract.requirements:
provider_descriptor = self._registry.find_provider(
requirement.interface_type
)
if provider_descriptor is None:
if requirement.optional:
print(
f"[CCA] Optional dependency '{requirement.name}' "
f"for '{name}' not satisfied — skipping."
)
continue
raise RuntimeError(
f"Cannot satisfy required dependency '{requirement.name}' "
f"(interface: {requirement.interface_type.__name__}) "
f"for Capability '{name}'. "
f"No registered Capability provides this interface."
)
provider_instance = self._instances.get(provider_descriptor.name)
if provider_instance is None:
raise RuntimeError(
f"Provider '{provider_descriptor.name}' for dependency "
f"'{requirement.name}' of '{name}' has not been started yet. "
f"This indicates a topological sort error — please report as a bug."
)
implementation = provider_instance.get_contract_implementation(
requirement.interface_type
)
if implementation is None:
raise RuntimeError(
f"Provider '{provider_descriptor.name}' returned None for "
f"get_contract_implementation({requirement.interface_type.__name__}). "
f"The provider's Contract declares this Provision but the "
f"CapabilityInstance does not implement it."
)
print(
f"[CCA] Injecting '{requirement.name}' "
f"({requirement.interface_type.__name__}) "
f"from '{provider_descriptor.name}' into '{name}'."
)
instance.inject_dependency(requirement.interface_type, implementation)
Chapter 11 — Efficiency Gradients
Efficiency Gradients are a concept that is unique to CCA and that addresses a problem that most other architectural patterns simply ignore: the fact that different parts of a system have radically different performance requirements, and that a single uniform level of abstraction cannot serve all of them well.
11.1 — The Problem That Efficiency Gradients Solve
In a traditional layered architecture, every component in the system uses the same level of abstraction. Everything goes through the same ORM, the same HTTP client, the same message queue library. This uniformity is convenient for developers — there is only one way to do things — but it is deeply inefficient for systems where some components have stringent performance requirements and others do not.
Consider our agentic pipeline. The OrchestratorCapability needs to coordinate six specialist agents, synthesize their outputs, and return a result to the caller. Its performance requirements are moderate — a few seconds of latency is acceptable because the pipeline as a whole takes minutes. The CodeGenerationCapability, on the other hand, is the bottleneck of the pipeline: it makes the largest LLM calls, processes the most tokens, and accounts for the majority of the pipeline's total latency. Its performance requirements are stringent — every millisecond of overhead in the inference path matters.
If both Capabilities use the same level of abstraction — the same HTTP client, the same JSON serialization library, the same logging framework — then the CodeGenerationCapability is burdened with overhead that it does not need, while the OrchestratorCapability benefits from abstractions that it could use but does not strictly require. Efficiency Gradients allow each Capability to choose the level of abstraction that is appropriate for its specific requirements.
11.2 — Three Levels of the Gradient
CCA defines three broad levels of the Efficiency Gradient, though in practice the gradient is continuous rather than discrete.
At the low-abstraction, high-efficiency end of the gradient, Capabilities use the most direct, lowest-overhead mechanisms available. In embedded systems, this means bare-metal code, direct hardware register access, and interrupt service routines. In our AI pipeline, this means using the most efficient available inference path: for a local CUDA Realization, this might mean using the llama-cpp-python library with CUDA acceleration and bypassing any intermediate abstraction layers; for a cloud API Realization, it might mean using HTTP/2 connection pooling and binary serialization where available.
At the medium-abstraction, medium-efficiency level, Capabilities use standard operating system services, well-optimized libraries, and established protocols that offer good performance without requiring bare-metal control. Most of the Capabilities in our pipeline operate at this level: they use the standard httpx async HTTP client, standard JSON serialization, and standard Python data structures.
At the high-abstraction, low-efficiency end of the gradient, Capabilities use higher-level frameworks, rich abstractions, and general-purpose tools that prioritize developer productivity and flexibility over raw performance. The DocumentationCapability, which runs on Gemini 3.1 Flash and produces documentation that is read by humans rather than processed by machines, can afford to operate at this level. Its latency requirements are the most relaxed in the pipeline.
11.3 — Efficiency Gradients Within a Single Capability
A key insight of Efficiency Gradients is that the balancing act can occur not just between different Capabilities, but also within the Realization layer of a single Capability. Not all methods within a Realization need to be optimized to the same level.
In the CodeGenerationCapability's Realization, the complete() method — the hot path that is called for every code generation request — should be optimized aggressively: connection pooling, minimal overhead, binary protocols where available, and no unnecessary logging in the critical path. But the is_available() method — called once at startup for health checking — can use a simple synchronous HTTP request with no optimization at all. The update_configuration() method — called rarely when the model configuration changes — can use a high-level configuration parsing library with full validation and detailed logging.
This within-Capability gradient is what makes CCA so practical for real systems. You do not have to choose between "optimize everything" and "abstract everything." You optimize the paths that matter and abstract the paths that do not.
11.4 — Local LLM Realizations: Applying Efficiency Gradients to Inference Backends
The most dramatic application of Efficiency Gradients in our pipeline is the choice of inference backend. Cloud API Realizations are at the high-abstraction end of the gradient: they use standard HTTP clients, JSON serialization, and managed infrastructure. Local inference Realizations are at the low-abstraction end: they use native libraries that communicate directly with GPU hardware, bypassing the network stack entirely.
# capabilities/llm_port.py is already defined above.
# capabilities/realizations/cloud/gpt53.py
# =============================================================================
# GPT53Realization: Cloud API Realization for OpenAI GPT-5.3.
#
# This Realization operates at the medium-to-high abstraction level of the
# Efficiency Gradient. It uses the standard OpenAI Python client, which
# handles connection pooling, retries, and authentication automatically.
# The trade-off is that every request goes through the network, introducing
# latency that local Realizations avoid.
# =============================================================================
from __future__ import annotations
import os
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse
class GPT53Realization:
"""
LLMPort implementation for the OpenAI GPT-5.3 API.
This Realization is appropriate when:
- Network latency is acceptable (typically > 500ms per request)
- You need the highest-quality outputs from a frontier model
- You do not have local GPU hardware available
- Privacy and data residency requirements permit cloud processing
The api_key is read from an environment variable at construction time,
not at request time. This ensures that a missing API key causes a
clear error at startup rather than a cryptic failure mid-pipeline.
"""
MODEL_ID = "gpt-5.3"
def __init__(self, api_key: str | None = None) -> None:
self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
if not self._api_key:
raise ValueError(
"OpenAI API key not provided. Set the OPENAI_API_KEY "
"environment variable or pass api_key to the constructor."
)
# Import here rather than at module level to avoid import errors
# when the openai package is not installed (e.g., in environments
# that use only local Realizations).
import openai
self._client = openai.OpenAI(api_key=self._api_key)
def complete(self, request: LLMRequest) -> LLMResponse:
"""Sends a completion request to the GPT-5.3 API."""
response = self._client.chat.completions.create(
model=self.MODEL_ID,
messages=[
{"role": "system", "content": request.system_prompt},
{"role": "user", "content": request.user_prompt},
],
temperature=request.temperature,
max_tokens=request.max_tokens,
)
choice = response.choices[0]
usage = response.usage
return LLMResponse(
content=choice.message.content or "",
model_identifier=self.MODEL_ID,
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
total_tokens=usage.total_tokens,
)
def is_available(self) -> bool:
"""Checks availability by making a minimal test request."""
try:
self._client.models.retrieve(self.MODEL_ID)
return True
except Exception:
return False
# capabilities/realizations/local/cuda_realization.py
# =============================================================================
# CUDARealization: Local inference on NVIDIA GPUs via llama-cpp-python.
#
# This Realization operates at the LOW-ABSTRACTION, HIGH-EFFICIENCY end of
# the Efficiency Gradient. It uses llama-cpp-python with CUDA acceleration,
# which communicates directly with the GPU via CUDA kernels, bypassing the
# network stack entirely. The result is dramatically lower latency (typically
# 50-200ms per token vs. 500ms+ for cloud APIs) at the cost of requiring
# local NVIDIA GPU hardware and a GGUF model file.
#
# This Realization is appropriate when:
# - You have NVIDIA GPU hardware (CUDA compute capability >= 7.0 recommended)
# - Network latency is unacceptable (e.g., air-gapped environments)
# - Data privacy requirements prohibit sending data to cloud APIs
# - Cost optimization requires avoiding per-token cloud API charges
# =============================================================================
from __future__ import annotations
import os
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse
class CUDARealization:
"""
LLMPort implementation using llama-cpp-python with CUDA acceleration.
The model_path should point to a GGUF-format model file. GGUF is the
standard format for quantized models compatible with llama.cpp.
Models can be downloaded from HuggingFace Hub in this format.
The n_gpu_layers parameter controls how many transformer layers are
offloaded to the GPU. Setting it to -1 offloads all layers, which
maximizes GPU utilization and minimizes inference latency. Reduce this
value if you encounter out-of-memory errors.
The n_ctx parameter sets the context window size. Larger values allow
processing longer prompts but require more GPU memory.
"""
def __init__(
self,
model_path: str,
n_gpu_layers: int = -1, # -1 = offload all layers to GPU
n_ctx: int = 8192,
model_identifier: str = "llama-cuda-local",
) -> None:
if not os.path.exists(model_path):
raise FileNotFoundError(
f"GGUF model file not found at '{model_path}'. "
f"Download a GGUF model from HuggingFace Hub and update "
f"the model_path in your assembly configuration."
)
# Lazy import: only import llama_cpp when this Realization is used.
# This prevents import errors in environments without llama-cpp-python.
from llama_cpp import Llama
self._model = Llama(
model_path=model_path,
n_gpu_layers=n_gpu_layers,
n_ctx=n_ctx,
verbose=False, # Suppress llama.cpp's verbose startup output
)
self._model_identifier = model_identifier
def complete(self, request: LLMRequest) -> LLMResponse:
"""
Runs inference locally on the CUDA-accelerated GPU.
The entire inference happens in-process — no network calls,
no serialization overhead, no authentication round-trips.
"""
# llama-cpp-python uses the ChatML message format
messages = [
{"role": "system", "content": request.system_prompt},
{"role": "user", "content": request.user_prompt},
]
response = self._model.create_chat_completion(
messages=messages,
temperature=request.temperature,
max_tokens=request.max_tokens,
)
choice = response["choices"][0]
usage = response.get("usage", {})
return LLMResponse(
content=choice["message"]["content"] or "",
model_identifier=self._model_identifier,
prompt_tokens=usage.get("prompt_tokens", 0),
completion_tokens=usage.get("completion_tokens", 0),
total_tokens=usage.get("total_tokens", 0),
)
def is_available(self) -> bool:
"""
Returns True if the model is loaded and CUDA is available.
The model is loaded at construction time, so if __init__ succeeded,
the model is available.
"""
try:
import llama_cpp
return self._model is not None
except ImportError:
return False
# capabilities/realizations/local/mlx_realization.py
# =============================================================================
# MLXRealization: Local inference on Apple Silicon via Apple MLX.
#
# Apple MLX is a machine learning framework designed specifically for
# Apple Silicon (M1, M2, M3, M4 series chips). It uses the unified memory
# architecture of Apple Silicon to run inference on the GPU without any
# data transfer overhead — the CPU and GPU share the same memory pool.
# This makes MLX exceptionally efficient on Apple hardware.
#
# This Realization is appropriate when:
# - You are running on Apple Silicon (M-series Mac or iPad Pro)
# - You want the best performance on Apple hardware
# - You need local inference without NVIDIA or AMD hardware
# =============================================================================
from __future__ import annotations
from capabilities.llm_port import LLMPort, LLMRequest, LLMResponse
class MLXRealization:
"""
LLMPort implementation using Apple MLX for Apple Silicon inference.
Requires the mlx-lm package: pip install mlx-lm
Models are loaded from HuggingFace Hub in MLX format, or converted
from standard formats using the mlx_lm.convert utility.
"""
def __init__(
self,
model_name: str, # e.g., "mlx-community/Llama-3-8B-Instruct-4bit"
max_tokens: int = 4096,
model_identifier: str = "mlx-local",
) -> None:
# Lazy import: only import mlx_lm when this Realization is used.
from mlx_lm import load
self._model, self._tokenizer = load(model_name)
self._max_tokens = max_tokens
self._model_identifier = model_identifier
def complete(self, request: LLMRequest) -> LLMResponse:
"""
Runs inference on Apple Silicon via MLX.
MLX uses the unified memory architecture — no GPU memory transfers.
"""
from mlx_lm import generate
# Combine system and user prompts into a single prompt string.
# MLX models typically use the ChatML format.
full_prompt = (
f"<|system|>\n{request.system_prompt}\n"
f"<|user|>\n{request.user_prompt}\n"
f"<|assistant|>\n"
)
response_text = generate(
self._model,
self._tokenizer,
prompt=full_prompt,
max_tokens=request.max_tokens or self._max_tokens,
temp=request.temperature,
verbose=False,
)
# MLX generate() returns only the generated text, not token counts.
# Estimate token counts from character length as a rough approximation.
estimated_completion_tokens = len(response_text.split())
estimated_prompt_tokens = len(full_prompt.split())
return LLMResponse(
content=response_text,
model_identifier=self._model_identifier,
prompt_tokens=estimated_prompt_tokens,
completion_tokens=estimated_completion_tokens,
total_tokens=estimated_prompt_tokens + estimated_completion_tokens,
)
def is_available(self) -> bool:
"""Returns True if running on Apple Silicon with MLX installed."""
try:
import mlx.core as mx
# Check that we have a GPU device available
return mx.default_device() == mx.gpu
except ImportError:
return False
# capabilities/realizations/local/factory.py
# =============================================================================
# LLMRealizationFactory: Hardware auto-detection and Realization selection.
#
# This factory encapsulates the logic for selecting the most appropriate
# LLM Realization based on the available hardware. It is used by the
# assembly layer when no explicit Realization has been configured.
#
# The factory tries backends in order of preference:
# 1. NVIDIA CUDA (highest performance on NVIDIA hardware)
# 2. Apple MLX (best performance on Apple Silicon)
# 3. AMD ROCm (for AMD GPU hardware)
# 4. Intel OpenVINO (for Intel CPUs and integrated graphics)
# 5. Cloud API fallback (when no local hardware is available)
#
# This factory is the only place in the system where hardware detection
# logic lives. All other components are hardware-agnostic.
# =============================================================================
from __future__ import annotations
from capabilities.llm_port import LLMPort
class LLMRealizationFactory:
"""
Selects and instantiates the most appropriate LLMPort implementation
based on available hardware and configuration.
Usage in assembly.py:
factory = LLMRealizationFactory(
model_path="/models/llama-3-70b.Q4_K_M.gguf",
hf_model_name="mlx-community/Llama-3-8B-Instruct-4bit",
openai_api_key=os.environ.get("OPENAI_API_KEY"),
)
llm_port = factory.create()
"""
def __init__(
self,
model_path: str | None = None, # Path to GGUF file for CUDA/ROCm/Vulkan
hf_model_name: str | None = None, # HuggingFace model name for MLX
openai_api_key: str | None = None, # Fallback to OpenAI cloud API
prefer_local: bool = True, # If True, prefer local over cloud
) -> None:
self._model_path = model_path
self._hf_model_name = hf_model_name
self._openai_api_key = openai_api_key
self._prefer_local = prefer_local
def create(self) -> LLMPort:
"""
Creates and returns the most appropriate LLMPort for this environment.
Tries local backends first (if prefer_local is True), then falls back
to cloud APIs.
"""
if self._prefer_local:
local_port = self._try_local_backends()
if local_port is not None:
return local_port
# Fall back to cloud API
if self._openai_api_key:
from capabilities.realizations.cloud.gpt53 import GPT53Realization
print("[LLMFactory] Using cloud backend: OpenAI GPT-5.3")
return GPT53Realization(api_key=self._openai_api_key)
raise RuntimeError(
"No LLM backend is available. Install a local inference library "
"(llama-cpp-python, mlx-lm) or provide a cloud API key."
)
def _try_local_backends(self) -> LLMPort | None:
"""Tries each local backend in order of preference."""
# Try NVIDIA CUDA first
if self._try_cuda():
from capabilities.realizations.local.cuda_realization import CUDARealization
print("[LLMFactory] Using local backend: NVIDIA CUDA (llama-cpp-python)")
return CUDARealization(model_path=self._model_path)
# Try Apple MLX
if self._try_mlx():
from capabilities.realizations.local.mlx_realization import MLXRealization
print("[LLMFactory] Using local backend: Apple MLX")
return MLXRealization(model_name=self._hf_model_name or "mlx-community/Llama-3-8B-Instruct-4bit")
# Try AMD ROCm (uses llama-cpp-python with ROCm build)
if self._try_rocm():
from capabilities.realizations.local.cuda_realization import CUDARealization
print("[LLMFactory] Using local backend: AMD ROCm (llama-cpp-python ROCm build)")
return CUDARealization(
model_path=self._model_path,
model_identifier="llama-rocm-local",
)
return None
def _try_cuda(self) -> bool:
"""Returns True if NVIDIA CUDA is available."""
try:
import torch
return torch.cuda.is_available()
except ImportError:
pass
try:
# Fallback: check via llama_cpp directly
from llama_cpp import llama_supports_gpu_offload
return llama_supports_gpu_offload()
except ImportError:
return False
def _try_mlx(self) -> bool:
"""Returns True if Apple MLX is available (Apple Silicon)."""
try:
import mlx.core as mx
return mx.default_device() == mx.gpu
except (ImportError, Exception):
return False
def _try_rocm(self) -> bool:
"""Returns True if AMD ROCm is available."""
try:
import torch
return torch.version.hip is not None
except (ImportError, AttributeError):
return False
Chapter 12 — System Assembly
The assembly layer is the only place in the entire system where concrete types are wired together. Every other module — the Essences, the Realizations, the Adaptations, the Registry, the Lifecycle Manager — works exclusively with abstract interfaces. The assembly layer is where you say: "For this deployment, the CodeGenerationCapability will use the GPT-5.3 Realization, the OrchestratorCapability will use the local CUDA Realization, and the system will run with these six Capabilities in this configuration."
This centralization of concrete wiring is not just a convention — it is an architectural principle. When all concrete dependencies are declared in one place, changing the deployment configuration requires changing only that one place. Swapping from cloud to local inference, adding a new Capability, or changing which Realization a Capability uses are all single-file changes.
# assembly.py
# =============================================================================
# System Assembly: The composition root for the Agentic Pipeline.
#
# This is the ONLY file in the system that:
# - Imports concrete Realization classes
# - Instantiates CapabilityDescriptors with factory lambdas
# - Registers Capabilities with the Registry
# - Creates and starts the CapabilityLifecycleManager
#
# All other files work exclusively with abstract interfaces (Protocols)
# and shared data models. This file is the seam between the abstract
# architecture and the concrete deployment environment.
# =============================================================================
from __future__ import annotations
import os
from datetime import date
from models import FeatureRequest
from contracts import (
RequirementsAnalysisContract, CodeGenerationContract,
CodeReviewContract, TestGenerationContract,
DocumentationContract, OrchestratorContract,
)
from cca.registry import CapabilityRegistry, CapabilityDescriptor
from cca.manager import CapabilityLifecycleManager
from cca.evolution import EvolutionEnvelope, DeprecationNotice, MigrationPath
# Import concrete Realization classes — allowed ONLY in assembly.py
from capabilities.realizations.cloud.gpt53 import GPT53Realization
from capabilities.realizations.cloud.claude46 import Claude46Realization
from capabilities.realizations.cloud.gemini31 import Gemini31Realization
from capabilities.realizations.local.factory import LLMRealizationFactory
# Import CapabilityInstance implementations
from capabilities.code_generation.capability import CodeGenerationCapabilityInstance
from capabilities.requirements_analysis.capability import RequirementsAnalysisCapabilityInstance
from capabilities.code_review.capability import CodeReviewCapabilityInstance
from capabilities.test_generation.capability import TestGenerationCapabilityInstance
from capabilities.documentation.capability import DocumentationCapabilityInstance
from capabilities.orchestrator.capability import OrchestratorCapabilityInstance
# Import contract builders
from contracts import (
CapabilityContract, ProvisionDefinition, RequirementDefinition, CapabilityProtocol,
CommunicationMechanism,
)
def build_code_generation_contract() -> CapabilityContract:
"""
Builds the formal Contract for the CodeGenerationCapability.
This function encodes the architectural decisions about what this
Capability provides, what it needs, and how it communicates.
"""
return CapabilityContract(
capability_name="CodeGeneration",
provisions=(
ProvisionDefinition(
name="CodeGenerationService",
interface_type=CodeGenerationContract,
description=(
"Generates implementation code from structured engineering tasks. "
"Supports completion listeners for observability."
),
),
),
requirements=(), # No dependencies on other Capabilities
protocols=(
CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=30_000, # 30s — LLM inference can be slow
reliability="at-least-once",
authentication=None,
description="In-process direct call. Latency dominated by LLM inference time.",
),
),
)
def build_orchestrator_contract() -> CapabilityContract:
"""Builds the formal Contract for the OrchestratorCapability."""
return CapabilityContract(
capability_name="Orchestrator",
provisions=(
ProvisionDefinition(
name="PipelineExecutionService",
interface_type=OrchestratorContract,
description="Executes the full agentic pipeline for a feature request.",
),
),
requirements=(
RequirementDefinition(
name="RequirementsAnalysis",
interface_type=RequirementsAnalysisContract,
optional=False,
description="Required to decompose feature requests into engineering tasks.",
),
RequirementDefinition(
name="CodeGeneration",
interface_type=CodeGenerationContract,
optional=False,
description="Required to generate implementation code.",
),
RequirementDefinition(
name="CodeReview",
interface_type=CodeReviewContract,
optional=False,
description="Required to review generated code for quality.",
),
RequirementDefinition(
name="TestGeneration",
interface_type=TestGenerationContract,
optional=False,
description="Required to generate test suites.",
),
RequirementDefinition(
name="Documentation",
interface_type=DocumentationContract,
optional=True, # Pipeline can succeed without documentation
description="Optional: generates documentation artifacts.",
),
),
protocols=(
CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=300_000, # 5 minutes for full pipeline
reliability="exactly-once",
authentication=None,
description="In-process direct call. Latency is the sum of all agent latencies.",
),
),
)
def build_and_start_pipeline(use_local_llm: bool = False) -> CapabilityLifecycleManager:
"""
Assembles and starts the complete Agentic Software Engineering Pipeline.
This function is the single entry point for system startup. It:
1. Creates the appropriate LLM Realizations based on configuration.
2. Builds CapabilityContracts for all six Capabilities.
3. Builds EvolutionEnvelopes recording the current versioning status.
4. Registers all Capabilities with the Registry.
5. Creates and starts the CapabilityLifecycleManager.
6. Returns the started Manager for use by the caller.
The use_local_llm parameter switches between cloud API Realizations
(default) and local hardware Realizations (CUDA, MLX, ROCm, etc.).
This is the only parameter needed to switch the entire system between
cloud and local inference — no other code changes are required.
"""
registry = CapabilityRegistry()
# -------------------------------------------------------------------------
# Step 1: Create LLM Realizations
# -------------------------------------------------------------------------
if use_local_llm:
# The factory auto-detects available hardware and selects the best backend.
# The api_key is provided as a fallback in case no local hardware is found.
factory = LLMRealizationFactory(
model_path=os.environ.get("LOCAL_MODEL_PATH", "/models/llama-3-70b.Q4_K_M.gguf"),
hf_model_name=os.environ.get("MLX_MODEL_NAME", "mlx-community/Llama-3-8B-Instruct-4bit"),
openai_api_key=os.environ.get("OPENAI_API_KEY"),
prefer_local=True,
)
# All agents share the same local LLM in this configuration.
# In a production system, you might use different quantization levels
# for different agents based on their quality requirements.
shared_local_llm = factory.create()
requirements_llm = shared_local_llm
code_gen_llm = shared_local_llm
code_review_llm = shared_local_llm
test_gen_llm = shared_local_llm
documentation_llm = shared_local_llm
orchestrator_llm = shared_local_llm
else:
# Cloud Realizations: each agent uses the model best suited to its task.
# API keys are read from environment variables — never hardcoded.
requirements_llm = Claude46Realization(
api_key=os.environ["ANTHROPIC_API_KEY"], model_variant="opus"
)
code_gen_llm = GPT53Realization(api_key=os.environ["OPENAI_API_KEY"])
code_review_llm = Claude46Realization(
api_key=os.environ["ANTHROPIC_API_KEY"], model_variant="sonnet"
)
test_gen_llm = Gemini31Realization(
api_key=os.environ["GOOGLE_API_KEY"], model_variant="pro"
)
documentation_llm = Gemini31Realization(
api_key=os.environ["GOOGLE_API_KEY"], model_variant="flash"
)
orchestrator_llm = GPT53Realization(api_key=os.environ["OPENAI_API_KEY"])
# -------------------------------------------------------------------------
# Step 2: Build Evolution Envelopes
# Each Capability's versioning status is recorded here. In a production
# system, these would be loaded from a configuration file or a service
# registry, not hardcoded. They are hardcoded here for clarity.
# -------------------------------------------------------------------------
code_gen_envelope = EvolutionEnvelope(
current_version="2.0.0",
previous_version="1.3.2",
migration_paths=[
MigrationPath(
from_version="1.x",
to_version="2.0.0",
breaking_changes=(
"generate_code() now returns GeneratedCode with a "
"'generation_model' field that was not present in v1.x.",
"The 'dependencies' field is now a tuple, not a list.",
),
documentation_url="https://docs.internal/code-gen/migration-v1-v2",
automated_migration_tool=None,
estimated_migration_effort="low",
)
],
)
orchestrator_envelope = EvolutionEnvelope(
current_version="1.0.0",
previous_version=None,
)
# -------------------------------------------------------------------------
# Step 3: Register all Capabilities with the Registry.
# The factory lambda is evaluated lazily — only when the LifecycleManager
# decides to create the instance. This means the LLM Realization objects
# are captured by the lambda but not used until start_all() is called.
# -------------------------------------------------------------------------
registry.register(CapabilityDescriptor(
name="RequirementsAnalysis",
contract=CapabilityContract(
capability_name="RequirementsAnalysis",
provisions=(ProvisionDefinition(
name="RequirementsAnalysisService",
interface_type=RequirementsAnalysisContract,
description="Decomposes feature requests into engineering tasks.",
),),
requirements=(),
protocols=(CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=20_000,
reliability="at-least-once",
authentication=None,
description="In-process direct call.",
),),
),
evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
factory=lambda: RequirementsAnalysisCapabilityInstance(llm_port=requirements_llm),
))
registry.register(CapabilityDescriptor(
name="CodeGeneration",
contract=build_code_generation_contract(),
evolution_envelope=code_gen_envelope,
factory=lambda: CodeGenerationCapabilityInstance(llm_port=code_gen_llm),
))
registry.register(CapabilityDescriptor(
name="CodeReview",
contract=CapabilityContract(
capability_name="CodeReview",
provisions=(ProvisionDefinition(
name="CodeReviewService",
interface_type=CodeReviewContract,
description="Reviews generated code and produces structured feedback.",
),),
requirements=(),
protocols=(CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=15_000,
reliability="at-least-once",
authentication=None,
description="In-process direct call.",
),),
),
evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
factory=lambda: CodeReviewCapabilityInstance(llm_port=code_review_llm),
))
registry.register(CapabilityDescriptor(
name="TestGeneration",
contract=CapabilityContract(
capability_name="TestGeneration",
provisions=(ProvisionDefinition(
name="TestGenerationService",
interface_type=TestGenerationContract,
description="Generates comprehensive test suites for generated code.",
),),
requirements=(),
protocols=(CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=20_000,
reliability="at-least-once",
authentication=None,
description="In-process direct call.",
),),
),
evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
factory=lambda: TestGenerationCapabilityInstance(llm_port=test_gen_llm),
))
registry.register(CapabilityDescriptor(
name="Documentation",
contract=CapabilityContract(
capability_name="Documentation",
provisions=(ProvisionDefinition(
name="DocumentationService",
interface_type=DocumentationContract,
description="Generates technical documentation for generated code.",
),),
requirements=(),
protocols=(CapabilityProtocol(
communication_mechanism=CommunicationMechanism.DIRECT_CALL,
data_format="python_objects",
max_latency_ms=10_000,
reliability="best-effort",
authentication=None,
description="In-process direct call. Optional — pipeline succeeds without it.",
),),
),
evolution_envelope=EvolutionEnvelope(current_version="1.0.0", previous_version=None),
factory=lambda: DocumentationCapabilityInstance(llm_port=documentation_llm),
))
registry.register(CapabilityDescriptor(
name="Orchestrator",
contract=build_orchestrator_contract(),
evolution_envelope=orchestrator_envelope,
factory=lambda: OrchestratorCapabilityInstance(llm_port=orchestrator_llm),
))
# -------------------------------------------------------------------------
# Step 4: Start the system.
# The LifecycleManager takes over from here. It reads the Registry,
# computes the topological order, and brings each Capability online.
# -------------------------------------------------------------------------
manager = CapabilityLifecycleManager(registry=registry)
manager.start_all()
return manager
def main() -> None:
"""
Entry point for the Agentic Software Engineering Pipeline.
Demonstrates both cloud and local inference configurations.
"""
use_local = os.environ.get("USE_LOCAL_LLM", "false").lower() == "true"
print(f"[Main] Starting pipeline in {'LOCAL' if use_local else 'CLOUD'} mode.")
manager = build_and_start_pipeline(use_local_llm=use_local)
try:
# Retrieve the OrchestratorCapability's contract implementation.
# This is the only Capability the caller interacts with directly.
orchestrator_instance = manager.get_instance("Orchestrator")
orchestrator: OrchestratorContract = (
orchestrator_instance.get_contract_implementation(OrchestratorContract)
)
# Submit a feature request to the pipeline.
request = FeatureRequest(
request_id="FR-2026-001",
title="Add rate limiting to the public API",
description=(
"Implement token-bucket rate limiting for all public API endpoints. "
"Each API key should have a configurable request limit per minute. "
"Exceeded limits should return HTTP 429 with a Retry-After header."
),
requester="engineering-team@company.com",
priority="high",
)
print(f"\n[Main] Executing pipeline for: '{request.title}'")
result = orchestrator.execute_pipeline(request)
print(f"\n[Main] Pipeline completed with status: {result.pipeline_status}")
print(f"[Main] Summary: {result.summary}")
print(f"[Main] Engineering tasks: {len(result.engineering_tasks)}")
print(f"[Main] Generated code artifacts: {len(result.generated_code)}")
print(f"[Main] Review results: {len(result.review_results)}")
print(f"[Main] Test suites: {len(result.test_suites)}")
print(f"[Main] Documentation artifacts: {len(result.documentation)}")
finally:
# Always shut down gracefully, even if an exception occurred.
print("\n[Main] Initiating graceful shutdown...")
manager.stop_all()
print("[Main] Shutdown complete.")
if __name__ == "__main__":
main()
Conclusion
Capability-Centric Architecture provides a coherent, principled answer to the architectural challenges that agentic AI systems face. By organizing the system around well-defined Capabilities — cohesive units of functionality that deliver tangible value — rather than technical layers or organizational boundaries, CCA creates a structure that is stable where it needs to be stable and flexible where it needs to be flexible.
The Capability Nucleus with its Essence, Realization, and Adaptation layers ensures that business logic is protected from infrastructure churn. The Essence contains what the agent does; the Realization contains how it does it in a specific environment; the Adaptation contains how it exposes itself to the world. These three layers can evolve independently, which means switching from cloud to local inference, adding a new communication protocol, or refactoring the core reasoning logic are all local changes that do not ripple through the system.
Capability Contracts with their Provisions, Requirements, and Protocols make all dependencies explicit and formal. Every interface is typed and documented. Every dependency is declared and verified at startup. Implicit interfaces — the invisible coupling that makes unstructured systems so fragile — become impossible by construction.
The Capability Registry and Capability Lifecycle Manager together automate the most error-prone aspects of system management: dependency resolution, startup ordering, dependency injection, and graceful shutdown. Adding a new Capability to the system requires only registering it with the Registry — the infrastructure handles the rest.
Evolution Envelopes make change a first-class concern rather than an afterthought. Every Capability carries a formal record of its versioning status, its deprecation notices, and its migration paths. The system can surface upcoming breaking changes at startup, giving teams visibility and time to adapt before problems occur in production.
Efficiency Gradients resolve the tension between performance and abstraction by allowing different Capabilities — and different parts of the same Capability — to operate at different levels of the abstraction stack. Critical inference paths can use low-level, high-efficiency backends; non-critical coordination paths can use high-level, developer-friendly abstractions. The architecture does not impose a single uniform approach on the entire system.
The result is a system that is not just intelligent but genuinely engineered: structured, evolvable, testable, and governable. That is what production-grade agentic AI requires.