Hitchhiker's Guide to AI, Software Architecture, and Everything Else

If you are a software engineer: DON'T PANIC! This blog is my place to beam thoughts on the universe of Artificial Intelligence and Software Architecture right to your screen. On my infinite mission to boldly go where (almost) no one has gone before I will provide in-depth coverage of architectural and AI topics, personal opinions, humor, philosophical discussions, interesting news and technology evaluations. (c) Prof. Dr. Michael Stal

Friday, April 03, 2026

Self-Extending AI: How to Teach an LLM to Build Its Own Tools

+------------------------------------------------------------------+
|                                                                  |
|   "What if the hammer you need doesn't exist yet --              |
|    so you teach the carpenter to forge it on the spot?"          |
|                                                                  |
+------------------------------------------------------------------+

  1. The Problem Nobody Talks About Enough
  2. A Brief Map of the Territory
  3. The Architecture of a Self-Extending Agent
  4. The Tool Registry: A Living Catalogue
  5. The Code Generator: Teaching the LLM to Write Tools
  6. The Validator: Trust, But Verify
  7. The MCP Server: Plugging Into the Protocol
  8. The Agent Loop: Where Everything Comes Alive
  9. Dynamic Agents: Beyond Tools
 10. Security, Safety, and the Responsible Path Forward
 11. Conclusion: The Beginning of Something Larger

1. THE PROBLEM NOBODY TALKS ABOUT ENOUGH

Every developer who has built a serious tool-based LLM application has hit the same invisible wall. You design your system carefully. You write a dozen tools -- functions that let the model search the web, query a database, send an email, parse a PDF, call an API. You test everything. The demo goes beautifully. And then, inevitably, a user types something like this:

  "Can you calculate the Haversine distance between these two GPS
   coordinates and then plot the result on an ASCII map?"

And your system freezes. Not because the LLM is confused -- the LLM knows exactly what to do. It freezes because the tool simply does not exist. You never wrote a Haversine calculator. You never wrote an ASCII map plotter. The model knows the answer, but it cannot reach it. It is like a surgeon who understands the procedure perfectly but has been handed the wrong instrument tray.

This is the fundamental tension at the heart of tool-based agentic AI: the set of tools you define at design time is always, inevitably, incomplete. The world is infinite. Your tool list is not.

The conventional responses to this problem are unsatisfying. You can try to anticipate every possible user need and write tools in advance -- but this is a fool's errand that leads to bloated, unmaintainable codebases. You can tell the user "sorry, that capability is not available" -- but this is a failure mode that erodes trust and utility. You can redeploy the system every time a new tool is needed -- but this breaks the promise of a living, responsive AI system.

So here is the question that should keep every AI engineer up at night:

+------------------------------------------------------------------+
|                                                                  |
|   What if the LLM could write the missing tool itself,           |
|   validate it, register it, and use it -- all in real time,      |
|   without any human intervention and without restarting?         |
|                                                                  |
+------------------------------------------------------------------+

The answer, as this article will demonstrate in considerable detail, is yes. Not only is this possible, but it is architecturally elegant, practically useful, and -- with the right safeguards -- surprisingly safe. And the implications extend far beyond tools. The same mechanism that lets an agent generate a missing function can be used to generate and integrate entirely new sub-agents, new reasoning strategies, and new orchestration patterns. We are talking about a system that grows its own cognitive apparatus on demand.

This article walks through the complete architecture of such a system, built in Python using the Model Context Protocol (MCP), asyncio, and a live tool registry. We will examine each layer in depth, understand why each design decision was made, and look at the moments where things can go wrong and how to prevent them.

Readers are assumed to have a working understanding of LLMs, function calling, and the basics of agentic AI. What follows is not a beginner's tutorial -- it is an engineering deep-dive into one of the most interesting problems in applied AI today.

The complete code is available in a GitHub repository.

2. A BRIEF MAP OF THE TERRITORY

Before we descend into code, it is worth establishing a shared mental model of the landscape we are navigating.

Modern agentic AI systems are built around a loop. The agent receives a goal, reasons about what actions to take, invokes tools to carry out those actions, observes the results, and reasons again. This loop continues until the goal is achieved or the agent gives up. The tools are the agent's hands -- the mechanisms by which it affects the world outside its own context window.

The Model Context Protocol, or MCP, is an open standard developed by Anthropic that formalises how tools are described, discovered, and invoked. An MCP server exposes a list of tools via a "tools/list" endpoint and handles invocations via a "tools/call" endpoint. An MCP client -- typically the agent -- queries the server to discover what tools are available and then calls them as needed. This clean separation between tool definition and tool invocation is what makes MCP so powerful as a foundation for dynamic systems.

  STANDARD (STATIC) AGENTIC SYSTEM

  +------------+       tools/list        +------------------+
  |            | ----------------------> |                  |
  |   AGENT    |                         |   MCP SERVER     |
  |   (LLM)    | <---------------------- | (fixed tool set) |
  |            |       tools/call        |                  |
  +------------+                         +------------------+
        |
        | "I need a Haversine tool..."
        |
        v
   [DEAD END -- tool does not exist]

In a static system, the tool list is fixed at startup. The server reads a configuration, loads a set of pre-written functions, and serves them forever. This is simple, predictable, and brittle.

Now consider what happens when we make the tool registry dynamic:

  SELF-EXTENDING AGENTIC SYSTEM

  +------------+    tools/list (live)    +------------------+
  |            | ----------------------> |                  |
  |   AGENT    |                         |   MCP SERVER     |
  |   (LLM)    | <---------------------- | (dynamic tools)  |
  |            |       tools/call        |                  |
  +------------+                         +--------+---------+
        |                                         |
        | "I need a Haversine tool"               |
        |                                         v
        |                              +----------+---------+
        +----------------------------> |                    |
          generate_and_register_tool   |  TOOL REGISTRY     |
                                       |  (grows at runtime)|
                                       +----------+---------+
                                                  |
                                       +----------+---------+
                                       |                    |
                                       |  CODE GENERATOR    |
                                       |  (LLM writes code) |
                                       +--------------------+

The agent, upon discovering that a tool does not exist, does not give up. Instead, it calls a special meta-tool called "generate_and_register_tool", passing a natural language description of what it needs. The code generator -- itself powered by an LLM -- writes the function, the validator checks it for safety and correctness, the registry stores it, and the MCP server immediately makes it available. The agent then calls the newly created tool as if it had always been there.

The system has extended itself. No restart. No human intervention. No deployment pipeline. The gap in capability has been filled, in real time, by the system itself.

This is the architecture we will now examine in detail.

3. THE ARCHITECTURE OF A SELF-EXTENDING AGENT

Good software architecture tells a story. Each component has a clear responsibility, a clear interface, and a clear reason to exist. The self-extending agent we are building is composed of five major components, each of which we will examine in its own section.

+------------------------------------------------------------------+
|                   SELF-EXTENDING AGENT SYSTEM                    |
+------------------------------------------------------------------+
|                                                                  |
|   +----------+     +----------+     +----------+                 |
|   |          |     |          |     |          |                 |
|   |  AGENT   +---->+   MCP    +---->+  TOOL    |                 |
|   |  LOOP    |     |  SERVER  |     | REGISTRY |                 |
|   |          |<----+          |<----+          |                 |
|   +----------+     +----+-----+     +----+-----+                 |
|                         |                |                       |
|                         |                |                       |
|                    +----v-----+     +----v-----+                 |
|                    |          |     |          |                 |
|                    |  CODE    |     |   TOOL   |                 |
|                    | GENERATOR|     |VALIDATOR |                 |
|                    |          |     |          |                 |
|                    +----------+     +----------+                 |
|                                                                  |
+------------------------------------------------------------------+

The five components and their responsibilities are as follows.

The Agent Loop is the orchestrating intelligence. It holds the conversation history, decides when to call tools, interprets tool results, and determines when a goal has been achieved. It is the component that first notices a tool is missing and decides to generate one.

The MCP Server is the protocol boundary. It translates between the agent's tool calls and the registry's internal representation. It also hosts the meta-tools -- the built-in tools that allow the agent to manage the registry itself.

The Tool Registry is the living catalogue of available tools. It stores tool metadata, source code, callable functions, and usage statistics. It is thread-safe, async-native, and designed to grow at runtime.

The Code Generator is the creative engine. It takes a natural language description of a desired capability and produces working Python code, complete with type annotations and a docstring that the registry uses to build the tool's JSON Schema.

The Tool Validator is the safety layer. Before any generated code is allowed into the registry, the validator performs static analysis, checking for dangerous patterns, verifying the function signature, and ensuring the code is structurally sound.

These five components form a closed loop of capability generation. Let us examine each one in turn.

4. THE TOOL REGISTRY: A LIVING CATALOGUE

The Tool Registry is the heart of the system. Everything else exists to serve it or to consume from it. Its job is deceptively simple: keep track of what tools exist, store their code and metadata, and make them callable. But the implementation details reveal a rich set of engineering challenges.

The first challenge is concurrency. In an async Python system, multiple agent turns may be running concurrently. One turn might be calling a tool while another is registering a new one. Without careful synchronisation, this leads to race conditions, stale reads, and corrupted state. The registry uses asyncio.Lock to protect all mutations, but with a critical subtlety: the lock is acquired and released within a single coroutine frame, never held across an await boundary that could cause a deadlock.

The second challenge is the representation of a tool. A tool is not just a function -- it is a bundle of related information. It has a name, a description, a JSON Schema describing its inputs, the original source code (for introspection and debugging), a callable reference, usage statistics, and tags for categorisation. We capture all of this in a dataclass called ToolEntry:

@dataclass
class ToolEntry:
    name: str
    description: str
    input_schema: dict[str, Any]
    source_code: str
    callable_fn: Callable[..., Any]
    tags: list[str] = field(default_factory=list)
    call_count: int = 0
    last_error: str | None = None
    created_at: datetime = field(default_factory=datetime.utcnow)

Notice that the ToolEntry carries both the source code as a string and the callable_fn as a live Python callable. This duality is intentional and important. The source code is stored for transparency -- so the agent can inspect what a tool does, and so a human operator can audit the system's self-generated code. The callable is stored for performance -- so that invocation does not require re-parsing or re-compiling the source on every call.

The third challenge is the JSON Schema. MCP requires every tool to declare its input schema in JSON Schema format. For hand-written tools, you write this schema manually. For dynamically generated tools, you need to derive it automatically from the function's type annotations. The registry does this using Python's inspect module combined with a type-annotation-to-JSON-Schema converter:

def _annotation_to_json_schema(annotation: Any) -> dict[str, Any]:
    if annotation is str:
        return {"type": "string"}
    if annotation is int:
        return {"type": "integer"}
    if annotation is float:
        return {"type": "number"}
    if annotation is bool:
        return {"type": "boolean"}
    if annotation is list:
        return {"type": "array"}
    if annotation is dict:
        return {"type": "object"}
    return {}

This function is called for each parameter in the generated function's signature. The result is assembled into a complete JSON Schema object that the MCP server can serve to the agent, allowing the agent to understand exactly what arguments the new tool expects.

The fourth challenge is the registration process itself. When a new tool arrives -- as source code from the generator -- the registry must execute that code in a controlled namespace and extract the resulting function. This is done using Python's built-in exec() function, which is powerful and dangerous in equal measure. The registry uses a restricted namespace and relies on the validator (described in the next section) to ensure the code is safe before exec() is ever called:

async def register_from_source(
    self, source_code: str, tags: list[str] | None = None
) -> ToolEntry:
    namespace: dict[str, Any] = {"__builtins__": __builtins__}
    exec(compile(source_code, "<generated>", "exec"), namespace)
    # ... extract function, build schema, store entry ...

The compile() call before exec() is not just a performance optimisation -- it also provides a second opportunity to catch syntax errors before they corrupt the registry's state.

The fifth and final challenge is introspection. The agent needs to be able to ask the registry questions: What tools exist? How many times has each tool been called? What was the last error? The registry exposes a get_stats() method that returns a structured summary of all this information, which the agent can use to make intelligent decisions about whether to reuse an existing tool or generate a new one.

  TOOL REGISTRY -- INTERNAL STATE

  +---------------------------------------------------------+
  |  _tools: dict[str, ToolEntry]                           |
  |                                                         |
  |  "haversine_distance"                                   |
  |    +- description: "Calculates distance between..."     |
  |    +- input_schema: {lat1, lon1, lat2, lon2}            |
  |    +- source_code: "def haversine_distance(...):\n..."  |
  |    +- callable_fn: <function haversine_distance>        |
  |    +- call_count: 7                                     |
  |    +- last_error: None                                  |
  |    +- tags: ["math", "geography"]                       |
  |                                                         |
  |  "ascii_map_plotter"                                    |
  |    +- description: "Renders a simple ASCII map..."      |
  |    +- ...                                               |
  +---------------------------------------------------------+

The registry is, in a very real sense, the agent's long-term procedural memory. It accumulates tools across sessions (if persisted to disk), grows smarter over time, and allows the agent to avoid regenerating tools it has already created. This is a form of learning that does not require retraining the underlying model.

5. THE CODE GENERATOR: TEACHING THE LLM TO WRITE TOOLS

The Code Generator is where the magic happens, and also where the most interesting engineering challenges live. Its job is to take a natural language description of a desired capability and produce a working, well-formed Python function that the registry can accept.

The generator is itself powered by an LLM -- the same model that drives the agent, or a separate one dedicated to code generation. This creates a fascinating recursive structure: the LLM is using an LLM to extend its own capabilities. The outer LLM decides it needs a tool; the inner LLM writes the tool; the outer LLM then uses it.

  CODE GENERATION PIPELINE

  Natural Language Description
           |
           v
  +--------+--------+
  |                 |
  |  PROMPT BUILDER |  <-- injects schema requirements,
  |                 |      safety constraints, examples
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  |   LLM API CALL  |  <-- temperature low (0.2),
  |                 |      deterministic preferred
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  |  CODE EXTRACTOR |  <-- strips markdown fences,
  |                 |      isolates the function
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  |   VALIDATOR     |  <-- static analysis, safety checks
  |                 |
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  |    REGISTRY     |  <-- exec(), store, serve
  |                 |
  +-----------------+

The prompt engineering for code generation is critically important and deserves careful attention. A naive prompt like "write a Python function that calculates Haversine distance" will produce code, but it will likely be missing type annotations, will have an inconsistent style, and may import libraries that are not available in the execution environment. A well-engineered prompt is far more specific:

SYSTEM_PROMPT = """
You are an expert Python engineer writing tool functions for an
AI agent system. You must follow these rules without exception:

1. Write exactly ONE Python function with the given name.
2. Every parameter must have a type annotation.
3. The return type must be annotated.
4. The function must have a Google-style docstring with an Args
   section and a Returns section.
5. Use only the Python standard library. Do not import third-party
   packages unless explicitly told they are available.
6. The function must be synchronous (not async).
7. Handle errors gracefully and return meaningful error messages.
8. Do not include any code outside the function definition.
"""

Each of these constraints exists for a concrete reason. The single function requirement prevents the generator from producing helper classes or module-level state that the registry cannot handle. The type annotation requirement enables automatic JSON Schema generation. The docstring requirement provides the description that the agent uses to decide when to call the tool. The standard-library-only constraint is a safety measure that prevents dependency hell. The synchronous requirement simplifies the execution model in the registry.

The code extraction step deserves special mention. LLMs, even when instructed to produce only code, frequently wrap their output in Markdown fences (triple backticks). The extractor must handle this gracefully, stripping the fences and any surrounding prose to isolate the raw Python source. A robust extractor uses a combination of regex matching and heuristic line-by-line scanning:

def extract_code(raw_response: str) -> str:
    # Try to extract from a markdown code fence first.
    fence_match = re.search(
        r"```(?:python)?\n(.*?)```", raw_response, re.DOTALL
    )
    if fence_match:
        return fence_match.group(1).strip()
    # Fall back to finding the first 'def' line.
    lines = raw_response.splitlines()
    start = next(
        (i for i, l in enumerate(lines) if l.startswith("def ")), None
    )
    if start is not None:
        return "\n".join(lines[start:]).strip()
    raise ValueError("No function definition found in LLM response.")

The temperature setting for the code generation LLM call is an important tuning parameter. Higher temperatures produce more creative but less reliable code. Lower temperatures produce more predictable, syntactically correct code but may miss creative solutions to unusual problems. In practice, a temperature of 0.2 strikes the right balance for code generation -- low enough to be reliable, high enough to handle unusual capability descriptions without getting stuck in degenerate patterns.

One of the most powerful features of the generation pipeline is retry logic. If the validator rejects the generated code, the error message from the validator is fed back into the LLM as a correction prompt:

for attempt in range(max_retries):
    raw = await self._call_llm(prompt)
    code = extract_code(raw)
    valid, errors = await self._validator.validate(code)
    if valid:
        return await self._registry.register_from_source(code, tags)
    # Feed errors back to the LLM for self-correction.
    prompt = self._build_correction_prompt(prompt, code, errors)

raise RuntimeError(f"Failed after {max_retries} attempts.")

This retry-with-feedback loop is a microcosm of the broader agentic pattern: observe, reason, act, observe again. The code generator is itself a tiny agent, trying to satisfy the validator's requirements through iterative refinement. In practice, most well-described capabilities are generated correctly on the first attempt. The retry loop handles edge cases and unusual requirements.

6. THE VALIDATOR: TRUST, BUT VERIFY

The validator is the component that makes the entire system safe enough to run in production. Without it, the code generator is a mechanism for arbitrary code execution -- a security nightmare. With it, the system can confidently accept and run LLM-generated code within well-defined boundaries.

The validator operates entirely on the Abstract Syntax Tree (AST) of the generated code, never executing it. This is the key insight: you can learn an enormous amount about what code does by analysing its structure, without running it and risking the consequences of malicious or buggy behaviour.

  VALIDATION PIPELINE

  Source Code (string)
           |
           v
  +--------+--------+
  |                 |
  |   ast.parse()   |  --> SyntaxError caught here
  |                 |
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  | STRUCTURE CHECK |  --> exactly one top-level function?
  |                 |      correct name?
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  | SIGNATURE CHECK |  --> all params annotated?
  |                 |      return type present?
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  | SECURITY SCAN   |  --> forbidden calls? dangerous imports?
  |                 |      network access? file system writes?
  +--------+--------+
           |
           v
  +--------+--------+
  |                 |
  | COMPLEXITY CHECK|  --> too many lines? too deeply nested?
  |                 |
  +-----------------+

The security scan is the most important step. It walks the AST looking for patterns that indicate dangerous behaviour. The list of forbidden patterns includes direct calls to exec() or eval() (which would allow the generated code to execute further arbitrary code), imports of modules like os, sys, subprocess, socket, and shutil (which provide access to the file system, network, and process management), and any use of import or importlib (which could be used to circumvent the import blacklist):

FORBIDDEN_IMPORTS = {
    "os", "sys", "subprocess", "socket", "shutil",
    "importlib", "ctypes", "pickle", "marshal",
}

FORBIDDEN_CALLS = {
    "exec", "eval", "compile", "__import__",
    "open", "breakpoint",
}

class SecurityVisitor(ast.NodeVisitor):
    def visit_Import(self, node: ast.Import) -> None:
        for alias in node.names:
            root = alias.name.split(".")[0]
            if root in FORBIDDEN_IMPORTS:
                self.errors.append(
                    f"Forbidden import: '{alias.name}'"
                )
        self.generic_visit(node)

    def visit_Call(self, node: ast.Call) -> None:
        if isinstance(node.func, ast.Name):
            if node.func.id in FORBIDDEN_CALLS:
                self.errors.append(
                    f"Forbidden call: '{node.func.id}()'"
                )
        self.generic_visit(node)

The complexity check is a subtler safety measure. Code that is extremely long or deeply nested is harder to reason about and more likely to contain bugs or hidden malicious logic. By setting limits on the number of lines (say, 150) and the maximum nesting depth (say, 5), the validator ensures that generated tools remain simple, auditable, and predictable.

The signature check ensures that the generated function has the name that was requested, that all parameters have type annotations, and that the return type is declared. This is not just a style requirement -- it is a functional requirement, because the registry's schema generation depends on these annotations being present.

When the validator passes all checks, it returns a clean success signal. When it fails, it returns a structured list of error messages that are precise enough for the code generator's retry loop to act on. The quality of these error messages directly affects the quality of the self-correction loop -- vague errors produce vague corrections.

It is worth being honest about the limits of static analysis. A sufficiently determined adversary can write malicious code that passes AST-based checks. The validator is a strong first line of defence, not an impenetrable wall. In production systems, additional layers of defence -- sandboxed execution environments, resource limits, human review queues for generated code -- should be considered. The validator buys you a great deal of safety for very little cost, but it is not a substitute for a comprehensive security posture.

7. THE MCP SERVER: PLUGGING INTO THE PROTOCOL

The MCP Server is the component that makes the system interoperable with the broader ecosystem of MCP-compatible clients and agents. It translates between the agent's protocol-level tool calls and the registry's internal Python API.

The server is built on the low-level MCP Python SDK Server class, which gives us fine-grained control over how tools are listed and called. This is important because we need the tool list to be dynamic -- read fresh from the registry on every "tools/list" request -- rather than computed once at startup.

The most important design decision in the MCP server is the separation between meta-tools and dynamic tools. Meta-tools are the built-in tools that allow the agent to manage the registry: generate a new tool, list existing tools, inspect a tool's source code, remove a tool, and get registry statistics. These meta-tools are always present, regardless of what dynamic tools have been registered. Dynamic tools are everything the registry has accumulated at runtime.

  MCP SERVER -- TOOL NAMESPACE

  +-----------------------------------------------------------+
  |  META-TOOLS (always present)                              |
  |    - generate_and_register_tool                           |
  |    - list_registered_tools                                |
  |    - get_tool_source                                      |
  |    - remove_tool                                          |
  |    - get_registry_stats                                   |
  +-----------------------------------------------------------+
  |  DYNAMIC TOOLS (grow at runtime)                          |
  |    - haversine_distance          (generated at 14:03:22)  |
  |    - ascii_map_plotter           (generated at 14:05:11)  |
  |    - compound_interest_calc      (generated at 14:09:44)  |
  |    - ...                                                  |
  +-----------------------------------------------------------+

The "tools/list" handler is elegantly simple precisely because of the registry's clean API. It reads the current state of the registry on every call, ensuring that any tool registered since the last list request is immediately visible:

@self._server.list_tools()
async def handle_list_tools() -> list[mcp_types.Tool]:
    meta_tools = make_meta_tools()
    dynamic_entries = await self._registry.get_all_tools()
    dynamic_tools = [entry_to_mcp_tool(e) for e in dynamic_entries]
    return meta_tools + dynamic_tools

The entry_to_mcp_tool() conversion function is the bridge between the registry's internal ToolEntry representation and the MCP protocol's Tool descriptor. It extracts the name, description, and input schema from the ToolEntry and packages them into the format the protocol expects. This conversion is trivial in code but architecturally significant -- it is the point where the registry's internal world meets the external protocol world.

The "tools/call" handler routes incoming calls to either the meta-tool dispatcher or the registry's call_tool() method. Error handling here is critical: any exception from a tool call must be caught and returned as a structured error message, not allowed to propagate up and crash the server. The agent needs to be able to read the error, understand what went wrong, and decide how to proceed -- perhaps by generating a corrected version of the tool.

One subtle but important implementation detail concerns the MCP SDK's initialisation options. The get_capabilities() method requires a NotificationOptions object -- not None, not omitted, but an explicit instance. Passing None causes an AttributeError deep in the SDK when it tries to access the tools_changed attribute of the notification options. The correct import path is from mcp.server, not from mcp.server.models, a distinction that is easy to miss and painful to debug:

from mcp.server.models import InitializationOptions
from mcp.server import NotificationOptions   # NOT from mcp.server.models

# ...

capabilities = self._server.get_capabilities(
    notification_options=NotificationOptions(),
    experimental_capabilities={},
)

This kind of subtle SDK-version-specific detail is exactly the sort of thing that turns a two-hour integration into a two-day debugging session. Document it. Comment it. Never assume the obvious import path is the correct one.

8. THE AGENT LOOP: WHERE EVERYTHING COMES ALIVE

The agent loop is where all the components come together and the system begins to exhibit genuinely intelligent behaviour. Understanding the loop in detail is essential to understanding why the self-extension mechanism works as smoothly as it does.

The loop begins with a user message. The agent assembles its current context -- the conversation history, the system prompt, and the list of available tools from the MCP server -- and sends this to the LLM. The LLM responds with either a final answer or a tool call request.

  AGENT LOOP -- DETAILED FLOW

  User Message
       |
       v
  +----+----+
  |         |
  | Assemble|  <-- history + system prompt + tool list
  | Context |
  |         |
  +----+----+
       |
       v
  +----+----+
  |         |
  |  LLM    |  <-- OpenAI-compatible API call
  |  Call   |
  |         |
  +----+----+
       |
       +------------------+------------------+
       |                  |                  |
       v                  v                  v
  [Final Answer]   [Tool Call:         [Tool Call:
                    known tool]         unknown tool]
       |                  |                  |
       v                  v                  v
  [Return to       [Execute via        [Call generate_
   User]            Registry]           and_register_tool,
                         |              then retry]
                         v
                   [Observe Result]
                         |
                         v
                   [Back to LLM Call]

The most interesting path through this diagram is the rightmost one: the case where the LLM determines that it needs a tool that does not exist. In a well-prompted system, the LLM is instructed to check the available tool list before attempting a task, and to call generate_and_register_tool if the required capability is absent.

The system prompt for the agent is carefully crafted to encourage this behaviour. It explains the meta-tools, gives examples of when to use them, and explicitly instructs the model to prefer reusing existing tools over generating new ones. This last point is important for efficiency: generating a tool takes several seconds and an LLM API call, so the agent should only do it when genuinely necessary.

The conversation flow for a self-extension event looks something like this in practice:

  USER:  "Calculate the Haversine distance between Paris
          (48.8566 N, 2.3522 E) and London (51.5074 N, 0.1278 W)"

  AGENT: [checks tool list -- no Haversine tool found]
         [calls generate_and_register_tool with description:
          "Calculate the great-circle distance between two GPS
           coordinates using the Haversine formula. Parameters:
           lat1, lon1, lat2, lon2 as floats in decimal degrees.
           Returns distance in kilometres as a float."]

  SYSTEM: [LLM generates code, validator approves, registry stores]
          [tool 'haversine_distance' now available]

  AGENT: [calls haversine_distance(48.8566, 2.3522, 51.5074, -0.1278)]

  TOOL:  343.56

  AGENT: "The Haversine distance between Paris and London is
          approximately 343.56 kilometres."

The entire self-extension event -- from the agent noticing the missing tool to having a result -- takes a few seconds. From the user's perspective, the system simply answered the question. The self-extension happened invisibly, in the background, as a natural part of the agent's reasoning process.

The agent loop also handles the case where a generated tool fails at runtime. If a tool raises an exception, the error is returned to the agent as a structured message. The agent can then inspect the tool's source code using get_tool_source, reason about what went wrong, call generate_and_register_tool with a corrected description, and try again. This creates a self-healing loop that can recover from generation errors without human intervention.

One important implementation detail in the agent loop is the management of the tool list across turns. The agent fetches the tool list at the beginning of each turn, not once at startup. This ensures that tools generated in earlier turns are available in later turns. Without this, the agent would generate the same tool over and over, never realising it had already created it.

9. DYNAMIC AGENTS: BEYOND TOOLS

Everything discussed so far has focused on dynamically generating tools -- individual functions that perform specific computations or actions. But the same architecture can be extended to generate and integrate entirely new agents. This is where the concept of a self-extending system becomes genuinely profound.

Consider the difference between a tool and an agent. A tool is a stateless function: it takes inputs, performs a computation, and returns an output. An agent is a stateful reasoning loop: it maintains context, makes sequences of decisions, calls tools of its own, and pursues goals over multiple steps. An agent is, in a sense, a tool that can think.

In the MCP framework, an agent can be exposed as a tool. From the calling agent's perspective, invoking a sub-agent looks identical to invoking a simple function -- it sends arguments and receives a result. The fact that the sub-agent internally runs a multi-step reasoning loop is an implementation detail hidden behind the tool interface.

  DYNAMIC AGENT GENERATION

  Orchestrator Agent
         |
         | "I need a sub-agent that can research a topic,
         |  summarise findings, and check facts"
         |
         v
  generate_and_register_tool
         |
         v
  Code Generator
         |
         | [generates an agent function that internally
         |  runs its own LLM loop with its own tools]
         |
         v
  Tool Registry
         |
         v
  "research_and_verify_agent" now available as a tool
         |
         v
  Orchestrator calls research_and_verify_agent("quantum computing")
         |
         v
  Sub-agent runs its own loop, returns structured summary

The generated agent function is a Python function that, internally, instantiates an LLM client, runs a reasoning loop, and returns a structured result. From the registry's perspective, it is just another callable. From the orchestrator's perspective, it is just another tool. But from the system's perspective, it is a new cognitive module that did not exist five minutes ago.

This pattern -- orchestrators generating sub-agents on demand -- suggests a path toward genuinely adaptive AI systems. An orchestrator that encounters a complex, multi-step task it cannot handle with its current tools can generate a specialised sub-agent for that task, delegate to it, and incorporate its results. The sub-agent can, in turn, generate its own tools or sub-agents if needed. The system grows its own cognitive architecture in response to the demands placed on it.

The implications are significant enough to warrant a moment of reflection. We are describing a system that can, in principle, expand its own capabilities without limit, in any direction, in response to any demand. This is extraordinarily powerful. It is also, as we will discuss in the next section, a capability that demands extraordinary care.

There are practical limits, of course. Generated agents are only as good as the LLM that generates them. Complex agent architectures are difficult to specify in natural language. The validator's constraints limit what kinds of code can be generated. And the standard-library-only constraint means that generated agents cannot use specialised frameworks without explicit allowlisting. But within these limits, the space of possible dynamically generated agents is vast.

One particularly promising application is the generation of domain- specific agents. An orchestrator handling a medical query might generate a sub-agent specialised in medical literature search and clinical reasoning. An orchestrator handling a financial query might generate a sub-agent specialised in market data analysis and risk calculation. Each of these sub-agents is generated once, stored in the registry, and reused for all subsequent queries of the same type. Over time, the system accumulates a library of specialised cognitive modules, each tailored to a specific domain or task type.

This is, in a very real sense, a form of learning. Not the gradient- descent kind of learning that updates model weights -- but a higher- level, architectural kind of learning that accumulates reusable cognitive structures. The system gets better at handling the kinds of tasks it has seen before, not by changing its underlying model, but by building up a library of tools and agents that encode hard-won problem-solving strategies.

10. SECURITY, SAFETY, AND THE RESPONSIBLE PATH FORWARD

No article about dynamically executing LLM-generated code would be complete without an honest, detailed discussion of the risks. The self-extending agent is a powerful system, and powerful systems can cause serious harm if deployed carelessly.

The threat model has several distinct layers, and each requires its own mitigation strategy.

  THREAT MODEL

  +----------------------------------------------------------+
  |  THREAT LAYER 1: Prompt Injection                        |
  |  A malicious user crafts a prompt that causes the agent  |
  |  to generate a tool with harmful behaviour.              |
  |  MITIGATION: Validator, sandboxing, human review queue.  |
  +----------------------------------------------------------+
  |  THREAT LAYER 2: LLM Hallucination                       |
  |  The code generator produces plausible-looking but       |
  |  incorrect code that passes validation but fails at      |
  |  runtime in subtle ways.                                 |
  |  MITIGATION: Retry loop, runtime error capture, testing. |
  +----------------------------------------------------------+
  |  THREAT LAYER 3: Resource Exhaustion                     |
  |  A generated tool enters an infinite loop or allocates   |
  |  unbounded memory.                                       |
  |  MITIGATION: Execution timeouts, memory limits,          |
  |              asyncio.wait_for() wrappers.                |
  +----------------------------------------------------------+
  |  THREAT LAYER 4: Registry Pollution                      |
  |  The registry accumulates many low-quality or redundant  |
  |  tools, degrading performance and confusing the agent.   |
  |  MITIGATION: Tool quality scoring, automatic pruning,    |
  |              human review of the tool catalogue.         |
  +----------------------------------------------------------+
  |  THREAT LAYER 5: Cascading Failures                      |
  |  A generated sub-agent generates its own sub-agents,     |
  |  which generate further sub-agents, consuming resources  |
  |  without bound.                                          |
  |  MITIGATION: Recursion depth limits, generation budgets, |
  |              circuit breakers.                           |
  +----------------------------------------------------------+

The validator addresses the most immediate security concern -- arbitrary code execution -- but it is not sufficient on its own. A defence-in- depth approach is required. The execution environment for generated tools should be sandboxed, ideally using a container or a restricted Python interpreter that limits access to the file system, network, and process management. The asyncio.wait_for() wrapper should be applied to all tool calls to enforce execution time limits. Memory usage should be monitored and capped.

The prompt injection threat is particularly subtle. A user who understands the system's architecture might craft a capability description that sounds innocent but produces harmful code. For example, a description like "a tool that reads configuration from the environment and returns it" might produce code that exfiltrates environment variables. The validator's import blacklist catches the most obvious cases (os.environ requires the os module), but creative attackers may find ways around it. Human review of generated code, at least for sensitive deployments, is a prudent additional safeguard.

The registry pollution problem is underappreciated. As the system runs over time, it accumulates tools. Some of these tools are high-quality and frequently used. Others are one-off solutions to unusual problems, generated once and never called again. A large, cluttered registry degrades the agent's ability to find the right tool -- the tool list becomes so long that the LLM's attention is diluted across hundreds of options. Automatic pruning strategies -- removing tools that have not been called in a certain time window, or that have a high error rate -- help keep the registry lean and useful.

The cascading agent generation problem is the most exotic risk but also the most important to think about in advance. If agents can generate sub-agents, and sub-agents can generate their own sub-agents, the system has the potential to grow without bound. A generation budget -- a hard limit on the number of tools or agents that can be generated in a single session -- is a simple and effective safeguard. A recursion depth limit prevents sub-agents from spawning their own sub-agents beyond a configurable depth.

Finally, it is worth reflecting on the broader ethical dimension. A system that can extend its own capabilities is a system that can surprise its operators. The tools it generates may solve problems in unexpected ways. The agents it creates may exhibit emergent behaviours that were not anticipated. This is not necessarily bad -- surprising solutions are often the best ones -- but it requires a culture of careful monitoring, transparent logging, and willingness to intervene.

The system described in this article logs all generated code, all validation results, all tool calls, and all errors. This logging is not optional -- it is the foundation of the human oversight that makes the system trustworthy. An operator who can see exactly what code was generated, when, why, and with what results, is an operator who can maintain meaningful control over a self-extending system.

11. CONCLUSION: THE BEGINNING OF SOMETHING LARGER

We began with a simple, uncomfortable observation: tool-based AI systems are always incomplete. The world is infinite; your tool list is not. Every static system will eventually encounter a user need it cannot meet.

The self-extending agent architecture described in this article is a principled, practical response to that observation. By combining a dynamic tool registry, an LLM-powered code generator, a static analysis validator, and the Model Context Protocol, we can build systems that grow their own capabilities in real time, in response to real user needs, without human intervention and without restarting.

The key insights of this architecture are worth restating clearly.

The separation between meta-tools and dynamic tools is what makes the system self-referential without being self-destructive. The agent can manage its own tool set using the same interface it uses to call tools, without any special-casing in the agent loop itself.

The validator is what makes the system safe enough to deploy. Static AST analysis cannot catch every possible threat, but it catches the most common and most dangerous ones, and it does so without executing any code. The cost of validation is negligible; the benefit is enormous.

The retry-with-feedback loop is what makes the system robust. LLMs make mistakes. Generated code is sometimes wrong. A system that can observe its own failures, reason about them, and try again is a system that can recover from the inevitable imperfections of LLM-generated code.

The living tool list -- fetched fresh on every agent turn -- is what makes the system coherent across time. Tools generated in earlier turns are available in later turns. The agent's capabilities accumulate, turn by turn, building toward a richer and richer problem-solving repertoire.

And the extension to dynamic agents is what makes the system genuinely open-ended. When the unit of dynamic generation is not a function but an agent -- a reasoning loop with its own tools and its own context -- the system can grow cognitive structures of arbitrary complexity. It can specialise, delegate, and orchestrate in ways that were not anticipated at design time.

+------------------------------------------------------------------+
|                                                                  |
|   The most interesting AI systems are not the ones that          |
|   know the most at the start.                                    |
|                                                                  |
|   They are the ones that learn the fastest --                    |
|   not by updating their weights,                                 |
|   but by building their own tools.                               |
|                                                                  |
+------------------------------------------------------------------+

We are at the very beginning of understanding what self-extending AI systems can do. The architecture described here is a starting point, not a destination. The tool registry will grow more sophisticated. The validator will become more nuanced. The code generator will learn to produce better code from richer descriptions. The agent loop will develop more intelligent strategies for deciding when to generate versus when to reuse.

But the fundamental insight -- that an AI system can be given the ability to build its own tools, and that this ability transforms a static, brittle system into a dynamic, adaptive one -- that insight is here to stay. The hammer that forges its own hammers is a different kind of tool entirely.

And we have only just begun to understand what it can build.

  +------------------------------------------------------------+
  |  APPENDIX: COMPONENT SUMMARY                               |
  +------------------------------------------------------------+
  |                                                            |
  |  tool_registry.py   -- ToolEntry, DynamicToolRegistry      |
  |  code_generator.py  -- CodeGenerationPipeline              |
  |  tool_validator.py  -- ToolValidator, SecurityVisitor      |
  |  mcp_server.py      -- DynamicMCPServer, meta-tools        |
  |  agent.py           -- Agent loop, tool-call dispatch      |
  |  main.py            -- Entry point, component wiring       |
  |                                                            |
  +------------------------------------------------------------+
  |  KEY DEPENDENCIES                                          |
  +------------------------------------------------------------+
  |                                                            |
  |  mcp          -- Model Context Protocol Python SDK         |
  |  anyio        -- Async I/O abstraction layer               |
  |  openai       -- OpenAI-compatible API client              |
  |  asyncio      -- Python standard library async runtime     |
  |                                                            |
  +------------------------------------------------------------+

The Evolution of Software Development with Model-Driven Approaches and LLMs

Model-Driven Development

Model-Driven Development, often abbreviated as MDD, represents a paradigm shift in software engineering, moving the focus from writing code directly to creating abstract models of a system. At its core, MDD is a software development methodology that emphasizes the creation and use of domain models as primary artifacts throughout the software lifecycle. These models are not merely documentation; they are central to the development process, serving as the basis for understanding, designing, implementing, and even testing software systems. The fundamental idea behind MDD is to elevate the level of abstraction, allowing developers to concentrate on the "what" a system should do rather than getting immediately bogged down in the "how" it will be implemented. This separation of concerns aims to improve productivity, enhance software quality, facilitate maintainability, and promote platform independence.

The benefits of adopting an MDD approach are manifold. By working with higher-level abstractions, developers can manage complexity more effectively, especially in large and intricate systems. Models provide a clear, unambiguous representation of the system, which can reduce miscommunication among stakeholders, including business analysts, architects, and developers. Furthermore, MDD can lead to increased productivity through automation, as models can be used to generate significant portions of the executable code or other artifacts. This automation also contributes to improved quality by ensuring consistency between the model and the generated code, reducing the likelihood of manual coding errors. Software systems developed with MDD can also be more adaptable to changing technologies and platforms, as the platform-independent models can be retargeted to different implementation technologies through various transformations.

MDD encompasses several key concepts and subcategories that define its breadth and applicability. One prominent subcategory is Model-Driven Architecture, or MDA, which is a specific approach promoted by the Object Management Group (OMG). MDA distinguishes between Platform Independent Models (PIMs) and Platform Specific Models (PSMs). A PIM describes the system's functionality and structure without considering any specific implementation technology, while a PSM adapts the PIM to a particular platform, such as Java Enterprise Edition or .NET. The transformation from PIM to PSM, and subsequently to code, is a cornerstone of MDA.

Another crucial aspect of MDD involves Domain-Specific Languages, or DSLs. These are specialized programming or modeling languages tailored to a particular application domain, in contrast to general-purpose languages like Java or Python. DSLs allow domain experts to express solutions in terms that are natural to their problem space, often leading to more concise, understandable, and verifiable models. They are fundamental to capturing domain knowledge directly within the development process.

Code Generation is perhaps the most visible outcome of MDD. It is the automated process of producing source code in a target programming language directly from models. This can range from generating boilerplate code, such as data access objects or user interface components, to generating entire application layers. The effectiveness of code generation heavily relies on the precision and completeness of the input models.

Model Transformation is the process of converting one model into another. This can involve transforming a PIM into a PSM, refining a high-level conceptual model into a more detailed logical model, or even translating models between different modeling languages. Transformation rules, often expressed in specialized transformation languages, define how elements in the source model map to elements in the target model.

Finally, Model-Based Testing, or MBT, is a technique where test cases are derived from models of the system under test. By generating test cases directly from behavioral models, MBT aims to improve the quality and coverage of testing, ensuring that the implemented system behaves as specified in its models. This approach can automate the creation of test suites, making the testing process more efficient and thorough.

The Role of Large Language Models in Model-Driven Development

Large Language Models, or LLMs, are rapidly transforming various aspects of software development, and their potential to augment Model-Driven Development is particularly significant. LLMs can act as intelligent co-pilots, assisting human developers and modelers by automating repetitive tasks, generating initial drafts, and bridging the gap between natural language requirements and formal models. Their ability to understand, generate, and transform text makes them uniquely suited to interact with the textual representations often found in modeling artifacts and code.

There are several areas within MDD where LLMs are proving to be highly beneficial, significantly enhancing the efficiency and effectiveness of the development process. One such area is the Initial Model Creation from Natural Language. LLMs can interpret textual requirements, user stories, or functional specifications and translate them into preliminary model structures. For instance, a natural language description of system entities and their relationships can be converted into a class diagram or an entity-relationship model. This capability greatly accelerates the initial modeling phase, providing a solid starting point for human modelers.

LLMs are also highly effective in Assisting with Domain-Specific Language (DSL) Definition and Usage. They can help in designing the grammar and syntax of a new DSL based on a description of the domain concepts and desired expressiveness. Furthermore, once a DSL is defined, LLMs can assist users in writing correct DSL statements, suggest completions, or even generate DSL code from higher-level natural language descriptions. This democratizes the use of DSLs, making them more accessible to domain experts who might not be proficient in formal language syntax.

Another impactful application is Accelerating Code Generation from Models. While traditional MDD tools use predefined templates and transformation rules for code generation, LLMs can augment this process by generating specific code segments or filling in implementation details based on a model's structure and the target programming language. For example, given a class model, an LLM can generate data access layer code, API endpoints, or even simple business logic methods. This is particularly useful for generating boilerplate code or implementing standard design patterns.

Supporting Model Transformation Rule Development is another area where LLMs can provide substantial help. Defining complex transformation rules between different models (e.g., from a Platform Independent Model to a Platform Specific Model) can be a challenging and error-prone task. LLMs can assist by suggesting transformation patterns, generating initial rule sets based on examples, or helping to refine existing rules by explaining their effects.

Finally, LLMs are excellent at Generating Documentation and Test Cases. From a well-defined model, an LLM can automatically produce comprehensive documentation, including descriptions of entities, relationships, and behaviors. Similarly, for Model-Based Testing, LLMs can generate test scenarios, test data, and even executable test code based on the behavioral specifications embedded in models, thereby improving testing coverage and efficiency.

However, it is equally important to recognize areas within MDD where LLMs are less suitable or require significant human oversight due to their inherent limitations. One such area is Formal Verification and Proofs of Model Correctness. While LLMs can generate formal specifications or even suggest logical assertions, their current capabilities do not extend to performing rigorous mathematical proofs or symbolic reasoning to verify the correctness, completeness, or consistency of complex models. These tasks typically require specialized formal methods tools and human expertise in logic and mathematics. LLMs are pattern matchers and generators, not theorem provers.

Deep Semantic Understanding of Highly Complex, Nuanced Domain Logic also presents a challenge for LLMs. While they can handle common patterns and well-documented domains, accurately interpreting and generating models or code for highly intricate, subtle, or novel business logic often requires a level of contextual understanding and reasoning that current LLMs may lack. Ambiguities in natural language requirements, especially in critical systems, can lead to incorrect or incomplete model generation, necessitating extensive human review and correction.

Furthermore, driving Novel Modeling Paradigm Innovation is not a strength of LLMs. They excel at synthesizing and recombining existing knowledge and patterns. Creating entirely new, groundbreaking modeling languages, notations, or architectural paradigms that fundamentally change how we approach software design remains a human-driven creative and intellectual endeavor. LLMs can assist in the *implementation* of new paradigms once conceived, but not in their initial conceptualization.

Unlocking Maximum Value: Where LLMs Shine Brightest in MDD

To truly harness the power of Large Language Models in Model-Driven Development, it is essential to focus on the applications where their capabilities align most effectively with MDD's goals. These are the areas where LLMs can provide the greatest leverage, accelerating development and enhancing quality without requiring them to perform tasks beyond their current strengths.

One of the most impactful applications is Generating Conceptual and Logical Models from Requirements. The initial phase of any software project involves gathering and understanding requirements, often expressed in natural language. Translating these informal descriptions into structured models is a time-consuming and error-prone process. LLMs can significantly streamline this by acting as an intelligent interpreter. For instance, given a set of user stories or a detailed functional specification, an LLM can generate a first-draft conceptual model, such as a class diagram, an entity-relationship diagram, or a basic state machine. This output provides a concrete starting point for human modelers, who can then review, refine, and elaborate upon it.

Consider a scenario where we need to model a simple Library Management System. A natural language requirement might state: "The system should manage books and users. Users can borrow books. Each book has a title, author, and ISBN. Users have a name and a unique ID. A book can be borrowed by only one user at a time. The system needs to track the due date for each borrowed book." An LLM can process this and generate a textual representation of a class diagram, perhaps in a format like PlantUML, which can then be rendered visually.

Here is an example of how an LLM might generate a PlantUML description from a natural language requirement:

// Natural Language Requirement:

// "The system should manage books and users. Users can borrow books.

// Each book has a title, author, and ISBN. Users have a name and a unique ID.

// A book can be borrowed by only one user at a time.

// The system needs to track the due date for each borrowed book."

// LLM-generated PlantUML model:

@startuml

class Book {

- String title

- String author

- String isbn

- Date dueDate

}

class User {

- String userId

- String name

}

Book "1" -- "0..1" User : borrowed by >

@enduml

This PlantUML snippet, generated by an LLM, clearly defines the `Book` and `User` classes with their respective attributes and establishes a one-to-one or one-to-zero-or-one relationship between them, indicating that a book can be borrowed by at most one user. The `dueDate` attribute is also correctly placed within the `Book` class as it tracks the loan status. This output, while a draft, saves considerable manual effort and provides a structured basis for further modeling.

Another area where LLMs provide immense value is Automating Code Generation for Standard Patterns. Once a model is sufficiently detailed and precise, LLMs can be leveraged to generate executable code, particularly for common architectural layers or repetitive components. This includes generating data access objects (DAOs), repository interfaces, REST API endpoints, or even simple UI forms based on the structure defined in the models. The consistency and speed with which LLMs can generate this boilerplate code significantly reduce development time and minimize errors that often arise from manual, repetitive coding.

Continuing with our Library Management System example, once the class model is refined, an LLM can generate Python classes that represent these entities and a basic repository pattern for data persistence.

// LLM-generated Python class for Book entity

import datetime

class Book:

"""

Represents a book in the library system.

"""

def __init__(self, title: str, author: str, isbn: str, book_id: str = None):

if not title or not author or not isbn:

raise ValueError("Title, author, and ISBN cannot be empty.")

self.book_id = book_id if book_id else self._generate_id()

self.title = title

self.author = author

self.isbn = isbn

self.borrower_id: str = None # ID of the user who borrowed the book

self.due_date: datetime.date = None

def _generate_id(self) -> str:

"""Generates a simple unique ID for the book. In a real system, this would be more robust."""

import uuid

return str(uuid.uuid4())

def __repr__(self) -> str:

return f"Book(ID='{self.book_id}', Title='{self.title}', Author='{self.author}', ISBN='{self.isbn}', Borrower='{self.borrower_id}', DueDate='{self.due_date}')"

def is_borrowed(self) -> bool:

"""Checks if the book is currently borrowed."""

return self.borrower_id is not None

def borrow(self, user_id: str, due_date: datetime.date):

"""Marks the book as borrowed by a user with a specific due date."""

if self.is_borrowed():

raise ValueError(f"Book '{self.title}' is already borrowed.")

self.borrower_id = user_id

self.due_date = due_date

def return_book(self):

"""Marks the book as returned."""

if not self.is_borrowed():

raise ValueError(f"Book '{self.title}' is not currently borrowed.")

self.borrower_id = None

self.due_date = None

// LLM-generated Python class for User entity

class User:

"""

Represents a user in the library system.

"""

def __init__(self, name: str, user_id: str = None):

if not name:

raise ValueError("User name cannot be empty.")

self.user_id = user_id if user_id else self._generate_id()

self.name = name

def _generate_id(self) -> str:

"""Generates a simple unique ID for the user."""

import uuid

return str(uuid.uuid4())

def __repr__(self) -> str:

return f"User(ID='{self.user_id}', Name='{self.name}')"

This code snippet demonstrates how an LLM can generate well-structured Python classes complete with `__init__` methods, basic validation, unique ID generation, and domain-specific methods like `borrow` and `return_book`. The generated code adheres to clean code principles by encapsulating logic within the classes and providing clear method names.

Lastly, Facilitating Domain-Specific Language (DSL) Adoption is another critical area where LLMs excel. DSLs are powerful for expressing domain logic concisely, but their creation and usage can be challenging. LLMs can assist in several ways. They can help in defining the grammar for a new DSL by suggesting syntax rules based on a natural language description of the domain's operations and concepts. Once a DSL is defined, LLMs can then be used to interpret DSL statements and generate executable code from them. This enables domain experts to write business rules or system configurations in a language they understand, with the LLM acting as the translator to underlying programming constructs.

For our library system, we might define a simple DSL for loan rules. For example, a rule could be "A user cannot borrow more than 3 books simultaneously" or "New books cannot be borrowed for the first week after acquisition." An LLM can help define the structure for such rules and then interpret them.

Here is a simplified example of a DSL rule and how an LLM might generate code from it:

// DSL Rule for loan policy

// RULE: Maximum_Books_Per_User

// DESCRIPTION: A user cannot borrow more than 3 books at any given time.

// CONDITION: user.borrowed_books_count >= 3

// ACTION: deny_loan("User has reached maximum borrowing limit.")

// LLM-generated Python function from the DSL rule

def check_maximum_books_per_user(user_id: str, library_repository) -> bool:

"""

Checks if a user has exceeded the maximum number of borrowed books.

Args:

user_id: The ID of the user.

library_repository: An instance of the LibraryRepository for data access.

Returns:

True if the user can borrow more books, False otherwise.

"""

borrowed_books = library_repository.get_borrowed_books_by_user(user_id)

if len(borrowed_books) >= 3:

print(f"Loan denied for user {user_id}: User has reached maximum borrowing limit.")

return False

return True

This example illustrates how an LLM can parse a structured DSL rule and translate it into a functional Python method, complete with comments and a clear return value. This significantly empowers domain experts to directly influence system behavior through DSLs, while LLMs handle the translation into executable code.

Practical Application: Leveraging LLMs for Enhanced MDD Workflows

Integrating Large Language Models into Model-Driven Development workflows involves a systematic approach, leveraging their generative and interpretive capabilities at various stages. The process typically follows an iterative human-in-the-loop methodology, where LLMs provide initial drafts and assistance, and human experts provide critical review and refinement.

Step 1: Requirements to Initial Model Draft

The first step in leveraging LLMs for MDD is to transform raw, natural language requirements into structured, preliminary models. This phase is crucial for establishing a common understanding and a formal basis for subsequent development.

Prompt Engineering for Model Generation: The effectiveness of an LLM in generating models from requirements heavily depends on the quality of the prompt. A well-crafted prompt should clearly state the goal (e.g., "Generate a PlantUML class diagram"), provide the natural language requirements, specify any desired modeling elements (e.g., "include attributes and relationships"), and indicate the preferred output format.

Consider the requirements for our Library Management System:

The Library Management System needs to manage books and users.
Books have a title, author, ISBN, and a unique book ID.
Users have a name and a unique user ID.
The system must track which user has borrowed which book, including the date of borrowing and the due date.
A book can only be borrowed by one user at a time.
The system should also manage loan transactions, recording who borrowed what and when.

A prompt to an LLM might look like this:

"Generate a PlantUML class diagram for a Library Management System based on the following requirements. Include classes for Book, User, and Loan. Define attributes for each class. Show relationships and their cardinalities.

Requirements:

- Books have a title, author, ISBN, and a unique book ID.

- Users have a name and a unique user ID.

- The system must track which user has borrowed which book, including the date of borrowing and the due date.

- A book can only be borrowed by one user at a time.

- The system should also manage loan transactions, recording who borrowed what and when."

The LLM would then generate a PlantUML diagram similar to the one shown previously, but potentially more detailed, including a `Loan` class to track transactions explicitly.

@startuml

class Book {

- String bookId

- String title

- String author

- String isbn

}

class User {

- String userId

- String name

}

class Loan {

- String loanId

- Date borrowDate

- Date dueDate

}

User "1" -- "*" Loan : makes >

Book "1" -- "0..1" Loan : is part of >

Loan "1" -- "1" Book : references >

Loan "1" -- "1" User : references >

@enduml

This more refined model now includes a `Loan` class, which is a common pattern for many-to-many relationships with additional attributes (like `borrowDate` and `dueDate`). The LLM, guided by the prompt, has correctly identified the need for an associative class to manage the borrowing relationship and its associated data.

Step 2: Refining Models and Generating Code

Once an initial model draft is generated, the next phase involves human review, refinement, and subsequent code generation. This is an iterative process where the model evolves, and LLMs continue to assist in translating these refined models into executable code.

Iterative Model Enhancement: Human modelers review the LLM-generated model for correctness, completeness, and adherence to architectural standards. They might add more attributes, methods, or relationships, introduce inheritance, or apply design patterns. These refined models can then be fed back to the LLM with prompts for further refinement or for generating specific code artifacts.

Consider the refined PlantUML model for the Library Management System. We can now ask the LLM to generate Python ORM classes and a basic repository pattern for data persistence.

Prompt to LLM:

"Generate Python classes for the Book, User, and Loan entities based on the provided PlantUML diagram. Include appropriate data types, constructors, and basic methods for borrowing/returning books within the entities. Also, create a simple `LibraryRepository` class that manages these entities in memory, providing methods for adding, retrieving, updating, and deleting books, users, and loans. Ensure the `LibraryRepository` handles the logic for borrowing and returning books by updating the `Loan` and `Book` statuses.

PlantUML Diagram:

(Paste the PlantUML diagram from above here)"

The LLM would then generate Python code that reflects these entities and the repository, similar to the snippets shown earlier, but now including the `Loan` entity and the repository logic. This generated code would serve as a robust foundation, allowing developers to focus on implementing complex business logic rather than boilerplate.

Step 3: DSL-Driven Business Logic with LLMs

Domain-Specific Languages are powerful for capturing business rules directly. LLMs can greatly facilitate both the definition and the application of DSLs within an MDD context.

Defining a DSL with LLM assistance: An LLM can help design the syntax and semantics of a DSL. By providing examples of desired rules in natural language, the LLM can suggest a formal grammar or a structured textual format for the DSL. This empowers domain experts to contribute directly to the system's logic.

Running Example: DSL for Loan Rules and Code Generation.

Let's define a simple DSL for library loan policies. We want rules like:

1. "A user cannot borrow more than 3 books simultaneously."

2. "New books cannot be borrowed for the first 7 days after acquisition."

3. "A user must return overdue books before borrowing new ones."

Prompt to LLM:

"Design a simple, structured text-based DSL for defining library loan policies. Each rule should have a unique name, a description, a condition, and an action. Then, for the following three rules, provide their representation in the designed DSL format. Finally, generate Python functions that implement these DSL rules, assuming access to a `LibraryRepository` instance."

The LLM might propose a DSL structure and then apply it:

// Proposed DSL Structure:

// RULE: <RuleName>

// DESCRIPTION: <Human-readable description>

// CONDITION: <Logical expression using domain terms>

// ACTION: <System action to take if condition is met>

// DSL Rule 1: Maximum_Books_Per_User

RULE: Maximum_Books_Per_User

DESCRIPTION: A user cannot borrow more than 3 books at any given time.

CONDITION: user.get_borrowed_books_count() >= 3

ACTION: deny_loan("User has reached maximum borrowing limit.")

// DSL Rule 2: New_Book_Embargo

RULE: New_Book_Embargo

DESCRIPTION: New books cannot be borrowed for the first 7 days after acquisition.

CONDITION: book.acquisition_date + 7_days > current_date

ACTION: deny_loan("Book is too new to be borrowed.")

// DSL Rule 3: Overdue_Book_Block

RULE: Overdue_Book_Block

DESCRIPTION: A user must return overdue books before borrowing new ones.

CONDITION: user.has_overdue_books()

ACTION: deny_loan("User has overdue books.")

Following this, the LLM would generate Python code for each rule, interpreting the `CONDITION` and `ACTION` parts into executable logic.

// LLM-generated Python function for Maximum_Books_Per_User rule

def enforce_maximum_books_per_user(user_id: str, library_repository) -> bool:

"""

Enforces the rule that a user cannot borrow more than 3 books.

Returns True if the user can borrow, False otherwise.

"""

borrowed_books = library_repository.get_borrowed_books_by_user(user_id)

if len(borrowed_books) >= 3:

print(f"Loan denied for user {user_id}: User has reached maximum borrowing limit.")

return False

return True

// LLM-generated Python function for New_Book_Embargo rule

import datetime

def enforce_new_book_embargo(book_id: str, library_repository) -> bool:

"""

Enforces the rule that new books cannot be borrowed for the first 7 days.

Returns True if the book can be borrowed, False otherwise.

"""

book = library_repository.get_book_by_id(book_id)

if book and book.acquisition_date: # Assuming book has an acquisition_date

if (datetime.date.today() - book.acquisition_date).days < 7:

print(f"Loan denied for book '{book.title}': Book is too new to be borrowed.")

return False

return True

// LLM-generated Python function for Overdue_Book_Block rule

def enforce_overdue_book_block(user_id: str, library_repository) -> bool:

"""

Enforces the rule that a user cannot borrow new books if they have overdue books.

Returns True if the user can borrow, False otherwise.

"""

overdue_books = library_repository.get_overdue_books_by_user(user_id)

if overdue_books:

print(f"Loan denied for user {user_id}: User has overdue books.")

return False

return True

This systematic approach demonstrates how LLMs can be integrated into MDD, from initial model conceptualization to the implementation of domain-specific business logic, significantly enhancing productivity and consistency.

Constituents and Details: A Deeper Dive

The effective integration of Large Language Models into Model-Driven Development relies on understanding several key constituents and operational details. It is not about replacing human intelligence but augmenting it, creating a powerful synergy between human expertise and AI capabilities.

The LLM as an Intelligent Co-Pilot:

In the MDD context, the LLM should be viewed as an intelligent co-pilot rather than an autonomous agent. It excels at tasks that involve pattern recognition, synthesis of information, and translation between different representations (natural language, formal models, code). The LLM can quickly generate initial drafts, explore alternatives, and provide suggestions, freeing up human modelers and developers to focus on higher-level design decisions, critical thinking, and complex problem-solving. This collaborative model ensures that human oversight and domain expertise remain central to the development process, mitigating the risks associated with LLM limitations such as hallucination or misinterpretation.

Prompt Engineering: The Art of Effective Communication with LLMs for MDD:

The quality of the output from an LLM is directly proportional to the quality of the input prompt. Prompt engineering is therefore a critical skill for leveraging LLMs in MDD. Effective prompts are clear, specific, and provide sufficient context. They often include:

Clear Instructions: Explicitly state the desired task (e.g., "Generate a PlantUML diagram," "Write Python code").
Contextual Information: Provide all necessary background, such as requirements, existing model fragments, or desired architectural patterns.
Output Format Specification: Define the expected format of the output (e.g., "in PlantUML syntax," "as Python classes," "formatted as a DSL rule").
Constraints and Examples: Specify any limitations or provide examples of desired output to guide the LLM.
Iterative Refinement: Instead of expecting a perfect output in one go, prompts can be designed for iterative refinement, where previous LLM outputs are fed back with human corrections or additional instructions.

Integration Strategies:

LLMs can be integrated into MDD environments through various mechanisms:

API Integration: Many LLM providers offer APIs that allow developers to programmatically send prompts and receive responses. This enables custom tools or plugins to be built that interact with LLMs, embedding their capabilities directly into existing MDD platforms or IDEs.
Plugins and Extensions: For popular MDD tools or IDEs (e.g., Eclipse-based modeling tools, VS Code), LLM functionalities can be exposed through dedicated plugins that provide contextual assistance, code generation, or model transformation capabilities.
Custom Tools: Organizations can develop custom tools that orchestrate interactions between users, LLMs, and MDD frameworks. These tools can manage prompt templates, parse LLM outputs, and integrate them with model repositories.

Human-in-the-Loop: The Indispensable Role of Human Expertise:

Despite the advanced capabilities of LLMs, the "human-in-the-loop" principle is paramount in MDD. Human modelers and developers are essential for:

Validation and Verification: Critically reviewing LLM-generated models and code for correctness, completeness, and adherence to domain-specific nuances or architectural standards.
Ambiguity Resolution: Resolving ambiguities in natural language requirements that LLMs might misinterpret.
Strategic Decision Making: Making high-level architectural decisions, selecting appropriate modeling paradigms, and defining overall system strategy.
Innovation: Driving true innovation in modeling techniques, DSL design, and problem-solving approaches.
Error Correction: Identifying and correcting "hallucinations" or logical errors in LLM outputs.

Challenges and Considerations:

While promising, integrating LLMs into MDD also presents challenges:

Model Accuracy and Hallucination: LLMs can sometimes generate plausible but incorrect or non-existent information, known as hallucination. This necessitates rigorous human review of all generated artifacts.
Context Window Limitations: For very large or complex models, the LLM's context window (the amount of text it can process at once) might be a limiting factor, requiring strategies for breaking down problems or iterative processing.
Security and Data Privacy: When using external LLM services, considerations around sensitive data in prompts and outputs, intellectual property, and compliance with data privacy regulations (e.g., GDPR) are crucial.
Explainability and Trust: Understanding why an LLM generated a particular model or code snippet can be challenging, impacting trust and debuggability.
Integration Complexity: Seamlessly integrating LLM outputs into existing MDD toolchains and ensuring compatibility with various modeling languages and code generation frameworks can be complex.

Conclusion: The Future of MDD with AI Augmentation

The convergence of Model-Driven Development and Large Language Models marks a significant evolution in software engineering. MDD, with its emphasis on abstraction and automation, provides a structured framework, while LLMs offer unprecedented capabilities for intelligent assistance, generation, and transformation of textual and code artifacts.

By strategically applying LLMs to tasks such as initial model generation from natural language requirements, accelerating boilerplate code generation from refined models, and facilitating the definition and use of Domain-Specific Languages, organizations can unlock substantial gains in productivity, consistency, and overall software quality. LLMs act as powerful co-pilots, enabling human modelers and developers to operate at higher levels of abstraction, focusing on creative problem-solving and critical design decisions rather than repetitive manual tasks.

While challenges related to accuracy, verification, and integration persist, the iterative, human-in-the-loop approach ensures that these advanced AI capabilities are harnessed responsibly. The future of MDD is not one where AI replaces human expertise, but rather one where it profoundly augments it, leading to more efficient, robust, and adaptable software systems. As LLM technology continues to advance, its role in streamlining and enhancing the model-driven paradigm will only grow, paving the way for a new era of intelligent software development.

Addendum: Full Running Example Code - Library Management System

This section provides the complete, runnable Python code for the Library Management System, integrating the concepts discussed in the article, including entity classes, a simple in-memory repository, and the application of DSL-driven business rules. The whole example was generated by Antrophic’s Claude 4 Sonnet.

import datetime

import uuid

from typing import List, Optional, Dict

# --- 1. Entity Classes ---

class Book:

"""

Represents a book in the library system.

Attributes:

book_id (str): Unique identifier for the book.

title (str): The title of the book.

author (str): The author of the book.

isbn (str): The International Standard Book Number.

acquisition_date (datetime.date): The date the book was acquired by the library.

borrower_id (Optional[str]): The ID of the user who borrowed the book, if any.

due_date (Optional[datetime.date]): The date the book is due, if borrowed.

"""

def __init__(self, title: str, author: str, isbn: str, acquisition_date: datetime.date, book_id: str = None):

if not title or not author or not isbn:

raise ValueError("Title, author, and ISBN cannot be empty.")

if not isinstance(acquisition_date, datetime.date):

raise TypeError("Acquisition date must be a datetime.date object.")

self.book_id = book_id if book_id else str(uuid.uuid4())

self.title = title

self.author = author

self.isbn = isbn

self.acquisition_date = acquisition_date

self.borrower_id: Optional[str] = None

self.due_date: Optional[datetime.date] = None

def __repr__(self) -> str:

return (f"Book(ID='{self.book_id}', Title='{self.title}', Author='{self.author}', "

f"ISBN='{self.isbn}', AcqDate='{self.acquisition_date}', "

f"Borrower='{self.borrower_id}', DueDate='{self.due_date}')")

def is_borrowed(self) -> bool:

"""Checks if the book is currently borrowed."""

return self.borrower_id is not None

def borrow(self, user_id: str, due_date: datetime.date):

"""

Marks the book as borrowed by a user with a specific due date.

Raises ValueError if the book is already borrowed.

"""

if self.is_borrowed():

raise ValueError(f"Book '{self.title}' (ID: {self.book_id}) is already borrowed.")

self.borrower_id = user_id

self.due_date = due_date

def return_book(self):

"""

Marks the book as returned.

Raises ValueError if the book is not currently borrowed.

"""

if not self.is_borrowed():

raise ValueError(f"Book '{self.title}' (ID: {self.book_id}) is not currently borrowed.")

self.borrower_id = None

self.due_date = None

def is_overdue(self) -> bool:

"""Checks if the book is currently borrowed and overdue."""

return self.is_borrowed() and self.due_date < datetime.date.today()

class User:

"""

Represents a user in the library system.

Attributes:

user_id (str): Unique identifier for the user.

name (str): The name of the user.

"""

def __init__(self, name: str, user_id: str = None):

if not name:

raise ValueError("User name cannot be empty.")

self.user_id = user_id if user_id else str(uuid.uuid4())

self.name = name

def __repr__(self) -> str:

return f"User(ID='{self.user_id}', Name='{self.name}')"

class Loan:

"""

Represents a loan transaction in the library system.

Attributes:

loan_id (str): Unique identifier for the loan.

book_id (str): The ID of the borrowed book.

user_id (str): The ID of the user who borrowed the book.

borrow_date (datetime.date): The date the book was borrowed.

due_date (datetime.date): The date the book is due to be returned.

return_date (Optional[datetime.date]): The actual date the book was returned, if any.

"""

def __init__(self, book_id: str, user_id: str, borrow_date: datetime.date, due_date: datetime.date, loan_id: str = None):

if not book_id or not user_id:

raise ValueError("Book ID and User ID cannot be empty for a loan.")

if not isinstance(borrow_date, datetime.date) or not isinstance(due_date, datetime.date):

raise TypeError("Borrow date and due date must be datetime.date objects.")

if due_date < borrow_date:

raise ValueError("Due date cannot be before borrow date.")

self.loan_id = loan_id if loan_id else str(uuid.uuid4())

self.book_id = book_id

self.user_id = user_id

self.borrow_date = borrow_date

self.due_date = due_date

self.return_date: Optional[datetime.date] = None

def __repr__(self) -> str:

return (f"Loan(ID='{self.loan_id}', BookID='{self.book_id}', UserID='{self.user_id}', "

f"BorrowDate='{self.borrow_date}', DueDate='{self.due_date}', "

f"ReturnDate='{self.return_date}')")

def is_active(self) -> bool:

"""Checks if the loan is currently active (book not yet returned)."""

return self.return_date is None

def mark_returned(self, return_date: datetime.date):

"""Marks the loan as returned on a specific date."""

if not self.is_active():

raise ValueError(f"Loan {self.loan_id} is not active.")

if not isinstance(return_date, datetime.date):

raise TypeError("Return date must be a datetime.date object.")

if return_date < self.borrow_date:

raise ValueError("Return date cannot be before borrow date.")

self.return_date = return_date

def is_overdue_and_active(self) -> bool:

"""Checks if the loan is active and overdue."""

return self.is_active() and self.due_date < datetime.date.today()

# --- 2. Repository Pattern (In-memory implementation) ---

class LibraryRepository:

"""

Manages the storage and retrieval of Book, User, and Loan entities.

This is an in-memory implementation for simplicity.

"""

def __init__(self):

self._books: Dict[str, Book] = {}

self._users: Dict[str, User] = {}

self._loans: Dict[str, Loan] = {}

# Book operations

def add_book(self, book: Book):

if book.book_id in self._books:

raise ValueError(f"Book with ID {book.book_id} already exists.")

self._books[book.book_id] = book

def get_book_by_id(self, book_id: str) -> Optional[Book]:

return self._books.get(book_id)

def get_all_books(self) -> List[Book]:

return list(self._books.values())

def update_book(self, book: Book):

if book.book_id not in self._books:

raise ValueError(f"Book with ID {book.book_id} does not exist for update.")

self._books[book.book_id] = book

def delete_book(self, book_id: str):

if book_id not in self._books:

raise ValueError(f"Book with ID {book_id} does not exist for deletion.")

del self._books[book_id]

# User operations

def add_user(self, user: User):

if user.user_id in self._users:

raise ValueError(f"User with ID {user.user_id} already exists.")

self._users[user.user_id] = user

def get_user_by_id(self, user_id: str) -> Optional[User]:

return self._users.get(user_id)

def get_all_users(self) -> List[User]:

return list(self._users.values())

def update_user(self, user: User):

if user.user_id not in self._users:

raise ValueError(f"User with ID {user.user_id} does not exist for update.")

self._users[user.user_id] = user

def delete_user(self, user_id: str):

if user_id not in self._users:

raise ValueError(f"User with ID {user_id} does not exist for deletion.")

del self._users[user_id]

# Loan operations

def add_loan(self, loan: Loan):

if loan.loan_id in self._loans:

raise ValueError(f"Loan with ID {loan.loan_id} already exists.")

self._loans[loan.loan_id] = loan

def get_loan_by_id(self, loan_id: str) -> Optional[Loan]:

return self._loans.get(loan_id)

def get_active_loans_by_user(self, user_id: str) -> List[Loan]:

return [loan for loan in self._loans.values() if loan.user_id == user_id and loan.is_active()]

def get_borrowed_books_by_user(self, user_id: str) -> List[Book]:

active_loans = self.get_active_loans_by_user(user_id)

return [self.get_book_by_id(loan.book_id) for loan in active_loans if self.get_book_by_id(loan.book_id)]

def get_overdue_books_by_user(self, user_id: str) -> List[Book]:

active_loans = self.get_active_loans_by_user(user_id)

overdue_books = []

for loan in active_loans:

book = self.get_book_by_id(loan.book_id)

if book and book.is_overdue():

overdue_books.append(book)

return overdue_books

def get_active_loan_for_book(self, book_id: str) -> Optional[Loan]:

for loan in self._loans.values():

if loan.book_id == book_id and loan.is_active():

return loan

return None

def get_all_loans(self) -> List[Loan]:

return list(self._loans.values())

# --- 3. DSL-Driven Business Rules (Implemented as functions) ---

MAX_BOOKS_PER_USER = 3

NEW_BOOK_EMBARGO_DAYS = 7

def enforce_maximum_books_per_user(user_id: str, repository: LibraryRepository) -> bool:

"""

Enforces the rule: A user cannot borrow more than MAX_BOOKS_PER_USER simultaneously.

Returns True if the user can borrow, False otherwise.

"""

borrowed_books = repository.get_borrowed_books_by_user(user_id)

if len(borrowed_books) >= MAX_BOOKS_PER_USER:

print(f" [RULE VIOLATED] Loan denied for user {user_id}: User has reached maximum borrowing limit ({MAX_BOOKS_PER_USER} books).")

return False

return True

def enforce_new_book_embargo(book_id: str, repository: LibraryRepository) -> bool:

"""

Enforces the rule: New books cannot be borrowed for the first NEW_BOOK_EMBARGO_DAYS after acquisition.

Returns True if the book can be borrowed, False otherwise.

"""

book = repository.get_book_by_id(book_id)

if not book:

print(f" [RULE CHECK FAILED] Book with ID {book_id} not found.")

return False # Cannot check rule if book doesn't exist

if (datetime.date.today() - book.acquisition_date).days < NEW_BOOK_EMBARGO_DAYS:

print(f" [RULE VIOLATED] Loan denied for book '{book.title}': Book is too new to be borrowed (acquired on {book.acquisition_date}).")

return False

return True

def enforce_overdue_book_block(user_id: str, repository: LibraryRepository) -> bool:

"""

Enforces the rule: A user must return overdue books before borrowing new ones.

Returns True if the user can borrow, False otherwise.

"""

overdue_books = repository.get_overdue_books_by_user(user_id)

if overdue_books:

print(f" [RULE VIOLATED] Loan denied for user {user_id}: User has {len(overdue_books)} overdue books.")

return False

return True

# --- 4. Library Service (Orchestrates operations and applies rules) ---

class LibraryService:

"""

Provides high-level operations for the library system,

orchestrating repository interactions and applying business rules.

"""

def __init__(self, repository: LibraryRepository):

self._repository = repository

self._loan_rules = [

enforce_maximum_books_per_user,

enforce_new_book_embargo,

enforce_overdue_book_block

]

def register_book(self, title: str, author: str, isbn: str, acquisition_date: datetime.date) -> Book:

book = Book(title, author, isbn, acquisition_date)

self._repository.add_book(book)

print(f"Registered book: {book}")

return book

def register_user(self, name: str) -> User:

user = User(name)

self._repository.add_user(user)

print(f"Registered user: {user}")

return user

def borrow_book(self, book_id: str, user_id: str, loan_duration_days: int = 14) -> Optional[Loan]:

book = self._repository.get_book_by_id(book_id)

user = self._repository.get_user_by_id(user_id)

if not book:

print(f"Error: Book with ID {book_id} not found.")

return None

if not user:

print(f"Error: User with ID {user_id} not found.")

return None

if book.is_borrowed():

print(f"Error: Book '{book.title}' is already borrowed.")

return None

# Apply all loan rules

for rule in self._loan_rules:

# Note: Some rules need both book_id and user_id, some only one.

# This simplified dispatch handles it, but a more robust DSL interpreter

# would pass specific context objects.

if "book_id" in rule.__code__.co_varnames and "user_id" in rule.__code__.co_varnames:

if not rule(book_id, user_id, self._repository):

return None

elif "book_id" in rule.__code__.co_varnames:

if not rule(book_id, self._repository):

return None

elif "user_id" in rule.__code__.co_varnames:

if not rule(user_id, self._repository):

return None

else:

# Handle rules that might not need book_id or user_id explicitly

# For this example, all rules need at least one, so this path is not taken

pass

borrow_date = datetime.date.today()

due_date = borrow_date + datetime.timedelta(days=loan_duration_days)

try:

book.borrow(user_id, due_date)

loan = Loan(book_id, user_id, borrow_date, due_date)

self._repository.update_book(book) # Update book status in repository

self._repository.add_loan(loan)

print(f"Successfully borrowed: '{book.title}' by '{user.name}'. Due: {due_date}")

return loan

except ValueError as e:

print(f"Error during borrowing process: {e}")

return None

def return_book(self, book_id: str) -> bool:

book = self._repository.get_book_by_id(book_id)

if not book:

print(f"Error: Book with ID {book_id} not found.")

return False

if not book.is_borrowed():

print(f"Error: Book '{book.title}' is not currently borrowed.")

return False

loan = self._repository.get_active_loan_for_book(book_id)

if loan:

try:

book.return_book()

loan.mark_returned(datetime.date.today())

self._repository.update_book(book)

# No explicit update for loan needed as it's modified in place in _loans dict

print(f"Successfully returned: '{book.title}'.")

return True

except ValueError as e:

print(f"Error during returning process: {e}")

return False

else:

print(f"Error: No active loan found for book '{book.title}'.")

return False

def get_user_borrowed_books(self, user_id: str) -> List[Book]:

user = self._repository.get_user_by_id(user_id)

if not user:

print(f"Error: User with ID {user_id} not found.")

return []

return self._repository.get_borrowed_books_by_user(user_id)

def get_user_overdue_books(self, user_id: str) -> List[Book]:

user = self._repository.get_user_by_id(user_id)

if not user:

print(f"Error: User with ID {user_id} not found.")

return []

return self._repository.get_overdue_books_by_user(user_id)

# --- 5. Demonstration / Main Execution ---

if __name__ == "__main__":

print("--- Library Management System Demonstration ---")

repository = LibraryRepository()

service = LibraryService(repository)

# 1. Register Books

print("\n--- Registering Books ---")

book1 = service.register_book("The Hitchhiker's Guide to the Galaxy", "Douglas Adams", "978-0345391803", datetime.date.today() - datetime.timedelta(days=30))

book2 = service.register_book("Pride and Prejudice", "Jane Austen", "978-0141439518", datetime.date.today() - datetime.timedelta(days=10))

book3 = service.register_book("1984", "George Orwell", "978-0451524935", datetime.date.today() - datetime.timedelta(days=5))

book4 = service.register_book("New Release Novel", "Author X", "978-1234567890", datetime.date.today() - datetime.timedelta(days=2)) # A new book

book5 = service.register_book("Another Classic", "Classic Author", "978-9876543210", datetime.date.today() - datetime.timedelta(days=20))

# 2. Register Users

print("\n--- Registering Users ---")

user1 = service.register_user("Alice Wonderland")

user2 = service.register_user("Bob The Builder")

# 3. Demonstrate Borrowing (Success and Rule Violations)

print("\n--- Demonstrating Borrowing ---")

# User1 borrows books

print(f"\nAttempting to borrow for {user1.name}:")

service.borrow_book(book1.book_id, user1.user_id) # Should succeed

service.borrow_book(book2.book_id, user1.user_id) # Should succeed

service.borrow_book(book5.book_id, user1.user_id) # Should succeed

print(f"\n{user1.name} currently has {len(service.get_user_borrowed_books(user1.user_id))} books borrowed.")

print("Attempting to borrow a 4th book for Alice (should be denied by MAX_BOOKS_PER_USER rule):")

service.borrow_book(book3.book_id, user1.user_id) # Should be denied by MAX_BOOKS_PER_USER

print("\nAttempting to borrow a new book (should be denied by NEW_BOOK_EMBARGO rule):")

service.borrow_book(book4.book_id, user2.user_id) # Should be denied by NEW_BOOK_EMBARGO for User2

# Simulate an overdue book for User2

print("\n--- Simulating Overdue Book for User2 ---")

overdue_book = service.register_book("Overdue Title", "Overdue Author", "978-1111111111", datetime.date.today() - datetime.timedelta(days=40))

# Manually set the due date in the past for this demo

loan_overdue = Loan(overdue_book.book_id, user2.user_id, datetime.date.today() - datetime.timedelta(days=30), datetime.date.today() - datetime.timedelta(days=10))

repository.add_loan(loan_overdue)

overdue_book.borrow(user2.user_id, loan_overdue.due_date)

repository.update_book(overdue_book)

print(f"Manually created an overdue loan for {user2.name} with book '{overdue_book.title}'.")

print(f"\n{user2.name} has {len(service.get_user_overdue_books(user2.user_id))} overdue books.")

print("Attempting to borrow a book for Bob (should be denied by OVERDUE_BOOK_BLOCK rule):")

service.borrow_book(book3.book_id, user2.user_id) # Should be denied by OVERDUE_BOOK_BLOCK

# 4. Demonstrate Returning Books

print("\n--- Demonstrating Returning Books ---")

print(f"Books borrowed by {user1.name}: {service.get_user_borrowed_books(user1.user_id)}")

service.return_book(book1.book_id) # Should succeed

print(f"Books borrowed by {user1.name} after return: {service.get_user_borrowed_books(user1.user_id)}")

print("\nAttempting to borrow for Alice again after returning a book:")

service.borrow_book(book3.book_id, user1.user_id) # Should now succeed for Alice

print("\n--- Final State ---")

print("All Books:")

for book in repository.get_all_books():

print(book)

print("\nAll Users:")

for user in repository.get_all_users():

print(user)

print("\nAll Loans:")

for loan in repository.get_all_loans():

print(loan)

This full example demonstrates the MDD principles of defining models (entities), implementing a repository for data access, and applying business logic through DSL-like rules. The `LibraryService` acts as the application layer, orchestrating these components. The print statements show the flow and how the business rules prevent invalid operations, reflecting the "how to use LLMs" section where LLMs would generate these components based on models and DSLs.

Friday, April 03, 2026

Self-Extending AI: How to Teach an LLM to Build Its Own Tools

TABLE OF CONTENTS

1. THE PROBLEM NOBODY TALKS ABOUT ENOUGH

2. A BRIEF MAP OF THE TERRITORY

3. THE ARCHITECTURE OF A SELF-EXTENDING AGENT

4. THE TOOL REGISTRY: A LIVING CATALOGUE

5. THE CODE GENERATOR: TEACHING THE LLM TO WRITE TOOLS

6. THE VALIDATOR: TRUST, BUT VERIFY

7. THE MCP SERVER: PLUGGING INTO THE PROTOCOL

8. THE AGENT LOOP: WHERE EVERYTHING COMES ALIVE

9. DYNAMIC AGENTS: BEYOND TOOLS

10. SECURITY, SAFETY, AND THE RESPONSIBLE PATH FORWARD

11. CONCLUSION: THE BEGINNING OF SOMETHING LARGER

The Evolution of Software Development with Model-Driven Approaches and LLMs

Model-Driven Development

The Role of Large Language Models in Model-Driven Development

Unlocking Maximum Value: Where LLMs Shine Brightest in MDD

Practical Application: Leveraging LLMs for Enhanced MDD Workflows

Step 1: Requirements to Initial Model Draft

Step 2: Refining Models and Generating Code

Step 3: DSL-Driven Business Logic with LLMs

Constituents and Details: A Deeper Dive

Conclusion: The Future of MDD with AI Augmentation

Addendum: Full Running Example Code - Library Management System