Tuesday, April 14, 2026

ORCHESTRATING INTELLIGENCE: HOW TO BUILD LARGE, COMPLEX SOFTWARE SYSTEMS AND DOCUMENTS USING MULTIPLE COORDINATED LLM PROMPTS




CHAPTER ONE: THE PROMISE AND THE PROBLEM

There is a seductive fantasy that many developers and writers entertain when they first encounter a powerful large language model. The fantasy goes something like this: you describe your entire software system, or your entire book, in one magnificent, sweeping prompt, and the model produces the whole thing in one glorious response. The fantasy is understandable. These models are genuinely remarkable. They can write code, draft prose, reason about architecture, and synthesize ideas at a speed no human team could match. But the fantasy runs headlong into a hard physical reality, and understanding that reality is the first step toward building something genuinely useful.

Every large language model operates within a context window. The context window is the total amount of text, measured in tokens, that the model can hold in its working memory at any one moment. A token is roughly three to four characters of English text, so a context window of 128,000 tokens corresponds to roughly 90,000 to 100,000 words. As of 2025, GPT-4o supports 128,000 tokens, Claude 3.5 Sonnet supports 200,000 tokens, and Gemini 1.5 Pro reaches up to one million tokens. These are impressive numbers, and they grow with each generation of models. Yet they are still finite, and finitude matters enormously when you are trying to generate a 50,000-line codebase or a 300-page technical document.

The naive response to this constraint is to simply wait for context windows to grow large enough that the problem disappears. This is a mistake for two reasons. The first reason is economic. Processing very long contexts is computationally expensive. Attention mechanisms, the mathematical heart of transformer-based models, scale quadratically with sequence length, which means that doubling the context length roughly quadruples the compute cost. The second reason is qualitative. Research published by Stanford and other institutions under the title "Lost in the Middle" (Liu et al., 2023, arxiv.org/abs/2307.03172) demonstrated that even when a model technically supports a very long context, its ability to reliably use information placed in the middle of that context degrades significantly. The model tends to pay disproportionate attention to the beginning and the end of the context, and information buried in the middle is effectively lost. This means that even if you could fit your entire codebase into a single prompt, the model would not reliably use all of it. Bigger context windows are necessary but not sufficient.

The sophisticated response to this constraint is to embrace it as a design principle. Instead of fighting the limitation, you architect your prompting strategy around it. You decompose your large system into smaller, well-defined units, you craft each prompt to be self-contained and focused, and you create a lightweight coordination mechanism that allows each prompt to know exactly what it needs to know about the other parts of the system without needing to read those parts in full. This is the art and science that this article is about.


CHAPTER TWO: WHY NAIVE CHAINING FAILS

The most obvious approach to multi-prompt generation is what practitioners call naive prompt chaining. You generate the output of Prompt 1, then you paste that output into Prompt 2 as context, then you paste both outputs into Prompt 3, and so on. This approach is described in introductory tutorials on sites like promptingguide.ai and datacamp.com, and it works beautifully for short, linear tasks. If you are summarizing a document in three passes, or translating a short story one scene at a time, naive chaining is perfectly adequate.

For large software systems or long documents, naive chaining collapses under its own weight. Imagine you are building a web application with a backend API, a database layer, a business logic layer, a frontend, and a test suite. You generate the database schema in Prompt 1, producing perhaps 200 lines of SQL and a set of Python dataclasses. You paste those 200 lines into Prompt 2 and generate the repository layer, producing 400 lines of Python. Now you paste all 600 lines into Prompt 3 to generate the service layer. By Prompt 5 or Prompt 6, you are pasting thousands of lines of generated code into each new prompt, and you are rapidly approaching or exceeding the context limit. Even if you stay within the limit, you are wasting most of the context window on code the model does not need to read in full. The model does not need to see every line of your repository implementation in order to write the service layer. It needs to know the method signatures, the return types, and the exceptions that the repository layer exposes. Everything else is noise.

The same problem afflicts document generation. If you are writing a technical book with ten chapters, and you paste the full text of Chapters 1 through 4 into the prompt for Chapter 5, you are consuming enormous amounts of context on prose that the model does not need to read word for word. What the model needs to know is what topics were covered in each previous chapter, what terminology was established, what claims were made, and what the reader can be assumed to know by the time they reach Chapter 5. A summary of those facts is far more useful, and far more context-efficient, than the full text.

The core insight is this: what subsequent prompts need from previous prompts is not the full artifact but a structured, compressed description of the artifact's interface, its contract with the rest of the system. This insight is the foundation of every sophisticated multi-prompt strategy.


CHAPTER THREE: THE CENTRAL CONCEPT - THE SYSTEM MANIFEST

The solution to the coordination problem is what we will call the System Manifest. The System Manifest is a single, carefully crafted document that is small enough to fit comfortably within the context window of any individual prompt, yet rich enough to give that prompt everything it needs to generate its artifact correctly and consistently with all other artifacts in the system.

Think of the System Manifest as the constitution of your project. It does not contain the full implementation of anything. It contains the rules, the contracts, the shared vocabulary, and the structural overview that every part of the system must respect. Every prompt in your multi-prompt workflow receives the System Manifest as its first input. The manifest is the one piece of shared context that travels with every prompt, and because it is deliberately kept small, it never threatens to overflow the context window.

The System Manifest has two variants depending on whether you are building software or generating a document, but the underlying philosophy is identical in both cases. Let us examine each variant in detail.

For a software system, the System Manifest contains the following categories of information. It begins with a project overview of no more than one paragraph, describing what the system does, what technology stack it uses, and what architectural pattern it follows. It then lists all modules or components of the system, giving each a name, a one-sentence description, and a reference identifier. It then provides the shared type definitions: all data transfer objects, domain entities, enumerations, and custom exceptions that are used across module boundaries. It then provides the interface contracts for each module: the public method signatures, including parameter names, types, and return types, but not the implementations. Finally, it lists the file and directory structure of the project, so that every prompt knows exactly where to place its output.

For a document, the System Manifest contains the following. It begins with a project overview describing the document's purpose, intended audience, and overall argument or thesis. It then lists all chapters or sections with their titles and a two-to-three sentence description of what each covers. It establishes the shared terminology and definitions that will be used throughout the document. It notes any cross-references: places where Chapter 3 will refer back to a concept introduced in Chapter 1, or where Chapter 7 will build on an argument made in Chapter 4. Finally, it specifies the stylistic conventions: tone, level of technicality, citation format, and any other constraints that must be consistent across all sections.

The key discipline in crafting a System Manifest is ruthless compression. Every line of the manifest must earn its place. If a piece of information is only relevant to one prompt and no other, it does not belong in the manifest. It belongs in the individual prompt for that module or chapter. The manifest contains only what is genuinely shared. This discipline keeps the manifest small and ensures that the context budget of each individual prompt is spent on the information that prompt actually needs.

Here is a concrete example of a System Manifest for a small software project. This example is deliberately simplified to illustrate the structure clearly.


SYSTEM MANIFEST v1.0 Project: TaskFlow - A lightweight task management REST API Stack: Python 3.12, FastAPI, SQLAlchemy 2.0, PostgreSQL, Pydantic v2 Architecture: Layered (Models -> Repository -> Service -> Router) Test Framework: pytest

MODULES: [M1] models - SQLAlchemy ORM models and Pydantic schemas [M2] repository - Database access layer, all SQL operations [M3] service - Business logic, validation, orchestration [M4] router - FastAPI route handlers, HTTP request/response [M5] tests - pytest test suite for all layers

FILE STRUCTURE: taskflow/ init.py models.py (M1) repository.py (M2) service.py (M3) router.py (M4) tests/ test_repository.py (M5) test_service.py (M5) test_router.py (M5)

SHARED TYPES (defined in models.py, imported by all other modules):

class TaskStatus(str, Enum): PENDING = "pending" IN_PROGRESS = "in_progress" DONE = "done"

class TaskCreate(BaseModel): title: str description: str | None = None status: TaskStatus = TaskStatus.PENDING

class TaskRead(BaseModel): id: int title: str description: str | None status: TaskStatus created_at: datetime

class TaskUpdate(BaseModel): title: str | None = None description: str | None = None status: TaskStatus | None = None

class TaskNotFoundError(Exception): pass class TaskValidationError(Exception): pass

INTERFACE CONTRACTS:

repository.py exposes: create_task(db: Session, task: TaskCreate) -> TaskRead get_task(db: Session, task_id: int) -> TaskRead get_all_tasks(db: Session, skip: int, limit: int) -> list[TaskRead] update_task(db: Session, task_id: int, task: TaskUpdate) -> TaskRead delete_task(db: Session, task_id: int) -> None Raises: TaskNotFoundError when task_id does not exist

service.py exposes: create_task(task: TaskCreate, db: Session) -> TaskRead get_task(task_id: int, db: Session) -> TaskRead get_all_tasks(skip: int, limit: int, db: Session) -> list[TaskRead] update_task(task_id: int, task: TaskUpdate, db: Session) -> TaskRead delete_task(task_id: int, db: Session) -> None Raises: TaskNotFoundError, TaskValidationError

router.py exposes (HTTP): POST /tasks -> 201 TaskRead GET /tasks -> 200 list[TaskRead] GET /tasks/{id} -> 200 TaskRead | 404 PUT /tasks/{id} -> 200 TaskRead | 404 | 422 DELETE /tasks/{id} -> 204 | 404

CONVENTIONS:

  • All functions are async where possible
  • All database sessions are injected via FastAPI Depends()
  • Error handling: service raises domain exceptions, router converts to HTTP
  • All dates are UTC, stored as datetime with timezone

Notice several things about this manifest. It is entirely self-contained and could be read by any developer, or any LLM, without needing to see any of the actual code. It is precise: the method signatures leave no ambiguity about what each function accepts and returns. It is compact: the entire manifest fits comfortably within a few hundred tokens, leaving the vast majority of each prompt's context window available for the actual generation task. And it is complete in the sense that matters: it contains everything that any one module needs to know about any other module in order to implement its own responsibilities correctly.


CHAPTER FOUR: THE PROMPT TEMPLATE PATTERN

With the System Manifest defined, the next architectural decision is how to structure each individual prompt. Every prompt in a multi-prompt workflow should follow the same template, which we will call the Prompt Template Pattern. This template has four sections, and the discipline of always using all four sections is what makes the resulting artifacts seamlessly integrable.

The first section is the Manifest Injection. This is simply the full System Manifest, pasted verbatim at the top of every prompt. Because the manifest is small, this costs very little context budget. Because it is always present, every prompt has access to the full shared vocabulary and interface contracts of the system.

The second section is the Role and Scope Declaration. This section tells the model exactly which module or chapter it is generating, and explicitly tells it what it is NOT responsible for generating. This negative specification is just as important as the positive one. Without it, models have a tendency to generate more than they are asked for, producing partial implementations of neighboring modules that conflict with the implementations those modules will receive in their own dedicated prompts.

The third section is the Local Context. This is the information that is specific to the current prompt and not shared with others. For a software module, this might include detailed business rules, specific algorithm requirements, or particular edge cases that only this module handles. For a document chapter, this might include the specific arguments to be made, the examples to be used, and the conclusions to be drawn. The local context is where the bulk of the prompt's content lives.

The fourth section is the Output Specification. This section tells the model exactly what format its output should take, what it should include, and what it should exclude. For code generation, this typically means specifying that the output should be a complete, runnable Python file with all necessary imports, no placeholder comments, and no explanatory prose outside of docstrings. For document generation, this typically means specifying the desired length, the heading structure, and any formatting conventions.

Here is what a prompt following this template looks like for the repository module of our TaskFlow example.


[MANIFEST INJECTION] SYSTEM MANIFEST v1.0 Project: TaskFlow - A lightweight task management REST API ... (full manifest as shown above) ...

[ROLE AND SCOPE DECLARATION] You are generating the file taskflow/repository.py for the TaskFlow project. Your responsibility is ONLY the repository layer (M2). Do not generate any code for models.py, service.py, router.py, or any test files. Assume that models.py already exists and exports exactly the types listed in the manifest. Import from taskflow.models as needed.

[LOCAL CONTEXT] The repository layer must use SQLAlchemy 2.0 style (select() statements, not legacy Query API). All database operations must be synchronous (not async) at this layer, as the async boundary is handled by FastAPI's thread pool executor. The get_task and update_task and delete_task functions must raise TaskNotFoundError (imported from taskflow.models) when no task with the given task_id exists. The get_all_tasks function must support pagination via skip and limit parameters, with a maximum limit of 100 enforced internally. The create_task function must set created_at to the current UTC time.

[OUTPUT SPECIFICATION] Produce a single, complete Python file. Begin with the module docstring. Include all necessary imports. Implement all five functions listed in the manifest interface contract for repository.py. Include docstrings for each function. Do not include any prose, explanation, or markdown formatting. Output only valid Python code.


This prompt is focused, complete, and self-contained. A model receiving this prompt knows exactly what to build, how it fits into the larger system, what types to use, what exceptions to raise, and what the neighboring modules will expect from it. The output will be a file that slots directly into the project without modification.


CHAPTER FIVE: THE MANIFEST EVOLUTION PATTERN

One subtlety that the naive multi-prompt approach misses entirely is that the System Manifest is not always static. In some workflows, particularly in software development, the act of generating one module may reveal information that needs to be added to the manifest before subsequent prompts are run. This is the Manifest Evolution Pattern.

Consider a scenario where you are building a system and, while generating the service layer, you realize that the service needs to expose a new exception type that the router will need to handle. If you simply add this exception to the service layer code without updating the manifest, the router prompt will not know about it, and the router will not handle it correctly. The solution is to pause after generating the service layer, update the manifest to include the new exception type in the interface contracts section, and then run the router prompt with the updated manifest.

The Manifest Evolution Pattern requires discipline. You must treat the manifest as a living document that is updated whenever a generated artifact introduces something that other artifacts will need to know about. The update must be minimal: you add only the new interface element, not the implementation. And you must re-run any previously completed prompts if the update affects them, though in a well-designed system this should be rare.

A practical way to manage manifest evolution is to version the manifest explicitly, as shown in the example above with "SYSTEM MANIFEST v1.0". When you update the manifest, you increment the version number. Each generated file can include a comment at the top noting which manifest version it was generated against, making it easy to identify which files need to be regenerated if the manifest changes significantly.


CHAPTER SIX: THE SKELETON-FIRST STRATEGY

The most powerful strategy for multi-prompt generation of large systems is what we call the Skeleton-First Strategy, which is inspired by but distinct from the academic "Skeleton of Thought" technique described by Ning et al. (2023, arxiv.org/abs/2307.15337). The idea is to use the very first prompt in your workflow not to generate any implementation at all, but to generate the skeleton of the entire system: the manifest itself, the file structure, the interface contracts, and the type definitions.

This first prompt, which we call the Architect Prompt, is the most important prompt in the entire workflow. It receives a high-level description of the system you want to build, and it produces the System Manifest. All subsequent prompts then use this manifest as their foundation. The Architect Prompt is the only prompt that needs to understand the whole system. Every other prompt only needs to understand its own piece plus the manifest.

Here is what an Architect Prompt looks like for a software project.


You are a senior software architect. Your task is to design the architecture for the following system and produce a System Manifest that will be used to guide the implementation of each module in separate subsequent prompts.

SYSTEM DESCRIPTION: Build a REST API for a task management application called TaskFlow. The system should allow users to create, read, update, and delete tasks. Each task has a title, an optional description, a status (pending/in_progress/done), and a creation timestamp. The system should be built with Python, FastAPI, SQLAlchemy, and PostgreSQL. It should follow a layered architecture with separate modules for models, repository, service, and routing. It should include a pytest test suite.

PRODUCE: A System Manifest in plain text containing exactly the following sections:

  1. Project overview (one paragraph, technology stack, architecture pattern)
  2. Module list with identifiers, names, and one-sentence descriptions
  3. File and directory structure
  4. Shared type definitions (complete Pydantic and SQLAlchemy class definitions, enumerations, and custom exceptions that cross module boundaries)
  5. Interface contracts for each module (public method signatures with full type annotations, no implementations, and notes on exceptions raised)
  6. Conventions (async strategy, error handling pattern, date handling, etc.)

The manifest must be self-contained, precise, and compact. It will be injected into every subsequent implementation prompt, so every line must earn its place. Do not include any implementation code. Do not include any prose beyond what is specified in the sections above.


The output of this prompt is the System Manifest. You review it, refine it if necessary, and then use it as the foundation for all subsequent implementation prompts. This review step is critical and must not be skipped. The manifest is the blueprint, and errors in the blueprint propagate to every module. A few minutes spent reviewing and correcting the manifest saves hours of fixing inconsistent implementations later.

For document generation, the Architect Prompt works analogously. It receives a description of the document's purpose and scope, and it produces a Document Manifest containing the chapter list, the thesis statement, the shared terminology, the cross-reference map, and the stylistic conventions. Each subsequent Chapter Prompt then receives this Document Manifest and generates one chapter.

Here is an example of a Document Architect Prompt.


You are an expert technical writer and document architect. Your task is to design the structure for the following document and produce a Document Manifest that will guide the writing of each chapter in separate subsequent prompts.

DOCUMENT DESCRIPTION: Write a comprehensive guide titled "Introduction to Quantum Computing for Software Engineers." The audience is experienced software developers with no prior knowledge of quantum mechanics or quantum computing. The document should cover the fundamental concepts, the programming model, the major algorithms, the current hardware landscape, and practical guidance for getting started. Target length: approximately 80 pages.

PRODUCE: A Document Manifest in plain text containing exactly the following sections:

  1. Document overview: purpose, audience, thesis, and overall argument arc
  2. Chapter list: for each chapter, its number, title, and a 3-5 sentence description of its content and its role in the overall argument
  3. Shared terminology: definitions of all technical terms that will be used across multiple chapters, to ensure consistent usage throughout
  4. Cross-reference map: a list of specific concepts introduced in one chapter that are referenced or built upon in later chapters
  5. Stylistic conventions: tone, level of mathematical formalism, use of analogies, citation style, and any other consistency requirements

The manifest must be compact enough to fit in a single prompt context alongside a chapter-generation instruction. Every entry must be precise and useful.



CHAPTER SEVEN: THE CHAPTER/MODULE PROMPT PATTERN IN DEPTH

Now that we have the manifest and the template, let us go deeper into the craft of writing individual module or chapter prompts. The quality of each individual prompt determines the quality of the corresponding artifact, and there are several techniques that consistently produce better results.

The first technique is explicit boundary specification. Every prompt must state not only what it generates but also what it does not generate. This sounds redundant, but it is essential. Large language models have a strong tendency to be helpful in ways you did not ask for, and in a multi-prompt workflow, unsolicited helpfulness is dangerous. If your service layer prompt does not explicitly say "do not generate any router code," the model may decide to include a few FastAPI route definitions as a convenience, and those definitions will conflict with the router you generate in the next prompt.

The second technique is dependency declaration. Every prompt should explicitly list the modules or chapters it depends on, and should state what it expects those dependencies to provide. This is not redundant with the manifest: the manifest states what each module provides, while the dependency declaration in the individual prompt states what this specific module needs from those provisions. The distinction matters because it forces you, as the prompt author, to think carefully about whether the manifest actually contains everything this module needs.

The third technique is example-driven specification for edge cases. When a module or chapter has complex behavior that is difficult to specify in abstract terms, include a small concrete example in the local context section of the prompt. For a repository function, this might be a brief description of what should happen when a task is not found. For a document chapter, this might be a sentence or two describing the kind of argument or example that should appear in a particular section. Concrete examples dramatically reduce the ambiguity that leads to inconsistent or incorrect outputs.

The fourth technique is the Negative Example. Alongside specifying what the output should look like, specify what it should not look like. For code generation, this might mean saying "do not use the legacy SQLAlchemy Query API, use select() statements instead." For document generation, this might mean saying "do not use bullet points or numbered lists in this chapter; all content should be in flowing prose." Negative examples are particularly useful for enforcing stylistic and architectural conventions that models tend to violate when left to their own devices.

Here is a more detailed example of a chapter prompt for the quantum computing document, demonstrating all four techniques.


[MANIFEST INJECTION] DOCUMENT MANIFEST v1.0 Document: Introduction to Quantum Computing for Software Engineers ... (full manifest as shown above) ...

[ROLE AND SCOPE DECLARATION] You are writing Chapter 3: "Quantum Gates and Circuits." Your responsibility is ONLY this chapter. Do not write any content that belongs to Chapter 2 (Qubits and Superposition) or Chapter 4 (Quantum Algorithms). You may reference concepts from Chapter 2 using the shared terminology defined in the manifest, but do not re-explain them. You may briefly preview that Chapter 4 will build on the circuit model, but do not describe the algorithms.

[DEPENDENCY DECLARATION] This chapter depends on Chapter 2 having introduced the following concepts, which you may use without re-defining: qubit, superposition, Bloch sphere, bra-ket notation, probability amplitude. The reader is assumed to understand that a qubit can exist in a superposition of |0> and |1> and that measurement collapses it to one of these states.

[LOCAL CONTEXT] This chapter introduces quantum gates as the quantum analog of classical logic gates, explains that they are unitary transformations on qubit states, and shows how they are combined into quantum circuits. Cover the following gates in this order: the Pauli-X gate (quantum NOT), the Hadamard gate (the most important gate for creating superposition), the CNOT gate (the fundamental two-qubit gate that creates entanglement), and the Toffoli gate (showing universality). For each gate, provide its matrix representation, its effect on a qubit state, and a practical intuition for what it does. Then explain how gates are combined into circuits using a circuit diagram notation (described in text, not as an actual diagram). Conclude by explaining the concept of circuit depth and why it matters for near-term quantum hardware.

IMPORTANT: The Hadamard gate section should include a worked example showing what happens when H is applied to |0> and then to |1>, computing the resulting superposition states explicitly. This is the kind of concrete calculation that software engineers find most illuminating.

[OUTPUT SPECIFICATION] Write approximately 4,000 words of flowing prose. Use the conventions defined in the manifest: accessible tone, minimal but precise mathematical notation, analogies to classical computing where helpful. Do not use bullet points or numbered lists. Use section headings within the chapter. Do not include a chapter summary (that is handled separately). Output only the chapter text, starting with the chapter title.


This prompt is specific enough that two different runs of it against the same model would produce very similar outputs in terms of structure and coverage, while still leaving the model enough creative latitude to produce engaging, high-quality prose or code. That balance between specificity and latitude is the hallmark of a well-crafted individual prompt.


CHAPTER EIGHT: THE INTEGRATION STRATEGY

Generating all the artifacts is only half the battle. The second half is integrating them into a coherent whole. A well-designed multi-prompt workflow makes integration nearly automatic, but it requires that you build integration into the workflow from the beginning rather than treating it as an afterthought.

For software systems, integration means that all generated files can be placed in the directory structure specified in the manifest and the resulting system compiles, passes its tests, and runs correctly without manual modification. This is achievable if and only if the manifest's interface contracts are precise and complete, and if every individual prompt strictly respects those contracts. When you find that integration requires manual fixes, the root cause is almost always one of three things: an imprecise interface contract in the manifest, a prompt that generated more than it was asked to (violating the boundary specification), or a prompt that generated less than it was asked to (missing a required interface element).

A practical integration workflow for software looks like this. After generating all modules, you place them in the directory structure. You then run a static type checker such as mypy or pyright against the entire codebase. Type errors at this stage are extremely informative: they tell you exactly where a generated module is using a type incorrectly, which almost always points to an inconsistency between the manifest and the generated code. You fix the inconsistency, either by correcting the generated code manually (for small issues) or by running a targeted correction prompt (for larger issues). You then run the test suite. Test failures at this stage tell you where the behavioral contracts are not being honored.

For documents, integration means that all chapters, when assembled in order, read as a coherent whole with consistent terminology, consistent style, and appropriate cross-references. The Document Manifest's cross-reference map is the key tool here. After assembling the chapters, you read through the cross-references and verify that each one is correctly implemented: that Chapter 3 actually does reference the concept introduced in Chapter 2 in the way the manifest specified, and that Chapter 7 actually does build on the argument made in Chapter 4.

A useful integration prompt can be run after all artifacts are assembled. This prompt receives the manifest and a list of the integration issues found (type errors, test failures, or cross-reference gaps), and it produces a targeted correction for each issue. Because the correction prompt has the manifest as context, it understands the intended design and can produce corrections that are consistent with the overall architecture.

Here is an example of an integration correction prompt for a software system.


[MANIFEST INJECTION] SYSTEM MANIFEST v1.0 ... (full manifest) ...

[INTEGRATION ISSUE] After assembling all modules and running mypy, the following type error was found in service.py, line 47:

error: Argument 1 to "create_task" has incompatible type "TaskCreate"; expected "Task"

The service.py module is calling repository.create_task() with a TaskCreate Pydantic model, but the repository function signature in the manifest specifies that it accepts a TaskCreate object. This suggests that the generated repository.py is using the SQLAlchemy Task ORM model as its parameter type instead of the Pydantic TaskCreate schema as specified in the manifest.

[TASK] Produce a corrected version of the create_task function in repository.py that correctly accepts a TaskCreate Pydantic model as its parameter, converts it to a SQLAlchemy Task ORM object internally, persists it, and returns a TaskRead Pydantic model as specified in the manifest. Output only the corrected function, not the entire file.


This targeted correction approach is far more efficient than regenerating an entire module. It uses the manifest as the source of truth, identifies the specific deviation, and produces a minimal fix.


CHAPTER NINE: PATTERNS AND BEST PRACTICES

Having walked through the core strategy in detail, we can now articulate the patterns and best practices that emerge from it as a set of principles. These are not abstract rules but practical lessons learned from applying multi-prompt generation to real systems.

The first pattern is the Manifest-First Principle. Always generate the manifest before generating any artifact. Never start writing implementation prompts before the manifest is complete and reviewed. The manifest is the foundation, and a shaky foundation produces a shaky system. This principle seems obvious, but it is violated constantly by practitioners who are eager to start generating code and treat the manifest as an optional nicety.

The second pattern is the Single Responsibility Prompt. Each prompt generates exactly one module, one class, one chapter, or one well-defined artifact. If you find yourself writing a prompt that generates two modules, split it into two prompts. The single responsibility principle, familiar from software design, applies with equal force to prompt design. Prompts that try to do too much produce outputs that are too large to review carefully, too complex to integrate cleanly, and too broad to be corrected efficiently when something goes wrong.

The third pattern is the Interface-Before-Implementation ordering. Always generate the interface definitions before the implementations. In software, this means generating the shared types and interface contracts (the models module) before generating the repository, service, or router. In documents, this means generating the chapter outlines before generating the full chapter text. This ordering ensures that the manifest is grounded in concrete, reviewed interface definitions rather than abstract intentions.

The fourth pattern is the Minimal Shared Context Rule. The manifest should contain only information that is genuinely shared across multiple prompts. Information that is only relevant to one prompt belongs in that prompt's local context section, not in the manifest. Violating this rule makes the manifest grow until it becomes too large to fit comfortably in every prompt's context window, defeating its purpose.

The fifth pattern is the Versioned Manifest. Always version the manifest, and always note in each generated artifact which manifest version it was generated against. This makes it easy to identify which artifacts need to be regenerated when the manifest evolves, and it creates a clear audit trail of the system's design history.

The sixth pattern is the Review Gate. After generating each artifact and before running the next prompt, review the artifact for correctness and consistency with the manifest. Do not run the next prompt if the current artifact has obvious errors. Errors propagate through a multi-prompt workflow just as bugs propagate through a codebase: the later you catch them, the more expensive they are to fix.

The seventh pattern is the Idempotent Prompt. Write each prompt so that running it multiple times produces the same artifact. This means avoiding prompts that rely on random or time-dependent behavior, and it means specifying the output format precisely enough that the model has little room to vary its structure. Idempotent prompts make it safe to re-run a prompt when you need to regenerate an artifact after a manifest update.


CHAPTER TEN: THE META-PROMPT - LETTING THE LLM DESIGN YOUR PROMPTS

Everything described so far assumes that you, the human, are designing the manifest and the individual prompts. But there is a powerful extension of this strategy: you can use an LLM to design the prompts themselves. This is the domain of meta-prompting, a technique formalized in research by Suzgun and Kalai (2024, arxiv.org/abs/2401.12954) and related to the Automatic Prompt Engineer work by Zhou et al. (2022, arxiv.org/abs/2211.01910).

A meta-prompt is a prompt that instructs the model to produce not an artifact but a set of prompts that will produce artifacts. The meta-prompt receives a high-level description of the system you want to build and produces the complete prompt workflow: the Architect Prompt, the System Manifest template, and all the individual module or chapter prompts. You then execute these prompts in sequence to generate the actual system.

This approach is particularly valuable for large, complex systems where designing the prompt workflow itself is a significant intellectual effort. The meta-prompt offloads that design effort to the model, allowing you to focus on reviewing and refining the generated prompts rather than writing them from scratch.

Here is a complete, production-quality meta-prompt that you can use to generate a multi-prompt workflow for any software system or document. This prompt is designed to be used with any capable LLM such as GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro.


META-PROMPT: MULTI-PROMPT WORKFLOW GENERATOR

You are an expert prompt engineer and software architect. Your task is to design a complete multi-prompt workflow that will allow a large language model to generate a complex software system or document through a series of focused, coordinated prompts. The workflow must be designed so that no single prompt exceeds a context budget of approximately 8,000 tokens (including the shared manifest), and so that all generated artifacts can be integrated seamlessly without manual modification.

SYSTEM/DOCUMENT TO BUILD: [INSERT YOUR HIGH-LEVEL DESCRIPTION HERE]

STEP 1: ANALYZE AND DECOMPOSE Analyze the system or document described above. Identify the natural decomposition into independent or loosely-coupled units (modules, layers, chapters, sections). For each unit, identify what it needs to know about other units in order to be generated correctly. Identify the shared information that all units need. Write a brief analysis (200-300 words) of your decomposition strategy and the key design decisions you are making.

STEP 2: GENERATE THE ARCHITECT PROMPT Write the Architect Prompt that will be used to generate the System Manifest. The Architect Prompt must instruct the model to produce a manifest containing:

  • Project/document overview (one paragraph)
  • Unit list with identifiers, names, and one-sentence descriptions
  • File/directory structure (for software) or chapter list (for documents)
  • Shared type definitions or shared terminology
  • Interface contracts (for software) or cross-reference map (for documents)
  • Conventions and stylistic rules

The Architect Prompt must be self-contained and must not assume any prior context. It must instruct the model to keep the manifest compact (target: under 1,500 tokens) while being precise and complete.

STEP 3: GENERATE THE INDIVIDUAL UNIT PROMPTS For each unit identified in Step 1, write a complete prompt following this template:

[MANIFEST INJECTION PLACEHOLDER] Note: "PASTE SYSTEM MANIFEST HERE" - this placeholder will be replaced with the actual manifest after the Architect Prompt is run.

[ROLE AND SCOPE DECLARATION] Specify exactly what this prompt generates and explicitly list what it does NOT generate.

[DEPENDENCY DECLARATION] List the units this unit depends on and what it expects from them.

[LOCAL CONTEXT] Provide all information specific to this unit: business rules, algorithms, edge cases, examples, and any other details needed for correct generation.

[OUTPUT SPECIFICATION] Specify the exact format, length, and content of the expected output.

STEP 4: GENERATE THE INTEGRATION PROMPT Write a prompt that will be used after all units are generated to verify integration and produce corrections for any inconsistencies found. This prompt must receive the manifest and a description of any integration issues and produce targeted corrections.

STEP 5: GENERATE THE EXECUTION GUIDE Write a brief guide (one page) for the human operator explaining:

  • The order in which to run the prompts
  • What to review after each prompt before proceeding
  • How to handle manifest evolution if a generated artifact reveals new shared information
  • How to use the integration prompt to fix issues found during assembly

OUTPUT FORMAT: Present your output in the following order:

  1. Decomposition Analysis
  2. Architect Prompt (clearly delimited)
  3. Individual Unit Prompts (one per unit, clearly delimited and labeled)
  4. Integration Prompt (clearly delimited)
  5. Execution Guide

Each prompt must be complete and ready to use. Do not include placeholders other than the manifest injection placeholder specified above. Do not include explanatory prose within the prompts themselves.


This meta-prompt is designed to be run once, producing a complete workflow document that you then execute step by step. The quality of the output depends heavily on the quality of your system description in the "SYSTEM/DOCUMENT TO BUILD" section. The more precise and detailed that description is, the more precise and useful the generated prompts will be. A vague description produces a vague workflow; a precise description produces a precise workflow.

When you run this meta-prompt, review the output carefully before executing any of the generated prompts. Pay particular attention to the Architect Prompt and the manifest structure it will produce. If the manifest structure does not capture all the shared information you expect, revise the Architect Prompt before proceeding. The meta-prompt output is a starting point, not a final answer, and your domain knowledge is essential for refining it into a truly effective workflow.


CHAPTER ELEVEN: APPLYING THE STRATEGY TO A REAL DOCUMENT - A WALKTHROUGH

To make the strategy fully concrete, let us walk through its application to a real document generation task from beginning to end. We will generate a technical white paper titled "Edge Computing in Industrial IoT: Architecture, Security, and Implementation." This is a substantial document with multiple chapters covering different technical domains, and it illustrates all the coordination challenges discussed earlier.

The first step is to run the Architect Prompt. We describe the document to the model: its purpose (to guide industrial engineers and architects in deploying edge computing solutions for IoT), its audience (engineers with networking and embedded systems backgrounds), its target length (approximately 60 pages), and its major themes (architecture patterns, security considerations, hardware selection, and implementation guidance). The model produces a Document Manifest.

The manifest it produces includes a chapter list with seven chapters: an introduction to edge computing and its role in industrial IoT, a chapter on reference architectures and deployment patterns, a chapter on network topology and connectivity, a chapter on security at the edge, a chapter on hardware and platform selection, a chapter on software stack and middleware, and a concluding chapter on implementation roadmap and case studies. The manifest defines shared terminology including terms like "edge node," "fog layer," "OT/IT convergence," "MQTT," and "time-series data." The cross-reference map notes that Chapter 4 (security) will reference the network topology concepts from Chapter 3, that Chapter 6 (software stack) will reference the hardware constraints from Chapter 5, and that Chapter 7 (implementation) will draw on the architecture patterns from Chapter 2.

We review this manifest carefully. We notice that the manifest does not define the term "brownfield deployment," which will be important in Chapter 7 when discussing how to add edge computing to existing industrial installations. We add this term to the shared terminology section and increment the manifest version to 1.1. We also notice that the cross-reference map does not mention that Chapter 5 should reference the security requirements from Chapter 4 when discussing hardware selection (specifically, the need for hardware security modules and secure boot). We add this cross-reference. The manifest is now complete.

We then run the Chapter 2 prompt (reference architectures). The prompt injects the manifest, declares that it is writing Chapter 2 and not any other chapter, declares its dependency on Chapter 1's introduction of the edge computing concept, and provides detailed local context about the three main architecture patterns to cover: the hierarchical edge-fog-cloud pattern, the peer-to-peer edge mesh pattern, and the cloud-offload pattern. The output is approximately 4,500 words of well-structured technical prose.

We run Chapter 3 (network topology). This prompt explicitly references the architecture patterns introduced in Chapter 2 (using the cross-reference map from the manifest) and builds on them to discuss the networking implications of each pattern. The shared terminology ensures that the same terms are used consistently: "edge node" in Chapter 3 means exactly what it meant in Chapter 2.

We continue through all seven chapters. When we reach Chapter 4 (security), the prompt's dependency declaration explicitly states that it depends on Chapter 3 having introduced the network topology concepts, and the local context instructs the model to frame security considerations in terms of those topologies. The result is a chapter that reads as a natural continuation of Chapter 3 rather than a standalone security treatise.

After generating all seven chapters, we assemble them in order and read through the cross-references. We find that Chapter 6 does not reference the hardware security module requirement from Chapter 5 when discussing the software stack's security features. This is a gap in the cross-reference map. We run a targeted correction prompt that receives the manifest, the relevant sections of Chapters 5 and 6, and an instruction to add a paragraph to Chapter 6 that connects the software security stack to the hardware security module capabilities described in Chapter 5. The correction prompt produces a single paragraph that slots cleanly into Chapter 6 without disturbing the rest of the chapter.

The final assembled document reads as a coherent whole. The terminology is consistent throughout. The cross-references are accurate. The argument builds logically from chapter to chapter. A reader would not suspect that the document was generated by seven separate prompts rather than written in a single continuous session.


CHAPTER TWELVE: COMMON FAILURE MODES AND HOW TO AVOID THEM

No strategy is complete without an honest discussion of how it can fail. Multi-prompt generation has several characteristic failure modes, and understanding them in advance allows you to design your workflow to avoid them.

The first failure mode is the Leaky Module. This occurs when a generated module or chapter contains content that belongs to a neighboring module or chapter. The service layer prompt produces some router code. Chapter 3 re-explains concepts that belong in Chapter 2. The root cause is almost always an insufficiently specific Role and Scope Declaration. The fix is to add explicit negative specifications: "Do not generate any router code" or "Do not re-explain the concept of superposition; assume the reader has read Chapter 2."

The second failure mode is the Orphaned Type. This occurs when a generated module introduces a new type or class that is not in the manifest, and a neighboring module that needs to use that type does not know it exists. The root cause is a manifest that was not updated when the new type was introduced. The fix is the Manifest Evolution Pattern: always update the manifest when a generated artifact introduces new shared information, and always re-run any prompts that would be affected by the update.

The third failure mode is the Inconsistent Convention. This occurs when different modules use different coding styles, naming conventions, or architectural patterns, making the assembled system feel like it was written by multiple different developers with different preferences. The root cause is a manifest that does not specify conventions precisely enough. The fix is to add a detailed Conventions section to the manifest that covers naming conventions, error handling patterns, logging patterns, and any other stylistic choices that must be consistent across modules.

The fourth failure mode is the Context Bleed. This occurs when a prompt is so long that the model loses track of the manifest's interface contracts and starts generating code or prose that contradicts them. The root cause is a local context section that is too long, leaving insufficient context budget for the model to properly attend to the manifest. The fix is to trim the local context ruthlessly, moving any information that is not essential to the generation task into comments or documentation that can be added manually after generation.

The fifth failure mode is the Hallucinated Dependency. This occurs when a generated module imports or references a function, class, or concept that does not exist in the manifest or in any other generated module. The root cause is that the model is drawing on its training data to fill gaps in the specification, producing references to libraries, functions, or patterns that are not part of the intended system. The fix is to be explicit in the manifest and in each prompt about exactly which external libraries and internal modules are available, and to include a statement like "Do not import any library not listed in the manifest" in the Output Specification section.


CHAPTER THIRTEEN: SCALING UP - VERY LARGE SYSTEMS

The strategy described so far works well for systems of moderate complexity: a REST API with five to ten modules, or a document with seven to ten chapters. For very large systems, such as a full-stack application with dozens of modules, or a book with thirty chapters, additional scaling strategies are needed.

The first scaling strategy is hierarchical decomposition. Instead of a single flat manifest covering all modules, you create a two-level hierarchy. The top-level manifest covers the major subsystems of the application: the backend API, the frontend, the data pipeline, the authentication service. Each subsystem then has its own sub-manifest that covers the modules within that subsystem. Each module prompt receives both the top-level manifest (for cross-subsystem interface contracts) and the relevant sub-manifest (for intra-subsystem details). This two-level structure keeps each manifest small while providing each prompt with the right level of detail.

The second scaling strategy is parallel generation. Once the manifest is complete and reviewed, many of the individual module or chapter prompts can be run in parallel, because they are independent of each other's outputs. The repository layer does not need to wait for the service layer to be generated; both can be generated simultaneously using the manifest as their shared foundation. This dramatically reduces the wall-clock time required to generate a large system.

The third scaling strategy is the Summary Artifact. For very long documents where chapters do build on each other's content (not just their interface contracts), you can generate a Summary Artifact after each chapter: a one-to-two paragraph summary of the chapter's key points, arguments, and conclusions. These summaries are then included in the manifest or in the local context of subsequent chapter prompts, giving those prompts access to the substance of previous chapters without requiring them to read the full text.

The Summary Artifact strategy is particularly valuable for narrative documents where the argument evolves across chapters. The summary captures the intellectual content of each chapter in a form that is compact enough to include in subsequent prompts without consuming excessive context budget. It is the document equivalent of the interface contract in software: a compressed, precise description of what the artifact provides to the rest of the system.


CHAPTER FOURTEEN: TOOLING AND AUTOMATION

The multi-prompt strategy described in this article can be executed entirely manually: you write the prompts, run them one by one in your preferred LLM interface, review the outputs, and assemble the artifacts. For small to medium systems, this manual approach is perfectly adequate and has the advantage of keeping you closely involved in every step of the generation process.

For larger systems, or for teams that need to apply this strategy repeatedly across many projects, automation becomes valuable. The prompt workflow can be implemented as a simple script that reads the manifest and the individual prompt templates from files, injects the manifest into each template, calls the LLM API, and writes the output to the appropriate file in the project directory. This automation eliminates the mechanical work of copying and pasting the manifest into each prompt, reduces the risk of human error, and makes it easy to regenerate individual artifacts when the manifest evolves.

Tools like LangChain (python.langchain.com) provide abstractions for building such workflows, including prompt templates, chain composition, and output parsers. However, for the specific multi-prompt strategy described here, the tooling requirements are modest enough that a simple Python script using the OpenAI or Anthropic API directly is often more transparent and easier to maintain than a full LangChain implementation.

The most important piece of tooling is not the automation script but the manifest itself. Treat the manifest as a first-class artifact in your project repository. Version it with git. Review changes to it with the same care you would review changes to a public API. The manifest is the contract that holds your multi-prompt system together, and it deserves the same respect as any other critical design document.


CONCLUSION: THE DISCIPLINE OF DECOMPOSITION

The multi-prompt strategy described in this article is, at its heart, an application of a principle that software engineers have known for decades: decomposition. Complex systems are built from simple, well-defined components with clear interfaces. The insight that this article adds is that the same principle applies to the process of generating complex systems with LLMs. The prompts are the components, the manifest is the interface specification, and the integration step is the assembly process.

What makes this insight non-trivial is the specific constraint of the LLM context window, which forces you to be disciplined about what information is shared and what information is local. This constraint is, paradoxically, a gift. It forces you to think clearly about the boundaries of each component, the contracts between components, and the minimal shared vocabulary that all components need. Systems designed under this discipline tend to be cleaner and more modular than systems designed without it.

The manifest is the key innovation. It is the answer to the question of how subsequent prompts can know what previous prompts have generated without reading those previous outputs in full. The answer is that they do not need to read the outputs; they need to read the contracts. A well-designed contract is always smaller than the implementation it describes, and it is always more useful to a neighboring component than the implementation itself.

As LLMs continue to improve and context windows continue to grow, some of the specific constraints that motivate this strategy will relax. But the underlying discipline of decomposition, interface-first design, and minimal shared context will remain valuable. These are not workarounds for LLM limitations; they are good engineering practices that happen to be enforced by those limitations. The engineer who masters them will build better systems with LLMs, and better systems without them.


QUICK REFERENCE: THE MULTI-PROMPT WORKFLOW AT A GLANCE

For the reader who wants a concise summary of the complete workflow described in this article, here it is in plain terms.

You begin by running the Architect Prompt, which takes your high-level system description and produces the System Manifest. The manifest contains the project overview, the unit list, the file structure, the shared types or terminology, the interface contracts or cross-reference map, and the conventions. You review the manifest carefully and correct any errors or gaps before proceeding.

You then run the individual Unit Prompts, one per module or chapter. Each prompt injects the full manifest, declares its role and scope (including what it does not generate), declares its dependencies, provides its local context, and specifies its output format. You review each output before running the next prompt. If a generated artifact introduces new shared information, you update the manifest before proceeding.

After all units are generated, you assemble them and run integration checks: static type checking and tests for software, cross-reference verification for documents. You use the Integration Prompt to produce targeted corrections for any issues found.

If you want the LLM to design the prompts for you, you run the Meta-Prompt first, providing your system description. The meta-prompt produces the complete workflow: the Architect Prompt, all Unit Prompts, and the Integration Prompt. You review and refine this workflow, then execute it.

The result is a large, complex, coherent artifact that no single prompt could have produced, assembled from focused, well-coordinated pieces that fit together as naturally as if they had been written in a single session.

Good. I now have all the verification data needed. Here is a detailed findings report followed by the corrected references section.



REFERENCES AND FURTHER READING

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. (2023). "Lost in the Middle: How Language Models Use Long Contexts." arXiv:2307.03172. This paper provides the empirical foundation for understanding why large context windows do not eliminate the need for careful context management, demonstrating that model performance degrades significantly when relevant information is placed in the middle of long input contexts.

Ning, X., Lin, Z., Zhou, Z., Yang, H., and Wang, Y. (2023). "Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding." arXiv:2307.15337. This paper introduces the skeleton-first approach to structured generation, in which a model first produces a high-level skeleton of its answer and then fills in the details, a technique that inspired the Skeleton-First Strategy described in this article.

Suzgun, M. and Kalai, A. T. (2024). "Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding." arXiv:2401.12954. This paper formalizes the meta-prompting technique, in which a single LLM acts as an orchestrator that decomposes complex tasks and assigns subtasks to specialized expert instances, demonstrating significant improvements on coding and reasoning benchmarks.

Zhou, Y., Muresanu, A. I., Han, Z., Paster, K., Pitis, S., Chan, H., and Ba, J. (2023). "Large Language Models Are Human-Level Prompt Engineers." arXiv:2211.01910. This paper introduces the Automatic Prompt Engineer framework for automatically generating and selecting high-quality prompts, providing the theoretical basis for using LLMs to generate and optimize prompts rather than relying solely on human authorship.

The Anthropic documentation on prompt chaining (docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/chain-prompts) provides practical guidance on implementing prompt chaining with Claude, including best practices for managing context across multiple prompts and patterns for self-correction and pipeline enforcement.

The Prompt Engineering Guide (promptingguide.ai), maintained by DAIR.AI (Distributed AI Research Institute), provides a comprehensive and regularly updated overview of prompting techniques including chain-of-thought, decomposition, and meta-prompting, with links to the original research papers for each technique.


No comments: