CHAPTER 1: WHY THIS MATTERS, AND WHAT WE ARE ACTUALLY TALKING ABOUT

Before we dive into files, patterns, and configuration details, let us establish something important: configuring an AI agent is not the same as writing a prompt. A prompt is a question you ask once. An agent configuration is closer to writing a job description, an employee handbook, a set of operating procedures, and a personality profile for a new hire who will work autonomously, make decisions, use tools, and potentially manage other workers, all without you being in the room. Get it wrong, and your agent will either do nothing useful, do the wrong thing confidently, or, in the worst case, do exactly what you said rather than what you meant.

The two platforms we will use as our primary reference throughout this tutorial are Hermes Agent, developed by Nous Research, and OpenClaw, an open-source framework for building personal AI assistants. Both platforms share a philosophy that is worth understanding before anything else: configuration lives in plain text files. Your agent's soul, its memory, its skills, its goals, and its operating rules are all Markdown and YAML documents sitting in a directory on your filesystem. This is not a limitation. It is a superpower. It means you can version-control your agent with Git, diff its personality over time, review what it knows, and audit what it has learned. It means your agent is inspectable, portable, and reproducible in a way that a black-box SaaS chatbot never could be.

Hermes Agent stores its configuration under ~/.hermes/ and uses files like SOUL.md, MEMORY.md, USER.md, AGENTS.md, and a config.yaml for infrastructure settings. OpenClaw stores its workspace under ~/.openclaw/ and uses SOUL.md, AGENTS.md, TOOLS.md, USER.md, BOOTSTRAP.md, and skill directories conforming to the agentskills.io open standard. The two platforms are different products with different strengths, but their configuration philosophies are close enough that lessons learned on one transfer readily to the other. Where they differ in important ways, we will call that out explicitly.

Let us now build our mental model from the ground up, starting with the most fundamental question a developer must answer before touching a single configuration file.

CHAPTER 2: WHAT IS THE GOAL, AND WHY DOES IT DETERMINE EVERYTHING ELSE

The goal is the single most important input to your entire agent configuration. Every other decision, which model to use, how to write the soul, which skills to install, whether to use a single agent or a team, whether to run once or on a schedule, flows downstream from a clear, precise, well-structured goal definition. Developers who skip this step and jump straight to configuring tools or writing personality descriptions almost always end up with agents that are busy but not useful.

A goal has several dimensions that you must think through before writing a single line of configuration. The first dimension is specificity: how precisely can you describe what done looks like? The second is scope: how many distinct steps or domains does achieving this goal require? The third is frequency: is this a one-time task, a regularly recurring task, or an ongoing standing objective? The fourth is verifiability: how will you or the agent know when the goal has been achieved? The fifth is risk: what is the worst thing the agent could do while pursuing this goal, and how much does that matter?

Let us look at two contrasting goal definitions to make this concrete.

EXAMPLE: Weak Goal Definition

"Help me with my GitHub repository."

EXAMPLE: Strong Goal Definition

"Audit the open issues in the GitHub repository at github.com/myorg/myproject. For each issue labeled 'bug' that has been open for more than 30 days and has no assignee, post a comment asking the original reporter for reproduction steps, then assign the issue to @triagebot. Generate a Markdown summary report of all actions taken and save it to ./reports/triage-YYYY-MM-DD.md. Stop when all qualifying issues have been processed or when 50 issues have been handled, whichever comes first."

The difference between these two definitions is not just clarity. It is the difference between an agent that will ask you clarifying questions forever and an agent that can work autonomously to completion. The strong definition contains an implicit definition of done (all qualifying issues processed or 50 handled), a scope boundary (only bugs, only older than 30 days, only unassigned), a verification artifact (the summary report), and a safety limit (50 issues maximum). Every one of these elements maps directly to something you will configure in your agent files.

Hermes Agent formalizes this with its /goal command and the associated goal loop. When you invoke /goal, you are not just giving the agent a task. You are activating a separate judge model that evaluates after every turn whether the goal has been satisfied. If the judge says no, Hermes continues working. This means your goal definition must be precise enough for a second LLM to evaluate it objectively. Vague goals produce agents that never stop, because the judge can never confidently say the goal is complete. Overly narrow goals produce agents that stop too early.

In OpenClaw, the goal is typically embedded in the AGENTS.md file as a standing mission, or passed as a natural-language instruction to a cron job or one-shot task. The platform does not have a separate judge model by default, which means the definition of done must be encoded in the skill or the prompt itself, often as an explicit checklist or a structured output requirement.

The practical advice here is this: before you open any configuration file, write your goal as a user story with acceptance criteria. Then ask yourself whether a reasonably intelligent person who had never spoken to you before could read that goal and know exactly when they were done. If the answer is no, your goal needs more work.

CHAPTER 3: THE SOUL - GIVING YOUR AGENT AN IDENTITY THAT HOLDS

Once you have a clear goal, the next file you will write is the soul. In both Hermes Agent and OpenClaw, this is SOUL.md, and it is loaded as the very first content in the system prompt at the start of every session. Think of it as the agent's character sheet, the document that answers the question: who is this agent, and how does it behave in every situation it encounters?

The soul serves several practical engineering purposes that are easy to underestimate. First, it prevents personality drift. Without a fixed soul, an LLM's behavior shifts based on whatever context happens to dominate the conversation. An agent that starts a session helping with financial analysis might gradually adopt the tone and assumptions of a casual chatbot if the conversation drifts in that direction. The SOUL.md file anchors the agent's identity against this drift. Second, it establishes hard behavioral limits that are harder to override than instructions buried in a skill or a task prompt. Third, it defines the communication style, which has a surprisingly large effect on output quality. An agent told to be concise and technical will produce different outputs than one told to be thorough and explanatory, even when given identical tasks.

Here is what a well-constructed SOUL.md looks like for a software engineering assistant agent:

FILE: ~/.hermes/SOUL.md (or ~/.openclaw/SOUL.md)

IDENTITY You are Meridian, a senior software engineering assistant with deep expertise in distributed systems, API design, and developer tooling. You are not a general-purpose chatbot. You exist to help engineering teams ship reliable software faster.

PERSONALITY AND TONE You communicate like a senior engineer who respects the reader's intelligence. You are direct, precise, and economical with words. You do not pad responses with affirmations like "Great question!" or "Certainly!". You do not apologize for being direct. When you are uncertain, you say so explicitly rather than hedging with vague language. You prefer concrete examples over abstract explanations. You ask clarifying questions only when the ambiguity would materially affect your approach, and you ask at most one clarifying question per turn.

CORE VALUES You prioritize correctness over speed. You prefer reversible actions over irreversible ones. You never delete files, drop database tables, or make destructive changes without explicit confirmation from the user. You treat security as a first-class concern, not an afterthought.

HARD LIMITS You will not execute commands that modify production systems without a human in the loop. You will not store or transmit credentials in plain text. You will not generate code that deliberately circumvents security controls. If asked to do any of these things, you explain why you are declining and suggest a safer alternative.

SELF-AWARENESS You know that you are an AI agent running on a language model. You do not pretend to have experiences you do not have. When your knowledge has a cutoff date that is relevant to a task, you say so.

Several things are worth noting about this example. The identity section is specific about domain expertise, which helps the underlying LLM activate the right knowledge and reasoning patterns. The personality section uses negative examples ("you do not pad responses") because LLMs respond well to explicit prohibitions alongside positive instructions. The hard limits section is written in unambiguous imperative language, because this is the section that needs to survive even when a user or another agent tries to override it. The self-awareness section prevents a class of hallucination where the agent confidently asserts things it cannot know.

A common antipattern here is what practitioners call the "Swiss Army Knife Soul," where the developer tries to make the agent good at everything by listing dozens of domains and capabilities. This produces an agent that is mediocre at everything and excellent at nothing, because the soul's specificity is what activates focused reasoning in the underlying model. Another antipattern is the "Motivational Poster Soul," which consists entirely of vague aspirational statements like "You are helpful, harmless, and honest" without any concrete behavioral guidance. These statements are true of every LLM by default and add no useful signal.

The soul should also be kept stable. It is not the place for task-specific instructions, project-specific conventions, or information that changes over time. Those belong in other files, which we will cover next.

CHAPTER 4: THE BRAIN - MEMORY, CONTEXT, AND WHAT THE AGENT KNOWS

If the soul defines who the agent is, the brain files define what the agent knows. In both Hermes Agent and OpenClaw, the brain is distributed across several files that serve different purposes and have different update frequencies. Understanding this distinction is essential for building agents that are both effective and efficient with tokens.

MEMORY.md is the agent's working memory about its environment. In Hermes Agent, this file is capped at approximately 800 tokens and injected directly into the system prompt at the start of every session. It contains facts that are stable but not permanent: the timezone of the server, the naming conventions of the project, the location of key configuration files, the API endpoints the agent uses most frequently. Think of it as the sticky notes on the agent's monitor. It is not a place for conversation history or task-specific notes. It is a place for environmental facts that would otherwise need to be re-established at the start of every session.

USER.md is the agent's model of the person it works with. It stores preferences, communication style, recurring contexts, and behavioral patterns that the agent has learned or been told. In Hermes Agent, this is populated through a process called Honcho dialectic user modeling, where the agent builds an increasingly detailed understanding of the user over time. In OpenClaw, it is typically seeded manually and then updated by the agent as it learns. A well-populated USER.md dramatically reduces the friction of working with an agent, because the agent stops asking questions it has already learned the answers to.

AGENTS.md is the operational layer. This is where you put the agent's working procedures, numbered workflows, memory management rules, session routing logic, and security constraints. In OpenClaw, this file takes priority over SOUL.md for security-related rules, which is an important architectural decision: personality can be overridden by operational necessity, but operational rules cannot be overridden by personality. In Hermes Agent, AGENTS.md files can exist at the project root and in subdirectories, and they are automatically injected when the agent performs tool calls in those directories. This means you can have project-specific operating procedures that activate only when the agent is working in a particular codebase.

Here is a concrete example of an AGENTS.md file for a developer assistant agent:

FILE: ~/myproject/AGENTS.md

PROJECT CONTEXT This is a Python 3.11 FastAPI application using PostgreSQL 15 and Redis 7. The main application lives in ./src/. Tests are in ./tests/ and must be run with pytest. The CI pipeline runs on GitHub Actions and configuration is in ./.github/workflows/.

MEMORY RULES After completing any task that involved discovering a new fact about this project (a new environment variable, a new service dependency, a new naming convention), write that fact to MEMORY.md before ending the session.

CODING CONVENTIONS All new functions must have type annotations. All new modules must have a module-level docstring. SQL queries must use parameterized statements. Never use SELECT * in production queries.

WORKFLOW: IMPLEMENTING A FEATURE Step 1. Read the relevant issue or specification completely before writing any code. Step 2. Identify which existing modules will be affected and read them. Step 3. Write tests first, then implementation. Step 4. Run the test suite with pytest before reporting completion. Step 5. If tests fail, fix the implementation, not the tests, unless the tests are demonstrably wrong.

SECURITY RULES Never log request bodies that may contain credentials or PII. Never commit .env files. If you discover a hardcoded credential in the codebase, flag it immediately and do not proceed with the original task until it is addressed.

Notice that this AGENTS.md is project-specific and lives in the project root, not in the global ~/.hermes/ directory. This is intentional. The global configuration defines who the agent is and how it generally behaves. The project-level AGENTS.md defines how it behaves in this specific context. This layered approach allows you to have one agent that behaves appropriately in multiple different projects without needing separate agent instances for each.

The relationship between these files and token efficiency is worth dwelling on. Every token loaded into the system prompt costs money and consumes context window space. Hermes Agent's 800-token cap on MEMORY.md is not arbitrary. It reflects a real engineering tradeoff: you want the agent to have enough context to be useful without burning so much of the context window on static facts that there is no room for the actual conversation. OpenClaw uses a similar progressive disclosure approach for skills, loading only the name and description of each skill initially and loading the full instructions only when a skill is activated. This is good engineering, and you should design your configuration files with the same discipline.

CHAPTER 5: SKILLS - TEACHING THE AGENT HOW TO DO THINGS

Skills are where the agent's capabilities live. In both Hermes Agent and OpenClaw, skills conform to the agentskills.io open standard, which means they are portable across platforms and can be shared through community hubs like ClawHub. A skill is a directory containing at minimum a SKILL.md file, which combines YAML frontmatter for metadata with a Markdown body for instructions.

The frontmatter is what the agent reads during the discovery phase, when it is deciding which skills are relevant to the current task. The description field in the frontmatter is therefore the most important field in the entire skill, because it is the signal the agent uses to decide whether to load the full skill instructions. A poor description means the skill will never be activated even when it is exactly what is needed.

Here is a complete example of a skill for generating structured code review reports:

DIRECTORY: ~/.hermes/skills/code-review-report/

FILE: SKILL.md

name: code-review-report description: Use this skill when the user asks for a code review, a PR review, a pull request analysis, or a structured assessment of code quality. Generates a standardized Markdown report covering correctness, security, performance, and maintainability. license: MIT compatibility: tools: [read_file, terminal]

CODE REVIEW REPORT SKILL

When activated, you will produce a structured code review report. Follow these steps precisely.

STEP 1: GATHER CONTEXT Read the files specified by the user. If a pull request URL is provided, use the terminal tool to run "gh pr diff " to retrieve the diff. If neither is provided, ask the user to specify what should be reviewed.

STEP 2: ANALYZE THE CODE Evaluate the code across four dimensions. For correctness, look for logic errors, off-by-one errors, unhandled edge cases, and incorrect assumptions. For security, look for injection vulnerabilities, hardcoded credentials, improper input validation, and insecure defaults. For performance, look for N+1 query patterns, unnecessary computation in hot paths, and missing indexes on frequently queried columns. For maintainability, look for code that violates the project's naming conventions (check AGENTS.md), missing tests, and functions that do more than one thing.

STEP 3: PRODUCE THE REPORT Output a Markdown document with the following structure:

Code Review Report

Date: [ISO date] Reviewer: Meridian (AI) Scope: [files or PR reviewed]

Summary

[Two to three sentences describing the overall quality and the most important finding.]

Findings

Critical (must fix before merge)

[Each finding as a numbered item with file, line number if available, description, and suggested fix.]

Important (should fix soon)

[Same format.]

Minor (nice to have)

[Same format.]

Positive Observations

[What the code does well. This section is mandatory and must contain at least one observation.]

STEP 4: SAVE THE REPORT Save the report to ./reviews/review-YYYY-MM-DD-HH-MM.md unless the user specifies a different location. Confirm the save location in your response.

This skill illustrates several best practices. The description is written from the perspective of what the user will ask, not what the skill does internally, because the agent matches user requests against skill descriptions. The steps are numbered and imperative, which produces more reliable execution than prose instructions. The output format is specified precisely, which means the agent's output is predictable and machine-parseable. The skill references AGENTS.md for project conventions, which means it adapts to the project context automatically.

Hermes Agent adds a powerful dimension to skills through its closed learning loop. After completing a task that required five or more tool calls, Hermes can automatically generate a new skill document capturing the workflow it just executed. This means your agent's skill library grows over time as it solves new problems. The engineering implication is significant: you do not need to anticipate every skill your agent will need at configuration time. You need to seed it with enough skills to handle its initial tasks, and then let it learn. However, this also means you need an audit mechanism. Skills generated automatically by the agent should be reviewed before they are promoted to active use, because a skill that encodes a bad pattern will cause that bad pattern to be repeated indefinitely. Using Git to track the skills directory and implementing a staging-to-active promotion workflow is strongly recommended.

CHAPTER 6: THE AGENTIC LOOP - HOW THE AGENT ACTUALLY WORKS

Understanding the agentic loop is not optional for a developer or architect working with these platforms. It is the mechanism by which everything else operates, and misunderstanding it is the source of a large fraction of the bugs, inefficiencies, and unexpected behaviors that developers encounter.

Both Hermes Agent and OpenClaw implement a ReAct-style agentic loop. ReAct stands for Reasoning and Acting, and the pattern was introduced in a 2022 paper by Yao et al. at Princeton and Google. The core insight of ReAct is that interleaving reasoning steps with action steps produces more reliable and interpretable agent behavior than either pure chain-of-thought reasoning or pure action execution. Here is what a single iteration of the loop looks like:

ITERATION N OF THE REACT LOOP

THOUGHT: The user wants to find all Python files in the repository that import the 'requests' library but do not handle the ConnectionError exception. I should use the terminal tool to run a grep command to find files that import requests, then read each file to check for exception handling.

ACTION: terminal COMMAND: grep -rl "import requests" ./src/

OBSERVATION: ./src/api/client.py ./src/utils/http_helpers.py ./src/services/data_fetcher.py

THOUGHT: Three files import requests. I need to read each one and check whether they have try/except blocks that catch ConnectionError or its parent class RequestException.

ACTION: read_file PATH: ./src/api/client.py

OBSERVATION: [file contents]

THOUGHT: This file has a try/except block but only catches Timeout, not ConnectionError. This is a bug. I will note this and continue to the next file.

[... continues for remaining files ...]

FINAL ANSWER: I found two files with incomplete exception handling...

In Hermes Agent's implementation, the loop is driven by the finish_reason field returned by the LLM API. If finish_reason is tool_calls, the agent executes the requested tools, appends the results to the conversation trajectory, and calls the LLM again. If finish_reason is stop, the agent has produced a final answer and the loop exits. The trajectory, which is the full ordered list of messages including user inputs, assistant responses, tool requests, and tool results, is the agent's working memory during a task. It is what allows the agent to maintain coherent reasoning across many steps.

Hermes Agent implements an iteration budget of 90 turns per task by default. This is a critical safety mechanism. Without an iteration limit, a confused or stuck agent can loop indefinitely, consuming tokens and API credits without producing useful output. The 90-turn default is generous enough for complex tasks but bounded enough to prevent runaway execution. For your own deployments, you should calibrate this limit based on the complexity of your tasks. A simple data retrieval task might need at most 10 turns. A complex multi-file refactoring task might legitimately need 40 or 50. Setting the limit too low causes premature termination. Setting it too high wastes money and makes debugging harder.

Context compression is another mechanism you need to understand. As the trajectory grows through many iterations, it eventually approaches the model's context window limit. Hermes Agent handles this by compressing the middle portion of the conversation: it summarizes older turns while preserving recent messages and all tool call/result pairs. This is a reasonable heuristic, but it means that information from early in a long task may be lost or distorted. For tasks where early context is critical, such as a task that begins by reading a specification document, you should either keep tasks short enough to avoid compression or explicitly instruct the agent to write key facts to MEMORY.md before the compression threshold is reached.

The difference between ReAct and Plan-and-Execute is worth understanding because it affects which pattern you should choose for different goal types. ReAct is adaptive: the agent decides its next action based on the most recent observation, which means it can respond to surprises and correct mistakes mid-task. Plan-and-Execute is structured: the agent first produces a complete plan, then executes each step. Plan-and-Execute is more legible and auditable, because you can inspect the plan before execution begins. It is also more brittle, because if the environment does not match the plan's assumptions, the agent may execute incorrect steps confidently. For most tasks in Hermes Agent and OpenClaw, ReAct is the right default. Reserve Plan-and-Execute for tasks where auditability is more important than adaptability, such as regulated workflows where a human must approve the plan before execution.

CHAPTER 7: CONFIGURING FOR ONE-SHOT GOALS

A one-shot goal is a task that needs to be accomplished once and then is done. It might be a complex task that takes many steps and hours of agent time, but it has a clear completion condition and will not recur. Examples include migrating a database schema, generating a comprehensive audit report, refactoring a module to use a new API, or building a prototype application.

The configuration for a one-shot goal has several distinctive characteristics. The goal definition must be self-contained, because there is no recurring context to rely on. The agent needs everything it needs to complete the task embedded in the goal description, the relevant AGENTS.md files, and the skills it has available. The definition of done must be explicit and verifiable, because the agent needs to know when to stop.

In Hermes Agent, the /goal command is the right mechanism for complex one-shot tasks. Here is how you would configure and invoke a one-shot goal for a database migration task:

INVOCATION:

/goal

GOAL: Migrate the user authentication system from session-based to JWT-based authentication.

DEFINITION OF DONE:

All endpoints that previously required a session cookie now accept and validate a JWT Bearer token.
The existing session middleware has been removed or disabled.
A new JWT utility module exists at ./src/auth/jwt.py with functions for token generation, validation, and refresh.
All existing tests pass.
New tests exist for the JWT utility module with at least 80% coverage.
A migration guide exists at ./docs/auth-migration.md explaining the change for API consumers.

VERIFICATION: Run pytest and confirm all tests pass. Check that ./docs/auth-migration.md exists and contains at least 500 words.

CONSTRAINTS: Do not modify the database schema. Do not change the user model. Do not remove the old session code until the JWT implementation is complete and tested.

BUDGET: 60 turns maximum.

Notice the structure of this goal definition. The definition of done is a numbered checklist of concrete, verifiable conditions. The verification section tells the agent how to confirm completion. The constraints section establishes safety boundaries. The budget section prevents runaway execution. This is not a prompt. It is a specification, and writing it as a specification rather than a request is what makes autonomous one-shot execution possible.

For one-shot tasks in OpenClaw, the equivalent approach is to create a dedicated skill file for the task and invoke it directly. This is particularly useful when the task is complex enough that you want to define the workflow in advance rather than relying on the agent to figure it out:

FILE: ~/.openclaw/skills/auth-migration/SKILL.md

name: auth-migration description: Use this skill to migrate the authentication system from session-based to JWT-based. This is a one-time migration task for the myproject application.

AUTH MIGRATION WORKFLOW

This skill guides the complete migration from session-based to JWT authentication. Execute the following steps in order and do not skip any step.

PHASE 1: ANALYSIS (Steps 1-3) Step 1. Read ./src/auth/session.py and ./src/middleware/session_middleware.py to understand the current implementation. Step 2. Read all files in ./src/api/ to identify every endpoint that uses session authentication. Step 3. Write a summary of findings to ./migration/analysis.md before proceeding.

PHASE 2: IMPLEMENTATION (Steps 4-7) Step 4. Create ./src/auth/jwt.py with token generation, validation, and refresh functions. Step 5. Update each identified endpoint to accept JWT Bearer tokens. Step 6. Write unit tests for ./src/auth/jwt.py targeting 80% coverage. Step 7. Run pytest and fix any failures before proceeding.

PHASE 3: DOCUMENTATION AND CLEANUP (Steps 8-9) Step 8. Write ./docs/auth-migration.md with a migration guide for API consumers. Step 9. Disable (do not delete) the old session middleware by wrapping it in a feature flag.

COMPLETION CHECK After Step 9, run pytest one final time. If all tests pass, report completion with a summary of all files created or modified.

The key insight for one-shot goals is that the more work you put into the configuration upfront, the less supervision the agent requires during execution. A well-configured one-shot task can run completely unattended. A poorly configured one will require constant intervention, which defeats the purpose of using an agent at all.

CHAPTER 8: CONFIGURING FOR RECURRING GOALS

Recurring goals are fundamentally different from one-shot goals in ways that affect almost every aspect of configuration. A recurring goal runs on a schedule, which means the agent must be able to execute it without any human context from the previous run, without asking clarifying questions, and without requiring setup steps that assume a fresh environment.

Both Hermes Agent and OpenClaw support cron-style scheduling. OpenClaw uses standard Unix cron expression syntax and stores job definitions in ~/.openclaw/cron/jobs.json. Hermes Agent supports similar scheduling through its task system. In both cases, each scheduled execution starts a fresh agent session with no memory of previous executions, unless you explicitly design the agent to persist state between runs.

This statelessness is the most important architectural characteristic of recurring agent jobs, and it is the source of the most common configuration mistakes. Developers who configure recurring jobs as if they were interactive sessions end up with agents that fail silently because they cannot ask the clarifying questions they would ask in an interactive context, or that produce inconsistent output because they make different assumptions each run.

Here is a concrete example of a recurring job configuration in OpenClaw for a daily engineering metrics report:

FILE: ~/.openclaw/cron/jobs.json (relevant entry)

{ "id": "daily-engineering-metrics", "name": "Daily Engineering Metrics Report", "schedule": "0 7 * * 1-5", "timezone": "Europe/Berlin", "mode": "isolated", "skill": "daily-engineering-metrics", "timeout": 600, "retries": 2, "model": "claude-4-5-haiku" }

And the corresponding skill file:

FILE: ~/.openclaw/skills/daily-engineering-metrics/SKILL.md

name: daily-engineering-metrics description: Generates the daily engineering metrics report. Runs automatically every weekday morning. Do not invoke manually unless testing.

DAILY ENGINEERING METRICS REPORT

This skill runs every weekday at 07:00 Europe/Berlin time. It produces a metrics report and delivers it to the team Slack channel. All steps must complete within 10 minutes.

CONTEXT Today's date is available via the terminal command "date +%Y-%m-%d". The reporting period is the previous calendar day. All times are in Europe/Berlin timezone.

DATA COLLECTION Step 1. Retrieve the GitHub Actions workflow run summary for the previous day using: gh run list --created [YESTERDAY] --json conclusion,name,duration Step 2. Retrieve the count of open PRs awaiting review using: gh pr list --state open --json createdAt,reviewDecision Step 3. Retrieve the Sentry error count for the previous day using the Sentry API skill (invoke: /sentry-daily-summary [YESTERDAY]).

REPORT FORMAT Produce a report with exactly this structure:

Engineering Daily Metrics - [DATE] CI Success Rate: [X]% ([N] runs, [M] failures) PRs Awaiting Review: [N] (oldest: [AGE] days) Production Errors: [N] ([DELTA] vs previous day) Attention Required: [List any metric that is outside normal range, or "None"]

DELIVERY Send the completed report to the #engineering-metrics Slack channel using the slack-message skill. If Slack delivery fails, save the report to ./reports/metrics-[DATE].md and log the delivery failure.

ERROR HANDLING If any data collection step fails, do not abort the entire report. Use "DATA UNAVAILABLE" for that metric and note the failure in the Attention Required section. Always produce and deliver a report, even if some data is missing.

Several design decisions in this configuration deserve explanation. The schedule "0 7 * * 1-5" means 7:00 AM on Monday through Friday, which is the correct cron expression for weekday mornings. The timezone is specified explicitly as Europe/Berlin rather than relying on the system default, because cron timezone bugs are notoriously difficult to debug. The mode is "isolated," which means each run starts a completely fresh session with no memory of previous runs. This is the recommended mode for recurring jobs because it prevents state from one run from contaminating the next.

The model selected is claude-4-5-haiku rather than a more powerful model. This is a deliberate cost optimization. A daily metrics report does not require deep reasoning. It requires reliable tool execution and consistent formatting. A faster, cheaper model is entirely adequate, and using it instead of a frontier model reduces the cost of this job from potentially several dollars per run to a few cents. Over a year of weekday runs, this difference is significant.

The error handling section is critical for recurring jobs. In an interactive session, the agent can ask the user what to do when something goes wrong. In a scheduled job, there is no user to ask. The skill must specify exactly what to do in every foreseeable failure mode. The pattern used here, continue with partial data and flag the failure in the output, is generally better than aborting the entire job, because a partial report is more useful than no report.

The timeout and retries fields in the job configuration provide a safety net. A timeout of 600 seconds (10 minutes) ensures that a stuck job does not run indefinitely. Two retries means that transient failures, like a momentary API outage, will be automatically recovered without human intervention.

CHAPTER 9: MULTI-AGENT SYSTEMS - ORCHESTRATION, HANDOFF, AND SUBGOAL DECOMPOSITION

Single-agent configurations are sufficient for a large fraction of real-world tasks. But some goals are genuinely too large, too complex, or too multi-domain for a single agent to handle reliably. When you reach this boundary, you need a multi-agent architecture, and the configuration challenges multiply significantly.

The fundamental reason to use multiple agents is not that a single agent is not smart enough. Modern LLMs are capable of remarkable breadth. The reason is context and specialization. A single agent working on a complex goal accumulates context as it works, and that context eventually crowds out the information it needs for later steps. A multi-agent system solves this by giving each agent a focused context relevant to its specific subtask. Additionally, specialized agents can be configured with domain-specific souls, skills, and memory that make them more reliable within their domain than a generalist agent would be.

Hermes Agent supports multi-agent setups through isolated profiles, each with its own configuration, memory, skills, and model. The delegate_task tool allows a parent agent to spawn subagents, passing them a goal and context. Subagents start with a fresh conversation and have no knowledge of the parent's history. Everything the subagent needs must be passed explicitly through the goal and context fields. OpenClaw supports similar patterns through its agent management system and can run multiple agents with different configurations simultaneously.

Let us walk through a concrete multi-agent scenario to make the orchestration concepts tangible. The goal is to produce a comprehensive competitive analysis report for a software product, covering technical capabilities, pricing, customer sentiment, and strategic positioning. This goal spans at least four distinct domains, each requiring different tools and different expertise.

MULTI-AGENT ARCHITECTURE FOR COMPETITIVE ANALYSIS

ORCHESTRATOR AGENT (brain.md: generalist, soul.md: project manager persona) | |-- delegates to --> RESEARCH AGENT (soul: analyst, skills: web-search, document-synthesis) | Goal: Gather technical specifications and pricing for competitors A, B, C | Output: ./research/raw-data.md | |-- delegates to --> SENTIMENT AGENT (soul: analyst, skills: review-scraping, sentiment-analysis) | Goal: Analyze customer reviews on G2, Capterra, Reddit for competitors A, B, C | Output: ./research/sentiment-summary.md | |-- waits for both outputs, then delegates to --> | |-- SYNTHESIS AGENT (soul: senior consultant, skills: report-writing, strategic-analysis) Goal: Synthesize raw-data.md and sentiment-summary.md into final report Output: ./reports/competitive-analysis-YYYY-MM-DD.md

The orchestrator's AGENTS.md would contain the workflow definition:

FILE: ~/.hermes/profiles/orchestrator/AGENTS.md

ORCHESTRATION WORKFLOW: COMPETITIVE ANALYSIS

Step 1. Verify that the target competitors and product have been specified. If not, ask the user before proceeding.

Step 2. Delegate to the research profile with the following goal: "Gather technical specifications, feature lists, and public pricing for [COMPETITORS]. Save structured findings to ./research/raw-data.md. Include sources for all claims."

Step 3. Delegate to the sentiment profile with the following goal: "Analyze customer reviews on G2, Capterra, and Reddit for [COMPETITORS] from the past 12 months. Identify top 5 praise themes and top 5 complaint themes per competitor. Save findings to ./research/sentiment-summary.md."

Step 4. Wait for both delegations to complete. Verify that ./research/raw-data.md and ./research/sentiment-summary.md both exist and are non-empty.

Step 5. Delegate to the synthesis profile with the following goal: "Read ./research/raw-data.md and ./research/sentiment-summary.md. Produce a comprehensive competitive analysis report at ./reports/competitive-analysis-[DATE].md. The report must include an executive summary, a feature comparison matrix, a pricing analysis, a customer sentiment analysis, and strategic recommendations."

Step 6. Verify the final report exists and report its location to the user.

HANDOFF PROTOCOL When delegating to a subagent, always include: (1) the specific output file path, (2) the format requirements for that output, (3) the scope boundaries (what the subagent should NOT do), and (4) the quality criteria the output must meet.

The handoff protocol section at the bottom of this AGENTS.md is worth examining carefully. The four elements it specifies, output file path, format requirements, scope boundaries, and quality criteria, are the minimum information a subagent needs to produce output that the orchestrator can use. Missing any one of these elements is a common source of multi-agent failures.

Output file path is obvious but often omitted, resulting in subagents saving output to unpredictable locations. Format requirements prevent the synthesis agent from receiving raw-data.md in a format it cannot parse. Scope boundaries prevent subagents from doing work that belongs to another agent, which causes duplication and inconsistency. Quality criteria give the subagent a self-evaluation mechanism so it can catch its own failures before reporting completion.

The handoff itself should be treated as a structured protocol, not a free-text message. The research community on multi-agent systems has consistently found that free-text handoffs are the primary source of context loss in multi-agent systems. When you pass context from one agent to another as unstructured prose, the receiving agent must interpret that prose, and interpretation introduces error. When you pass context as a structured schema, the receiving agent can parse it reliably.

Here is what a structured handoff payload looks like in practice:

HANDOFF PAYLOAD (JSON Schema)

{ "schema_version": "1.0", "task_id": "competitive-analysis-2025-06-22", "delegating_agent": "orchestrator", "receiving_agent": "synthesis", "goal": "Produce competitive analysis report", "inputs": { "raw_data_path": "./research/raw-data.md", "sentiment_path": "./research/sentiment-summary.md" }, "output": { "path": "./reports/competitive-analysis-2025-06-22.md", "format": "Markdown with H1 title, H2 sections as specified", "minimum_length_words": 2000 }, "scope": { "include": ["feature comparison", "pricing analysis", "sentiment synthesis", "strategic recommendations"], "exclude": ["additional research", "web browsing", "contacting external APIs"] }, "quality_criteria": [ "Every claim in the feature comparison must cite a source from raw-data.md", "Strategic recommendations must be grounded in the sentiment data", "Executive summary must be readable by a non-technical executive" ], "deadline_turns": 30}

Including a schema_version field in every handoff payload is a practice borrowed from API design. As your multi-agent system evolves, the handoff schema will change. Versioning the schema allows you to maintain backward compatibility and detect mismatches between agents that have been updated at different times.

The question of when to use parallel versus sequential subagent execution is an architectural decision with significant performance implications. In the competitive analysis example above, the research agent and the sentiment agent can run in parallel because they have no dependency on each other's output. The synthesis agent must run after both, because it depends on their outputs. This is a DAG (directed acyclic graph) execution pattern, and it is the most efficient structure for tasks with independent parallel branches followed by a synthesis step.

Hermes Agent's support for asynchronous subagents means that the orchestrator does not need to block while waiting for each subagent to complete. It can delegate both the research and sentiment tasks, then poll for their completion before delegating the synthesis task. This reduces the total wall-clock time for the competitive analysis from the sum of all three agents' execution times to the maximum of the two parallel agents' times plus the synthesis agent's time.

CHAPTER 10: GOAL TYPES AND THEIR CONFIGURATION SIGNATURES

Different types of goals have characteristic configuration patterns. Understanding these patterns allows you to quickly identify the right configuration approach for a new goal rather than starting from scratch each time.

A data transformation goal, such as converting a dataset from one format to another, validating records against a schema, or enriching records with data from an external API, has a simple configuration signature. It needs a precise input/output specification, a skill that defines the transformation logic, and a verification step that confirms the output meets the expected schema. It does not need a complex soul, a rich USER.md, or multi-agent orchestration. The soul can be minimal and technical. The skill should include explicit error handling for malformed input records. The goal definition should specify what to do with records that fail validation, whether to skip them, flag them, or abort.

A research and synthesis goal, such as producing a report, summarizing a body of literature, or answering a complex question that requires gathering information from multiple sources, has a different signature. It needs strong web search and document reading skills, a soul that emphasizes accuracy and source attribution, and a goal definition that specifies the required depth, format, and citation standards. For complex research goals, a multi-agent approach with a dedicated research agent and a separate synthesis agent often produces better results than a single agent trying to do both, because the research phase and the synthesis phase require different cognitive modes and different context.

A code generation and maintenance goal needs a soul that emphasizes correctness and testability, an AGENTS.md with explicit coding conventions, skills for reading and writing code files and running tests, and a goal definition that includes acceptance criteria in the form of test requirements. The most important configuration element for code goals is the feedback loop: the agent must be able to run the code it writes and observe the results, which means the terminal tool must be available and the AGENTS.md must specify how to run tests.

A monitoring and alerting goal, which is almost always a recurring goal, needs a minimal soul, a self-contained skill that specifies exactly what to check and what constitutes an alert condition, explicit error handling for every foreseeable failure mode, and a delivery mechanism for the alert output. The skill must be written so that it can execute completely without human interaction, because monitoring jobs run unattended. The alert threshold and escalation logic should be in the skill file, not hardcoded in the cron job definition, so they can be updated without modifying the scheduling configuration.

An orchestration goal, where the agent's primary job is to coordinate other agents rather than to do work directly, needs a soul that emphasizes clarity and precision in communication, an AGENTS.md with a detailed workflow definition including handoff protocols, and skills for spawning subagents and aggregating their outputs. The orchestrator's soul should explicitly de-emphasize doing work directly, because an orchestrator that starts doing the work of its subagents undermines the entire multi-agent architecture.

CHAPTER 11: BEST PRACTICES AND ANTIPATTERNS - THE HARD-WON LESSONS

The following observations come from the documented experience of practitioners working with these platforms and the broader agentic AI community. They are organized not as a list but as a narrative, because the relationships between these practices matter as much as the practices themselves.

The single most important best practice is to write your configuration files as if you are onboarding a new employee, not as if you are writing a prompt. A prompt is optimized for a single interaction. An employee handbook is optimized for consistent behavior across thousands of interactions in contexts you cannot fully anticipate. The difference in mindset produces dramatically different configuration quality. When you write SOUL.md, ask yourself: if this agent encounters a situation I have not thought of, will this document give it enough guidance to make a reasonable decision? When you write AGENTS.md, ask yourself: if this agent is working on a task at 3 AM with no one available to ask, will these procedures keep it on track?

The second critical practice is to test your configuration before deploying it to production. This sounds obvious, but many developers skip it because testing an agent feels different from testing code. The right approach is to create a set of representative test scenarios that cover the normal case, edge cases, and failure cases, then run the agent through each scenario and evaluate its behavior. For recurring jobs, run the job manually using the platform's "run now" feature before enabling the schedule. For multi-agent systems, test each subagent in isolation before testing the orchestrator. Hermes Agent's /goal command makes this straightforward: you can run the same goal definition multiple times with different inputs and compare the results.

The third practice is to instrument your agents from the start. Both platforms produce logs of the agent's trajectory, including its thoughts, actions, and observations. These logs are invaluable for debugging, but only if you actually read them. Set up a workflow where you review the trajectory logs for your most important recurring jobs at least weekly. Look for patterns of inefficiency, such as the agent repeatedly searching for information that should be in MEMORY.md, or patterns of error, such as the agent consistently failing at a particular step and recovering through an expensive retry.

Now for the antipatterns, which are perhaps more instructive than the best practices because they are more specific and more avoidable.

The "Omniscient Soul" antipattern occurs when a developer tries to make the agent capable of everything by writing a SOUL.md that claims expertise in every domain. The result is an agent that is confidently mediocre across all domains rather than reliably excellent in its actual domain. The fix is to be ruthlessly specific about the agent's domain and to create separate agents with separate souls for genuinely different domains.

The "Empty Context" antipattern occurs when USER.md and MEMORY.md are left empty or nearly empty. The agent then spends the first several turns of every session re-establishing context that should already be known. This is expensive in tokens and frustrating in practice. The fix is to seed these files with everything the agent should know before it starts working, and to configure the agent to update them as it learns.

The "Prompt Injection Vulnerability" is not just a theoretical concern. OpenClaw's own documentation acknowledges that a significant percentage of community skills contain malicious instructions, and that the agent's inability to reliably separate commands from data makes it susceptible to prompt injection attacks that can poison its memory and influence its long-term behavior. The fix is to audit every community skill before installing it, to run agents in isolated containers rather than directly on your host machine, and to implement a review gate for any skill the agent generates automatically.

The "Excessive Agency" antipattern is identified in OpenClaw's security documentation as the number one risk for AI agents. It occurs when an agent is given more permissions, more tools, and more autonomy than its task actually requires. An agent that can read files, write files, execute terminal commands, send messages, and call external APIs has a very large blast radius if it makes a mistake or is manipulated. The fix is to apply the principle of least privilege: give the agent only the tools it needs for its specific task, and configure hard limits in SOUL.md and AGENTS.md for the most dangerous operations.

The "Stateful Cron" antipattern occurs when a recurring job is configured as if it has access to state from previous runs. The developer writes a skill that says "compare today's results with yesterday's results" without providing a mechanism for the agent to actually access yesterday's results, because each cron execution starts a fresh session. The fix is to explicitly design state persistence into recurring jobs: write the previous run's output to a known file location, and have the skill read that file at the start of each run.

The "Free-Text Handoff" antipattern in multi-agent systems has already been mentioned, but it deserves emphasis. When an orchestrator passes context to a subagent as unstructured prose, the subagent must interpret that prose, and different LLM instances will interpret the same prose differently. This produces non-deterministic behavior in your multi-agent system, which is the opposite of what you want. The fix is to use structured JSON handoff payloads with explicit schemas, as demonstrated in Chapter 9.

CHAPTER 12: EVALUATING YOUR CONFIGURATION

A configuration that looks good on paper may not work well in practice, and a configuration that works well in practice may be fragile in ways that only appear under unusual conditions. Evaluation is the discipline that bridges this gap, and it is underdeveloped in most agentic AI deployments.

The most useful evaluation technique for agent configurations is what the research community calls LLM-as-judge evaluation. You run the agent on a set of test cases, collect the full trajectory including thoughts, actions, observations, and final output, and then use a separate LLM to score the trajectory against a rubric. The rubric should cover correctness (did the agent achieve the goal?), efficiency (did the agent take a reasonable path, or did it waste turns on unnecessary steps?), safety (did the agent stay within its configured boundaries?), and reliability (would the agent produce a consistent result if run again?).

For recurring jobs, you can automate this evaluation by running the job in a staging environment before promoting it to production. The staging environment should mirror the production environment as closely as possible, including the same data sources and the same tool configurations. Run the job several times in staging, review the trajectories, and only promote to production when the job is consistently producing correct output.

For multi-agent systems, evaluation is more complex because you need to evaluate both the individual agents and the system as a whole. Start by evaluating each subagent in isolation with representative inputs. Then evaluate the orchestrator's handoff quality by inspecting the handoff payloads it generates. Finally, evaluate the end-to-end system with representative goals and measure the quality of the final output.

The most important metric for a one-shot goal agent is goal completion rate: what fraction of the time does the agent successfully achieve the goal within its turn budget? For a recurring job agent, the most important metrics are reliability (what fraction of scheduled runs complete successfully?) and consistency (how similar are the outputs across runs with similar inputs?). For a multi-agent system, the most important metric is handoff success rate: what fraction of handoffs result in the receiving agent producing output that meets the quality criteria?

These metrics should be tracked over time, not just measured once. Agent behavior can drift as the underlying LLM is updated by the provider, as the agent's skill library grows through the learning loop, and as the environment the agent operates in changes. A monitoring dashboard that tracks these metrics and alerts when they fall below acceptable thresholds is a worthwhile investment for any production agent deployment.

CONCLUSION: THE DISCIPLINE OF AGENT CONFIGURATION

Configuring an AI agent to reliably achieve goals is a discipline that combines software engineering, technical writing, system design, and a deep understanding of how large language models reason. It is not prompt engineering, though prompt engineering skills are useful. It is not traditional software development, though software development skills are essential. It is something new, and it rewards practitioners who approach it with the same rigor and craftsmanship they would bring to any other serious engineering challenge.

The platforms we have examined, Hermes Agent and OpenClaw, represent a thoughtful approach to this challenge. By making configuration explicit, file-based, and auditable, they give developers the tools to build agents that are inspectable, reproducible, and improvable over time. The agentskills.io open standard for skills, the ReAct agentic loop, the structured handoff protocols for multi-agent systems, and the cron-based scheduling for recurring jobs are all mature patterns with real-world validation.

The journey from a vague idea of "I want an agent that does X" to a production-ready agent configuration that reliably achieves X is longer than most developers expect the first time. It requires clear goal definition, careful soul design, thoughtful memory architecture, well-crafted skills, appropriate loop configuration, and rigorous evaluation. Each of these elements interacts with the others in ways that are not always obvious. But the payoff, an agent that works autonomously, learns over time, and handles complex multi-step goals without constant supervision, is substantial enough to justify the investment.

Start with a single, well-defined goal. Write the goal definition as a specification with acceptance criteria. Write the soul as an employee handbook, not a prompt. Seed the memory files with everything the agent should know before it starts. Install or write the skills the agent needs. Test before deploying. Review the trajectories. Iterate. The agent will get better, and so will you.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Monday, June 22, 2026

CONFIGURING AI AGENTS TO ACHIEVE GOALS: A DEEP TECHNICAL TUTORIAL FOR DEVELOPERS AND ARCHITECTS