Sunday, May 31, 2026

SKILLS IN AGENTIC AI: A COMPLETE TUTORIAL FOR DEVELOPERS

 


PREFACE: WHY YOU SHOULD CARE ABOUT THIS

Imagine you hired a brilliant new employee. On their first day, you hand them a mountain of tasks: schedule meetings, analyze spreadsheets, write reports, call vendors, and monitor a production system. If that employee had no skills, no prior knowledge, no ability to use a phone or open a spreadsheet, they would be completely useless despite their raw intelligence. The same is true for AI agents.

Agentic AI is the field concerned with building AI systems that do not merely answer questions but actually take actions, pursue goals, make decisions, and complete complex multi-step tasks autonomously. These systems are called agents, and they are rapidly becoming the backbone of modern AI applications. But an agent without skills is like that brilliant but helpless new hire: smart in theory, useless in practice.

This tutorial is written for developers who have never worked with Agentic AI before. You do not need a machine learning background. You do not need to have built a neural network. You do need to be comfortable reading code, thinking in abstractions, and following a logical argument step by step. By the end of this article, you will understand what skills are, why they exist, how to design them well, how agents use them, and how to avoid the many traps that catch beginners off guard.

CHAPTER ONE: UNDERSTANDING AGENTIC AI FROM SCRATCH

Before we can talk about skills, we need to understand the world they live in. Let us spend a few pages building that foundation.

A traditional AI application is reactive and stateless. You send it a prompt, it sends back a response, and the interaction is complete. There is no memory of what happened before, no ability to take actions in the world, and no concept of a goal that spans multiple steps. This is the familiar chatbot model, and while it is useful, it is fundamentally limited.

An agent is different. An agent has a goal, a memory, a reasoning process, and the ability to take actions. It can look at its current situation, decide what to do next, do it, observe the result, and then decide what to do after that. This loop, which researchers call the Observe-Think-Act loop or sometimes the ReAct loop (Reasoning plus Acting), is the heartbeat of every agent system.

The loop looks like this in abstract terms:

+------------------+
|   Observe World  |
+--------+---------+
         |
         v
+------------------+
|  Reason / Plan   |
+--------+---------+
         |
         v
+------------------+
|   Select Action  |
+--------+---------+
         |
         v
+------------------+
|   Execute Action |
+--------+---------+
         |
         v
+------------------+
| Observe Outcome  |
+--------+---------+
         |
         (loop back to top)

The agent keeps cycling through this loop until it decides the goal is complete, or until it runs out of attempts, or until some external condition stops it. Each iteration of the loop is called a step or a turn.

Now here is the critical question: when the agent reaches the "Execute Action" box, what exactly can it do? The answer is: only what it has been given the capability to do. Those capabilities are what we call skills.

CHAPTER TWO: WHAT IS A SKILL?

A skill, in the context of Agentic AI, is a well-defined, self-contained capability that an agent can invoke to accomplish a specific sub-task. Think of a skill as a function that the agent knows about, can decide to call, and can interpret the result of.

This definition has several important parts, and each one matters enormously.

"Well-defined" means the skill has a clear name, a clear description of what it does, a clear specification of what inputs it needs, and a clear description of what it returns. Vagueness is the enemy of reliable agent behavior.

"Self-contained" means the skill does not depend on hidden state or secret knowledge. Everything the skill needs to do its job is either passed in as a parameter or is part of the skill's own internal implementation. The agent does not need to know how the skill works internally, only what it does and how to call it.

"Capability that an agent can invoke" means the agent has been told this skill exists and has been given enough information to decide when to use it and how to call it correctly. The agent does not discover skills on its own; they are registered with the agent at startup.

"Accomplish a specific sub-task" means skills are not general-purpose. A skill does one thing well. It does not try to do everything. This is the single-responsibility principle applied to agent capabilities.

Let us look at a concrete, simple example. Suppose you are building an agent that helps employees at a company manage their calendars. You might give it the following skills:

get_calendar_events(date: str) -> list[dict]
create_calendar_event(title: str, date: str, time: str, duration_minutes: int) -> dict
delete_calendar_event(event_id: str) -> bool
find_free_slots(date: str, duration_minutes: int) -> list[str]
send_meeting_invitation(event_id: str, attendee_emails: list[str]) -> bool

Each of these is a skill. Each one does exactly one thing. Each one has a clear name that tells you what it does, clear parameters that tell you what it needs, and a clear return type that tells you what it gives back.

When the agent receives a request like "Schedule a one-hour meeting with Alice and Bob tomorrow afternoon," it will reason through the problem and decide it needs to call find_free_slots first to see what times are available, then call create_calendar_event to book the slot, and finally call send_meeting_invitation to notify Alice and Bob. The agent orchestrates these skills in sequence to accomplish the overall goal.

This is the power of skills: they transform a vague, high-level goal into a series of precise, executable actions.

CHAPTER THREE: SKILLS VERSUS TOOLS - CLEARING UP THE CONFUSION

If you have read anything about Agentic AI before, you have almost certainly encountered the word "tools." Many frameworks use the words "skill" and "tool" interchangeably, and this causes a great deal of confusion. Let us sort this out once and for all.

In the most common usage across the industry, a tool is the technical implementation of a capability, while a skill is the agent-facing description of that capability. The tool is the code that does the work. The skill is the metadata and interface that tells the agent the tool exists and how to use it.

Think of it this way: a hammer is a tool. The skill of "driving a nail into wood" is the knowledge of how to pick up the hammer, position it correctly, and swing it with the right force. The skill wraps the tool and gives it meaning in the context of a goal-directed agent.

In practice, a skill definition typically contains three things. First, it contains a human-readable name and description that the agent's language model can understand and reason about. Second, it contains a schema for the inputs, specifying what parameters are required, what their types are, and what they mean. Third, it contains a reference to the actual function or API call that will be executed when the skill is invoked.

Here is a simple illustration of how a tool function and a skill definition relate to each other. The tool is just a Python function:

import requests


def fetch_weather_data(city: str, units: str = "metric") -> dict:
    """
    Fetches current weather data for a given city from a weather API.

    Args:
        city: The name of the city to fetch weather for.
        units: The unit system to use. Either 'metric' or 'imperial'.

    Returns:
        A dictionary containing temperature, humidity, and conditions.
    """
    # In a real implementation, this would call an actual weather API.
    # For demonstration, we return a mock response.
    api_url = f"https://api.weatherservice.example.com/current"
    response = requests.get(api_url, params={"city": city, "units": units})
    response.raise_for_status()
    return response.json()

The skill definition is the metadata that wraps this function and makes it understandable to an agent. In many frameworks, this looks like a dictionary or a structured object:

weather_skill = {
    "name": "get_current_weather",
    "description": (
        "Retrieves the current weather conditions for a specified city. "
        "Use this skill when the user asks about weather, temperature, "
        "rain, humidity, or general atmospheric conditions in a location. "
        "Do not use this for historical weather data or forecasts."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "city": {
                "type": "string",
                "description": "The name of the city, e.g. 'Berlin' or 'New York'."
            },
            "units": {
                "type": "string",
                "enum": ["metric", "imperial"],
                "description": (
                    "The unit system for temperature. Use 'metric' for Celsius "
                    "and 'imperial' for Fahrenheit. Default to 'metric'."
                )
            }
        },
        "required": ["city"]
    },
    "function": fetch_weather_data
}

Notice how much richer the skill definition is compared to the raw function. The description tells the agent not just what the skill does but also when to use it and when not to use it. The parameter descriptions give the agent enough context to fill in the values correctly. This is the crucial difference between a tool (the function) and a skill (the full definition including the function).

Some frameworks, like OpenAI's function calling API, LangChain, and Microsoft's Semantic Kernel, have slightly different terminologies and structures, but the underlying concept is always the same: you are giving the agent a named, described, parameterized capability.

For the rest of this tutorial, we will use the word "skill" to mean the complete package: the description, the schema, and the implementation together.

CHAPTER FOUR: A TAXONOMY OF SKILLS

Not all skills are alike. Understanding the different categories of skills will help you design your agent systems more thoughtfully.

The first major category is retrieval skills. These skills fetch information from somewhere and return it to the agent. The weather example above is a retrieval skill. Other examples include searching a database, reading a file, querying an API, looking up a user's profile, or fetching the current price of a stock. Retrieval skills are read-only; they observe the world without changing it.

The second major category is action skills. These skills change something in the world. Sending an email, creating a calendar event, writing to a database, posting a message to a Slack channel, or executing a shell command are all action skills. Action skills have side effects, which means they require extra care in design and testing.

The third major category is computation skills. These skills perform calculations or transformations on data. Converting a temperature from Celsius to Fahrenheit, parsing a JSON document, calculating the distance between two GPS coordinates, or summarizing a long text are all computation skills. Computation skills are typically pure functions: given the same input, they always produce the same output, and they have no side effects.

The fourth major category is orchestration skills. These are higher-level skills that themselves invoke other skills or agents. An orchestration skill might be "research and summarize a topic," which internally calls a web search skill, a text extraction skill, and a summarization skill in sequence. Orchestration skills are how you build hierarchical, multi-agent systems.

The fifth major category is memory skills. These skills interact with the agent's memory systems, allowing it to store information for later retrieval, recall past interactions, or update its knowledge base. Examples include save_to_memory, recall_from_memory, and update_user_preference.

Understanding which category a skill belongs to helps you think about its design constraints. Retrieval skills should be fast and reliable. Action skills should be idempotent where possible (meaning calling them twice produces the same result as calling them once) and should have clear error handling. Computation skills should be tested exhaustively. Orchestration skills should be designed with failure modes in mind.

CHAPTER FIVE: THE PURPOSE AND VALUE OF SKILLS

Now that we know what skills are, let us talk about why they exist and what problems they solve. This is not an academic question; understanding the purpose of skills will directly influence how you design them.

The first and most fundamental purpose of skills is to give agents access to the real world. A large language model, by itself, is a text-in, text-out system. It knows a great deal about the world from its training data, but it cannot look up today's stock price, cannot send an email, cannot read your company's internal database. Skills are the bridge between the agent's reasoning capabilities and the actual systems and data sources it needs to interact with.

The second purpose is to make agent behavior predictable and auditable. When an agent can only interact with the world through a defined set of skills, you know exactly what it can and cannot do. You can log every skill invocation, inspect the inputs and outputs, and build a complete audit trail of the agent's actions. This is enormously valuable for debugging, compliance, and trust-building.

The third purpose is to enable specialization and reuse. Once you have built a well-designed skill, you can give it to multiple agents. Your calendar skill can be used by your scheduling agent, your personal assistant agent, and your meeting-room-booking agent. Skills are the building blocks of a modular agent ecosystem.

The fourth purpose is to constrain the agent's action space. This might sound counterintuitive, but limiting what an agent can do is actually a safety feature. An agent that can only call a predefined set of skills cannot accidentally (or maliciously) do things you did not intend. The skill set defines the agent's "blast radius," and keeping that blast radius small and well-understood is a key principle of safe agent design.

The fifth purpose is to separate concerns cleanly. The agent's language model handles reasoning, planning, and natural language understanding. The skills handle the actual work of interacting with systems and data. This separation means you can upgrade the language model without rewriting your skills, and you can add new skills without retraining the model.

CHAPTER SIX: DESIGNING PERFECT SKILLS - THE FUNDAMENTALS

Designing a good skill is harder than it looks. A poorly designed skill will confuse the agent, produce unreliable results, and make your system fragile. A well-designed skill will be used correctly almost every time, will handle errors gracefully, and will be a joy to maintain.

Let us walk through the principles of good skill design one by one, with examples at each step.

PRINCIPLE 1: NAME YOUR SKILL LIKE A VERB PHRASE

The name of a skill is the first thing the agent sees when deciding whether to use it. A good skill name is a clear, unambiguous verb phrase that describes exactly what the skill does. Bad names are vague nouns like "weather" or "calendar." Good names are precise verb phrases like "get_current_weather" or "create_calendar_event."

Compare these two sets of skill names:

Bad:  weather, calendar, email, database, file
Good: get_current_weather, create_calendar_event, send_email, query_database, read_file

The good names immediately tell the agent (and the developer) what action the skill performs. The bad names are ambiguous: does "calendar" create events, delete them, or just read them?

PRINCIPLE 2: WRITE DESCRIPTIONS THAT TEACH THE AGENT WHEN TO USE THE SKILL

The description field is not just documentation for human developers. It is the primary signal the agent uses to decide whether a particular skill is appropriate for the current situation. A good description answers three questions: what does this skill do, when should you use it, and when should you not use it.

Here is an example of a weak description versus a strong description for the same skill:

Weak:  "Gets weather information."

Strong: "Retrieves real-time weather conditions for a specific city,
         including temperature, humidity, wind speed, and general
         conditions (sunny, cloudy, rainy, etc.). Use this skill
         whenever the user asks about current weather, temperature,
         or atmospheric conditions in a specific location. Do NOT
         use this for weather forecasts (use get_weather_forecast
         instead) or for historical weather data (use
         get_historical_weather instead). This skill requires an
         internet connection and may fail if the city name is
         misspelled or unrecognized."

The strong description is longer, but that length is doing real work. It tells the agent exactly what data it will receive, when to use this skill versus related skills, and what failure modes to expect. This level of detail dramatically reduces the chance that the agent will call the wrong skill or misinterpret the result.

PRINCIPLE 3: DESIGN PARAMETERS WITH PRECISION AND GENEROSITY

Every parameter in your skill should have a clear type, a clear description, and, where possible, examples or constraints. Do not assume the agent will infer what you mean; tell it explicitly.

Consider a skill for searching a product catalog. Here is a poorly designed parameter set:

# Poor parameter design - too vague, missing constraints and examples
search_products_skill_bad = {
    "name": "search_products",
    "description": "Searches for products.",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {"type": "string"},
            "filters": {"type": "object"},
            "limit": {"type": "integer"}
        },
        "required": ["query"]
    }
}

Now here is the same skill with well-designed parameters:

# Good parameter design - precise types, rich descriptions, examples, constraints
search_products_skill_good = {
    "name": "search_products",
    "description": (
        "Searches the product catalog for items matching a text query. "
        "Returns a list of matching products with their names, prices, "
        "availability, and product IDs. Use this when the user wants to "
        "find, browse, or compare products. For best results, use specific "
        "product names or categories rather than vague terms."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": (
                    "The search term or phrase. Should be a product name, "
                    "category, or descriptive phrase. Examples: "
                    "'wireless headphones', 'red running shoes size 42', "
                    "'4K OLED television under 1000 euros'."
                )
            },
            "category": {
                "type": "string",
                "enum": [
                    "electronics", "clothing", "sports", "home",
                    "books", "toys", "food", "automotive"
                ],
                "description": (
                    "Optional category filter to narrow results. "
                    "Only specify this if the user has indicated a "
                    "specific product category."
                )
            },
            "max_price_euros": {
                "type": "number",
                "description": (
                    "Optional maximum price in euros. Only specify this "
                    "if the user has mentioned a budget or price limit. "
                    "Must be a positive number."
                ),
                "minimum": 0
            },
            "max_results": {
                "type": "integer",
                "description": (
                    "Maximum number of results to return. Defaults to 10. "
                    "Use a smaller number (3-5) for quick lookups and a "
                    "larger number (20-50) for comprehensive searches."
                ),
                "default": 10,
                "minimum": 1,
                "maximum": 100
            }
        },
        "required": ["query"]
    }
}

The difference is dramatic. The well-designed version gives the agent everything it needs to call the skill correctly in almost any situation. The enum for category prevents the agent from inventing category names that do not exist. The minimum constraint on max_price_euros prevents the agent from passing a negative number. The detailed description of query with examples helps the agent formulate effective search terms.

PRINCIPLE 4: RETURN STRUCTURED, SELF-DESCRIBING DATA

The output of a skill is just as important as its input. When a skill returns data, that data needs to be interpretable by the agent without any additional context. Avoid returning raw, opaque data structures. Return clean, well-labeled dictionaries or objects that the agent can reason about directly.

Here is an example of a skill implementation that returns poorly structured data:

def get_user_info_bad(user_id: str) -> tuple:
    """Returns user information as a tuple."""
    # This is a bad practice - tuples are opaque and positional
    user = database.fetch_user(user_id)
    return (user[0], user[3], user[7], user[12])
    # What are positions 0, 3, 7, 12? Nobody knows without reading the DB schema.

And here is the same skill returning well-structured data:

def get_user_info_good(user_id: str) -> dict:
    """
    Retrieves profile information for a specific user.

    Args:
        user_id: The unique identifier for the user (e.g., 'usr_12345').

    Returns:
        A dictionary with the following keys:
        - user_id (str): The user's unique identifier.
        - full_name (str): The user's full display name.
        - email (str): The user's primary email address.
        - department (str): The organizational department the user belongs to.
        - role (str): The user's job role or title.
        - is_active (bool): Whether the user account is currently active.
        - created_at (str): ISO 8601 timestamp of account creation.

    Raises:
        ValueError: If the user_id format is invalid.
        UserNotFoundError: If no user exists with the given user_id.
    """
    if not user_id.startswith("usr_"):
        raise ValueError(f"Invalid user_id format: '{user_id}'. Must start with 'usr_'.")

    user = database.fetch_user(user_id)

    if user is None:
        raise UserNotFoundError(f"No user found with ID: {user_id}")

    return {
        "user_id": user.id,
        "full_name": user.display_name,
        "email": user.primary_email,
        "department": user.department_name,
        "role": user.job_title,
        "is_active": user.account_status == "active",
        "created_at": user.created_at.isoformat()
    }

The well-structured version returns a dictionary with clearly named keys. The agent can read "department" and know it is getting the user's department. It can read "is_active" and know whether the account is usable. The docstring also documents the error cases, which helps the agent understand what might go wrong and how to handle it.

PRINCIPLE 5: HANDLE ERRORS GRACEFULLY AND INFORMATIVELY

Skills will fail. Networks go down, APIs return unexpected responses, users provide invalid input. A skill that crashes with an unhandled exception is worse than useless; it breaks the agent's reasoning loop and may leave the system in an inconsistent state.

Good skill design means anticipating failure modes and returning informative error information that the agent can reason about and potentially recover from.

from dataclasses import dataclass
from typing import Optional


@dataclass
class SkillResult:
    """
    A standardized container for skill execution results.

    Using a consistent result wrapper across all skills makes it easier
    for the agent to handle both success and failure cases uniformly.
    """
    success: bool
    data: Optional[dict]
    error_code: Optional[str]
    error_message: Optional[str]
    retry_suggested: bool = False


def send_email(
    recipient_email: str,
    subject: str,
    body: str,
    sender_name: str = "AI Assistant"
) -> SkillResult:
    """
    Sends an email to a specified recipient.

    Args:
        recipient_email: The email address of the recipient.
        subject: The subject line of the email. Keep under 100 characters.
        body: The full text body of the email.
        sender_name: The display name to show as the sender.

    Returns:
        A SkillResult indicating success or failure. On success, data
        contains the message_id of the sent email. On failure, error_code
        and error_message describe what went wrong.
    """
    import re
    import smtplib

    # Validate email format before attempting to send
    email_pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    if not re.match(email_pattern, recipient_email):
        return SkillResult(
            success=False,
            data=None,
            error_code="INVALID_EMAIL",
            error_message=(
                f"The email address '{recipient_email}' does not appear to be "
                f"valid. Please verify the address and try again."
            ),
            retry_suggested=False
        )

    try:
        message_id = email_service.send(
            to=recipient_email,
            subject=subject,
            body=body,
            sender_name=sender_name
        )
        return SkillResult(
            success=True,
            data={"message_id": message_id, "recipient": recipient_email},
            error_code=None,
            error_message=None
        )

    except smtplib.SMTPRecipientsRefused:
        return SkillResult(
            success=False,
            data=None,
            error_code="RECIPIENT_REFUSED",
            error_message=(
                f"The email server refused to deliver to '{recipient_email}'. "
                f"The address may not exist or may be blocked."
            ),
            retry_suggested=False
        )

    except smtplib.SMTPServerDisconnected:
        return SkillResult(
            success=False,
            data=None,
            error_code="SERVER_UNAVAILABLE",
            error_message=(
                "The email server is temporarily unavailable. "
                "This is likely a transient network issue."
            ),
            retry_suggested=True
        )

Notice how each error case returns a different error_code and a different error_message that explains what went wrong in plain language. The retry_suggested field tells the agent whether it makes sense to try again. This kind of rich error information allows the agent to make intelligent decisions about how to proceed when things go wrong.

CHAPTER SEVEN: HOW AGENTS PROCESS AND SELECT SKILLS

Now we come to one of the most fascinating parts of the whole system: how does an agent actually decide which skill to use, and what happens mechanically when it makes that decision?

The answer depends on the underlying language model and framework, but the general process is remarkably consistent across implementations. Let us walk through it step by step.

STEP 1: SKILL REGISTRATION

When you create an agent, you register a set of skills with it. This registration process takes the skill definitions (names, descriptions, parameter schemas) and formats them in a way that the language model can understand. In most modern frameworks, this means converting the skill definitions into a structured format (usually JSON Schema) and including them in the system prompt or in a special "tools" section of the API request.

Here is what a simplified agent setup looks like:

from typing import Any, Callable


class AgentSkillRegistry:
    """
    Manages the collection of skills available to an agent.

    This registry is responsible for storing skill definitions,
    validating skill calls, and dispatching execution to the
    appropriate underlying functions.
    """

    def __init__(self):
        # Maps skill names to their full definitions including the callable
        self._skills: dict[str, dict] = {}

    def register(self, skill_definition: dict, handler: Callable) -> None:
        """
        Registers a new skill with the agent.

        Args:
            skill_definition: The metadata dict with name, description,
                              and parameters schema.
            handler: The Python callable that implements the skill.

        Raises:
            ValueError: If a skill with the same name is already registered.
        """
        name = skill_definition["name"]
        if name in self._skills:
            raise ValueError(
                f"A skill named '{name}' is already registered. "
                f"Skill names must be unique within an agent."
            )
        self._skills[name] = {
            "definition": skill_definition,
            "handler": handler
        }

    def get_definitions_for_model(self) -> list[dict]:
        """
        Returns all skill definitions in the format expected by the
        language model API (without the handler callables, which the
        model does not need to know about).
        """
        return [
            entry["definition"]
            for entry in self._skills.values()
        ]

    def execute(self, skill_name: str, arguments: dict) -> Any:
        """
        Executes a skill by name with the given arguments.

        Args:
            skill_name: The name of the skill to execute.
            arguments: A dictionary of argument names to values.

        Returns:
            The return value of the skill's handler function.

        Raises:
            KeyError: If no skill with the given name is registered.
        """
        if skill_name not in self._skills:
            raise KeyError(
                f"No skill named '{skill_name}' is registered. "
                f"Available skills: {list(self._skills.keys())}"
            )
        handler = self._skills[skill_name]["handler"]
        return handler(**arguments)

This registry is the central hub of the skill system. It stores the definitions, provides them to the language model, and dispatches execution when the model decides to call a skill.

STEP 2: THE MODEL RECEIVES THE SKILL DEFINITIONS

When the agent sends a request to the language model, it includes the skill definitions alongside the conversation history and the current user message. The model has been trained to understand these definitions and to produce structured "function call" outputs when it decides a skill should be invoked.

From the model's perspective, the skill definitions look something like this in the API request:

# This is a simplified representation of what gets sent to the language model API.
# Real APIs like OpenAI's have a specific format for this, but the concept is the same.

api_request = {
    "model": "gpt-4o",
    "messages": [
        {
            "role": "system",
            "content": "You are a helpful assistant with access to calendar tools."
        },
        {
            "role": "user",
            "content": "Can you schedule a 30-minute meeting with Alice tomorrow at 2pm?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "create_calendar_event",
                "description": (
                    "Creates a new event in the user's calendar. Use this when "
                    "the user wants to schedule a meeting, appointment, or reminder."
                ),
                "parameters": {
                    "type": "object",
                    "properties": {
                        "title": {
                            "type": "string",
                            "description": "The title or name of the calendar event."
                        },
                        "date": {
                            "type": "string",
                            "description": "The date of the event in YYYY-MM-DD format."
                        },
                        "time": {
                            "type": "string",
                            "description": "The start time in HH:MM 24-hour format."
                        },
                        "duration_minutes": {
                            "type": "integer",
                            "description": "The duration of the event in minutes."
                        }
                    },
                    "required": ["title", "date", "time", "duration_minutes"]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

The "tool_choice": "auto" setting tells the model it can decide for itself whether to call a skill or just respond with text. You can also set this to "required" to force the model to always call a skill, or to a specific skill name to force it to call a particular skill.

STEP 3: THE MODEL DECIDES TO CALL A SKILL

The language model processes the request and decides that calling the create_calendar_event skill is the right thing to do. Instead of generating a text response, it generates a structured function call output:

# This is what the model's response looks like when it decides to call a skill.
# This is not code you write; it is what the API returns to you.

model_response = {
    "id": "chatcmpl-abc123",
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": None,
                "tool_calls": [
                    {
                        "id": "call_xyz789",
                        "type": "function",
                        "function": {
                            "name": "create_calendar_event",
                            "arguments": '{"title": "Meeting with Alice", "date": "2025-06-01", "time": "14:00", "duration_minutes": 30}'
                        }
                    }
                ]
            },
            "finish_reason": "tool_calls"
        }
    ]
}

Notice that the model has filled in all the required parameters. It inferred "tomorrow" as a specific date, converted "2pm" to "14:00" in 24-hour format, and used "30" for the duration. This is the language model doing what it does best: understanding natural language and converting it into structured data.

STEP 4: YOUR CODE EXECUTES THE SKILL

Now your agent code takes over. It parses the model's response, extracts the skill name and arguments, and calls the appropriate handler through the registry:

import json


def handle_model_response(model_response: dict, registry: AgentSkillRegistry) -> str:
    """
    Processes a model response that contains a skill call request.

    This function extracts the skill name and arguments from the model's
    output, executes the skill, and returns the result as a JSON string
    that can be fed back to the model in the next turn.

    Args:
        model_response: The raw response dictionary from the language model API.
        registry: The skill registry to use for executing the skill.

    Returns:
        A JSON string containing the skill execution result.
    """
    choice = model_response["choices"][0]
    message = choice["message"]

    # Check if the model actually requested a skill call
    if not message.get("tool_calls"):
        # The model responded with text, not a skill call
        return message.get("content", "")

    # Process the first tool call (in this simplified example)
    tool_call = message["tool_calls"][0]
    skill_name = tool_call["function"]["name"]
    arguments_json = tool_call["function"]["arguments"]

    # Parse the arguments from JSON string to Python dict
    arguments = json.loads(arguments_json)

    # Execute the skill and capture the result
    try:
        result = registry.execute(skill_name, arguments)
        return json.dumps({
            "status": "success",
            "result": result
        })
    except Exception as e:
        return json.dumps({
            "status": "error",
            "error": str(e)
        })

STEP 5: THE RESULT IS FED BACK TO THE MODEL

The skill execution result is added to the conversation history and sent back to the model. The model then generates a natural language response based on the result. This completes one full iteration of the agent loop.

def run_agent_turn(
    user_message: str,
    conversation_history: list[dict],
    registry: AgentSkillRegistry,
    model_client: Any
) -> str:
    """
    Executes one complete turn of the agent loop.

    A "turn" consists of receiving a user message, potentially calling
    one or more skills, and producing a final natural language response.

    Args:
        user_message: The latest message from the user.
        conversation_history: The list of previous messages in the conversation.
        registry: The skill registry containing available skills.
        model_client: The language model API client.

    Returns:
        The agent's final natural language response to the user.
    """
    # Add the user's message to the conversation history
    conversation_history.append({
        "role": "user",
        "content": user_message
    })

    # Keep looping until the model produces a text response (not a skill call)
    while True:
        # Send the conversation to the model along with available skills
        response = model_client.chat.completions.create(
            model="gpt-4o",
            messages=conversation_history,
            tools=registry.get_definitions_for_model(),
            tool_choice="auto"
        )

        choice = response.choices[0]
        message = choice.message

        # Add the assistant's message to history (whether it's a skill call or text)
        conversation_history.append(message.model_dump())

        # If the model produced a text response, we are done
        if choice.finish_reason == "stop":
            return message.content

        # If the model requested skill calls, execute them all
        if choice.finish_reason == "tool_calls":
            for tool_call in message.tool_calls:
                skill_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)

                # Execute the skill
                skill_result = registry.execute(skill_name, arguments)

                # Add the skill result to the conversation history
                # so the model can see what the skill returned
                conversation_history.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(skill_result)
                })
            # Loop back to let the model process the skill results
            # and either call more skills or produce a final response

This loop is the heart of the agent. It keeps running until the model decides it has enough information to give the user a final answer. In a complex task, the model might call five or ten skills before it has everything it needs. Each skill call adds more information to the conversation history, and the model uses all of that accumulated information to reason toward a solution.

CHAPTER EIGHT: BUILDING A COMPLETE SKILL FROM SCRATCH - A WORKED EXAMPLE

Let us now build a complete, production-quality skill from scratch. We will build a skill that searches a company's internal knowledge base and returns relevant articles. This is a realistic, common use case that illustrates all the principles we have discussed.

We start by thinking about what the skill needs to do. It needs to accept a search query, optionally filter by category or date range, search an index of documents, and return a list of relevant results with enough information for the agent to use them.

We also need to think about failure modes. The search service might be unavailable. The query might be empty or too long. The results might be empty. Each of these cases needs to be handled gracefully.

Let us start with the data models:

from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional


@dataclass
class KnowledgeBaseArticle:
    """
    Represents a single article from the company knowledge base.

    This dataclass serves as the canonical data model for knowledge base
    content throughout the skill system. Using a typed dataclass instead
    of a raw dictionary prevents subtle bugs caused by typos in key names.
    """
    article_id: str
    title: str
    summary: str
    content_preview: str  # First 500 characters of the full content
    category: str
    author: str
    published_at: datetime
    last_updated_at: datetime
    url: str
    relevance_score: float  # Between 0.0 (not relevant) and 1.0 (highly relevant)


@dataclass
class KnowledgeBaseSearchResult:
    """
    The complete result of a knowledge base search operation.

    Wrapping results in a dedicated class makes it easy to add metadata
    (like total_count and search_time_ms) without changing the interface
    of the skill itself.
    """
    articles: list[KnowledgeBaseArticle]
    total_count: int
    search_time_ms: float
    query_used: str
    success: bool
    error_message: Optional[str] = None

Now let us implement the actual search function. This is the "tool" part of the skill:

import time
import logging
from typing import Optional

logger = logging.getLogger(__name__)


def search_knowledge_base(
    query: str,
    category: Optional[str] = None,
    max_results: int = 5,
    min_relevance_score: float = 0.3,
    published_after: Optional[str] = None
) -> KnowledgeBaseSearchResult:
    """
    Searches the company knowledge base for articles matching a query.

    This function connects to the internal search service, executes a
    semantic search using the provided query, applies any specified filters,
    and returns a ranked list of matching articles.

    Args:
        query: The search query. Should be a natural language question or
               a set of keywords. Maximum 500 characters. Cannot be empty.
        category: Optional category filter. Must be one of the valid
                  categories: 'hr', 'it', 'finance', 'legal', 'operations',
                  'product'. If None, searches all categories.
        max_results: Maximum number of articles to return. Must be between
                     1 and 20. Defaults to 5.
        min_relevance_score: Minimum relevance score (0.0 to 1.0) for an
                             article to be included in results. Higher values
                             mean stricter matching. Defaults to 0.3.
        published_after: Optional date filter in YYYY-MM-DD format. Only
                         returns articles published on or after this date.

    Returns:
        A KnowledgeBaseSearchResult containing the matching articles and
        metadata about the search operation.
    """
    start_time = time.monotonic()

    # --- Input Validation ---
    # We validate inputs here rather than relying on the caller to get it right.
    # This makes the skill robust even if the agent passes unexpected values.

    if not query or not query.strip():
        return KnowledgeBaseSearchResult(
            articles=[],
            total_count=0,
            search_time_ms=0.0,
            query_used=query,
            success=False,
            error_message=(
                "The search query cannot be empty. Please provide a "
                "meaningful search term or question."
            )
        )

    if len(query) > 500:
        # Truncate rather than reject, to be forgiving of long inputs
        logger.warning(
            "Knowledge base search query truncated from %d to 500 characters.",
            len(query)
        )
        query = query[:500]

    if not 1 <= max_results <= 20:
        logger.warning(
            "max_results value %d is out of range [1, 20]. Clamping to range.",
            max_results
        )
        max_results = max(1, min(20, max_results))

    valid_categories = {"hr", "it", "finance", "legal", "operations", "product"}
    if category is not None and category.lower() not in valid_categories:
        return KnowledgeBaseSearchResult(
            articles=[],
            total_count=0,
            search_time_ms=0.0,
            query_used=query,
            success=False,
            error_message=(
                f"Invalid category '{category}'. Valid categories are: "
                f"{', '.join(sorted(valid_categories))}."
            )
        )

    # Parse the date filter if provided
    date_filter = None
    if published_after is not None:
        try:
            date_filter = datetime.strptime(published_after, "%Y-%m-%d")
        except ValueError:
            return KnowledgeBaseSearchResult(
                articles=[],
                total_count=0,
                search_time_ms=0.0,
                query_used=query,
                success=False,
                error_message=(
                    f"Invalid date format '{published_after}'. "
                    f"Please use YYYY-MM-DD format, e.g., '2024-01-15'."
                )
            )

    # --- Execute the Search ---
    try:
        raw_results = knowledge_base_client.semantic_search(
            query=query.strip(),
            category_filter=category.lower() if category else None,
            limit=max_results * 2,  # Fetch extra to allow for score filtering
            after_date=date_filter
        )

        # Filter by minimum relevance score and take the top max_results
        filtered_results = [
            r for r in raw_results
            if r.score >= min_relevance_score
        ][:max_results]

        # Convert raw results to our typed data model
        articles = [
            KnowledgeBaseArticle(
                article_id=r.id,
                title=r.title,
                summary=r.summary,
                content_preview=r.content[:500] if r.content else "",
                category=r.category,
                author=r.author_name,
                published_at=r.published_at,
                last_updated_at=r.updated_at,
                url=f"https://kb.company.internal/articles/{r.id}",
                relevance_score=round(r.score, 3)
            )
            for r in filtered_results
        ]

        elapsed_ms = (time.monotonic() - start_time) * 1000

        return KnowledgeBaseSearchResult(
            articles=articles,
            total_count=len(articles),
            search_time_ms=round(elapsed_ms, 1),
            query_used=query,
            success=True
        )

    except ConnectionError as e:
        logger.error("Knowledge base search failed due to connection error: %s", e)
        return KnowledgeBaseSearchResult(
            articles=[],
            total_count=0,
            search_time_ms=0.0,
            query_used=query,
            success=False,
            error_message=(
                "Unable to connect to the knowledge base search service. "
                "This is likely a temporary network issue. Please try again "
                "in a few moments."
            )
        )

    except Exception as e:
        logger.error("Unexpected error during knowledge base search: %s", e, exc_info=True)
        return KnowledgeBaseSearchResult(
            articles=[],
            total_count=0,
            search_time_ms=0.0,
            query_used=query,
            success=False,
            error_message=(
                f"An unexpected error occurred during the search. "
                f"Error type: {type(e).__name__}. Please contact support "
                f"if this problem persists."
            )
        )

Now we need to convert this KnowledgeBaseSearchResult into a format that the agent can easily reason about. We do this with a serialization function:

def serialize_search_result(result: KnowledgeBaseSearchResult) -> dict:
    """
    Converts a KnowledgeBaseSearchResult into a plain dictionary suitable
    for returning to the language model.

    The language model works with text and JSON, not Python objects, so
    we need to convert our typed objects into a serializable format.
    Dates are converted to ISO 8601 strings, and floats are rounded to
    avoid unnecessarily long decimal representations.

    Args:
        result: The search result to serialize.

    Returns:
        A plain dictionary that can be JSON-serialized and returned to
        the language model.
    """
    if not result.success:
        return {
            "success": False,
            "error": result.error_message,
            "articles": [],
            "total_count": 0
        }

    serialized_articles = []
    for article in result.articles:
        serialized_articles.append({
            "article_id": article.article_id,
            "title": article.title,
            "summary": article.summary,
            "content_preview": article.content_preview,
            "category": article.category,
            "author": article.author,
            "published_at": article.published_at.strftime("%Y-%m-%d"),
            "last_updated_at": article.last_updated_at.strftime("%Y-%m-%d"),
            "url": article.url,
            "relevance_score": article.relevance_score
        })

    return {
        "success": True,
        "total_count": result.total_count,
        "search_time_ms": result.search_time_ms,
        "query_used": result.query_used,
        "articles": serialized_articles
    }

Finally, we define the complete skill definition that ties everything together:

KNOWLEDGE_BASE_SEARCH_SKILL = {
    "name": "search_knowledge_base",
    "description": (
        "Searches the company's internal knowledge base for articles, "
        "guides, policies, and documentation. Use this skill when the user "
        "asks questions about company policies, procedures, IT systems, HR "
        "processes, financial guidelines, legal requirements, or operational "
        "procedures. This skill performs semantic search, meaning it "
        "understands the meaning of the query, not just keyword matching. "
        "For best results, phrase the query as a natural language question "
        "or use descriptive keywords. Do NOT use this for real-time data "
        "like current stock prices or live system status; use the "
        "appropriate specialized skills for those purposes."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": (
                    "The search query. Phrase this as a natural language "
                    "question or a set of descriptive keywords. Examples: "
                    "'How do I request vacation days?', "
                    "'VPN setup instructions for Windows', "
                    "'expense reimbursement policy'. Maximum 500 characters."
                ),
                "maxLength": 500
            },
            "category": {
                "type": "string",
                "enum": ["hr", "it", "finance", "legal", "operations", "product"],
                "description": (
                    "Optional category to restrict the search. Only specify "
                    "this if you are confident about the category. If unsure, "
                    "omit this parameter to search all categories."
                )
            },
            "max_results": {
                "type": "integer",
                "description": (
                    "Maximum number of articles to return. Use 3-5 for quick "
                    "lookups and up to 20 for comprehensive research. "
                    "Defaults to 5."
                ),
                "default": 5,
                "minimum": 1,
                "maximum": 20
            },
            "published_after": {
                "type": "string",
                "description": (
                    "Optional date filter in YYYY-MM-DD format. Only returns "
                    "articles published on or after this date. Use this when "
                    "the user specifically asks for recent information."
                ),
                "pattern": "^\\d{4}-\\d{2}-\\d{2}$"
            }
        },
        "required": ["query"]
    }
}

To register this skill with an agent, you would do the following:

def create_knowledge_assistant_agent() -> tuple[AgentSkillRegistry, dict]:
    """
    Creates and configures a knowledge assistant agent with the
    knowledge base search skill registered and ready to use.

    Returns:
        A tuple of (registry, system_config) ready to be used with
        the run_agent_turn function.
    """
    registry = AgentSkillRegistry()

    # Create a wrapper function that handles serialization
    def knowledge_base_search_handler(**kwargs) -> dict:
        """Wraps search_knowledge_base with serialization for the agent."""
        result = search_knowledge_base(**kwargs)
        return serialize_search_result(result)

    registry.register(
        skill_definition=KNOWLEDGE_BASE_SEARCH_SKILL,
        handler=knowledge_base_search_handler
    )

    system_config = {
        "role": "system",
        "content": (
            "You are a helpful internal assistant for company employees. "
            "You have access to the company knowledge base and can search "
            "it to answer questions about policies, procedures, and guidelines. "
            "Always cite the article title and URL when you use information "
            "from the knowledge base. If the search returns no results, "
            "acknowledge this honestly and suggest the user contact the "
            "relevant department directly."
        )
    }

    return registry, system_config

This complete example demonstrates how all the pieces fit together: the data models, the implementation function, the serialization layer, the skill definition, and the registration process.

CHAPTER NINE: ADVANCED SKILL PATTERNS

Once you have mastered the basics, there are several advanced patterns that will take your skill design to the next level.

PATTERN 1: SKILL CHAINING AND COMPOSITION

Sometimes a complex task requires a sequence of skills to be called in a specific order. While the agent can figure out this sequence on its own, you can also create higher-level "composite" skills that encapsulate common sequences. This is especially useful when the intermediate steps are implementation details that the agent does not need to reason about.

Consider a skill that "researches and summarizes a topic." Internally, it might call a web search skill, then a content extraction skill, then a summarization skill. But from the agent's perspective, it is just one skill: research_and_summarize.

def research_and_summarize(
    topic: str,
    max_sources: int = 3,
    summary_length: str = "medium"
) -> dict:
    """
    Researches a topic by searching the web, extracting content from
    the top results, and producing a coherent summary.

    This composite skill combines web search, content extraction, and
    summarization into a single operation. Use it when you need a
    comprehensive overview of a topic rather than just a list of links.

    Args:
        topic: The topic to research. Be specific for better results.
        max_sources: Number of web sources to consult (1-5). More sources
                     give a more comprehensive view but take longer.
        summary_length: How long the summary should be. One of:
                        'brief' (2-3 sentences), 'medium' (1-2 paragraphs),
                        'detailed' (3-5 paragraphs).

    Returns:
        A dictionary containing the summary text, the list of sources
        consulted, and metadata about the research process.
    """
    # Step 1: Search the web for relevant pages
    search_results = web_search_tool(query=topic, num_results=max_sources * 2)

    if not search_results:
        return {
            "success": False,
            "error": f"No web results found for topic: '{topic}'"
        }

    # Step 2: Extract content from the top results
    extracted_contents = []
    sources_used = []

    for result in search_results[:max_sources]:
        content = extract_webpage_content(url=result["url"])
        if content and len(content) > 100:  # Skip nearly-empty pages
            extracted_contents.append(content[:3000])  # Limit per-source length
            sources_used.append({
                "title": result["title"],
                "url": result["url"]
            })

    if not extracted_contents:
        return {
            "success": False,
            "error": "Could not extract readable content from any search results."
        }

    # Step 3: Summarize the combined content
    combined_content = "\n\n---\n\n".join(extracted_contents)
    summary = summarize_text(
        text=combined_content,
        topic=topic,
        length=summary_length
    )

    return {
        "success": True,
        "topic": topic,
        "summary": summary,
        "sources": sources_used,
        "sources_consulted": len(sources_used)
    }

PATTERN 2: PARAMETERIZED SKILL FACTORIES

Sometimes you need to create many similar skills that differ only in their configuration. For example, you might have ten different databases, each of which needs a query skill. Instead of writing ten nearly-identical skill definitions, you can use a factory function to generate them:

def create_database_query_skill(
    database_name: str,
    database_description: str,
    allowed_tables: list[str],
    connection_string: str
) -> tuple[dict, callable]:
    """
    Factory function that creates a database query skill for a specific database.

    This factory pattern allows you to create multiple similar skills without
    duplicating code. Each generated skill is fully independent and can be
    registered separately with different agents.

    Args:
        database_name: A short identifier for the database (e.g., 'sales_db').
        database_description: A human-readable description of what data this
                              database contains, used in the skill description.
        allowed_tables: List of table names the agent is permitted to query.
        connection_string: The database connection string.

    Returns:
        A tuple of (skill_definition, handler_function) ready for registration.
    """
    skill_definition = {
        "name": f"query_{database_name}",
        "description": (
            f"Executes a read-only SQL query against the {database_name} database. "
            f"{database_description} "
            f"Only SELECT queries are permitted. "
            f"Available tables: {', '.join(allowed_tables)}. "
            f"Returns results as a list of row dictionaries."
        ),
        "parameters": {
            "type": "object",
            "properties": {
                "sql_query": {
                    "type": "string",
                    "description": (
                        f"A valid SQL SELECT query for the {database_name} database. "
                        f"Must start with SELECT. Must only reference these tables: "
                        f"{', '.join(allowed_tables)}. "
                        f"Always include a LIMIT clause to avoid returning too many rows."
                    )
                },
                "max_rows": {
                    "type": "integer",
                    "description": "Maximum number of rows to return. Defaults to 50.",
                    "default": 50,
                    "minimum": 1,
                    "maximum": 1000
                }
            },
            "required": ["sql_query"]
        }
    }

    def handler(sql_query: str, max_rows: int = 50) -> dict:
        """Executes the SQL query against the configured database."""
        # Security check: only allow SELECT statements
        normalized_query = sql_query.strip().upper()
        if not normalized_query.startswith("SELECT"):
            return {
                "success": False,
                "error": "Only SELECT queries are permitted. Modification queries are not allowed."
            }

        # Security check: only allow access to permitted tables
        for table in allowed_tables:
            pass  # In production, implement proper SQL parsing here

        try:
            with database_connection(connection_string) as conn:
                cursor = conn.cursor()
                cursor.execute(sql_query)
                columns = [desc[0] for desc in cursor.description]
                rows = cursor.fetchmany(max_rows)
                return {
                    "success": True,
                    "columns": columns,
                    "rows": [dict(zip(columns, row)) for row in rows],
                    "row_count": len(rows)
                }
        except Exception as e:
            return {
                "success": False,
                "error": f"Query execution failed: {str(e)}"
            }

    return skill_definition, handler

You can then use this factory to create skills for multiple databases:

# Create skills for two different databases using the same factory
sales_skill_def, sales_handler = create_database_query_skill(
    database_name="sales_db",
    database_description=(
        "Contains sales transactions, customer records, and revenue data "
        "for the current and previous fiscal years."
    ),
    allowed_tables=["transactions", "customers", "products", "regions"],
    connection_string="postgresql://sales-db.internal:5432/sales"
)

hr_skill_def, hr_handler = create_database_query_skill(
    database_name="hr_db",
    database_description=(
        "Contains employee records, department structures, and payroll data. "
        "Access is restricted to non-sensitive fields only."
    ),
    allowed_tables=["employees", "departments", "positions"],
    connection_string="postgresql://hr-db.internal:5432/hr"
)

# Register both skills with the agent
registry.register(sales_skill_def, sales_handler)
registry.register(hr_skill_def, hr_handler)

PATTERN 3: SKILLS WITH CONFIRMATION STEPS

For action skills that have significant real-world consequences (sending emails, deleting records, making purchases), it is often wise to implement a two-step confirmation pattern. The first step describes what will be done and asks for confirmation. The second step actually does it.

import uuid
from datetime import datetime, timedelta


# In-memory store for pending actions (in production, use a persistent store)
_pending_actions: dict[str, dict] = {}


def prepare_bulk_email(
    recipient_list_name: str,
    subject: str,
    body: str
) -> dict:
    """
    Prepares a bulk email for sending but does NOT send it yet.

    This is the first step of a two-step email sending process. Call this
    skill to prepare the email and get a confirmation token. Then call
    confirm_and_send_bulk_email with the token to actually send it.
    This pattern prevents accidental mass emails.

    Args:
        recipient_list_name: The name of the mailing list to send to.
        subject: The email subject line.
        body: The email body text.

    Returns:
        A dictionary containing a confirmation_token, a summary of what
        will be sent, and the number of recipients.
    """
    recipient_count = mailing_list_service.get_count(recipient_list_name)

    if recipient_count == 0:
        return {
            "success": False,
            "error": f"No recipients found in list '{recipient_list_name}'."
        }

    # Generate a unique token for this pending action
    confirmation_token = str(uuid.uuid4())
    expires_at = datetime.utcnow() + timedelta(minutes=5)

    # Store the pending action
    _pending_actions[confirmation_token] = {
        "type": "bulk_email",
        "recipient_list_name": recipient_list_name,
        "recipient_count": recipient_count,
        "subject": subject,
        "body": body,
        "expires_at": expires_at
    }

    return {
        "success": True,
        "confirmation_token": confirmation_token,
        "summary": (
            f"Ready to send email to {recipient_count} recipients "
            f"in list '{recipient_list_name}' with subject '{subject}'. "
            f"This token expires in 5 minutes."
        ),
        "recipient_count": recipient_count,
        "expires_at": expires_at.isoformat()
    }


def confirm_and_send_bulk_email(confirmation_token: str) -> dict:
    """
    Executes a bulk email send that was previously prepared.

    This is the second step of the two-step email sending process. You
    must first call prepare_bulk_email to get a confirmation_token, then
    call this skill with that token to actually send the emails. Tokens
    expire after 5 minutes.

    Args:
        confirmation_token: The token returned by prepare_bulk_email.

    Returns:
        A dictionary indicating success or failure of the send operation.
    """
    pending = _pending_actions.get(confirmation_token)

    if pending is None:
        return {
            "success": False,
            "error": (
                "Invalid or expired confirmation token. "
                "Please call prepare_bulk_email again to get a new token."
            )
        }

    if datetime.utcnow() > pending["expires_at"]:
        del _pending_actions[confirmation_token]
        return {
            "success": False,
            "error": "This confirmation token has expired. Please prepare the email again."
        }

    # Execute the actual send
    try:
        result = email_service.send_bulk(
            list_name=pending["recipient_list_name"],
            subject=pending["subject"],
            body=pending["body"]
        )
        del _pending_actions[confirmation_token]
        return {
            "success": True,
            "emails_sent": result.sent_count,
            "message": f"Successfully sent {result.sent_count} emails."
        }
    except Exception as e:
        return {
            "success": False,
            "error": f"Send operation failed: {str(e)}"
        }

This two-step pattern is a powerful safety mechanism. The agent will naturally describe the pending action to the user and ask for confirmation before calling the second skill, because the first skill's response explicitly says the email has not been sent yet and provides a summary of what will happen.

CHAPTER TEN: BEST PRACTICES AND COMMON PITFALLS

We have covered a lot of ground. Now let us consolidate the most important lessons into a set of best practices, and let us also look at the pitfalls that catch even experienced developers off guard.

BEST PRACTICE 1: KEEP SKILLS SMALL AND FOCUSED

The single most important principle in skill design is that each skill should do exactly one thing. When you find yourself writing a skill that has a long list of optional parameters that enable completely different behaviors depending on which ones are set, that is a sign you need to split it into multiple skills.

A skill called manage_calendar that can create, read, update, and delete events based on an "operation" parameter is a bad design. Four separate skills (create_calendar_event, get_calendar_events, update_calendar_event, delete_calendar_event) are a much better design. The agent can reason about four specific capabilities far more reliably than about one vague, multi-mode capability.

BEST PRACTICE 2: MAKE SKILL DESCRIPTIONS ADVERSARIALLY CLEAR

When writing skill descriptions, imagine that the agent is going to try to use the skill in every wrong way possible. Your description should make it impossible to misuse the skill without realizing it. Explicitly state what the skill does NOT do. Explicitly state what the valid input ranges are. Explicitly state what happens when the skill fails.

BEST PRACTICE 3: LOG EVERY SKILL INVOCATION

Every time a skill is called, log the skill name, the input arguments, the output, and the execution time. This logging is invaluable for debugging agent behavior, auditing what the agent has done, and identifying skills that are being called incorrectly or too frequently.

import logging
import time
import json
from functools import wraps
from typing import Any, Callable

logger = logging.getLogger("skill_audit")


def skill_audit_wrapper(skill_name: str, handler: Callable) -> Callable:
    """
    Wraps a skill handler with comprehensive audit logging.

    This decorator logs every skill invocation with its inputs, outputs,
    execution time, and success/failure status. All logs are written to
    the 'skill_audit' logger, which should be configured to write to a
    persistent, searchable log store in production.

    Args:
        skill_name: The name of the skill being wrapped.
        handler: The original skill handler function.

    Returns:
        A wrapped version of the handler with audit logging.
    """
    @wraps(handler)
    def wrapper(**kwargs) -> Any:
        start_time = time.monotonic()
        invocation_id = str(uuid.uuid4())[:8]  # Short ID for log correlation

        logger.info(
            "SKILL_CALL | id=%s | skill=%s | args=%s",
            invocation_id,
            skill_name,
            json.dumps(kwargs, default=str)
        )

        try:
            result = handler(**kwargs)
            elapsed_ms = (time.monotonic() - start_time) * 1000

            logger.info(
                "SKILL_SUCCESS | id=%s | skill=%s | elapsed_ms=%.1f | result_type=%s",
                invocation_id,
                skill_name,
                elapsed_ms,
                type(result).__name__
            )
            return result

        except Exception as e:
            elapsed_ms = (time.monotonic() - start_time) * 1000
            logger.error(
                "SKILL_ERROR | id=%s | skill=%s | elapsed_ms=%.1f | error=%s: %s",
                invocation_id,
                skill_name,
                elapsed_ms,
                type(e).__name__,
                str(e),
                exc_info=True
            )
            raise

    return wrapper

BEST PRACTICE 4: IMPLEMENT RATE LIMITING AND TIMEOUTS

Skills that call external APIs or services should always have timeouts and rate limiting. An agent that gets stuck waiting for a skill that never returns will appear frozen to the user and will eventually hit the language model's context limit.

import functools
import signal
from typing import TypeVar, Callable

T = TypeVar('T')


def with_timeout(seconds: int):
    """
    Decorator that adds a timeout to a skill handler.

    If the skill does not complete within the specified number of seconds,
    it raises a TimeoutError. This prevents the agent from getting stuck
    waiting indefinitely for a slow or unresponsive external service.

    Args:
        seconds: Maximum number of seconds to wait for the skill to complete.
    """
    def decorator(func: Callable[..., T]) -> Callable[..., T]:
        @functools.wraps(func)
        def wrapper(*args, **kwargs) -> T:
            def timeout_handler(signum, frame):
                raise TimeoutError(
                    f"Skill '{func.__name__}' timed out after {seconds} seconds. "
                    f"The external service may be slow or unavailable."
                )

            # Set the timeout alarm (Unix only; use threading.Timer on Windows)
            old_handler = signal.signal(signal.SIGALRM, timeout_handler)
            signal.alarm(seconds)

            try:
                result = func(*args, **kwargs)
                return result
            finally:
                # Always cancel the alarm, even if an exception occurred
                signal.alarm(0)
                signal.signal(signal.SIGALRM, old_handler)

        return wrapper
    return decorator


# Usage example: apply a 10-second timeout to a web search skill
@with_timeout(seconds=10)
def web_search(query: str, num_results: int = 5) -> list[dict]:
    """Searches the web and returns a list of result dictionaries."""
    return search_api_client.search(query=query, limit=num_results)

PITFALL 1: THE AMBIGUOUS SKILL PROBLEM

One of the most common mistakes is creating skills whose descriptions overlap so much that the agent cannot reliably choose between them. If you have a skill called get_user_info and another called fetch_user_profile, and both descriptions say roughly the same thing, the agent will sometimes call the wrong one.

The solution is to make each skill's description explicitly differentiate it from similar skills. If you have two similar skills, mention the other one in each description and explain when to use each.

PITFALL 2: THE MISSING CONTEXT PROBLEM

Skills often need context that the agent has but that is not in the skill's parameters. For example, a skill that creates a calendar event might need to know the user's timezone, but you do not want to require the agent to pass the timezone every time. The solution is to inject context at the handler level, not the skill definition level.

class ContextualSkillHandler:
    """
    A skill handler that has access to user context without requiring
    the agent to pass that context as explicit parameters.

    This pattern is useful for context that is stable within a session
    (like user timezone or language preference) but would be tedious and
    error-prone to pass as an explicit parameter on every skill call.
    """

    def __init__(self, user_context: dict):
        """
        Args:
            user_context: A dictionary of user-specific context values,
                          such as timezone, language, and preferences.
        """
        self._context = user_context

    def create_calendar_event(
        self,
        title: str,
        date: str,
        time: str,
        duration_minutes: int
    ) -> dict:
        """
        Creates a calendar event, automatically applying the user's timezone.

        The timezone is taken from the user context rather than being
        passed as a parameter, which simplifies the skill interface and
        prevents the agent from accidentally using the wrong timezone.
        """
        user_timezone = self._context.get("timezone", "UTC")

        # Convert the provided local time to UTC using the user's timezone
        local_datetime = parse_local_datetime(date, time, user_timezone)
        utc_datetime = convert_to_utc(local_datetime, user_timezone)

        return calendar_service.create_event(
            title=title,
            start_utc=utc_datetime,
            duration_minutes=duration_minutes,
            owner_user_id=self._context["user_id"]
        )

PITFALL 3: THE SIDE EFFECT SURPRISE

Action skills that modify state can cause serious problems if the agent calls them multiple times due to a misunderstanding or a retry loop. Always design action skills to be idempotent where possible, meaning calling them multiple times with the same arguments produces the same result as calling them once.

For skills that cannot be made idempotent (like sending an email), implement deduplication using a client-provided idempotency key:

def send_notification(
    user_id: str,
    message: str,
    notification_type: str,
    idempotency_key: str
) -> dict:
    """
    Sends a notification to a user.

    The idempotency_key parameter prevents duplicate notifications if this
    skill is called multiple times with the same key. The agent should
    generate a unique key for each distinct notification it intends to send,
    and reuse the same key if retrying a failed send.

    Args:
        user_id: The ID of the user to notify.
        message: The notification message text.
        notification_type: One of 'email', 'sms', 'push'.
        idempotency_key: A unique string identifying this specific notification
                         intent. If a notification with this key was already
                         sent successfully, this call is a no-op.

    Returns:
        A dictionary indicating whether the notification was sent or was
        already delivered (idempotent duplicate).
    """
    # Check if we already processed this idempotency key
    existing = notification_store.find_by_idempotency_key(idempotency_key)
    if existing is not None:
        return {
            "success": True,
            "was_duplicate": True,
            "notification_id": existing.id,
            "message": "Notification was already sent; this is a duplicate request."
        }

    # Send the notification and store the idempotency key
    notification_id = notification_service.send(
        user_id=user_id,
        message=message,
        type=notification_type
    )
    notification_store.record(
        idempotency_key=idempotency_key,
        notification_id=notification_id
    )

    return {
        "success": True,
        "was_duplicate": False,
        "notification_id": notification_id,
        "message": "Notification sent successfully."
    }

PITFALL 4: THE SKILL OVERLOAD PROBLEM

Giving an agent too many skills is a real problem. Research and practical experience show that language models become less reliable at skill selection when they have more than about 20-30 skills available at once. The model's attention is spread thin, and it becomes more likely to choose the wrong skill or to miss a relevant one.

The solution is to use dynamic skill loading: instead of registering all possible skills at startup, load only the skills that are relevant to the current task. This can be done with a skill router that selects the appropriate subset of skills based on the user's request:

class DynamicSkillRouter:
    """
    Manages a large library of skills and dynamically selects the most
    relevant subset for each agent interaction.

    This router solves the "skill overload" problem by ensuring the agent
    never sees more than a manageable number of skills at once, while still
    having access to the full library when needed.
    """

    def __init__(self, max_skills_per_request: int = 15):
        """
        Args:
            max_skills_per_request: Maximum number of skills to provide to
                                    the agent for any single request. Lower
                                    values improve reliability but may miss
                                    relevant skills for complex tasks.
        """
        self._all_skills: list[dict] = []
        self._max_skills = max_skills_per_request

    def register_skill(self, skill_definition: dict, handler: callable) -> None:
        """Adds a skill to the full library."""
        self._all_skills.append({
            "definition": skill_definition,
            "handler": handler,
            # Extract keywords from the description for routing
            "keywords": self._extract_keywords(skill_definition["description"])
        })

    def get_relevant_skills(self, user_message: str) -> list[dict]:
        """
        Returns the most relevant skills for a given user message.

        Uses keyword matching and semantic similarity to select the skills
        most likely to be needed for the current request.

        Args:
            user_message: The user's current message or request.

        Returns:
            A list of skill definitions (without handlers) for the most
            relevant skills, up to max_skills_per_request.
        """
        message_lower = user_message.lower()

        # Score each skill by keyword overlap with the user message
        scored_skills = []
        for skill in self._all_skills:
            score = sum(
                1 for keyword in skill["keywords"]
                if keyword in message_lower
            )
            scored_skills.append((score, skill))

        # Sort by score descending, take the top N
        scored_skills.sort(key=lambda x: x[0], reverse=True)
        top_skills = [s["definition"] for _, s in scored_skills[:self._max_skills]]

        return top_skills

    def _extract_keywords(self, description: str) -> list[str]:
        """Extracts meaningful keywords from a skill description."""
        # In production, use a proper NLP keyword extractor
        # This is a simplified version for illustration
        stop_words = {"the", "a", "an", "is", "are", "this", "that", "use", "when"}
        words = description.lower().split()
        return [w for w in words if len(w) > 3 and w not in stop_words]

PITFALL 5: THE HALLUCINATED PARAMETER PROBLEM

Language models sometimes "hallucinate" parameter values, meaning they make up plausible-sounding values that are actually wrong. For example, if you ask an agent to look up a user by name and the agent cannot find the user's ID, it might invent a plausible-looking user ID like "usr_12345" and pass it to your skill. Your skill then fails with a "user not found" error.

The solution is to validate all inputs strictly and return clear, informative error messages when validation fails. Also, design your skill workflows so that the agent retrieves IDs and other opaque identifiers from skills rather than constructing them from scratch.

Instead of: "Look up user John Smith" -> agent invents user_id "usr_john_smith" -> skill fails.

Do this: "Look up user John Smith" -> agent calls find_user_by_name("John Smith") -> gets back user_id "usr_98765" -> agent calls get_user_info("usr_98765") -> success.

CHAPTER ELEVEN: TESTING YOUR SKILLS

Skills are code, and code needs tests. Testing skills for agent systems has some unique challenges compared to testing regular functions, because you need to test not just the function's behavior but also whether the skill definition is clear enough for the agent to use correctly.

There are three levels of testing for skills.

The first level is unit testing, which tests the skill's implementation function in isolation. This is standard software testing: you call the function with various inputs and verify the outputs. Pay special attention to edge cases and error conditions.

import pytest
from unittest.mock import patch, MagicMock
from datetime import datetime


class TestSearchKnowledgeBase:
    """
    Unit tests for the search_knowledge_base skill implementation.

    These tests verify the skill's behavior in isolation, using mocks
    to replace the actual knowledge base client. This allows us to test
    all code paths without requiring a real database connection.
    """

    def test_empty_query_returns_error(self):
        """An empty query should return a failure result, not raise an exception."""
        result = search_knowledge_base(query="")
        assert result.success is False
        assert result.error_message is not None
        assert "empty" in result.error_message.lower()

    def test_whitespace_only_query_returns_error(self):
        """A query containing only whitespace should be treated as empty."""
        result = search_knowledge_base(query="   ")
        assert result.success is False

    def test_invalid_category_returns_error(self):
        """An unrecognized category name should return a clear error."""
        result = search_knowledge_base(query="vacation policy", category="nonexistent")
        assert result.success is False
        assert "invalid category" in result.error_message.lower()

    def test_invalid_date_format_returns_error(self):
        """A date in the wrong format should return a clear error."""
        result = search_knowledge_base(
            query="recent updates",
            published_after="15-01-2024"  # Wrong format; should be YYYY-MM-DD
        )
        assert result.success is False
        assert "date format" in result.error_message.lower()

    @patch("knowledge_base_client.semantic_search")
    def test_successful_search_returns_articles(self, mock_search):
        """A successful search should return a list of articles."""
        # Arrange: set up the mock to return two fake articles
        mock_search.return_value = [
            MagicMock(
                id="art_001",
                title="Vacation Policy 2024",
                summary="Our vacation policy allows 25 days per year.",
                content="Full content here...",
                category="hr",
                author_name="HR Team",
                published_at=datetime(2024, 1, 15),
                updated_at=datetime(2024, 3, 1),
                score=0.92
            )
        ]

        # Act: call the skill with a valid query
        result = search_knowledge_base(query="vacation days policy")

        # Assert: verify the result structure and content
        assert result.success is True
        assert result.total_count == 1
        assert result.articles[0].title == "Vacation Policy 2024"
        assert result.articles[0].relevance_score == 0.92
        assert result.articles[0].article_id == "art_001"

    @patch("knowledge_base_client.semantic_search")
    def test_connection_error_returns_graceful_failure(self, mock_search):
        """A network error should return a helpful error message, not crash."""
        mock_search.side_effect = ConnectionError("Network unreachable")

        result = search_knowledge_base(query="any query")

        assert result.success is False
        assert "connect" in result.error_message.lower()
        assert result.articles == []

The second level is integration testing, which tests the skill with real external dependencies (or realistic test doubles). This verifies that the skill works correctly in the actual environment it will be deployed in.

The third level is agent-level testing, which tests whether the agent correctly selects and uses the skill in response to natural language inputs. This is the most important and most often neglected level of testing.

class TestAgentSkillSelection:
    """
    Agent-level tests that verify the agent correctly selects and uses
    the knowledge base search skill in response to user messages.

    These tests use a real language model (or a test model) to verify
    that the skill descriptions are clear enough for the agent to use
    the skills correctly.
    """

    def setup_method(self):
        """Set up a fresh agent with the knowledge base skill for each test."""
        self.registry, self.system_config = create_knowledge_assistant_agent()
        self.conversation_history = [self.system_config]

    def test_agent_calls_search_for_policy_question(self):
        """
        When asked about a company policy, the agent should call the
        knowledge base search skill rather than answering from memory.
        """
        skill_calls = []

        # Intercept skill calls to verify the agent uses the right skill
        original_execute = self.registry.execute
        def tracking_execute(skill_name, arguments):
            skill_calls.append({"skill": skill_name, "args": arguments})
            return {"success": True, "articles": [], "total_count": 0}
        self.registry.execute = tracking_execute

        # Run the agent with a policy question
        run_agent_turn(
            user_message="What is the company's policy on remote work?",
            conversation_history=self.conversation_history,
            registry=self.registry,
            model_client=test_model_client
        )

        # Verify the agent called the knowledge base search skill
        assert len(skill_calls) >= 1
        assert skill_calls[0]["skill"] == "search_knowledge_base"
        assert "remote work" in skill_calls[0]["args"]["query"].lower()

CHAPTER TWELVE: PUTTING IT ALL TOGETHER - A COMPLETE MINI-AGENT

Let us close the tutorial with a complete, working mini-agent that demonstrates everything we have learned. This agent is a simple "company assistant" that can answer questions about company policies, check the weather, and send notifications. It is small enough to understand completely but realistic enough to illustrate real-world patterns.

"""
company_assistant.py

A complete mini-agent demonstrating skills in Agentic AI.

This module implements a simple company assistant agent with three skills:
  1. search_knowledge_base - searches internal documentation
  2. get_current_weather   - fetches weather for a city
  3. send_notification     - sends a notification to a user

The agent uses the OpenAI API with function calling to select and invoke
skills based on user requests.

Usage:
    python company_assistant.py
"""

import json
import logging
import uuid
from dataclasses import dataclass
from typing import Any, Optional

# Configure logging so we can see what the agent is doing
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(name)s | %(levelname)s | %(message)s"
)
logger = logging.getLogger("company_assistant")


# =============================================================================
# SKILL IMPLEMENTATIONS (the "tools")
# =============================================================================

def search_knowledge_base_impl(query: str, category: Optional[str] = None) -> dict:
    """
    Mock implementation of knowledge base search.
    In production, this would connect to a real search service.
    """
    logger.info("Searching knowledge base for: '%s' (category: %s)", query, category)

    # Simulated knowledge base entries for demonstration
    mock_articles = [
        {
            "title": "Remote Work Policy",
            "summary": "Employees may work remotely up to 3 days per week with manager approval.",
            "category": "hr",
            "url": "https://kb.company.internal/articles/remote-work-policy"
        },
        {
            "title": "Expense Reimbursement Guidelines",
            "summary": "Expenses up to 50 EUR can be reimbursed without a receipt. Above that, a receipt is required.",
            "category": "finance",
            "url": "https://kb.company.internal/articles/expense-reimbursement"
        },
        {
            "title": "VPN Setup Guide",
            "summary": "Download the company VPN client from the IT portal and use your SSO credentials.",
            "category": "it",
            "url": "https://kb.company.internal/articles/vpn-setup"
        }
    ]

    # Simple keyword matching for the mock
    query_lower = query.lower()
    matching = [
        article for article in mock_articles
        if any(word in article["title"].lower() or word in article["summary"].lower()
               for word in query_lower.split())
        and (category is None or article["category"] == category)
    ]

    return {
        "success": True,
        "total_count": len(matching),
        "articles": matching
    }


def get_current_weather_impl(city: str, units: str = "metric") -> dict:
    """
    Mock implementation of weather fetching.
    In production, this would call a real weather API.
    """
    logger.info("Fetching weather for: '%s' (units: %s)", city, units)

    # Simulated weather data
    mock_weather = {
        "berlin": {"temperature": 18, "conditions": "Partly cloudy", "humidity": 65},
        "new york": {"temperature": 24, "conditions": "Sunny", "humidity": 45},
        "london": {"temperature": 14, "conditions": "Rainy", "humidity": 80}
    }

    city_lower = city.lower()
    weather = mock_weather.get(city_lower)

    if weather is None:
        return {
            "success": False,
            "error": f"Weather data not available for '{city}'. Try a major city name."
        }

    temp = weather["temperature"]
    unit_symbol = "C" if units == "metric" else "F"
    if units == "imperial":
        temp = round(temp * 9 / 5 + 32)

    return {
        "success": True,
        "city": city,
        "temperature": temp,
        "unit": unit_symbol,
        "conditions": weather["conditions"],
        "humidity_percent": weather["humidity"]
    }


def send_notification_impl(
    user_id: str,
    message: str,
    notification_type: str = "push"
) -> dict:
    """
    Mock implementation of notification sending.
    In production, this would call a real notification service.
    """
    logger.info(
        "Sending %s notification to user '%s': %s",
        notification_type, user_id, message[:50]
    )

    notification_id = f"notif_{uuid.uuid4().hex[:8]}"

    return {
        "success": True,
        "notification_id": notification_id,
        "message": f"Notification sent successfully to user '{user_id}'."
    }


# =============================================================================
# SKILL DEFINITIONS (the agent-facing metadata)
# =============================================================================

SKILLS = [
    {
        "type": "function",
        "function": {
            "name": "search_knowledge_base",
            "description": (
                "Searches the company internal knowledge base for articles about "
                "policies, procedures, IT guides, and operational information. "
                "Use this when the user asks about company rules, how to do something "
                "at the company, or needs information from internal documentation. "
                "Do NOT use this for real-time information like weather or stock prices."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": (
                            "A natural language search query. Examples: "
                            "'remote work policy', 'how to set up VPN', "
                            "'expense reimbursement rules'."
                        )
                    },
                    "category": {
                        "type": "string",
                        "enum": ["hr", "it", "finance", "legal", "operations"],
                        "description": (
                            "Optional category filter. Only specify if you are "
                            "confident about the category."
                        )
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": (
                "Retrieves current weather conditions for a specified city. "
                "Use this when the user asks about weather, temperature, or "
                "atmospheric conditions in a location. For forecasts, this skill "
                "is not appropriate; it only provides current conditions."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The name of the city, e.g. 'Berlin' or 'Tokyo'."
                    },
                    "units": {
                        "type": "string",
                        "enum": ["metric", "imperial"],
                        "description": "Temperature units. Use 'metric' for Celsius.",
                        "default": "metric"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_notification",
            "description": (
                "Sends a push notification, email, or SMS to a specific user. "
                "Use this when the user explicitly asks to notify or alert someone. "
                "Always confirm the recipient and message content before calling this skill."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {
                        "type": "string",
                        "description": "The unique identifier of the user to notify."
                    },
                    "message": {
                        "type": "string",
                        "description": "The notification message text. Keep under 200 characters."
                    },
                    "notification_type": {
                        "type": "string",
                        "enum": ["push", "email", "sms"],
                        "description": "The delivery channel for the notification.",
                        "default": "push"
                    }
                },
                "required": ["user_id", "message"]
            }
        }
    }
]


# =============================================================================
# SKILL DISPATCHER
# =============================================================================

def dispatch_skill(skill_name: str, arguments: dict) -> dict:
    """
    Routes a skill call to the appropriate implementation function.

    This dispatcher acts as the central hub between the agent's skill
    selection decisions and the actual implementation functions. Adding
    a new skill requires only adding an entry here and defining the
    implementation function above.

    Args:
        skill_name: The name of the skill to execute.
        arguments: The arguments to pass to the skill implementation.

    Returns:
        The result dictionary from the skill implementation.
    """
    skill_map = {
        "search_knowledge_base": search_knowledge_base_impl,
        "get_current_weather": get_current_weather_impl,
        "send_notification": send_notification_impl
    }

    if skill_name not in skill_map:
        return {
            "success": False,
            "error": f"Unknown skill: '{skill_name}'. Available: {list(skill_map.keys())}"
        }

    handler = skill_map[skill_name]
    return handler(**arguments)


# =============================================================================
# AGENT LOOP
# =============================================================================

def run_company_assistant(user_message: str, model_client: Any) -> str:
    """
    Runs the company assistant agent for a single user interaction.

    This function implements the complete agent loop: it sends the user's
    message to the language model, handles any skill calls the model requests,
    and returns the final natural language response.

    Args:
        user_message: The user's input message.
        model_client: An initialized OpenAI API client.

    Returns:
        The agent's final response as a plain text string.
    """
    conversation = [
        {
            "role": "system",
            "content": (
                "You are a helpful company assistant. You can search the company "
                "knowledge base, check the weather, and send notifications. "
                "Always be concise and cite your sources when using the knowledge base. "
                "If you cannot find relevant information, say so honestly."
            )
        },
        {
            "role": "user",
            "content": user_message
        }
    ]

    logger.info("Starting agent turn for message: '%s'", user_message[:80])

    # The agent loop: keep going until the model produces a text response
    max_iterations = 10  # Safety limit to prevent infinite loops
    for iteration in range(max_iterations):
        logger.info("Agent loop iteration %d", iteration + 1)

        response = model_client.chat.completions.create(
            model="gpt-4o",
            messages=conversation,
            tools=SKILLS,
            tool_choice="auto"
        )

        choice = response.choices[0]
        message = choice.message

        # Add the assistant's message to the conversation history
        conversation.append(message.model_dump(exclude_none=True))

        # If the model produced a final text response, we are done
        if choice.finish_reason == "stop":
            logger.info("Agent produced final response after %d iterations.", iteration + 1)
            return message.content

        # If the model requested skill calls, execute them all
        if choice.finish_reason == "tool_calls":
            for tool_call in message.tool_calls:
                skill_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)

                logger.info(
                    "Executing skill: '%s' with args: %s",
                    skill_name,
                    json.dumps(arguments)
                )

                skill_result = dispatch_skill(skill_name, arguments)

                logger.info(
                    "Skill '%s' returned: success=%s",
                    skill_name,
                    skill_result.get("success", "unknown")
                )

                # Add the skill result to the conversation
                conversation.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(skill_result)
                })

    # If we hit the iteration limit, return a graceful error
    logger.warning("Agent hit maximum iteration limit (%d).", max_iterations)
    return (
        "I was unable to complete your request within the allowed number of steps. "
        "Please try rephrasing your question or breaking it into smaller parts."
    )


# =============================================================================
# MAIN ENTRY POINT
# =============================================================================

if __name__ == "__main__":
    from openai import OpenAI

    client = OpenAI()  # Uses OPENAI_API_KEY environment variable

    print("Company Assistant ready. Type 'quit' to exit.")
    print("-" * 50)

    while True:
        user_input = input("You: ").strip()
        if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break
        if not user_input:
            continue

        response = run_company_assistant(user_input, client)
        print(f"Assistant: {response}")
        print()

This complete example is a working agent. With a valid OpenAI API key, you can run it and interact with it. It will correctly call the knowledge base search skill when you ask about company policies, the weather skill when you ask about the weather, and the notification skill when you ask it to notify someone. The logging will show you exactly which skills are being called and what they return.

CONCLUSION: THE BIGGER PICTURE

We have covered a tremendous amount of ground in this tutorial. Let us step back and look at the bigger picture.

Skills are the fundamental unit of capability in Agentic AI. They are the bridge between an agent's reasoning ability and the real world. Without well-designed skills, even the most powerful language model is trapped in a box, unable to take meaningful action. With well-designed skills, an agent becomes a genuine force multiplier: it can automate complex workflows, integrate disparate systems, and accomplish goals that would take a human hours of tedious work.

The key insights from this tutorial can be summarized as follows. First, a skill is not just a function; it is a function plus a rich description that teaches the agent when and how to use it. Second, the description field is arguably more important than the implementation, because it is what the agent uses to decide whether to call the skill at all. Third, skills should be small, focused, and self-contained, following the single-responsibility principle. Fourth, error handling is not optional; every skill must handle its failure modes gracefully and return informative error messages. Fifth, the agent loop is simple but powerful: observe, reason, call skills, observe results, repeat until done. Sixth, testing skills requires testing at three levels: unit tests for the implementation, integration tests for the real-world connections, and agent-level tests for the skill descriptions.

The field of Agentic AI is evolving rapidly. New frameworks, new models, and new patterns are emerging constantly. But the fundamental concept of skills, of giving agents well-defined, well-described capabilities to interact with the world, is stable and foundational. Everything else is built on top of it.

You now have the knowledge to build real agent systems. Start small: pick one task you want to automate, define two or three skills for it, and build the simplest possible agent that uses them. Then iterate. Add more skills. Handle more edge cases. Test more thoroughly. The path from "hello world" to production-grade agent systems is long, but you now know how to walk it.

Good luck, and build something remarkable.

APPENDIX: QUICK REFERENCE CARD

SKILL DEFINITION CHECKLIST

[ ] Name is a clear verb phrase (e.g., get_user_profile, not user)
[ ] Description explains WHAT the skill does
[ ] Description explains WHEN to use it
[ ] Description explains WHEN NOT to use it
[ ] All parameters have types and descriptions
[ ] Required parameters are marked as required
[ ] Optional parameters have sensible defaults
[ ] Enum constraints are used where applicable
[ ] Return value structure is documented
[ ] Error cases are documented

SKILL IMPLEMENTATION CHECKLIST

[ ] All inputs are validated before use
[ ] All external calls are wrapped in try/except
[ ] Error returns are informative and actionable
[ ] Return value is a plain dict (JSON-serializable)
[ ] Execution time is reasonable (add timeouts if needed)
[ ] Side effects are idempotent or use idempotency keys
[ ] Logging is in place for all invocations
[ ] Unit tests cover happy path and error cases

AGENT LOOP CHECKLIST

[ ] Maximum iteration limit is set
[ ] All skill calls are logged
[ ] Skill results are added to conversation history
[ ] Final response handles the case where the limit is hit
[ ] System prompt sets the agent's role and constraints
[ ] Skill count per request is kept manageable (under 20)

No comments: