Hitchhiker's Guide to AI, Software Architecture, and Everything Else: Implementing the Smallest Possible LLM-based Agentic AI System

Introduction

An LLM-based Agentic AI represents a paradigm shift in how we interact with and leverage large language models. Instead of merely responding to a single prompt, an agent can perceive its environment, reason about a task, decide on an action, execute that action, and then observe the outcome, iteratively working towards a goal. For software engineers looking to understand this concept, building the smallest possible demo is an excellent way to demystify the process and illustrate both its fundamental simplicity and the areas where complexity can arise.

At its core, an agentic AI powered by a large language model functions as an autonomous entity capable of performing tasks that require multiple steps, external interactions, and adaptive decision-making. The large language model acts as the agent's "brain," responsible for understanding instructions, generating thoughts, formulating plans, and deciding which actions to take. This brain needs to interact with several other components to be truly agentic. These components include a memory to retain context and past observations, a set of tools or actions that allow the agent to interact with its environment, and a mechanism for perception to receive feedback from those interactions. The entire process is orchestrated by a control loop, which continuously cycles through perception, reasoning, action, and observation until the task is complete.

For our smallest possible program, we will choose a straightforward technology stack. Python is the natural choice due to its extensive libraries and ease of integration with large language models. To keep the setup minimal and avoid local model deployment complexities, we will utilize a commercial Large Language Model API, such as OpenAI's, which provides a robust and accessible interface for interacting with powerful models. While local models using frameworks like Hugging Face Transformers are viable for more advanced setups, an API simplifies our initial focus on agentic principles. Our tools will be simple Python functions, demonstrating how an agent can extend its capabilities beyond pure language generation.

Let us now proceed with the step-by-step implementation.

Step 1: Setting up the LLM Interaction

The first fundamental building block is the ability to communicate with the large language model. This involves sending a prompt and receiving a completion. We will use the OpenAI Python client for this purpose. You would need to install it first using 'pip install openai' and set your API key as an environment variable or directly in the code (though environment variables are recommended for security).

Here is a basic Python function to interact with the LLM:

import os

from openai import OpenAI

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

def call_llm(prompt_messages):

try:

response = client.chat.completions.create(

model="gpt-3.5-turbo", # Or "gpt-4", "gpt-4o", etc.

messages=prompt_messages,

temperature=0.7,

max_tokens=500,

stop=["Observation:"] # Crucial for parsing agent output

)

return response.choices[0].message.content

except Exception as e:

print(f"Error calling LLM: {e}")

return "ERROR"

The 'prompt_messages' argument is a list of dictionaries, typically containing 'role' (e.g., "system", "user", "assistant") and 'content'. The 'stop' sequence is vital for agentic behavior; it tells the LLM to stop generating text once it outputs 'Observation:', which helps us parse its output reliably. Prompt engineering is key here: the way we structure these messages guides the LLM's behavior and reasoning process.

Step 2: Defining Simple Tools

An agent distinguishes itself by its ability to use external tools. For our demo, we will create two very simple, illustrative tools: one for a mock web search and another for a basic calculator. These tools are just Python functions that the agent can "call."

def mock_web_search(query):

"""Simulates a web search and returns a predefined result."""

print(f"Executing tool: mock_web_search with query '{query}'")

if "Michael Stal" in query:

return "Michael Stal is a software engineer in Munich."

elif "current year" in query:

return "The current year is 2025." # Hardcoded for demo simplicity

else:

return f"No specific result found for '{query}'. This is a mock search."

def simple_calculator(expression):

"""Evaluates a simple mathematical expression."""

print(f"Executing tool: simple_calculator with expression '{expression}'")

try:

# WARNING: Using eval() is dangerous in production code.

# This is for demonstration purposes ONLY.

result = str(eval(expression))

return result

except Exception as e:

return f"Error evaluating expression: {e}"

These functions represent the external capabilities of our agent. The agent itself does not know how to perform a web search or calculate; it only knows how to *ask* for these operations to be performed by designated tools.

Step 3: Implementing the Agent's Memory

To maintain context and allow for multi-turn interactions and iterative reasoning, an agent needs a form of memory. For this minimal demo, a simple Python list will suffice to store the conversation history and the agent's internal thoughts and observations. Each entry in this list will be a dictionary, similar to the 'messages' format used by the LLM API.

agent_memory = []

def add_to_memory(role, content):

agent_memory.append({"role": role, "content": content})

This 'agent_memory' list will accumulate all interactions, including user queries, the agent's thoughts, actions, and observations from tools. When we construct the prompt for the LLM, we will pass this entire history, allowing the LLM to "remember" past steps.

Step 4: Crafting the Agent's Prompt (The "Brain" Configuration)

This is perhaps the most critical part of defining our agent. The prompt acts as the agent's programming, instructing the LLM on its role, its goal, the tools it has access to, and how it should structure its output. We need to clearly define the expected output format so that our control loop can reliably parse the LLM's response.

The system message sets the stage:

SYSTEM_PROMPT = """

You are a helpful AI assistant named XYZ, designed to assist employees.

You can use tools to gather information.

Your goal is to answer the user's question or complete their request.

You operate in a loop of Thought, Action, and Observation.

When you need to use a tool, output 'Action:' followed by the tool name and arguments.

Example:

Action: tool_name(arg1='value1', arg2='value2')

Available tools:

1. mock_web_search(query: str) -> str

Description: Performs a mock web search for the given query. Useful for general knowledge questions.

2. simple_calculator(expression: str) -> str

Description: Evaluates a simple mathematical expression. Useful for calculations.

When you have enough information to answer the user's request, output 'Final Answer:' followed by your answer.

Always start your reasoning with 'Thought:'.

If you need more information, continue the Thought/Action/Observation cycle.

"""

Notice the explicit instructions for 'Action:' and 'Final Answer:'. These are the signals our parsing logic will look for. The tool descriptions are also critical; the LLM uses these to understand when and how to use each tool.

Step 5: Building the Agent Control Loop

This is the heart of the agent, orchestrating the perceive-reason-act-observe cycle. It will be a loop that continues until the agent provides a final answer.

def run_agent(initial_task):

add_to_memory("system", SYSTEM_PROMPT)

add_to_memory("user", initial_task)

print(f"User: {initial_task}")

while True:

# 1. Reason (LLM call)

current_prompt_messages = agent_memory[:] # Copy current memory for LLM

llm_response = call_llm(current_prompt_messages)

print(f"LLM Response: {llm_response}")

if llm_response == "ERROR":

print("Agent encountered an error with LLM call. Exiting.")

break

add_to_memory("assistant", llm_response)

# 2. Parse LLM's response for Action or Final Answer

if "Final Answer:" in llm_response:

final_answer = llm_response.split("Final Answer:", 1)[1].strip()

print(f"Agent Final Answer: {final_answer}")

break

elif "Action:" in llm_response:

try:

# Basic parsing for Action: tool_name(arg='value')

action_line = llm_response.split("Action:", 1)[1].strip()

# Find the first parenthesis to separate tool name from args

tool_name_end_index = action_line.find('(')

if tool_name_end_index == -1:

raise ValueError("Invalid action format: missing opening parenthesis.")

tool_name = action_line[:tool_name_end_index].strip()

args_str = action_line[tool_name_end_index:].strip()

# Very basic argument parsing (assumes simple key='value' structure)

# This is a highly simplified parser for the demo.

# A robust solution would use ast.literal_eval or a proper parser.

args = {}

if args_str.startswith('(') and args_str.endswith(')'):

args_content = args_str[1:-1] # Remove parentheses

if args_content:

# Split by comma, then by equals. Very brittle.

for arg_pair in args_content.split(','):

if '=' in arg_pair:

key, value = arg_pair.split('=', 1)

# Remove quotes from value if present

value = value.strip().strip("'").strip('"')

args[key.strip()] = value

else:

raise ValueError("Invalid action format: arguments not enclosed in parentheses.")

# 3. Execute Action

observation = ""

if tool_name == "mock_web_search":

query = args.get('query')

if query:

observation = mock_web_search(query)

else:

observation = "Error: mock_web_search requires a 'query' argument."

elif tool_name == "simple_calculator":

expression = args.get('expression')

if expression:

observation = simple_calculator(expression)

else:

observation = "Error: simple_calculator requires an 'expression' argument."

else:

observation = f"Unknown tool: {tool_name}"

print(f"Observation: {observation}")

add_to_memory("system", f"Observation: {observation}")

except Exception as e:

error_message = f"Agent parsing or execution error: {e}. LLM output was: {llm_response}"

print(error_message)

add_to_memory("system", f"Observation: {error_message}")

# In a real agent, you might try to recover or ask the LLM to re-evaluate.

# For this demo, we'll just continue the loop, hoping the LLM corrects.

else:

print("Agent did not output an Action or Final Answer. Continuing...")

# This case means the LLM is still 'thinking' or generated an unexpected output.

# In a real agent, you might have a maximum number of 'thought' steps.

# For this demo, it will just loop back with the new LLM response in memory.

Putting It All Together

Now, let's combine all the pieces into a runnable script.

import os

from openai import OpenAI

# Initialize OpenAI client

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Agent Memory

agent_memory = []

def add_to_memory(role, content):

agent_memory.append({"role": role, "content": content})

# LLM Interaction Function

def call_llm(prompt_messages):

try:

response = client.chat.completions.create(

model="gpt-3.5-turbo",

messages=prompt_messages,

temperature=0.7,

max_tokens=500,

stop=["Observation:"]

)

return response.choices[0].message.content

except Exception as e:

print(f"Error calling LLM: {e}")

return "ERROR"

# Tool Definitions

def mock_web_search(query):

print(f"Executing tool: mock_web_search with query '{query}'")

if "Michael Stal" in query:

return "Michael Stal is a software engineer in Munich."

elif "current year" in query:

return "The current year is 2025." # Hardcoded for demo simplicity

else:

return f"No specific result found for '{query}'. This is a mock search."

def simple_calculator(expression):

print(f"Executing tool: simple_calculator with expression '{expression}'")

try:

# WARNING: Using eval() is dangerous in production code.

# This is for demonstration purposes ONLY.

result = str(eval(expression))

return result

except Exception as e:

return f"Error evaluating expression: {e}"

# System Prompt for the Agent

SYSTEM_PROMPT = """

You are a helpful AI assistant named XYZ, designed to assist users.

You can use tools to gather information.

Your goal is to answer the user's question or complete their request.

You operate in a loop of Thought, Action, and Observation.

When you need to use a tool, output 'Action:' followed by the tool name and arguments.

Example:

Action: tool_name(arg1='value1', arg2='value2')

Available tools:

1. mock_web_search(query: str) -> str

Description: Performs a mock web search for the given query. Useful for general knowledge questions.

2. simple_calculator(expression: str) -> str

Description: Evaluates a simple mathematical expression. Useful for calculations.

When you have enough information to answer the user's request, output 'Final Answer:' followed by your answer.

Always start your reasoning with 'Thought:'.

If you need more information, continue the Thought/Action/Observation cycle.

"""

# Agent Control Loop

def run_agent(initial_task):

global agent_memory # Access the global memory list

agent_memory = [] # Reset memory for a new task

add_to_memory("system", SYSTEM_PROMPT)

add_to_memory("user", initial_task)

print(f"User: {initial_task}\n")

while True:

current_prompt_messages = agent_memory[:]

llm_response = call_llm(current_prompt_messages)

print(f"LLM Response:\n{llm_response}\n")

if llm_response == "ERROR":

print("Agent encountered an error with LLM call. Exiting.")

break

add_to_memory("assistant", llm_response)

if "Final Answer:" in llm_response:

final_answer = llm_response.split("Final Answer:", 1)[1].strip()

print(f"Agent Final Answer: {final_answer}\n")

break

elif "Action:" in llm_response:

try:

action_line = llm_response.split("Action:", 1)[1].strip()

tool_name_end_index = action_line.find('(')

if tool_name_end_index == -1:

raise ValueError("Invalid action format: missing opening parenthesis.")

tool_name = action_line[:tool_name_end_index].strip()

args_str = action_line[tool_name_end_index:].strip()

args = {}

if args_str.startswith('(') and args_str.endswith(')'):

args_content = args_str[1:-1]

if args_content:

for arg_pair in args_content.split(','):

if '=' in arg_pair:

key, value = arg_pair.split('=', 1)

value = value.strip().strip("'").strip('"')

args[key.strip()] = value

else:

raise ValueError("Invalid action format: arguments not enclosed in parentheses.")

observation = ""

if tool_name == "mock_web_search":

query = args.get('query')

if query:

observation = mock_web_search(query)

else:

observation = "Error: mock_web_search requires a 'query' argument."

elif tool_name == "simple_calculator":

expression = args.get('expression')

if expression:

observation = simple_calculator(expression)

else:

observation = "Error: simple_calculator requires an 'expression' argument."

else:

observation = f"Unknown tool: {tool_name}"

print(f"Observation: {observation}\n")

add_to_memory("system", f"Observation: {observation}")

except Exception as e:

error_message = f"Agent parsing or execution error: {e}. LLM output was: {llm_response}"

print(error_message)

add_to_memory("system", f"Observation: {error_message}")

else:

print("Agent did not output an Action or Final Answer. Continuing...\n")

# Example Usage

if __name__ == "__main__":

# Ensure OPENAI_API_KEY is set in your environment variables

if not os.environ.get("OPENAI_API_KEY"):

print("Please set the OPENAI_API_KEY environment variable.")

else:

print("--- Running Agent Demo 1: Simple Calculation ---")

run_agent("What is 123 + 456?")

print("\n" + "="*50 + "\n")

print("--- Running Agent Demo 2: Web Search and Follow-up ---")

run_agent("Who is Michael Stal? And which books did he co-author?")

print("\n" + "="*50 + "\n")

print("--- Running Agent Demo 3: Unknown Request ---")

run_agent("Tell me a joke about a robot.")

print("\n" + "="*50 + "\n")

Discussion on Complexity and Future Steps

This demo clearly illustrates the fundamental mechanism of an LLM-based agent. The core loop of reasoning, acting, and observing is surprisingly simple to implement. However, the path from this minimal example to a robust, production-ready agent introduces several layers of complexity.

One significant area of complexity is the robust parsing of the LLM's output. Our demo uses simple string splitting, which is highly brittle. Real-world agents often employ more sophisticated parsing techniques, such as regular expressions, Pydantic models for structured output, or even function calling features provided by LLM APIs, which can directly return structured JSON.

Another challenge lies in sophisticated memory management. A simple list of messages works for short interactions, but for longer, more complex tasks, this approach quickly becomes inefficient and hits context window limits. Advanced agents use techniques like summarization, retrieval-augmented generation (RAG) with vector databases, or hierarchical memory systems to manage vast amounts of information.

Tool orchestration also grows in complexity. Our demo uses hardcoded tool calls. In a more advanced scenario, an agent might need to select from dozens or hundreds of tools, understand their interdependencies, and even dynamically generate tool arguments. Error handling and self-correction are crucial; what happens when a tool fails or the LLM generates an invalid action? Robust agents need mechanisms to detect and recover from such issues.

Finally, prompt engineering for reliability and consistency is an ongoing challenge. Crafting prompts that reliably elicit the desired agentic behavior, especially across different tasks and LLM versions, requires significant iteration and testing. Frameworks like LangChain, LlamaIndex, and CrewAI abstract away much of this complexity, providing pre-built components for memory, tools, and control loops, allowing developers to focus on the higher-level logic of their agents. While these frameworks are excellent for building real-world applications, understanding the fundamental mechanics demonstrated here is invaluable.

Conclusion

The core concept of an LLM-based agent, cycling through perception, reasoning, action, and observation, is fundamentally straightforward to grasp and implement. As demonstrated, a minimal working example can be constructed with relatively few lines of Python code and a commercial LLM API. The true complexity arises not from the basic loop itself, but from the need for robustness, scalability, and advanced features like sophisticated parsing, intelligent memory management, and comprehensive error handling, which are essential for real-world applications. This demo serves as a foundational stepping stone, encouraging software engineers to experiment further and explore the vast potential of agentic AI.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Wednesday, November 19, 2025

Implementing the Smallest Possible LLM-based Agentic AI System

Introduction

Step 1: Setting up the LLM Interaction

Step 2: Defining Simple Tools

Step 3: Implementing the Agent's Memory

Step 4: Crafting the Agent's Prompt (The "Brain" Configuration)

Step 5: Building the Agent Control Loop

Putting It All Together

Discussion on Complexity and Future Steps

Conclusion

No comments:

About Me