Monday, May 19, 2025

Building a smart and autonomous Agentic AI system

 INTRODUCTION


Large language models have revolutionized software engineering tasks such as summarization, question answering, and code generation. Even so, they suffer from hallucinations and they cannot access information that lies outside of the text on which they were trained. They also lack the ability to execute multi-step procedures or interact with external systems without additional scaffolding. To build applications that are reliable and up-to-date, engineers enrich these models with external knowledge sources and orchestrators. This enrichment typically combines retrieval from document stores, structured queries to knowledge graphs, planning modules that decompose requests into executable steps, and memory systems that track state over time. In this article, software engineers will learn how to assemble all of these pieces into a powerful autonomous agent that follows a plan, invokes tools, and delivers accurate results.


RETRIEVAL-AUGMENTED GENERATION


Retrieval-Augmented Generation is a design pattern in which a user’s query is first used to fetch pertinent documents from an external repository. Those documents are then provided as context to the language model so that the model need not rely solely on its internal parameters for factual knowledge. By offloading factual information to an updatable store, RAG reduces hallucinations and keeps the system current without retraining.


The following code example demonstrates how to build a simple RAG pipeline with FAISS for vector similarity search, the OpenAI embeddings API for generating vector representations, and the GPT-4 completion endpoint for generation. The code first embeds a collection of documents and indexes them in FAISS. For each new query, the pipeline embeds the query, retrieves the top passages by similarity, assembles those passages into a prompt, and then calls the language model to generate a grounded answer.


import faiss

import numpy as np

from openai import OpenAI

from openai.embeddings_utils import get_embedding


# This section prepares the documents and builds a FAISS index.

# Each document is embedded using the OpenAI embedding model.

# The resulting vectors are added to an IndexFlatL2 structure,

# which supports fast nearest-neighbor searches by L2 distance.

documents = [

    "The Apollo missions landed humans on the Moon between 1969 and 1972.",

    "The International Space Station has been continuously inhabited since November 2000."

]

embedding_model = "text-embedding-ada-002"

document_embeddings = [get_embedding(d, engine=embedding_model) for d in documents]

dimension = len(document_embeddings[0])

index = faiss.IndexFlatL2(dimension)

index.add(np.array(document_embeddings))


# This initializes the GPT-4 client for generation.

llm = OpenAI(model="gpt-4")


def retrieve_and_generate(query, top_k=2):

    # The query is embedded with the same model as the documents.

    q_embedding = np.array([get_embedding(query, engine=embedding_model)])

    # FAISS performs a similarity search to find the top-k documents.

    distances, indices = index.search(q_embedding, top_k)

    retrieved = [documents[i] for i in indices[0]]


    # The retrieved passages are formatted into a context string.

    context = "\n".join(f"Source {i+1}: {text}" for i, text in enumerate(retrieved))

    prompt = (

        "Use the following context to answer the question.\n\n"

        f"{context}\n\n"

        f"Question: {query}\n"

        "Answer:"

    )


    # The assembled prompt is sent to the language model for completion.

    response = llm.completions.create(prompt=prompt, max_tokens=256)

    return response.choices[0].text.strip()


# Example invocation of the RAG pipeline.

answer = retrieve_and_generate(

    "When did humans first live on the International Space Station?"

)

print(answer)



KNOWLEDGE GRAPH INTEGRATION


While RAG handles unstructured text, knowledge graphs capture facts as structured triples of entities and relationships. Querying a knowledge graph enables precise fact lookup and relationship traversal that text retrieval alone cannot guarantee. This can be particularly important when you need to validate or compute across relationships rather than rely on natural-language patterns.


The next example shows how to use RDFLib to load a Turtle file into a Graph object and then execute a SPARQL query to find entities related to a given concept. The results are formatted into a plain-text table that can be provided to the language model so that its reasoning can incorporate structured data.


from rdflib import Graph


# The graph is initialized and a local Turtle file is parsed.

# The file contains triples conforming to a custom schema

# under the example.org namespace.

g = Graph()

g.parse("knowledge_graph.ttl", format="turtle")


# This SPARQL query retrieves topics related to MachineLearning,

# along with their definitions. The PREFIX declaration

# maps the ex: prefix to the custom schema URI.

sparql = """

PREFIX ex: <http://example.org/schema#>


SELECT ?topic ?definition WHERE {

  ?topic ex:relatedTo ex:MachineLearning.

  ?topic ex:definition ?definition.

}

"""


results = g.query(sparql)


# The results are formatted as a simple table with two columns.

formatted = "Topic | Definition\n"

for row in results:

    # The URI is split to extract the local name of the topic.

    topic_name = row.topic.split("#")[-1]

    definition = str(row.definition)

    formatted += f"{topic_name} | {definition}\n"


# The table is then injected into a prompt for the model to reason over.

context = (

    "Here is a list of topics related to Machine Learning:\n"

    + formatted

)

prompt = (

    context

    + "\n\nBased on this information, which topic would be most relevant "

    "to neural network explainability?"

)

answer = llm.completions.create(prompt=prompt, max_tokens=128).choices[0].text.strip()

print(answer)



AGENTIC ARCHITECTURE AND PLANNING


Beyond answering single queries, an autonomous agent needs to plan and execute multiple steps without human intervention. A practical pattern is to have the model generate a textual plan, then interpret each step, run it through the appropriate function—whether that is retrieval, a graph query, or a tool call—and then verify success before proceeding.


In the following code example, the generate_plan function asks GPT-4 to produce a numbered list of steps for a high-level task. The run_plan function then iterates over each step, calls execute_step to perform the work, and calls assess_success to ask the model whether the execution succeeded. If any step fails, the process aborts with an error.


def generate_plan(task_description):

    # The prompt explicitly requests a step-by-step numbered plan.

    prompt = (

        "Create a step-by-step plan to accomplish the following task:\n\n"

        f"{task_description}\n\n"

        "Plan:\n1."

    )

    response = llm.completions.create(

        prompt=prompt, max_tokens=256, stop=["\n\n"]

    )

    plan_text = response.choices[0].text.strip()

    # Each non-empty line is assumed to start with a number and a period.

    steps = [

        line.strip()[3:]

        for line in plan_text.split("\n")

        if line.strip()

    ]

    return steps


def execute_step(step):

    # This placeholder might dispatch to retrieve_and_generate or SPARQL.

    # For demonstration, it simply returns a formatted message.

    return f"Executed: {step}"


def assess_success(step_output):

    # The model is asked to judge success, answering yes or no.

    eval_prompt = (

        f"The agent attempted the following step:\n\n"

        f"{step_output}\n\n"

        "Did it succeed? Answer yes or no with a brief justification."

    )

    eval_resp = llm.completions.create(prompt=eval_prompt, max_tokens=64)

    verdict = eval_resp.choices[0].text.strip().lower()

    return verdict.startswith("yes")


def run_plan(task):

    # The high-level task is decomposed into atomic steps.

    steps = generate_plan(task)

    for step in steps:

        output = execute_step(step)

        if not assess_success(output):

            raise RuntimeError(f"Step failed and aborted: {step}")

    return "All steps completed successfully."


# Example of running the agentic plan executor.

result = run_plan(

    "Compile a list of key milestones in space exploration and summarize them."

)

print(result)



ADDITIONAL ENHANCEMENTS


To build a production-grade autonomous agent, further components are often added. A long-term memory system can record past interactions and retrieved facts in a Redis key-value store so that future planning can reference earlier results. Tool invocation frameworks can let the agent execute side effects—such as sending an email or performing a calculation—instead of relying purely on generated text. Chain-of-thought prompting patterns encourage the model to reveal intermediate reasoning steps, improving transparency and debuggability. While each enhancement adds complexity, they address specific shortcomings: memory preserves context beyond a single session, tools provide deterministic capabilities, and chain-of-thought helps avoid reasoning errors.


IMPLEMENTATION EXAMPLE


The following complete Python implementation demonstrates an Agent class that unifies all of the above ideas. It initializes a FAISS index for document retrieval, loads an RDF graph for structured queries, uses Redis for memory, and provides a simple planner that alternates between retrieval and graph lookups. It also includes a date normalization tool to convert string dates into ISO format. This agent class exposes a single method, answer_question, which runs the plan, invokes tools as needed, and logs each step’s result to memory.


import faiss

import numpy as np

from openai import OpenAI

from openai.embeddings_utils import get_embedding

from rdflib import Graph

import redis

from dateutil import parser as date_parser


class Agent:

    def __init__(self, docs, ttl_graph_path, redis_url="redis://localhost:6379/0"):

        # Prepare document retrieval components.

        self.embedding_model = "text-embedding-ada-002"

        self.documents = docs

        embeddings = [get_embedding(d, engine=self.embedding_model) for d in docs]

        dim = len(embeddings[0])

        self.index = faiss.IndexFlatL2(dim)

        self.index.add(np.array(embeddings))


        # Load knowledge graph from Turtle.

        self.graph = Graph()

        self.graph.parse(ttl_graph_path, format="turtle")


        # Initialize Redis for long-term memory.

        self.memory = redis.Redis.from_url(redis_url)


        # Initialize the language model client.

        self.llm = OpenAI(model="gpt-4")


    def retrieve(self, query, top_k=2):

        q_emb = np.array([get_embedding(query, engine=self.embedding_model)])

        _, idxs = self.index.search(q_emb, top_k)

        return [self.documents[i] for i in idxs[0]]


    def query_graph(self, sparql):

        return list(self.graph.query(sparql))


    def normalize_date(self, date_str):

        dt = date_parser.parse(date_str)

        return dt.date().isoformat()


    def log_memory(self, key, value):

        self.memory.set(key, value)


    def generate_plan(self, question):

        prompt = (

            "Decompose the following question into a sequence of actionable steps:\n\n"

            f"{question}\n\n"

            "Steps:\n1."

        )

        resp = self.llm.completions.create(

            prompt=prompt, max_tokens=256, stop=["\n\n"]

        )

        text = resp.choices[0].text.strip()

        steps = [

            line.strip()[3:]

            for line in text.split("\n")

            if line.strip()

        ]

        return steps


    def execute(self, question):

        plan = self.generate_plan(question)

        full_answer = []

        for i, step in enumerate(plan, start=1):

            # Decide which subsystem to invoke.

            if step.lower().startswith("retrieve"):

                docs = self.retrieve(question)

                result = "\n".join(f"- {d}" for d in docs)

            elif "sparql" in step.lower():

                sparql = step.split("SPARQL:")[-1].strip()

                rows = self.query_graph(sparql)

                result = str(rows)

            elif "date" in step.lower():

                # Extract example date from step for demonstration.

                example_date = "November 2000"

                norm = self.normalize_date(example_date)

                result = f"Normalized {example_date} to {norm}"

            else:

                # Fallback: call the LLM directly.

                result = self.llm.completions.create(

                    prompt=step, max_tokens=128

                ).choices[0].text.strip()


            # Log result to memory with a step-specific key.

            mem_key = f"step_{i}"

            self.log_memory(mem_key, result)

            full_answer.append(f"Step {i}: {result}")


        # Combine and return the agent’s final response.

        return "\n".join(full_answer)


    def answer_question(self, question):

        return self.execute(question)


# Example usage of the Agent:

documents = [

    "Saturn V launched Apollo 11 in July 1969.",

    "The Space Shuttle first flew in 1981."

]

agent = Agent(documents, "knowledge_graph.ttl")

output = agent.answer_question("Create a timeline of major U.S. human spaceflight vehicles.")

print(output)



CONCLUSION


By enriching a base language model with retrieval-augmented generation, knowledge graph queries, autonomous planning, memory storage, and tool invocation, software engineers can construct agents that go well beyond single-turn Q&A. Each component addresses a key weakness: retrieval grounds facts in updatable text, graphs enforce structured relationships, planning coordinates multi-step tasks, memory preserves context, and tools enable deterministic actions. When these pieces are integrated carefully and their interactions monitored, the result is a robust autonomous agent that follows a plan and delivers reliable, high-quality results.

No comments: