Hitchhiker's Guide to AI, Software Architecture, and Everything Else: Beyond RAG: Overcoming Context Loss and Relevance Gaps in Retrieval-Augmented Generation with Knowledge Graphs

Introduction

Retrieval-Augmented Generation (RAG) has emerged as a leading paradigm for enhancing large language models (LLMs) with external knowledge. By combining the generative power of LLMs with the precision of information retrieval, RAG systems can answer questions and perform tasks that require up-to-date or domain-specific information. However, as RAG has been adopted in real-world applications, its limitations have become increasingly apparent. Chief among these are the loss of context due to document chunking and the mismatch between similarity and true relevance during retrieval. In this article, we explore these challenges in depth and introduce a knowledge graph-based alternative that addresses the core weaknesses of RAG, offering a path toward more accurate, context-aware, and efficient AI systems. Code examples are provided to illustrate the concepts.

The Problems with RAG

1. Context Fragmentation

At the heart of RAG is the process of splitting documents into smaller chunks, typically a few sentences or paragraphs each. This is necessary because LLMs have a limited context window, and vector databases perform best with manageable chunk sizes. However, this chunking process is inherently lossy. Information that spans across chunk boundaries is severed, and the subtle interplay between different parts of a document is lost. For example, a legal document may define a term in one section and use it in another; chunking can break this connection, making it difficult for the system to provide accurate answers about the document’s meaning.

A typical RAG chunking and embedding process might look like this:

```

from sentence_transformers import SentenceTransformer

import numpy as np

# Example document

doc = "Aspirin is a medication used to reduce pain. It can also reduce fever and inflammation. Side effects include stomach upset and bleeding."

# Chunking (naive split by sentences)

chunks = doc.split('. ')

model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(chunks)

# Now, embeddings can be stored in a vector database for similarity search

```

This approach loses the connection between "Aspirin" and its side effects if they are in different chunks.

2. Similarity vs. Relevance

RAG systems typically use vector similarity search to retrieve chunks that are "similar" to the user’s query. Embeddings capture semantic similarity, but this is not always aligned with what is truly relevant. For instance, a query about "side effects of a medication" might retrieve chunks that mention the medication in similar contexts but do not actually list side effects.

A typical similarity search might look like this:

```

from sklearn.metrics.pairwise import cosine_similarity

query = "What are the side effects of aspirin?"

query_embedding = model.encode([query])

similarities = cosine_similarity(query_embedding, embeddings)

most_similar_idx = np.argmax(similarities)

print("Most relevant chunk:", chunks[most_similar_idx])

```

This may return a chunk that mentions "Aspirin" but not its side effects, highlighting the gap between similarity and true relevance.

A Knowledge Graph-Based Solution

To overcome these limitations, we propose a fundamentally different approach: using knowledge graphs to represent and retrieve information. A knowledge graph is a structured representation of entities, concepts, and their relationships, capturing not just isolated facts but the rich web of context that connects them.

1. Knowledge Graph Construction

The first step is to preprocess the document corpus to extract entities (such as people, organizations, chemicals, or legal terms), concepts (such as processes, events, or categories), and the relationships between them (such as "causes," "is part of," "is defined by," etc.). This can be done using a combination of natural language processing techniques, such as named entity recognition, relation extraction, and coreference resolution.

A simple example using the `networkx` library:

```

import networkx as nx

# Create a directed graph

G = nx.DiGraph()

# Add nodes (entities/concepts)

G.add_node("Aspirin", type="medication")

G.add_node("Pain", type="symptom")

G.add_node("Fever", type="symptom")

G.add_node("Inflammation", type="symptom")

G.add_node("Stomach upset", type="side_effect")

G.add_node("Bleeding", type="side_effect")

# Add edges (relationships)

G.add_edge("Aspirin", "Pain", relation="treats")

G.add_edge("Aspirin", "Fever", relation="treats")

G.add_edge("Aspirin", "Inflammation", relation="treats")

G.add_edge("Aspirin", "Stomach upset", relation="side_effect")

G.add_edge("Aspirin", "Bleeding", relation="side_effect")

```

2. Contextual Mapping and Preservation

Each node in the graph can be linked to its full context in the source document. For example, you can store the relevant sentence or paragraph as an attribute of the node.

```

G.nodes["Aspirin"]["context"] = "Aspirin is a medication used to reduce pain, fever, and inflammation."

G.nodes["Stomach upset"]["context"] = "Side effects include stomach upset."

G.nodes["Bleeding"]["context"] = "Side effects include bleeding."

```

3. Relevance-Driven Retrieval

When a user asks a question, the system can traverse the graph to find the most relevant nodes and their context, rather than relying on vector similarity.

For example, to answer "What are the side effects of aspirin?":

```

side_effects = [n for n in G.successors("Aspirin") if G["Aspirin"][n]["relation"] == "side_effect"]

for effect in side_effects:

print(effect, ":", G.nodes[effect]["context"])

```

This will output:

```

Stomach upset : Side effects include stomach upset.

Bleeding : Side effects include bleeding.

```

4. LLM Integration and Reasoning

The LLM can be prompted with structured information from the graph, such as:

```

Aspirin is a medication used to reduce pain, fever, and inflammation. Its side effects include stomach upset and bleeding.

```

Or, for more advanced integration, the LLM can be fine-tuned to interpret graph-structured data directly, enabling it to synthesize information from multiple nodes and relationships.

Advantages Over RAG

The knowledge graph-based approach offers several key advantages over traditional RAG systems:

- Context Preservation: By linking concepts to their full textual surroundings and related entities, the system preserves the rich context that is often lost in chunked retrieval.

- Improved Relevance: Retrieval is driven by explicit relationships and domain knowledge, rather than superficial similarity, leading to more accurate and useful answers.

- Cross-Document Reasoning: The graph structure enables the system to aggregate and synthesize information from multiple sources, providing a more comprehensive view of the topic.

- Efficiency: Once the knowledge graph is constructed, retrieval and reasoning can be faster and more targeted, reducing computational load and response time.

- Transparency: The graph provides a clear audit trail of how answers are derived, increasing user trust and enabling easier debugging and improvement.

Trade-Offs and Challenges

While the knowledge graph approach offers significant benefits, it also introduces new challenges:

Preprocessing Complexity: Building and maintaining a high-quality knowledge graph requires sophisticated NLP techniques and domain expertise. This increases the upfront investment in system development.
Scalability: As the corpus grows, the graph can become large and complex, requiring efficient storage, indexing, and traversal algorithms.
Dynamic Updates: Incorporating new information or correcting errors in the graph requires robust update mechanisms and quality control.
LLM Adaptation: Training or prompting LLMs to effectively use structured graph data is an active area of research and may require custom architectures or fine-tuning.

Despite these challenges, the long-term benefits of improved accuracy, context preservation, and efficiency make the knowledge graph approach a compelling alternative to RAG, especially for applications where precision and trustworthiness are paramount.

Conclusion

RAG has played a crucial role in bridging the gap between LLMs and external knowledge, but its reliance on chunked embeddings and similarity search imposes fundamental limitations. By shifting to a knowledge graph-based paradigm, we can overcome context loss, improve relevance, and enable more sophisticated reasoning. While this approach demands greater investment in preprocessing and system design, it promises a new generation of AI systems that are not only more accurate and efficient, but also more transparent and trustworthy. As the field continues to evolve, knowledge graphs are poised to become a cornerstone of advanced AI applications, unlocking the full potential of large language models in knowledge-intensive domains.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Thursday, April 24, 2025

Beyond RAG: Overcoming Context Loss and Relevance Gaps in Retrieval-Augmented Generation with Knowledge Graphs