Welcome to the fascinating world of Large Language Models, Retrieval-Augmented Generation, and the Model Context Protocol! If you're reading this, you're about to embark on an exciting journey that will take you from complete beginner to someone who can build sophisticated AI-powered applications using the Go programming language and entirely open source components.
Think of this tutorial as your friendly guide through what might initially seem like a complex maze of technologies. We'll take everything step by step, explaining not just the "how" but also the "why" behind every concept and code snippet you'll encounter. Most importantly, we'll do this using only open source tools and models that you can run locally or deploy freely.
UNDERSTANDING THE FUNDAMENTAL CONCEPTS
Before we dive into writing code, let's establish a solid foundation by understanding what we're actually building. Imagine you're constructing a house - you wouldn't start laying bricks without first understanding the blueprint and having the right tools ready.
Large Language Models, commonly abbreviated as LLMs, are sophisticated artificial intelligence systems that have been trained on vast amounts of text data. Think of them as incredibly well-read assistants who can understand and generate human-like text. They're like having a conversation with someone who has read millions of books, articles, and documents, and can draw upon that knowledge to help you with various tasks.
In the open source world, we have access to excellent models like Llama 2, Code Llama, Mistral, and many others that can run locally on your machine without requiring external API calls or sending your data to third-party services. This gives you complete control over your data and eliminates ongoing costs.
However, LLMs have a significant limitation that we need to address. They're trained on data up to a certain point in time, and they don't have access to your specific, private, or real-time information. This is where Retrieval-Augmented Generation, or RAG, comes into play.
RAG is like giving your well-read assistant access to your personal library and filing cabinet. When you ask a question, the system first searches through your specific documents and data to find relevant information, then provides that context to the LLM so it can give you a more accurate and personalized response.
The Model Context Protocol, or MCP, is a standardized way for different AI applications and tools to communicate with each other. Think of it as a universal translator that allows various AI systems to share information and capabilities seamlessly.
SETTING UP YOUR GO DEVELOPMENT ENVIRONMENT
Before we can start building our LLM application, we need to prepare our development environment. Go, also known as Golang, is a programming language developed by Google that's particularly well-suited for building networked applications and services.
First, you'll need to install Go on your system. Visit the official Go website and download the installer for your operating system. Once installed, you can verify that everything is working correctly by opening your terminal or command prompt and typing:
go version
This should display the version of Go that you've installed. If you see an error message instead, you may need to check your installation or add Go to your system's PATH environment variable.
Next, let's create a new directory for our project. Navigate to a location where you'd like to store your code and create a new folder:
mkdir llm-rag-mcp-tutorial
cd llm-rag-mcp-tutorial
Now, initialize a new Go module. A Go module is like a container that holds all the code and dependencies for your project:
go mod init llm-rag-mcp-tutorial
This creates a file called go.mod that will track all the external libraries your project depends on.
For our open source LLM integration, we'll be using Ollama, which is an excellent tool for running large language models locally. You'll need to install Ollama on your system by visiting their website and following the installation instructions for your operating system.
Once Ollama is installed, you can download and run a model like Llama 2 or 3 or 4 by executing:
ollama pull llama2
This will download the model to your local machine. You can then start the Ollama service, which provides a REST API that our Go application can communicate with.
UNDERSTANDING THE ARCHITECTURE OF OUR APPLICATION
Before we start coding, let's visualize what we're building. Our application will consist of several interconnected components, each with a specific responsibility.
At the core, we'll have an LLM client that communicates with our locally running Ollama service. This client will be responsible for sending prompts to the LLM and receiving responses without any external dependencies.
Surrounding this core, we'll build a RAG system that can search through documents and provide relevant context to enhance the LLM's responses. This system will include a document indexer, a search mechanism, and a context formatter.
On top of this foundation, we'll implement an MCP server that exposes our application's capabilities to other systems, and an MCP client that can consume services from other MCP-compatible applications.
Finally, we'll tie everything together with a user interface that allows people to interact with our system in a natural and intuitive way.
BUILDING YOUR FIRST OPEN SOURCE LLM CLIENT
Let's start by creating a simple client that can communicate with our locally running Ollama service. We'll begin with a basic structure and gradually add more sophisticated features.
Create a new file called main.go in your project directory:
package main
import (
"bufio"
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"strings"
"time"
)
// OllamaClient represents our client for communicating with the local Ollama service
type OllamaClient struct {
baseURL string
client *http.Client
model string
}
// Message represents a single message in our conversation
type Message struct {
Role string `json:"role"`
Content string `json:"content"`
}
// ChatRequest represents the structure of our API request to Ollama
type ChatRequest struct {
Model string `json:"model"`
Messages []Message `json:"messages"`
Stream bool `json:"stream"`
}
// ChatResponse represents the structure of the API response from Ollama
type ChatResponse struct {
Model string `json:"model"`
CreatedAt string `json:"created_at"`
Message Message `json:"message"`
Done bool `json:"done"`
}
// NewOllamaClient creates a new instance of our Ollama client
func NewOllamaClient(baseURL, model string) *OllamaClient {
return &OllamaClient{
baseURL: baseURL,
model: model,
client: &http.Client{
Timeout: 120 * time.Second, // Generous timeout for local inference
},
}
}
// SendMessage sends a message to the local LLM and returns the response
func (c *OllamaClient) SendMessage(userMessage string) (string, error) {
// Create the request payload for Ollama's chat endpoint
request := ChatRequest{
Model: c.model,
Messages: []Message{
{
Role: "user",
Content: userMessage,
},
},
Stream: false, // We want a complete response, not streaming
}
// Convert the request to JSON
jsonData, err := json.Marshal(request)
if err != nil {
return "", fmt.Errorf("failed to marshal request: %w", err)
}
// Create the HTTP request to Ollama's chat endpoint
url := c.baseURL + "/api/chat"
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
if err != nil {
return "", fmt.Errorf("failed to create request: %w", err)
}
// Set the required headers
req.Header.Set("Content-Type", "application/json")
// Send the request to our local Ollama service
resp, err := c.client.Do(req)
if err != nil {
return "", fmt.Errorf("failed to send request to Ollama: %w", err)
}
defer resp.Body.Close()
// Read the response body
body, err := io.ReadAll(resp.Body)
if err != nil {
return "", fmt.Errorf("failed to read response: %w", err)
}
// Check for HTTP errors
if resp.StatusCode != http.StatusOK {
return "", fmt.Errorf("Ollama API request failed with status %d: %s", resp.StatusCode, string(body))
}
// Parse the response from Ollama
var chatResponse ChatResponse
if err := json.Unmarshal(body, &chatResponse); err != nil {
return "", fmt.Errorf("failed to unmarshal response: %w", err)
}
return chatResponse.Message.Content, nil
}
// SendMessageWithContext sends a message along with additional context to the LLM
func (c *OllamaClient) SendMessageWithContext(userMessage, context string) (string, error) {
// Combine the context with the user message
enhancedMessage := fmt.Sprintf("Context: %s\n\nQuestion: %s", context, userMessage)
return c.SendMessage(enhancedMessage)
}
// CheckHealth verifies that Ollama is running and accessible
func (c *OllamaClient) CheckHealth() error {
url := c.baseURL + "/api/tags"
resp, err := c.client.Get(url)
if err != nil {
return fmt.Errorf("failed to connect to Ollama: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("Ollama service returned status %d", resp.StatusCode)
}
return nil
}
func main() {
// Initialize our Ollama client
// This assumes Ollama is running locally on the default port
client := NewOllamaClient("http://localhost:11434", "llama2")
// Check if Ollama is running
fmt.Println("Checking connection to Ollama...")
if err := client.CheckHealth(); err != nil {
fmt.Printf("Error: Cannot connect to Ollama service: %v\n", err)
fmt.Println("Please make sure Ollama is installed and running.")
fmt.Println("You can start it with: ollama serve")
return
}
fmt.Println("Successfully connected to Ollama!")
fmt.Println("Welcome to the Open Source LLM Chat Client!")
fmt.Println("Type 'quit' to exit the program.")
fmt.Println("----------------------------------------")
// Create a scanner to read user input
scanner := bufio.NewScanner(os.Stdin)
for {
fmt.Print("You: ")
// Read user input
if !scanner.Scan() {
break
}
userInput := strings.TrimSpace(scanner.Text())
// Check if user wants to quit
if strings.ToLower(userInput) == "quit" {
fmt.Println("Goodbye!")
break
}
// Skip empty inputs
if userInput == "" {
continue
}
// Send the message to the local LLM
fmt.Print("Assistant: ")
response, err := client.SendMessage(userInput)
if err != nil {
fmt.Printf("Error: %v\n", err)
continue
}
// Display the response
fmt.Printf("%s\n\n", response)
}
}
This code creates a client that communicates with a locally running Ollama service, giving us access to powerful open source language models without any external dependencies or API costs.
The OllamaClient struct holds the configuration needed to communicate with our local Ollama service, including the base URL where Ollama is running and the model we want to use.
The SendMessage method handles the entire process of communicating with Ollama. It creates a properly formatted request, sends it to the local service, and parses the response. Notice how we're using Ollama's chat endpoint, which provides a conversational interface similar to what you'd find in commercial services.
The SendMessageWithContext method is particularly important for our RAG implementation. It allows us to provide additional context along with the user's question, which is exactly what we'll need when we want to enhance responses with information from our document store.
The CheckHealth method verifies that our Ollama service is running and accessible before we try to use it. This helps provide clear error messages if something isn't configured correctly.
IMPLEMENTING DOCUMENT STORAGE AND RETRIEVAL
Now that we have a working LLM client, let's add the ability to store and search through documents. This is the foundation of our RAG system.
Create a new file called document_store.go:
package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"math"
"os"
"path/filepath"
"sort"
"strings"
"time"
"unicode"
)
// Document represents a single document in our store
type Document struct {
ID string `json:"id"`
Title string `json:"title"`
Content string `json:"content"`
Metadata map[string]string `json:"metadata"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
WordCount int `json:"word_count"`
}
// DocumentStore manages our collection of documents
type DocumentStore struct {
documents map[string]*Document
dataDir string
index *SimpleIndex
}
// SearchResult represents a document found during search
type SearchResult struct {
Document *Document
Score float64
Snippet string
Matches []string
}
// SimpleIndex provides basic text indexing capabilities
type SimpleIndex struct {
wordToDocuments map[string]map[string]float64 // word -> document_id -> tf_idf_score
documentCount int
}
// NewDocumentStore creates a new document store with indexing capabilities
func NewDocumentStore(dataDir string) (*DocumentStore, error) {
// Create the data directory if it doesn't exist
if err := os.MkdirAll(dataDir, 0755); err != nil {
return nil, fmt.Errorf("failed to create data directory: %w", err)
}
store := &DocumentStore{
documents: make(map[string]*Document),
dataDir: dataDir,
index: NewSimpleIndex(),
}
// Load existing documents
if err := store.loadDocuments(); err != nil {
return nil, fmt.Errorf("failed to load documents: %w", err)
}
return store, nil
}
// NewSimpleIndex creates a new text index
func NewSimpleIndex() *SimpleIndex {
return &SimpleIndex{
wordToDocuments: make(map[string]map[string]float64),
documentCount: 0,
}
}
// AddDocument adds a new document to the store and updates the index
func (ds *DocumentStore) AddDocument(title, content string, metadata map[string]string) (*Document, error) {
// Generate a unique ID for the document
id := fmt.Sprintf("doc_%d", time.Now().UnixNano())
// Count words in the content
wordCount := len(strings.Fields(content))
// Create the document
doc := &Document{
ID: id,
Title: title,
Content: content,
Metadata: metadata,
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
WordCount: wordCount,
}
// Add to memory store
ds.documents[id] = doc
// Update the search index
ds.index.AddDocument(doc)
// Save to disk
if err := ds.saveDocument(doc); err != nil {
// Remove from memory and index if save failed
delete(ds.documents, id)
ds.index.RemoveDocument(id)
return nil, fmt.Errorf("failed to save document: %w", err)
}
return doc, nil
}
// GetDocument retrieves a document by ID
func (ds *DocumentStore) GetDocument(id string) (*Document, bool) {
doc, exists := ds.documents[id]
return doc, exists
}
// SearchDocuments performs an indexed search with TF-IDF scoring
func (ds *DocumentStore) SearchDocuments(query string, maxResults int) []SearchResult {
if len(ds.documents) == 0 {
return []SearchResult{}
}
queryWords := ds.tokenizeAndNormalize(query)
if len(queryWords) == 0 {
return []SearchResult{}
}
// Calculate scores for each document
documentScores := make(map[string]float64)
documentMatches := make(map[string][]string)
for _, word := range queryWords {
if docScores, exists := ds.index.wordToDocuments[word]; exists {
for docID, score := range docScores {
documentScores[docID] += score
documentMatches[docID] = append(documentMatches[docID], word)
}
}
}
// Convert to search results
var results []SearchResult
for docID, score := range documentScores {
if doc, exists := ds.documents[docID]; exists {
snippet := ds.generateSnippet(doc.Content, query, 200)
results = append(results, SearchResult{
Document: doc,
Score: score,
Snippet: snippet,
Matches: documentMatches[docID],
})
}
}
// Sort results by score (highest first)
sort.Slice(results, func(i, j int) bool {
return results[i].Score > results[j].Score
})
// Limit results
if len(results) > maxResults {
results = results[:maxResults]
}
return results
}
// AddDocument adds a document to the index
func (idx *SimpleIndex) AddDocument(doc *Document) {
// Tokenize the document content
words := idx.tokenizeAndNormalize(doc.Title + " " + doc.Content)
// Calculate term frequency for this document
termFreq := make(map[string]int)
for _, word := range words {
termFreq[word]++
}
// Add to index with TF-IDF scoring
for word, freq := range termFreq {
if idx.wordToDocuments[word] == nil {
idx.wordToDocuments[word] = make(map[string]float64)
}
// Calculate TF (term frequency)
tf := float64(freq) / float64(len(words))
// For now, store TF; we'll calculate IDF when searching
idx.wordToDocuments[word][doc.ID] = tf
}
idx.documentCount++
}
// RemoveDocument removes a document from the index
func (idx *SimpleIndex) RemoveDocument(docID string) {
for word := range idx.wordToDocuments {
delete(idx.wordToDocuments[word], docID)
if len(idx.wordToDocuments[word]) == 0 {
delete(idx.wordToDocuments, word)
}
}
idx.documentCount--
}
// tokenizeAndNormalize breaks text into normalized words
func (ds *DocumentStore) tokenizeAndNormalize(text string) []string {
return ds.index.tokenizeAndNormalize(text)
}
// tokenizeAndNormalize breaks text into normalized words
func (idx *SimpleIndex) tokenizeAndNormalize(text string) []string {
// Convert to lowercase
text = strings.ToLower(text)
// Split into words and clean them
var words []string
currentWord := strings.Builder{}
for _, r := range text {
if unicode.IsLetter(r) || unicode.IsDigit(r) {
currentWord.WriteRune(r)
} else {
if currentWord.Len() > 0 {
word := currentWord.String()
if len(word) > 2 { // Filter out very short words
words = append(words, word)
}
currentWord.Reset()
}
}
}
// Don't forget the last word
if currentWord.Len() > 0 {
word := currentWord.String()
if len(word) > 2 {
words = append(words, word)
}
}
return words
}
// generateSnippet creates a relevant snippet from the document content
func (ds *DocumentStore) generateSnippet(content, query string, maxLength int) string {
queryWords := ds.tokenizeAndNormalize(query)
words := strings.Fields(content)
if len(words) == 0 {
return ""
}
// Find the best starting position for the snippet
bestStart := 0
maxMatches := 0
// Look for the position with the most query word matches in a window
windowSize := 50
for i := 0; i <= len(words)-windowSize && i < len(words); i++ {
matches := 0
windowText := strings.ToLower(strings.Join(words[i:i+windowSize], " "))
for _, queryWord := range queryWords {
matches += strings.Count(windowText, queryWord)
}
if matches > maxMatches {
maxMatches = matches
bestStart = i
}
}
// Build snippet starting from the best position
var snippet strings.Builder
currentLength := 0
for i := bestStart; i < len(words) && currentLength < maxLength; i++ {
if i > bestStart {
snippet.WriteString(" ")
currentLength++
}
snippet.WriteString(words[i])
currentLength += len(words[i])
}
result := snippet.String()
if len(result) >= maxLength {
result = result[:maxLength-3] + "..."
}
return result
}
// saveDocument saves a document to disk
func (ds *DocumentStore) saveDocument(doc *Document) error {
filename := filepath.Join(ds.dataDir, doc.ID+".json")
data, err := json.MarshalIndent(doc, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal document: %w", err)
}
if err := ioutil.WriteFile(filename, data, 0644); err != nil {
return fmt.Errorf("failed to write document file: %w", err)
}
return nil
}
// loadDocuments loads all documents from disk and rebuilds the index
func (ds *DocumentStore) loadDocuments() error {
files, err := ioutil.ReadDir(ds.dataDir)
if err != nil {
// Directory might not exist yet, which is fine
return nil
}
for _, file := range files {
if !strings.HasSuffix(file.Name(), ".json") {
continue
}
filename := filepath.Join(ds.dataDir, file.Name())
data, err := ioutil.ReadFile(filename)
if err != nil {
fmt.Printf("Warning: failed to read document file %s: %v\n", filename, err)
continue
}
var doc Document
if err := json.Unmarshal(data, &doc); err != nil {
fmt.Printf("Warning: failed to unmarshal document file %s: %v\n", filename, err)
continue
}
ds.documents[doc.ID] = &doc
ds.index.AddDocument(&doc)
}
return nil
}
// ListDocuments returns all documents in the store
func (ds *DocumentStore) ListDocuments() []*Document {
var docs []*Document
for _, doc := range ds.documents {
docs = append(docs, doc)
}
// Sort by creation time (newest first)
sort.Slice(docs, func(i, j int) bool {
return docs[i].CreatedAt.After(docs[j].CreatedAt)
})
return docs
}
// GetDocumentCount returns the number of documents in the store
func (ds *DocumentStore) GetDocumentCount() int {
return len(ds.documents)
}
// GetRelevantContext retrieves and formats relevant document content for RAG
func (ds *DocumentStore) GetRelevantContext(query string, maxDocuments int, maxContextLength int) string {
results := ds.SearchDocuments(query, maxDocuments)
if len(results) == 0 {
return "No relevant documents found."
}
var contextBuilder strings.Builder
currentLength := 0
for i, result := range results {
docContext := fmt.Sprintf("Document %d - %s:\n%s\n\n",
i+1, result.Document.Title, result.Snippet)
if currentLength + len(docContext) > maxContextLength {
break
}
contextBuilder.WriteString(docContext)
currentLength += len(docContext)
}
return contextBuilder.String()
}
This document store implementation provides a sophisticated foundation for our RAG system. The key innovation here is the SimpleIndex struct, which implements a basic but effective TF-IDF (Term Frequency-Inverse Document Frequency) scoring system.
The tokenizeAndNormalize method breaks text into individual words, converts them to lowercase, and filters out very short words that don't contribute much to search relevance. This preprocessing step is crucial for effective text search.
The AddDocument method not only stores documents but also updates our search index. This means that every time we add a new document, it becomes immediately searchable with proper relevance scoring.
The SearchDocuments method uses our index to find documents that match the user's query. It calculates relevance scores based on how frequently query terms appear in each document and returns the most relevant results first.
The GetRelevantContext method is specifically designed for our RAG implementation. It searches for relevant documents and formats them into a context string that we can provide to our LLM along with the user's question.
CREATING A RAG-ENABLED CHAT SYSTEM
Now let's combine our LLM client with our document store to create a RAG-enabled chat system. Create a new file called rag_chat.go:
package main
import (
"bufio"
"fmt"
"os"
"strings"
)
// RAGChatSystem combines document retrieval with LLM generation
type RAGChatSystem struct {
llmClient *OllamaClient
documentStore *DocumentStore
maxContextDocs int
maxContextLength int
}
// NewRAGChatSystem creates a new RAG-enabled chat system
func NewRAGChatSystem(llmClient *OllamaClient, documentStore *DocumentStore) *RAGChatSystem {
return &RAGChatSystem{
llmClient: llmClient,
documentStore: documentStore,
maxContextDocs: 3, // Use top 3 most relevant documents
maxContextLength: 2000, // Limit context to 2000 characters
}
}
// ProcessQuery handles a user query with RAG enhancement
func (rag *RAGChatSystem) ProcessQuery(query string) (string, error) {
// First, search for relevant documents
context := rag.documentStore.GetRelevantContext(
query,
rag.maxContextDocs,
rag.maxContextLength,
)
// If we found relevant context, use it to enhance the response
if context != "No relevant documents found." {
enhancedQuery := fmt.Sprintf(`Based on the following context, please answer the user's question. If the context doesn't contain relevant information, please say so and provide a general answer.
Context:
%s
User Question: %s
Please provide a helpful and accurate answer based on the context provided.`, context, query)
return rag.llmClient.SendMessage(enhancedQuery)
}
// If no relevant context found, just send the query directly
return rag.llmClient.SendMessage(query)
}
// AddDocumentInteractive allows users to add documents through the chat interface
func (rag *RAGChatSystem) AddDocumentInteractive() error {
scanner := bufio.NewScanner(os.Stdin)
fmt.Print("Enter document title: ")
if !scanner.Scan() {
return fmt.Errorf("failed to read title")
}
title := strings.TrimSpace(scanner.Text())
fmt.Print("Enter document content (type 'END' on a new line to finish):\n")
var contentBuilder strings.Builder
for {
if !scanner.Scan() {
break
}
line := scanner.Text()
if line == "END" {
break
}
contentBuilder.WriteString(line + "\n")
}
content := strings.TrimSpace(contentBuilder.String())
if title == "" || content == "" {
return fmt.Errorf("title and content cannot be empty")
}
// Add metadata
metadata := map[string]string{
"source": "user_input",
"type": "manual",
}
doc, err := rag.documentStore.AddDocument(title, content, metadata)
if err != nil {
return fmt.Errorf("failed to add document: %w", err)
}
fmt.Printf("Document '%s' added successfully with ID: %s\n", doc.Title, doc.ID)
return nil
}
// ShowDocumentStats displays information about the document store
func (rag *RAGChatSystem) ShowDocumentStats() {
count := rag.documentStore.GetDocumentCount()
fmt.Printf("Document store contains %d documents.\n", count)
if count > 0 {
docs := rag.documentStore.ListDocuments()
fmt.Println("Recent documents:")
for i, doc := range docs {
if i >= 5 { // Show only the 5 most recent
break
}
fmt.Printf(" - %s (ID: %s, %d words)\n", doc.Title, doc.ID, doc.WordCount)
}
}
}
// SearchDocumentsInteractive allows users to search documents
func (rag *RAGChatSystem) SearchDocumentsInteractive(query string) {
results := rag.documentStore.SearchDocuments(query, 5)
if len(results) == 0 {
fmt.Println("No documents found matching your query.")
return
}
fmt.Printf("Found %d relevant documents:\n\n", len(results))
for i, result := range results {
fmt.Printf("%d. %s (Score: %.2f)\n", i+1, result.Document.Title, result.Score)
fmt.Printf(" Snippet: %s\n", result.Snippet)
fmt.Printf(" Matches: %s\n\n", strings.Join(result.Matches, ", "))
}
}
// RunInteractiveChat starts the main chat loop
func (rag *RAGChatSystem) RunInteractiveChat() {
fmt.Println("RAG-Enhanced Chat System")
fmt.Println("========================")
fmt.Println("Commands:")
fmt.Println(" /add - Add a new document")
fmt.Println(" /search - Search documents")
fmt.Println(" /stats - Show document statistics")
fmt.Println(" /help - Show this help message")
fmt.Println(" /quit - Exit the program")
fmt.Println()
fmt.Println("Just type your question to chat with RAG enhancement!")
fmt.Println("----------------------------------------")
scanner := bufio.NewScanner(os.Stdin)
for {
fmt.Print("You: ")
if !scanner.Scan() {
break
}
input := strings.TrimSpace(scanner.Text())
if input == "" {
continue
}
// Handle commands
switch {
case input == "/quit":
fmt.Println("Goodbye!")
return
case input == "/help":
fmt.Println("Commands:")
fmt.Println(" /add - Add a new document")
fmt.Println(" /search - Search documents")
fmt.Println(" /stats - Show document statistics")
fmt.Println(" /help - Show this help message")
fmt.Println(" /quit - Exit the program")
continue
case input == "/add":
if err := rag.AddDocumentInteractive(); err != nil {
fmt.Printf("Error adding document: %v\n", err)
}
continue
case strings.HasPrefix(input, "/search "):
query := strings.TrimSpace(input[8:])
if query != "" {
rag.SearchDocumentsInteractive(query)
} else {
fmt.Println("Please provide a search query. Example: /search golang tutorial")
}
continue
case input == "/stats":
rag.ShowDocumentStats()
continue
case strings.HasPrefix(input, "/"):
fmt.Println("Unknown command. Type /help for available commands.")
continue
}
// Process regular chat messages with RAG
fmt.Print("Assistant: ")
response, err := rag.ProcessQuery(input)
if err != nil {
fmt.Printf("Error: %v\n", err)
continue
}
fmt.Printf("%s\n\n", response)
}
}
// Example function to populate the document store with sample data
func (rag *RAGChatSystem) AddSampleDocuments() error {
sampleDocs := []struct {
title string
content string
}{
{
"Go Programming Basics",
`Go is a programming language developed by Google. It's designed for simplicity and efficiency.
Go features garbage collection, memory safety, and excellent concurrency support through goroutines.
The language has a clean syntax and compiles to native machine code, making it fast and efficient.
Go is particularly well-suited for building web services, command-line tools, and distributed systems.`,
},
{
"Introduction to Machine Learning",
`Machine Learning is a subset of artificial intelligence that enables computers to learn and improve
from experience without being explicitly programmed. There are three main types: supervised learning,
unsupervised learning, and reinforcement learning. Common algorithms include linear regression,
decision trees, neural networks, and support vector machines. ML is used in applications like
image recognition, natural language processing, and recommendation systems.`,
},
{
"RESTful API Design Principles",
`REST (Representational State Transfer) is an architectural style for designing web services.
Key principles include statelessness, uniform interface, cacheable responses, and layered system architecture.
RESTful APIs use standard HTTP methods (GET, POST, PUT, DELETE) and status codes.
Resources are identified by URLs, and data is typically exchanged in JSON format.
Good API design includes proper error handling, versioning, and comprehensive documentation.`,
},
}
for _, doc := range sampleDocs {
metadata := map[string]string{
"source": "sample_data",
"type": "educational",
}
_, err := rag.documentStore.AddDocument(doc.title, doc.content, metadata)
if err != nil {
return fmt.Errorf("failed to add sample document '%s': %w", doc.title, err)
}
}
fmt.Println("Sample documents added successfully!")
return nil
}
This RAG chat system brings together all the components we've built so far. The ProcessQuery method is the heart of the system - it searches for relevant documents, formats them as context, and sends an enhanced prompt to the LLM.
The system includes several interactive features that make it easy to manage documents and test the RAG functionality. Users can add documents, search through them, and see statistics about their document store.
The AddSampleDocuments method provides some initial content to work with, demonstrating how the system handles different types of technical documentation.
IMPLEMENTING THE MODEL CONTEXT PROTOCOL SERVER
Now let's implement an MCP server that exposes our RAG functionality to other applications. Create a new file called mcp_server.go:
package main
import (
"encoding/json"
"fmt"
"log"
"net/http"
"strconv"
"strings"
)
// MCPServer implements the Model Context Protocol server
type MCPServer struct {
ragSystem *RAGChatSystem
port int
}
// MCPRequest represents a generic MCP request
type MCPRequest struct {
Method string `json:"method"`
Params map[string]interface{} `json:"params"`
ID string `json:"id"`
}
// MCPResponse represents a generic MCP response
type MCPResponse struct {
Result interface{} `json:"result,omitempty"`
Error *MCPError `json:"error,omitempty"`
ID string `json:"id"`
}
// MCPError represents an error in MCP format
type MCPError struct {
Code int `json:"code"`
Message string `json:"message"`
}
// ToolInfo describes an available tool
type ToolInfo struct {
Name string `json:"name"`
Description string `json:"description"`
Parameters map[string]interface{} `json:"parameters"`
}
// NewMCPServer creates a new MCP server
func NewMCPServer(ragSystem *RAGChatSystem, port int) *MCPServer {
return &MCPServer{
ragSystem: ragSystem,
port: port,
}
}
// Start begins serving the MCP protocol
func (s *MCPServer) Start() error {
http.HandleFunc("/mcp", s.handleMCPRequest)
http.HandleFunc("/health", s.handleHealth)
fmt.Printf("MCP Server starting on port %d\n", s.port)
fmt.Printf("Available endpoints:\n")
fmt.Printf(" POST /mcp - MCP protocol endpoint\n")
fmt.Printf(" GET /health - Health check endpoint\n")
return http.ListenAndServe(fmt.Sprintf(":%d", s.port), nil)
}
// handleHealth provides a simple health check endpoint
func (s *MCPServer) handleHealth(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]string{
"status": "healthy",
"service": "mcp-rag-server",
})
}
// handleMCPRequest processes MCP protocol requests
func (s *MCPServer) handleMCPRequest(w http.ResponseWriter, r *http.Request) {
if r.Method != "POST" {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
w.Header().Set("Content-Type", "application/json")
var request MCPRequest
if err := json.NewDecoder(r.Body).Decode(&request); err != nil {
s.sendError(w, request.ID, -32700, "Parse error")
return
}
var response MCPResponse
response.ID = request.ID
switch request.Method {
case "tools/list":
response.Result = s.listTools()
case "tools/call":
result, err := s.callTool(request.Params)
if err != nil {
response.Error = &MCPError{
Code: -32603,
Message: err.Error(),
}
} else {
response.Result = result
}
case "documents/search":
result, err := s.searchDocuments(request.Params)
if err != nil {
response.Error = &MCPError{
Code: -32603,
Message: err.Error(),
}
} else {
response.Result = result
}
case "documents/add":
result, err := s.addDocument(request.Params)
if err != nil {
response.Error = &MCPError{
Code: -32603,
Message: err.Error(),
}
} else {
response.Result = result
}
case "chat/query":
result, err := s.processQuery(request.Params)
if err != nil {
response.Error = &MCPError{
Code: -32603,
Message: err.Error(),
}
} else {
response.Result = result
}
default:
response.Error = &MCPError{
Code: -32601,
Message: "Method not found",
}
}
json.NewEncoder(w).Encode(response)
}
// sendError sends an error response
func (s *MCPServer) sendError(w http.ResponseWriter, id string, code int, message string) {
response := MCPResponse{
Error: &MCPError{
Code: code,
Message: message,
},
ID: id,
}
json.NewEncoder(w).Encode(response)
}
// listTools returns the available tools
func (s *MCPServer) listTools() map[string]interface{} {
tools := []ToolInfo{
{
Name: "search_documents",
Description: "Search through the document store for relevant information",
Parameters: map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"query": map[string]interface{}{
"type": "string",
"description": "The search query",
},
"max_results": map[string]interface{}{
"type": "integer",
"description": "Maximum number of results to return",
"default": 5,
},
},
"required": []string{"query"},
},
},
{
Name: "add_document",
Description: "Add a new document to the store",
Parameters: map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"title": map[string]interface{}{
"type": "string",
"description": "The document title",
},
"content": map[string]interface{}{
"type": "string",
"description": "The document content",
},
"metadata": map[string]interface{}{
"type": "object",
"description": "Additional metadata for the document",
},
},
"required": []string{"title", "content"},
},
},
{
Name: "rag_query",
Description: "Process a query using RAG (Retrieval-Augmented Generation)",
Parameters: map[string]interface{}{
"type": "object",
"properties": map[string]interface{}{
"query": map[string]interface{}{
"type": "string",
"description": "The user's question or query",
},
},
"required": []string{"query"},
},
},
}
return map[string]interface{}{
"tools": tools,
}
}
// callTool executes a tool based on the request
func (s *MCPServer) callTool(params map[string]interface{}) (interface{}, error) {
toolName, ok := params["name"].(string)
if !ok {
return nil, fmt.Errorf("tool name is required")
}
arguments, ok := params["arguments"].(map[string]interface{})
if !ok {
return nil, fmt.Errorf("tool arguments are required")
}
switch toolName {
case "search_documents":
return s.searchDocuments(arguments)
case "add_document":
return s.addDocument(arguments)
case "rag_query":
return s.processQuery(arguments)
default:
return nil, fmt.Errorf("unknown tool: %s", toolName)
}
}
// searchDocuments handles document search requests
func (s *MCPServer) searchDocuments(params map[string]interface{}) (interface{}, error) {
query, ok := params["query"].(string)
if !ok {
return nil, fmt.Errorf("query parameter is required")
}
maxResults := 5
if mr, ok := params["max_results"]; ok {
if mrFloat, ok := mr.(float64); ok {
maxResults = int(mrFloat)
} else if mrStr, ok := mr.(string); ok {
if parsed, err := strconv.Atoi(mrStr); err == nil {
maxResults = parsed
}
}
}
results := s.ragSystem.documentStore.SearchDocuments(query, maxResults)
// Convert results to a format suitable for JSON response
var jsonResults []map[string]interface{}
for _, result := range results {
jsonResults = append(jsonResults, map[string]interface{}{
"document_id": result.Document.ID,
"title": result.Document.Title,
"score": result.Score,
"snippet": result.Snippet,
"matches": result.Matches,
"word_count": result.Document.WordCount,
"created_at": result.Document.CreatedAt,
})
}
return map[string]interface{}{
"results": jsonResults,
"total": len(jsonResults),
}, nil
}
// addDocument handles document addition requests
func (s *MCPServer) addDocument(params map[string]interface{}) (interface{}, error) {
title, ok := params["title"].(string)
if !ok {
return nil, fmt.Errorf("title parameter is required")
}
content, ok := params["content"].(string)
if !ok {
return nil, fmt.Errorf("content parameter is required")
}
metadata := make(map[string]string)
if metaParam, ok := params["metadata"].(map[string]interface{}); ok {
for key, value := range metaParam {
if strValue, ok := value.(string); ok {
metadata[key] = strValue
}
}
}
// Add default metadata
metadata["source"] = "mcp_api"
metadata["type"] = "api_added"
doc, err := s.ragSystem.documentStore.AddDocument(title, content, metadata)
if err != nil {
return nil, fmt.Errorf("failed to add document: %w", err)
}
return map[string]interface{}{
"document_id": doc.ID,
"title": doc.Title,
"word_count": doc.WordCount,
"created_at": doc.CreatedAt,
"message": "Document added successfully",
}, nil
}
// processQuery handles RAG query requests
func (s *MCPServer) processQuery(params map[string]interface{}) (interface{}, error) {
query, ok := params["query"].(string)
if !ok {
return nil, fmt.Errorf("query parameter is required")
}
response, err := s.ragSystem.ProcessQuery(query)
if err != nil {
return nil, fmt.Errorf("failed to process query: %w", err)
}
// Also get the relevant documents that were used
context := s.ragSystem.documentStore.GetRelevantContext(query, 3, 2000)
usedDocuments := s.ragSystem.documentStore.SearchDocuments(query, 3)
var docInfo []map[string]interface{}
for _, result := range usedDocuments {
docInfo = append(docInfo, map[string]interface{}{
"document_id": result.Document.ID,
"title": result.Document.Title,
"score": result.Score,
})
}
return map[string]interface{}{
"response": response,
"used_documents": docInfo,
"context_provided": context != "No relevant documents found.",
}, nil
}
This MCP server implementation exposes our RAG functionality through a standardized protocol that other applications can consume. The server provides three main tools: document search, document addition, and RAG-enhanced query processing.
The server follows the MCP specification by implementing proper request and response formats, error handling, and tool discovery. Other applications can connect to this server and use our RAG capabilities without needing to understand the internal implementation.
BUILDING AN MCP CLIENT
Now let's create an MCP client that can consume services from other MCP servers. Create a new file called mcp_client.go:
package main
import (
"bytes"
"encoding/json"
"fmt"
"io"
"net/http"
"time"
)
// MCPClient provides functionality to connect to MCP servers
type MCPClient struct {
serverURL string
client *http.Client
}
// ToolCallRequest represents a request to call a tool
type ToolCallRequest struct {
Name string `json:"name"`
Arguments map[string]interface{} `json:"arguments"`
}
// NewMCPClient creates a new MCP client
func NewMCPClient(serverURL string) *MCPClient {
return &MCPClient{
serverURL: serverURL,
client: &http.Client{
Timeout: 30 * time.Second,
},
}
}
// ListTools retrieves available tools from the MCP server
func (c *MCPClient) ListTools() (map[string]interface{}, error) {
request := MCPRequest{
Method: "tools/list",
Params: make(map[string]interface{}),
ID: fmt.Sprintf("req_%d", time.Now().UnixNano()),
}
response, err := c.sendRequest(request)
if err != nil {
return nil, err
}
if response.Error != nil {
return nil, fmt.Errorf("MCP error %d: %s", response.Error.Code, response.Error.Message)
}
result, ok := response.Result.(map[string]interface{})
if !ok {
return nil, fmt.Errorf("unexpected response format")
}
return result, nil
}
// CallTool executes a tool on the MCP server
func (c *MCPClient) CallTool(toolName string, arguments map[string]interface{}) (interface{}, error) {
request := MCPRequest{
Method: "tools/call",
Params: map[string]interface{}{
"name": toolName,
"arguments": arguments,
},
ID: fmt.Sprintf("req_%d", time.Now().UnixNano()),
}
response, err := c.sendRequest(request)
if err != nil {
return nil, err
}
if response.Error != nil {
return nil, fmt.Errorf("MCP error %d: %s", response.Error.Code, response.Error.Message)
}
return response.Result, nil
}
// SearchDocuments searches for documents using the MCP server
func (c *MCPClient) SearchDocuments(query string, maxResults int) (interface{}, error) {
arguments := map[string]interface{}{
"query": query,
}
if maxResults > 0 {
arguments["max_results"] = maxResults
}
return c.CallTool("search_documents", arguments)
}
// AddDocument adds a document using the MCP server
func (c *MCPClient) AddDocument(title, content string, metadata map[string]string) (interface{}, error) {
arguments := map[string]interface{}{
"title": title,
"content": content,
}
if metadata != nil {
metaInterface := make(map[string]interface{})
for k, v := range metadata {
metaInterface[k] = v
}
arguments["metadata"] = metaInterface
}
return c.CallTool("add_document", arguments)
}
// ProcessRAGQuery sends a query for RAG processing
func (c *MCPClient) ProcessRAGQuery(query string) (interface{}, error) {
arguments := map[string]interface{}{
"query": query,
}
return c.CallTool("rag_query", arguments)
}
// sendRequest sends an MCP request to the server
func (c *MCPClient) sendRequest(request MCPRequest) (*MCPResponse, error) {
jsonData, err := json.Marshal(request)
if err != nil {
return nil, fmt.Errorf("failed to marshal request: %w", err)
}
url := c.serverURL + "/mcp"
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
if err != nil {
return nil, fmt.Errorf("failed to create request: %w", err)
}
req.Header.Set("Content-Type", "application/json")
resp, err := c.client.Do(req)
if err != nil {
return nil, fmt.Errorf("failed to send request: %w", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return nil, fmt.Errorf("failed to read response: %w", err)
}
if resp.StatusCode != http.StatusOK {
return nil, fmt.Errorf("HTTP error %d: %s", resp.StatusCode, string(body))
}
var response MCPResponse
if err := json.Unmarshal(body, &response); err != nil {
return nil, fmt.Errorf("failed to unmarshal response: %w", err)
}
return &response, nil
}
// CheckHealth checks if the MCP server is healthy
func (c *MCPClient) CheckHealth() error {
url := c.serverURL + "/health"
resp, err := c.client.Get(url)
if err != nil {
return fmt.Errorf("failed to connect to MCP server: %w", err)
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return fmt.Errorf("MCP server returned status %d", resp.StatusCode)
}
return nil
}
This MCP client provides a clean interface for interacting with MCP servers. It handles the protocol details and provides convenient methods for common operations like searching documents, adding documents, and processing RAG queries.
The client includes proper error handling and follows the MCP specification for request and response formats. This makes it easy to integrate with any MCP-compatible server, not just our own implementation.
PUTTING IT ALL TOGETHER
Now let's create a comprehensive main application that demonstrates all the components working together. Update your main.go file:
package main
import (
"flag"
"fmt"
"log"
"os"
)
func main() {
// Command line flags
var (
mode = flag.String("mode", "chat", "Mode to run: chat, server, client, or demo")
port = flag.Int("port", 8080, "Port for MCP server")
serverURL = flag.String("server", "http://localhost:8080", "MCP server URL for client mode")
dataDir = flag.String("data", "./documents", "Directory for document storage")
ollamaURL = flag.String("ollama", "http://localhost:11434", "Ollama server URL")
model = flag.String("model", "llama2", "LLM model to use")
)
flag.Parse()
switch *mode {
case "chat":
runChatMode(*dataDir, *ollamaURL, *model)
case "server":
runServerMode(*dataDir, *ollamaURL, *model, *port)
case "client":
runClientMode(*serverURL)
case "demo":
runDemoMode(*dataDir, *ollamaURL, *model)
default:
fmt.Printf("Unknown mode: %s\n", *mode)
fmt.Println("Available modes: chat, server, client, demo")
os.Exit(1)
}
}
// runChatMode starts the interactive RAG chat system
func runChatMode(dataDir, ollamaURL, model string) {
fmt.Println("Starting RAG Chat Mode...")
// Initialize components
llmClient := NewOllamaClient(ollamaURL, model)
// Check Ollama connection
if err := llmClient.CheckHealth(); err != nil {
log.Fatalf("Cannot connect to Ollama: %v", err)
}
documentStore, err := NewDocumentStore(dataDir)
if err != nil {
log.Fatalf("Failed to initialize document store: %v", err)
}
ragSystem := NewRAGChatSystem(llmClient, documentStore)
// Add sample documents if the store is empty
if documentStore.GetDocumentCount() == 0 {
fmt.Println("Document store is empty. Adding sample documents...")
if err := ragSystem.AddSampleDocuments(); err != nil {
log.Printf("Warning: Failed to add sample documents: %v", err)
}
}
// Start interactive chat
ragSystem.RunInteractiveChat()
}
// runServerMode starts the MCP server
func runServerMode(dataDir, ollamaURL, model string, port int) {
fmt.Println("Starting MCP Server Mode...")
// Initialize components
llmClient := NewOllamaClient(ollamaURL, model)
// Check Ollama connection
if err := llmClient.CheckHealth(); err != nil {
log.Fatalf("Cannot connect to Ollama: %v", err)
}
documentStore, err := NewDocumentStore(dataDir)
if err != nil {
log.Fatalf("Failed to initialize document store: %v", err)
}
ragSystem := NewRAGChatSystem(llmClient, documentStore)
// Add sample documents if the store is empty
if documentStore.GetDocumentCount() == 0 {
fmt.Println("Document store is empty. Adding sample documents...")
if err := ragSystem.AddSampleDocuments(); err != nil {
log.Printf("Warning: Failed to add sample documents: %v", err)
}
}
// Start MCP server
mcpServer := NewMCPServer(ragSystem, port)
if err := mcpServer.Start(); err != nil {
log.Fatalf("Failed to start MCP server: %v", err)
}
}
// runClientMode demonstrates the MCP client
func runClientMode(serverURL string) {
fmt.Println("Starting MCP Client Mode...")
client := NewMCPClient(serverURL)
// Check server health
if err := client.CheckHealth(); err != nil {
log.Fatalf("Cannot connect to MCP server: %v", err)
}
fmt.Println("Connected to MCP server successfully!")
// List available tools
tools, err := client.ListTools()
if err != nil {
log.Fatalf("Failed to list tools: %v", err)
}
fmt.Println("Available tools:")
if toolsList, ok := tools["tools"].([]interface{}); ok {
for _, tool := range toolsList {
if toolMap, ok := tool.(map[string]interface{}); ok {
name := toolMap["name"].(string)
description := toolMap["description"].(string)
fmt.Printf(" - %s: %s\n", name, description)
}
}
}
// Demonstrate document search
fmt.Println("\nSearching for 'Go programming'...")
searchResult, err := client.SearchDocuments("Go programming", 3)
if err != nil {
log.Printf("Search failed: %v", err)
} else {
fmt.Printf("Search result: %+v\n", searchResult)
}
// Demonstrate RAG query
fmt.Println("\nProcessing RAG query...")
ragResult, err := client.ProcessRAGQuery("What is Go programming language?")
if err != nil {
log.Printf("RAG query failed: %v", err)
} else {
fmt.Printf("RAG result: %+v\n", ragResult)
}
}
// runDemoMode provides a comprehensive demonstration
func runDemoMode(dataDir, ollamaURL, model string) {
fmt.Println("Starting Demo Mode...")
fmt.Println("This will demonstrate all components of the RAG system.")
// Initialize components
llmClient := NewOllamaClient(ollamaURL, model)
// Check Ollama connection
if err := llmClient.CheckHealth(); err != nil {
log.Fatalf("Cannot connect to Ollama: %v", err)
}
fmt.Println("✓ Connected to Ollama")
documentStore, err := NewDocumentStore(dataDir)
if err != nil {
log.Fatalf("Failed to initialize document store: %v", err)
}
fmt.Println("✓ Document store initialized")
ragSystem := NewRAGChatSystem(llmClient, documentStore)
// Add sample documents
if err := ragSystem.AddSampleDocuments(); err != nil {
log.Printf("Warning: Failed to add sample documents: %v", err)
}
fmt.Println("✓ Sample documents added")
// Demonstrate document search
fmt.Println("\n--- Document Search Demo ---")
results := documentStore.SearchDocuments("machine learning", 2)
for i, result := range results {
fmt.Printf("%d. %s (Score: %.2f)\n", i+1, result.Document.Title, result.Score)
fmt.Printf(" Snippet: %s\n", result.Snippet)
}
// Demonstrate RAG query
fmt.Println("\n--- RAG Query Demo ---")
response, err := ragSystem.ProcessQuery("What is machine learning?")
if err != nil {
log.Printf("RAG query failed: %v", err)
} else {
fmt.Printf("Question: What is machine learning?\n")
fmt.Printf("Answer: %s\n", response)
}
// Demonstrate without context
fmt.Println("\n--- Query without Context Demo ---")
response2, err := ragSystem.ProcessQuery("What is quantum computing?")
if err != nil {
log.Printf("Query failed: %v", err)
} else {
fmt.Printf("Question: What is quantum computing?\n")
fmt.Printf("Answer: %s\n", response2)
}
fmt.Println("\n--- Demo Complete ---")
fmt.Println("You can now run the system in different modes:")
fmt.Println(" go run . -mode=chat # Interactive chat")
fmt.Println(" go run . -mode=server # Start MCP server")
fmt.Println(" go run . -mode=client # Test MCP client")
}
This comprehensive main application ties everything together and provides multiple ways to interact with our RAG system. Users can run it in different modes depending on their needs.
TESTING AND DEPLOYMENT CONSIDERATIONS
Now that we have a complete RAG system with MCP support, let's discuss how to test and deploy it effectively.
First, make sure you have Ollama installed and running with a suitable model. You can test the basic functionality by running:
go run . -mode=demo
This will demonstrate all the components working together and help you verify that everything is configured correctly.
For interactive use, you can run the chat mode:
go run . -mode=chat
This provides a user-friendly interface for adding documents and asking questions with RAG enhancement.
To test the MCP functionality, start the server in one terminal:
go run . -mode=server -port=8080
Then test the client in another terminal:
go run . -mode=client -server=http://localhost:8080
When deploying this system, consider the following best practices. Ensure that your Ollama service is properly secured and not exposed to the public internet unless necessary. Use environment variables for configuration instead of command-line flags in production. Implement proper logging and monitoring to track system performance and usage. Consider adding authentication and authorization to your MCP server if it will be accessed by multiple clients.
For scaling, you might want to implement a more sophisticated document indexing system using dedicated search engines like Elasticsearch or vector databases for semantic search. You could also add caching layers to improve response times for frequently asked questions.
CONCLUSION AND NEXT STEPS
Congratulations! You've built a complete RAG-enhanced LLM application with MCP support using entirely open source components. This system demonstrates the power of combining local language models with document retrieval and standardized protocols for AI system integration.
Your system now includes a sophisticated document store with text indexing, a RAG-enhanced chat interface, an MCP server that exposes your capabilities to other applications, and an MCP client that can consume external services.
Some potential enhancements you might consider include implementing vector embeddings for semantic search, adding support for different document formats like PDF and Word, implementing user authentication and multi-tenancy, adding real-time document synchronization, and creating a web-based user interface.
The foundation you've built is solid and extensible. You can continue to enhance it with additional features while maintaining the clean architecture and open source principles that make it powerful and flexible.
Remember that the field of AI and language models is rapidly evolving. Keep an eye on new open source models and tools that might enhance your system's capabilities. The modular design you've implemented makes it easy to swap out components as better alternatives become available.
Most importantly, you now have hands-on experience with the fundamental concepts and technologies that power modern AI applications. This knowledge will serve you well as you continue to explore and build in this exciting field.