Friday, May 01, 2026

The Shortest LLM and RAG Chatbot




Introduction

How can we build the shortest possible LLM Chatbot with RAG support? Here is my attempt. Can you beat it?


What is RAG and why is it useful?


RAG helps your chatbot answer questions more accurately by retrieving relevant information from a predefined set of documents (your "knowledge base") before generating a response. Instead of relying solely on the LLM's pre-trained knowledge, it provides the LLM with specific, up-to-date context, reducing hallucinations and improving relevance.


Open Source Components We'll Use:


  • `ollama`: A fantastic tool for running open-source LLMs and embedding models locally. It simplifies model management immensely.
  • `llama3` (or similar): A powerful open-source LLM from Meta, served by `ollama`.
  • `nomic-embed-text`: An open-source embedding model, also served by `ollama`, used to convert text into numerical vectors for searching.
  • `FAISS`: Facebook AI Similarity Search, an open-source library for efficient similarity search and clustering of dense vectors. We'll use it as our vector store.
  • `langchain`: An open-source framework designed to build applications with LLMs, making RAG pipelines very straightforward.


Getting Started: Setup Steps


Before running the code, you'll need to set up your environment.


1.  Install `ollama`:

Download and install `ollama` from their official website: [ollama.com](https://ollama.com/). Follow the instructions for your operating system.


2.  Pull LLM and Embedding Models with `ollama`:

Once `ollama` is installed and running, open your terminal and pull the necessary models. We'll use `llama3` for the LLM and `nomic-embed-text` for embeddings.


    ollama pull llama3

    ollama pull nomic-embed-text

    

(You can choose other `ollama` models if you prefer, just ensure they are pulled and update the model names in the Python code accordingly.)


3.  Install Python Libraries:

You'll need `langchain-community`, `langchain-core`, and `faiss-cpu`.


    pip install langchain-community langchain-core faiss-cpu

   


The Shortest RAG Chatbot Code


Here's the Python code for your RAG chatbot. It's designed to be as minimal as possible while demonstrating the full RAG workflow.



#1. Import necessary components from langchain

from langchain_community.llms import Ollama

from langchain_community.embeddings import OllamaEmbeddings

from langchain_community.vectorstores import FAISS

from langchain_core.documents import Document

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnablePassthrough, RunnableLambda

from langchain_core.output_parsers import StrOutputParser


# --- Configuration ---

LLM_MODEL = "llama3"  # Ensure this model is pulled in Ollama

EMBEDDING_MODEL = "nomic-embed-text" # Ensure this model is pulled in Ollama


# 2. Define your knowledge base (sample documents)

# In a real-world scenario, these would be loaded from files, databases, etc.

documents = [

    Document(page_content="GlobalTech Innovations is a leading technology company specializing in AI, cloud computing, and sustainable energy solutions."),

    Document(page_content="The company was founded by Dr. Evelyn Reed and Mr. David Chen on 15 March 2005."),

    Document(page_content="GlobalTech's headquarters are located in Silicon Valley, California, with major offices in London and Singapore."),

    Document(page_content="Their flagship product, 'Nexus AI', provides advanced data analytics and machine learning capabilities for enterprises."),

    Document(page_content="The current CEO of GlobalTech Innovations is Sarah Jenkins, appointed in January 2023."),

    Document(page_content="GlobalTech is committed to fostering innovation and developing ethical AI practices."),

    Document(page_content="They recently launched 'EcoCloud', a new cloud platform designed for energy efficiency and reduced carbon footprint."),

    Document(page_content="GlobalTech Innovations employs over 10,000 people worldwide and serves a diverse global client base."),

]


# 3. Initialize Embedding Model (via Ollama)

print(f"Initializing embedding model: {EMBEDDING_MODEL}...")

embeddings = OllamaEmbeddings(model=EMBEDDING_MODEL)


# 4. Create and populate the Vector Store (FAISS)

# This step embeds your documents and stores them for efficient retrieval.

print("Creating FAISS vector store from documents...")

vectorstore = FAISS.from_documents(documents, embeddings)

retriever = vectorstore.as_retriever() # A retriever helps fetch relevant documents


# 5. Initialize the Large Language Model (via Ollama)

print(f"Initializing LLM: {LLM_MODEL}...")

llm = Ollama(model=LLM_MODEL)


# 6. Define the RAG Prompt Template

# This template instructs the LLM on how to use the retrieved context.

template = """You are a helpful assistant.

Answer the question based ONLY on the following context.

If you cannot find the answer in the context, politely state that you don't have enough information.


Context:

{context}


Question: {question}

"""

prompt = ChatPromptTemplate.from_template(template)


# 7. Construct the RAG Chain

# This chain defines the flow: retrieve -> format prompt -> invoke LLM -> parse output.

print("Setting up the RAG chain...")

rag_chain = (

    {"context": retriever, "question": RunnablePassthrough()} # Retrieve context, pass question

    | prompt                                             # Apply prompt template

    | llm                                                     # Invoke the LLM

    | StrOutputParser()                           # Parse LLM output to string

)


# 8. Start the Chat Loop

print("\n--- RAG Chatbot Ready! ---")

print(f"Using LLM: {LLM_MODEL}, Embeddings: {EMBEDDING_MODEL}")

print("Type your questions, or 'exit' to quit.")


while True:

    user_query = input("\nYou: ")

    if user_query.lower() == 'exit':

        print("Goodbye! Stay productive!")

        break

    

    print("Bot (thinking...): ", end="", flush=True)

    try:

        response = rag_chain.invoke(user_query)

        print(response)

    except Exception as e:

        print(f"An error occurred: {e}. Please ensure Ollama is running and models are pulled correctly.")




How to Run the Chatbot


1.  Save the code above as a Python file (e.g., `rag_chatbot.py`).

2.  Open your terminal or command prompt.

3.  Navigate to the directory where you saved the file.

4.  Run the script:

    

    python rag_chatbot.py

    

5.  The chatbot will initialize, and then you can start asking questions!



Example Interactions:



You: Who founded GlobalTech Innovations?

Bot: GlobalTech Innovations was founded by Dr. Evelyn Reed and Mr. David Chen.


You: Where is GlobalTech's headquarters?

Bot: GlobalTech's headquarters are located in Silicon Valley, California, with major offices in London and Singapore.


You: What is Nexus AI?

Bot: 'Nexus AI' is GlobalTech Innovations' flagship product, providing advanced data analytics and machine learning capabilities for enterprises.


You: What is the capital of France?

Bot: I don't have enough information to answer that question.



This setup provides a robust, yet incredibly concise, foundation for a RAG-powered chatbot using entirely open-source tools. Enjoy streamlining your information retrieval!