Hitchhiker's Guide to AI, Software Architecture, and Everything Else: Building an LLM-based Research Agent

Introduction

The integration of Large Language Models (LLMs) into research workflows has revolutionized how we approach information discovery and synthesis. An LLM-based research agent represents a sophisticated application that can understand user queries, autonomously gather relevant information from diverse sources, critically evaluate this information, and present it in a coherent manner with proper citations. This article provides a detailed guide for software engineers on how to build such a system, covering both theoretical concepts and practical implementation details.

A research agent differs from a standard chatbot in its ability to act autonomously on behalf of the user, making decisions about what information to retrieve, how to validate it, and how to present findings in a useful way. The agentic nature of these systems allows them to perform complex sequences of actions without constant user intervention, making them particularly valuable for research tasks that require exploring multiple sources and synthesizing diverse information.

System Architecture Overview

The architecture of an LLM-based research agent consists of several interconnected components that work together to process user queries, retrieve information, validate sources, and generate responses. At its core, the system uses one or more LLMs to understand user intent, generate search queries, evaluate information relevance, and synthesize findings.

The research agent operates in a cycle that begins with receiving a user query, proceeds through information gathering and evaluation, and concludes with presenting findings to the user. The user can then refine their query based on the initial results, initiating another cycle of research. This iterative approach allows for progressive refinement of the research focus and ensures that the information provided aligns with the user's needs.

Core Components

User Interface

The user interface serves as the entry point for interactions with the research agent. We can implement this interface in two primary ways: as a text-based chatbot or as a web application. The text-based interface is simpler to implement and can be integrated into existing command-line tools or messaging platforms. The web-based interface offers more flexibility in terms of displaying information, managing interactions, and integrating with other web services.

For a text-based interface, we might use a simple command-line application that processes user input and displays responses. This approach is straightforward but limits the ways in which information can be presented. A web-based interface, on the other hand, can leverage HTML, CSS, and JavaScript to create a more interactive and visually appealing experience. This might include features such as collapsible sections for different sources, interactive citations, and the ability to save or export research findings.

Let's look at a simple example of how we might implement a basic text-based interface using Python:

import readline # For better input handling with command history

class ResearchAgentCLI:

def __init__(self, research_agent):

self.research_agent = research_agent

self.history = []

def start_session(self):

print("Welcome to the Research Agent. Enter your research question or type 'exit' to quit.")

while True:

user_input = input("\nResearch Question: ")

if user_input.lower() == 'exit':

print("Thank you for using the Research Agent. Goodbye!")

break

# Process the user query through the research agent

response = self.research_agent.process_query(user_input)

# Display the response

print("\n" + "="*80)

print("RESEARCH FINDINGS:")

print(response['findings'])

print("\nSOURCES:")

for i, source in enumerate(response['sources'], 1):

print(f"{i}. {source['title']} - {source['authors']} ({source['year']})")

print(f" URL: {source['url']}")

print("="*80)

# Store the interaction in history

self.history.append({

'query': user_input,

'response': response

})

print("\nWould you like to refine your query based on these findings? (yes/no)")

refine = input().lower()

if refine == 'yes':

print("Please enter your refined query:")

else:

print("You can ask a new research question or type 'exit' to quit.")

# Usage example

if __name__ == "__main__":

from research_agent import ResearchAgent

agent = ResearchAgent()

cli = ResearchAgentCLI(agent)

cli.start_session()

This code creates a simple command-line interface for interacting with the research agent. It initializes a session, takes user input, processes it through the research agent, and displays the findings and sources. It also provides an option for the user to refine their query based on the initial results.

For a web-based interface, we might use a framework like Flask or FastAPI to create a backend API that communicates with the research agent, and a frontend built with HTML, CSS, and JavaScript (possibly using a framework like React or Vue.js) to handle user interactions. Here's a simplified example of how we might implement a Flask-based API for our research agent:

from flask import Flask, request, jsonify

from research_agent import ResearchAgent

app = Flask(__name__)

research_agent = ResearchAgent()

@app.route('/api/research', methods=['POST'])

def conduct_research():

data = request.json

user_query = data.get('query', '')

if not user_query:

return jsonify({'error': 'No query provided'}), 400

# Process the query through the research agent

response = research_agent.process_query(user_query)

return jsonify(response)

@app.route('/api/refine', methods=['POST'])

def refine_research():

data = request.json

original_query = data.get('original_query', '')

refined_query = data.get('refined_query', '')

previous_findings = data.get('previous_findings', {})

if not refined_query:

return jsonify({'error': 'No refined query provided'}), 400

# Process the refined query, potentially using previous findings

response = research_agent.refine_query(original_query, refined_query, previous_findings)

return jsonify(response)

if __name__ == '__main__':

app.run(debug=True)

This Flask application creates two API endpoints: one for conducting initial research based on a user query, and another for refining research based on previous findings. The frontend would make HTTP requests to these endpoints and display the results to the user.

Query Processing

The query processing component is responsible for understanding the user's research question and transforming it into a format that can be used for information retrieval. This involves several steps, including parsing the user's input, identifying key concepts and terms, and generating search queries for different information sources.

LLMs are particularly well-suited for this task because they can understand natural language queries and extract relevant information. By prompting the LLM with the user's query and instructions on how to process it, we can generate effective search queries that capture the essence of the research question.

Here's an example of how we might implement query processing using an LLM:

import openai

import os

from typing import List, Dict, Any

class QueryProcessor:

def __init__(self, api_key=None):

# Use provided API key or get from environment variable

self.api_key = api_key or os.getenv("OPENAI_API_KEY")

openai.api_key = self.api_key

def process_query(self, user_query: str) -> Dict[str, Any]:

"""

Process a user query to extract key concepts and generate search queries.

Args:

user_query: The research question or description from the user

Returns:

A dictionary containing processed query information

"""

# Construct a prompt for the LLM

prompt = f"""

I need to conduct research on the following topic:

"{user_query}"

Please help me by:

1. Identifying the main research question

2. Extracting key concepts and terms

3. Generating 3-5 effective search queries for academic databases

4. Suggesting specific academic databases or sources that might be relevant

Format your response as a structured JSON object.

"""

# Call the LLM API

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are a research assistant helping to process research queries."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

# Parse the response

try:

result = json.loads(response.choices[0].message.content)

return result

except json.JSONDecodeError:

# Fallback if the response is not valid JSON

return {

"main_question": user_query,

"key_concepts": [],

"search_queries": [user_query],

"suggested_sources": ["Google Scholar", "PubMed", "arXiv"]

}

def refine_query(self, original_query: str, refined_query: str, previous_results: Dict[str, Any]) -> Dict[str, Any]:

"""

Refine a query based on user feedback and previous results.

Args:

original_query: The original research question

refined_query: The user's refined question or feedback

previous_results: Results from the previous search

Returns:

A dictionary containing the refined query information

"""

# Construct a prompt for the LLM

prompt = f"""

I previously researched:

"{original_query}"

Based on the initial findings, I want to refine my research with this follow-up:

"{refined_query}"

Previous findings summary:

{previous_results.get('summary', 'No summary available')}

Please help me refine my research by:

1. Identifying the new focus or direction

2. Extracting additional key concepts and terms

3. Generating 3-5 refined search queries

4. Suggesting specific sources to focus on or exclude

Format your response as a structured JSON object.

"""

# Call the LLM API

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are a research assistant helping to refine research queries."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

# Parse the response

try:

result = json.loads(response.choices[0].message.content)

return result

except json.JSONDecodeError:

# Fallback if the response is not valid JSON

return {

"new_focus": refined_query,

"additional_concepts": [],

"refined_queries": [refined_query],

"source_recommendations": []

}

This QueryProcessor class uses the OpenAI API to process user queries and generate structured information that can be used for information retrieval. It includes methods for processing initial queries and refining queries based on user feedback and previous results. The LLM is prompted to identify key concepts, generate search queries, and suggest relevant sources, with the results returned in a structured format that can be used by other components of the system.

Information Retrieval System

The information retrieval component is responsible for gathering relevant information from various sources based on the processed query. This includes searching the internet, academic databases, and other repositories for documents, papers, and articles that might contain information relevant to the user's research question.

To implement this component, we need to integrate with various APIs and services that provide access to different information sources. This might include general search engines like Google, academic databases like PubMed or arXiv, and specialized repositories for specific domains.

Here's an example of how we might implement a modular information retrieval system that can search multiple sources:

import requests

import json

import os

import time

from typing import List, Dict, Any

from bs4 import BeautifulSoup

import arxiv

import scholarly

class InformationRetriever:

def __init__(self, api_keys=None):

"""

Initialize the information retriever with API keys for various services.

Args:

api_keys: Dictionary of API keys for different services

"""

self.api_keys = api_keys or {}

self.serp_api_key = self.api_keys.get('serp_api') or os.getenv("SERP_API_KEY")

def search_web(self, query: str, num_results: int = 10) -> List[Dict[str, Any]]:

"""

Search the web using SerpAPI (Google Search API wrapper).

Args:

query: The search query

num_results: Number of results to return

Returns:

A list of search results with title, snippet, and URL

"""

if not self.serp_api_key:

print("Warning: No SerpAPI key provided. Web search disabled.")

return []

url = "https://serpapi.com/search"

params = {

"q": query,

"api_key": self.serp_api_key,

"engine": "google",

"num": num_results

}

try:

response = requests.get(url, params=params)

data = response.json()

results = []

for item in data.get('organic_results', []):

results.append({

'title': item.get('title', ''),

'snippet': item.get('snippet', ''),

'url': item.get('link', ''),

'source_type': 'web'

})

return results

except Exception as e:

print(f"Error searching the web: {e}")

return []

def search_arxiv(self, query: str, max_results: int = 10) -> List[Dict[str, Any]]:

"""

Search arXiv for academic papers.

Args:

query: The search query

max_results: Maximum number of results to return

Returns:

A list of papers with title, authors, abstract, and URL

"""

try:

search = arxiv.Search(

query=query,

max_results=max_results,

sort_by=arxiv.SortCriterion.Relevance

)

results = []

for paper in search.results():

results.append({

'title': paper.title,

'authors': [author for author in paper.authors],

'abstract': paper.summary,

'url': paper.pdf_url,

'published': paper.published.strftime('%Y-%m-%d'),

'source_type': 'arxiv'

})

return results

except Exception as e:

print(f"Error searching arXiv: {e}")

return []

def search_google_scholar(self, query: str, max_results: int = 10) -> List[Dict[str, Any]]:

"""

Search Google Scholar for academic papers.

Args:

query: The search query

max_results: Maximum number of results to return

Returns:

A list of papers with title, authors, abstract, and URL

"""

try:

search_query = scholarly.search_pubs(query)

results = []

count = 0

for paper in search_query:

if count >= max_results:

break

# Get detailed information

try:

detailed_paper = scholarly.fill(paper)

results.append({

'title': detailed_paper.bib.get('title', ''),

'authors': detailed_paper.bib.get('author', []),

'abstract': detailed_paper.bib.get('abstract', ''),

'url': detailed_paper.bib.get('url', ''),

'year': detailed_paper.bib.get('year', ''),

'citations': detailed_paper.citedby,

'source_type': 'google_scholar'

})

count += 1

except Exception as inner_e:

print(f"Error retrieving details for a Google Scholar paper: {inner_e}")

# Avoid rate limiting

time.sleep(1)

return results

except Exception as e:

print(f"Error searching Google Scholar: {e}")

return []

def retrieve_information(self, queries: List[str], sources: List[str] = None) -> Dict[str, List[Dict[str, Any]]]:

"""

Retrieve information from multiple sources based on a list of queries.

Args:

queries: List of search queries

sources: List of sources to search (defaults to all available)

Returns:

A dictionary mapping source names to lists of results

"""

if sources is None:

sources = ['web', 'arxiv', 'google_scholar']

results = {source: [] for source in sources}

for query in queries:

if 'web' in sources:

web_results = self.search_web(query)

results['web'].extend(web_results)

if 'arxiv' in sources:

arxiv_results = self.search_arxiv(query)

results['arxiv'].extend(arxiv_results)

if 'google_scholar' in sources:

scholar_results = self.search_google_scholar(query)

results['google_scholar'].extend(scholar_results)

# Remove duplicates (based on URL)

for source in results:

unique_urls = set()

unique_results = []

for result in results[source]:

url = result.get('url', '')

if url and url not in unique_urls:

unique_urls.add(url)

unique_results.append(result)

results[source] = unique_results

return results

This InformationRetriever class provides methods for searching different sources of information, including the web (using SerpAPI), arXiv, and Google Scholar. It includes a method for retrieving information from multiple sources based on a list of queries, with results organized by source. The class handles error cases and rate limiting to ensure reliable operation.

Source Validation and Reference Management

The source validation and reference management component is responsible for evaluating the credibility and relevance of the information retrieved, and for managing citations and references. This is a critical component for ensuring that the research agent provides reliable and verifiable information.

To implement this component, we need to develop methods for assessing the quality of sources, extracting citation information, and formatting references according to standard citation styles. We can use LLMs to assist with these tasks, particularly for evaluating the relevance and credibility of sources.

Here's an example of how we might implement this component:

import openai

import os

import json

from typing import List, Dict, Any

from datetime import datetime

class SourceValidator:

def __init__(self, api_key=None):

"""

Initialize the source validator with an API key for the LLM service.

Args:

api_key: API key for the LLM service

"""

self.api_key = api_key or os.getenv("OPENAI_API_KEY")

openai.api_key = self.api_key

def validate_sources(self, sources: List[Dict[str, Any]], query: str) -> List[Dict[str, Any]]:

"""

Validate sources for relevance and credibility.

Args:

sources: List of sources to validate

query: The original research query

Returns:

A list of validated sources with additional metadata

"""

validated_sources = []

for source in sources:

# Skip if there's not enough information to validate

if not source.get('title') or not source.get('url'):

continue

# Prepare source information for validation

source_info = {

'title': source.get('title', ''),

'authors': source.get('authors', []),

'abstract': source.get('abstract', source.get('snippet', '')),

'url': source.get('url', ''),

'published_date': source.get('published', source.get('year', '')),

'source_type': source.get('source_type', 'unknown')

}

# Validate the source using the LLM

validation_result = self._validate_with_llm(source_info, query)

# Combine original source info with validation results

validated_source = {**source, **validation_result}

validated_sources.append(validated_source)

# Sort sources by relevance score

validated_sources.sort(key=lambda x: x.get('relevance_score', 0), reverse=True)

return validated_sources

def _validate_with_llm(self, source_info: Dict[str, Any], query: str) -> Dict[str, Any]:

"""

Use an LLM to validate a source for relevance and credibility.

Args:

source_info: Information about the source

query: The original research query

Returns:

A dictionary with validation results

"""

# Construct a prompt for the LLM

prompt = f"""

I'm researching: "{query}"

Please evaluate the following source for relevance and credibility:

Title: {source_info['title']}

Authors: {', '.join(source_info['authors']) if isinstance(source_info['authors'], list) else source_info['authors']}

Abstract/Snippet: {source_info['abstract']}

URL: {source_info['url']}

Published: {source_info['published_date']}

Source Type: {source_info['source_type']}

Please provide:

1. A relevance score (0-100) indicating how relevant this source is to my research query

2. A credibility score (0-100) based on the source type, authors, publication venue, etc.

3. A brief explanation of your evaluation

4. Key points from this source that are relevant to my research

Format your response as a structured JSON object.

"""

try:

# Call the LLM API

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are a research assistant evaluating sources for relevance and credibility."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

# Parse the response

result = json.loads(response.choices[0].message.content)

# Add a timestamp for the validation

result['validated_at'] = datetime.now().isoformat()

return result

except Exception as e:

print(f"Error validating source with LLM: {e}")

# Return default values if validation fails

return {

'relevance_score': 50,

'credibility_score': 50,

'evaluation': "Automated validation failed. Please evaluate this source manually.",

'key_points': [],

'validated_at': datetime.now().isoformat()

}

class ReferenceManager:

def __init__(self):

"""Initialize the reference manager."""

pass

def format_reference(self, source: Dict[str, Any], style: str = 'apa') -> str:

"""

Format a source as a reference according to the specified citation style.

Args:

source: Source information

style: Citation style (apa, mla, chicago, etc.)

Returns:

A formatted reference string

"""

if style.lower() == 'apa':

return self._format_apa(source)

elif style.lower() == 'mla':

return self._format_mla(source)

elif style.lower() == 'chicago':

return self._format_chicago(source)

else:

return self._format_apa(source) # Default to APA

def _format_apa(self, source: Dict[str, Any]) -> str:

"""Format a reference in APA style."""

authors = source.get('authors', [])

if isinstance(authors, list):

if len(authors) == 0:

author_str = "No author"

elif len(authors) == 1:

author_str = authors[0]

elif len(authors) < 8:

author_str = ", ".join(authors[:-1]) + ", & " + authors[-1]

else:

author_str = ", ".join(authors[:6]) + ", ... " + authors[-1]

else:

author_str = authors

year = source.get('year', source.get('published', 'n.d.'))

if isinstance(year, str) and len(year) >= 4:

year = year[:4] # Extract just the year

title = source.get('title', 'Untitled')

url = source.get('url', '')

# Format based on source type

source_type = source.get('source_type', '').lower()

if source_type == 'arxiv' or source_type == 'google_scholar':

# Format as a journal article or preprint

return f"{author_str}. ({year}). {title}. Retrieved from {url}"

else:

# Format as a web page

return f"{author_str}. ({year}). {title}. Retrieved from {url}"

def _format_mla(self, source: Dict[str, Any]) -> str:

"""Format a reference in MLA style."""

# Implementation similar to _format_apa but following MLA guidelines

# This is a simplified version

authors = source.get('authors', [])

if isinstance(authors, list):

if len(authors) == 0:

author_str = "No author"

elif len(authors) == 1:

author_str = authors[0]

elif len(authors) == 2:

author_str = authors[0] + " and " + authors[1]

else:

author_str = authors[0] + ", et al."

else:

author_str = authors

title = source.get('title', 'Untitled')

url = source.get('url', '')

# Format based on source type

return f"{author_str}. \"{title}.\" {url}. Accessed {datetime.now().strftime('%d %b. %Y')}."

def _format_chicago(self, source: Dict[str, Any]) -> str:

"""Format a reference in Chicago style."""

# Implementation similar to _format_apa but following Chicago guidelines

# This is a simplified version

authors = source.get('authors', [])

if isinstance(authors, list):

if len(authors) == 0:

author_str = "No author"

elif len(authors) == 1:

author_str = authors[0]

else:

author_str = authors[0] + ", et al."

else:

author_str = authors

year = source.get('year', source.get('published', 'n.d.'))

if isinstance(year, str) and len(year) >= 4:

year = year[:4] # Extract just the year

title = source.get('title', 'Untitled')

url = source.get('url', '')

return f"{author_str}. {year}. \"{title}.\" {url}."

This code includes two classes: SourceValidator and ReferenceManager. The SourceValidator class uses an LLM to evaluate sources for relevance and credibility, providing scores and explanations for each source. The ReferenceManager class formats references according to different citation styles, including APA, MLA, and Chicago. Together, these classes ensure that the research agent provides reliable information with proper citations.

Response Generation

The response generation component is responsible for synthesizing the information retrieved and validated into a coherent response that addresses the user's research question. This involves summarizing the findings, highlighting key points, and presenting the information in a clear and organized manner.

LLMs are particularly well-suited for this task because they can understand and generate natural language text. By providing the LLM with the validated information and instructions on how to synthesize it, we can generate comprehensive and coherent responses.

Here's an example of how we might implement this component:

import openai

import os

import json

from typing import List, Dict, Any

class ResponseGenerator:

def __init__(self, api_key=None):

"""

Initialize the response generator with an API key for the LLM service.

Args:

api_key: API key for the LLM service

"""

self.api_key = api_key or os.getenv("OPENAI_API_KEY")

openai.api_key = self.api_key

def generate_response(self, query: str, validated_sources: List[Dict[str, Any]],

reference_style: str = 'apa') -> Dict[str, Any]:

"""

Generate a comprehensive response based on validated sources.

Args:

query: The original research query

validated_sources: List of validated sources

reference_style: Citation style for references

Returns:

A dictionary containing the response and metadata

"""

# Filter sources by relevance

relevant_sources = [s for s in validated_sources if s.get('relevance_score', 0) > 50]

# If no relevant sources, return a message indicating this

if not relevant_sources:

return {

'findings': "I couldn't find any relevant information for your query. Please try refining your question or exploring different keywords.",

'sources': [],

'summary': "No relevant information found."

}

# Prepare source information for the LLM

source_info = []

for i, source in enumerate(relevant_sources[:10]): # Limit to top 10 sources

source_info.append({

'id': i + 1,

'title': source.get('title', ''),

'authors': source.get('authors', []),

'key_points': source.get('key_points', []),

'relevance_score': source.get('relevance_score', 0),

'credibility_score': source.get('credibility_score', 0),

'evaluation': source.get('evaluation', '')

})

# Generate a response using the LLM

response = self._synthesize_with_llm(query, source_info)

# Format references for the sources

formatted_sources = []

for i, source in enumerate(relevant_sources[:10]):

reference_id = f"[{i+1}]"

formatted_reference = self._format_reference(source, reference_style)

formatted_sources.append({

'id': reference_id,

'reference': formatted_reference,

'title': source.get('title', ''),

'authors': source.get('authors', []),

'url': source.get('url', ''),

'year': source.get('year', source.get('published', '')),

'relevance_score': source.get('relevance_score', 0),

'credibility_score': source.get('credibility_score', 0)

})

# Return the complete response

return {

'findings': response['findings'],

'sources': formatted_sources,

'summary': response['summary'],

'limitations': response.get('limitations', 'No specific limitations noted.')

}

def _synthesize_with_llm(self, query: str, source_info: List[Dict[str, Any]]) -> Dict[str, Any]:

"""

Use an LLM to synthesize information from multiple sources.

Args:

query: The original research query

source_info: Information about relevant sources

Returns:

A dictionary with the synthesized findings and metadata

"""

# Construct a prompt for the LLM

sources_text = "\n\n".join([

f"Source {s['id']}:\nTitle: {s['title']}\nAuthors: {', '.join(s['authors']) if isinstance(s['authors'], list) else s['authors']}\nKey Points: {'; '.join(s['key_points'])}\nRelevance: {s['relevance_score']}/100, Credibility: {s['credibility_score']}/100"

for s in source_info

])

prompt = f"""

I'm researching: "{query}"

I've found the following relevant sources:

{sources_text}

Please synthesize this information into a comprehensive response that addresses my research question. Your response should:

1. Provide a clear and detailed answer to my question based on the sources

2. Highlight key findings and insights

3. Note areas of consensus and disagreement among sources

4. Identify any limitations or gaps in the available information

5. Include in-text citations using the format [Source ID] (e.g., [1], [2], etc.)

Also provide:

- A brief summary (2-3 sentences) of the overall findings

- Any limitations of the current research

Format your response as a structured JSON object with 'findings', 'summary', and 'limitations' fields.

"""

try:

# Call the LLM API

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are a research assistant synthesizing information from multiple sources."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

# Parse the response

result = json.loads(response.choices[0].message.content)

return result

except Exception as e:

print(f"Error synthesizing response with LLM: {e}")

# Return a default response if synthesis fails

return {

'findings': "I encountered an error while synthesizing the information. Here are the key points from the most relevant sources:\n\n" +

"\n\n".join([f"Source {s['id']}: {s['title']}\n" +

"Key Points: " + "; ".join(s['key_points']) for s in source_info]),

'summary': "Synthesis failed. Please review the key points from each source.",

'limitations': "The automated synthesis process encountered an error."

}

def _format_reference(self, source: Dict[str, Any], style: str) -> str:

"""

Format a reference according to the specified citation style.

Args:

source: Source information

style: Citation style

Returns:

A formatted reference string

"""

# This is a simplified version - in a real implementation, you would use a more robust

# reference formatting system or library

authors = source.get('authors', [])

if isinstance(authors, list):

if len(authors) == 0:

author_str = "No author"

elif len(authors) == 1:

author_str = authors[0]

elif len(authors) < 8:

author_str = ", ".join(authors[:-1]) + ", & " + authors[-1]

else:

author_str = ", ".join(authors[:6]) + ", ... " + authors[-1]

else:

author_str = authors

year = source.get('year', source.get('published', 'n.d.'))

if isinstance(year, str) and len(year) >= 4:

year = year[:4] # Extract just the year

title = source.get('title', 'Untitled')

url = source.get('url', '')

if style.lower() == 'apa':

return f"{author_str}. ({year}). {title}. Retrieved from {url}"

elif style.lower() == 'mla':

return f"{author_str}. \"{title}.\" {url}. Accessed {datetime.now().strftime('%d %b. %Y')}."

elif style.lower() == 'chicago':

return f"{author_str}. {year}. \"{title}.\" {url}."

else:

return f"{author_str}. ({year}). {title}. {url}"

This ResponseGenerator class uses an LLM to synthesize information from multiple validated sources into a coherent response. It includes methods for generating the response and formatting references according to different citation styles. The response includes the synthesized findings, a summary, and information about any limitations or gaps in the available information.

Iterative Refinement Mechanism

The iterative refinement mechanism allows users to refine their research questions based on the initial findings. This is a critical feature for a research agent because it enables a more interactive and dynamic research process, where the user can progressively narrow down their focus or explore different aspects of their topic.

To implement this mechanism, we need to develop methods for processing refined queries and incorporating previous findings into the new search. This involves maintaining a history of interactions and using this history to inform subsequent searches.

Here's an example of how we might implement this component:

import json

from typing import List, Dict, Any

from datetime import datetime

class ResearchSession:

def __init__(self, session_id=None):

"""

Initialize a research session.

Args:

session_id: Unique identifier for the session

"""

self.session_id = session_id or datetime.now().strftime("%Y%m%d%H%M%S")

self.interactions = []

self.current_query = None

self.current_findings = None

def add_interaction(self, query: str, findings: Dict[str, Any]) -> None:

"""

Add an interaction to the session history.

Args:

query: The research query

findings: The findings returned for the query

"""

interaction = {

'timestamp': datetime.now().isoformat(),

'query': query,

'findings': findings

}

self.interactions.append(interaction)

self.current_query = query

self.current_findings = findings

def get_session_history(self) -> List[Dict[str, Any]]:

"""

Get the session history.

Returns:

A list of interactions in the session

"""

return self.interactions

def get_latest_interaction(self) -> Dict[str, Any]:

"""

Get the latest interaction in the session.

Returns:

The latest interaction or None if there are no interactions

"""

if self.interactions:

return self.interactions[-1]

return None

def save_session(self, file_path: str) -> None:

"""

Save the session to a file.

Args:

file_path: Path to the file

"""

session_data = {

'session_id': self.session_id,

'interactions': self.interactions,

'created_at': self.interactions[0]['timestamp'] if self.interactions else datetime.now().isoformat(),

'last_updated': datetime.now().isoformat()

}

with open(file_path, 'w') as f:

json.dump(session_data, f, indent=2)

@classmethod

def load_session(cls, file_path: str) -> 'ResearchSession':

"""

Load a session from a file.

Args:

file_path: Path to the file

Returns:

A ResearchSession object

"""

with open(file_path, 'r') as f:

session_data = json.load(f)

session = cls(session_id=session_data.get('session_id'))

session.interactions = session_data.get('interactions', [])

if session.interactions:

session.current_query = session.interactions[-1]['query']

session.current_findings = session.interactions[-1]['findings']

return session

class RefinementManager:

def __init__(self, api_key=None):

"""

Initialize the refinement manager with an API key for the LLM service.

Args:

api_key: API key for the LLM service

"""

self.api_key = api_key or os.getenv("OPENAI_API_KEY")

openai.api_key = self.api_key

def process_refinement(self, session: ResearchSession, refined_query: str) -> Dict[str, Any]:

"""

Process a refined query based on the session history.

Args:

session: The research session

refined_query: The refined query from the user

Returns:

A dictionary with information for the refined search

"""

# Get the latest interaction

latest = session.get_latest_interaction()

if not latest:

# If there's no previous interaction, treat this as a new query

return {

'query': refined_query,

'search_queries': [refined_query],

'suggested_sources': ["Google Scholar", "PubMed", "arXiv"],

'is_refinement': False

}

# Use the LLM to analyze the refinement

refinement_info = self._analyze_refinement(latest['query'], refined_query, latest['findings'])

return {

'original_query': latest['query'],

'refined_query': refined_query,

'search_queries': refinement_info.get('search_queries', [refined_query]),

'suggested_sources': refinement_info.get('suggested_sources', []),

'focus_areas': refinement_info.get('focus_areas', []),

'exclude_areas': refinement_info.get('exclude_areas', []),

'is_refinement': True

}

def _analyze_refinement(self, original_query: str, refined_query: str,

previous_findings: Dict[str, Any]) -> Dict[str, Any]:

"""

Use an LLM to analyze how a query has been refined.

Args:

original_query: The original research query

refined_query: The refined query from the user

previous_findings: Findings from the previous search

Returns:

A dictionary with analysis of the refinement

"""

# Construct a prompt for the LLM

prompt = f"""

Original Research Question: "{original_query}"

Refined Research Question: "{refined_query}"

Previous Findings Summary:

{previous_findings.get('summary', 'No summary available')}

Please analyze how the research question has been refined and provide:

1. 3-5 specific search queries that would help address the refined question

2. Suggested sources that would be most relevant for this refinement

3. Specific areas or aspects to focus on

4. Areas or aspects from the original search that should be excluded

Format your response as a structured JSON object.

"""

try:

# Call the LLM API

response = openai.ChatCompletion.create(

model="gpt-4",

messages=[

{"role": "system", "content": "You are a research assistant helping to refine research queries."},

{"role": "user", "content": prompt}

response_format={"type": "json_object"}

)

# Parse the response

result = json.loads(response.choices[0].message.content)

return result

except Exception as e:

print(f"Error analyzing refinement with LLM: {e}")

# Return default values if analysis fails

return {

'search_queries': [refined_query],

'suggested_sources': ["Google Scholar", "PubMed", "arXiv"],

'focus_areas': [],

'exclude_areas': []

}

This code includes two classes: ResearchSession and RefinementManager. The ResearchSession class manages the history of interactions in a research session, including queries and findings. The RefinementManager class processes refined queries based on the session history, using an LLM to analyze how the query has been refined and to generate appropriate search queries and suggestions for the refined search.

Technical Implementation

LLM Selection and Integration

The choice of LLM is a critical decision in building a research agent. There are two main categories to consider: commercial LLMs provided as services (like OpenAI's GPT-4, Anthropic's Claude, or Google's Gemini) and local LLMs that can be run on your own hardware (like Llama 2, Mistral, or Falcon).

Commercial LLMs typically offer better performance and are easier to integrate, but they come with costs, potential privacy concerns, and dependency on external services. Local LLMs provide more control, privacy, and potentially lower costs for high-volume usage, but they require more technical expertise to set up and may have lower performance, especially on consumer hardware.

For a research agent that needs to process and synthesize complex information, a more capable model is generally preferable. However, the choice depends on your specific requirements, budget, and technical constraints.

Here's an example of how we might implement a modular LLM interface that can work with different models:

import os

import json

from typing import List, Dict, Any, Union, Optional

import openai

from transformers import AutoModelForCausalLM, AutoTokenizer

import torch

class LLMInterface:

def __init__(self, model_type: str = 'openai', model_name: str = 'gpt-4',

api_key: str = None, device: str = None):

"""

Initialize the LLM interface.

Args:

model_type: Type of LLM ('openai', 'local', etc.)

model_name: Name of the model

api_key: API key for commercial LLMs

device: Device to run local models on ('cuda', 'rocm', 'mps', 'cpu')

"""

self.model_type = model_type.lower()

self.model_name = model_name

# Set up based on model type

if self.model_type == 'openai':

self.api_key = api_key or os.getenv("OPENAI_API_KEY")

openai.api_key = self.api_key

self.model = None

self.tokenizer = None

elif self.model_type == 'local':

# Determine device for local models

if device:

self.device = device

else:

if torch.cuda.is_available():

self.device = 'cuda'

elif hasattr(torch, 'has_mps') and torch.has_mps:

self.device = 'mps'

elif hasattr(torch, 'has_rocm') and torch.has_rocm:

self.device = 'rocm'

else:

self.device = 'cpu'

print(f"Loading local model {model_name} on {self.device}...")

self.tokenizer = AutoTokenizer.from_pretrained(model_name)

self.model = AutoModelForCausalLM.from_pretrained(

model_name,

torch_dtype=torch.float16 if self.device != 'cpu' else torch.float32,

device_map=self.device

)

else:

raise ValueError(f"Unsupported model type: {model_type}")

def generate(self, prompt: str, system_message: str = None,

max_tokens: int = 1000, temperature: float = 0.7,

json_format: bool = False) -> Dict[str, Any]:

"""

Generate a response from the LLM.

Args:

prompt: The prompt to send to the LLM

system_message: System message for models that support it

max_tokens: Maximum number of tokens to generate

temperature: Temperature for generation

json_format: Whether to request JSON format output

Returns:

A dictionary with the generated text and metadata

"""

if self.model_type == 'openai':

return self._generate_openai(prompt, system_message, max_tokens, temperature, json_format)

elif self.model_type == 'local':

return self._generate_local(prompt, system_message, max_tokens, temperature)

else:

raise ValueError(f"Unsupported model type: {self.model_type}")

def _generate_openai(self, prompt: str, system_message: str = None,

max_tokens: int = 1000, temperature: float = 0.7,

json_format: bool = False) -> Dict[str, Any]:

"""Generate text using OpenAI API."""

messages = []

if system_message:

messages.append({"role": "system", "content": system_message})

messages.append({"role": "user", "content": prompt})

kwargs = {

"model": self.model_name,

"messages": messages,

"max_tokens": max_tokens,

"temperature": temperature

}

if json_format:

kwargs["response_format"] = {"type": "json_object"}

try:

response = openai.ChatCompletion.create(**kwargs)

return {

'text': response.choices[0].message.content,

'model': self.model_name,

'finish_reason': response.choices[0].finish_reason,

'usage': {

'prompt_tokens': response.usage.prompt_tokens,

'completion_tokens': response.usage.completion_tokens,

'total_tokens': response.usage.total_tokens

}

except Exception as e:

print(f"Error generating text with OpenAI: {e}")

return {

'text': f"Error generating response: {str(e)}",

'model': self.model_name,

'error': str(e)

}

def _generate_local(self, prompt: str, system_message: str = None,

max_tokens: int = 1000, temperature: float = 0.7) -> Dict[str, Any]:

"""Generate text using a local model."""

try:

# Combine system message and prompt if provided

full_prompt = ""

if system_message:

full_prompt = f"{system_message}\n\n"

full_prompt += prompt

# Tokenize the input

inputs = self.tokenizer(full_prompt, return_tensors="pt").to(self.device)

# Generate response

with torch.no_grad():

outputs = self.model.generate(

inputs.input_ids,

max_new_tokens=max_tokens,

temperature=temperature,

do_sample=temperature > 0,

pad_token_id=self.tokenizer.eos_token_id

)

# Decode the response

generated_text = self.tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)

return {

'text': generated_text,

'model': self.model_name,

'finish_reason': 'stop',

'usage': {

'prompt_tokens': inputs.input_ids.shape[1],

'completion_tokens': outputs.shape[1] - inputs.input_ids.shape[1],

'total_tokens': outputs.shape[1]

}

except Exception as e:

print(f"Error generating text with local model: {e}")

return {

'text': f"Error generating response: {str(e)}",

'model': self.model_name,

'error': str(e)

}

This LLMInterface class provides a unified interface for interacting with different types of LLMs, including commercial APIs like OpenAI and local models using the Hugging Face Transformers library. It handles device selection for local models, supporting NVIDIA CUDA, AMD ROCm, and Apple MPS for GPU acceleration. The class includes methods for generating text with different parameters and formats, with appropriate error handling.

GPU Acceleration

GPU acceleration is essential for running large language models efficiently, especially when using local models. Different hardware platforms require different approaches to GPU acceleration:

1. NVIDIA CUDA: The most widely supported platform for deep learning, used with NVIDIA GPUs.

2. AMD ROCm: AMD's open-source platform for GPU computing, used with AMD GPUs.

3. Apple MPS (Metal Performance Shaders): Apple's framework for GPU computing on Mac devices with Apple Silicon or compatible AMD GPUs.

To support these different platforms, we need to configure our environment appropriately and ensure that our code can detect and use the available hardware.

Here's an example of how we might implement a utility class for managing GPU resources:

import os

import torch

import platform

import subprocess

from typing import Dict, Any, List, Optional

class GPUManager:

def __init__(self):

"""Initialize the GPU manager."""

self.device = self._detect_device()

self.device_info = self._get_device_info()

def _detect_device(self) -> str:

"""

Detect the available GPU device.

Returns:

Device string ('cuda', 'rocm', 'mps', or 'cpu')

"""

if torch.cuda.is_available():

return 'cuda'

elif hasattr(torch, 'has_mps') and torch.has_mps:

return 'mps'

elif hasattr(torch, 'has_rocm') and torch.has_rocm:

return 'rocm'

else:

return 'cpu'

def _get_device_info(self) -> Dict[str, Any]:

"""

Get information about the detected device.

Returns:

A dictionary with device information

"""

info = {

'device_type': self.device,

'platform': platform.system()

}

if self.device == 'cuda':

info['device_count'] = torch.cuda.device_count()

info['current_device'] = torch.cuda.current_device()

info['device_name'] = torch.cuda.get_device_name(info['current_device'])

info['memory_allocated'] = torch.cuda.memory_allocated(info['current_device'])

info['memory_reserved'] = torch.cuda.memory_reserved(info['current_device'])

try:

# Try to get more detailed information using nvidia-smi

result = subprocess.run(['nvidia-smi', '--query-gpu=name,memory.total,memory.free,memory.used,temperature.gpu', '--format=csv,noheader'],

capture_output=True, text=True)

if result.returncode == 0:

parts = result.stdout.strip().split(', ')

if len(parts) >= 5:

info['detailed'] = {

'name': parts[0],

'memory_total': parts[1],

'memory_free': parts[2],

'memory_used': parts[3],

'temperature': parts[4]

}

except:

pass

elif self.device == 'rocm':

info['device_count'] = torch.cuda.device_count() # ROCm uses CUDA API

info['current_device'] = torch.cuda.current_device()

info['device_name'] = torch.cuda.get_device_name(info['current_device'])

info['memory_allocated'] = torch.cuda.memory_allocated(info['current_device'])

info['memory_reserved'] = torch.cuda.memory_reserved(info['current_device'])

try:

# Try to get more detailed information using rocm-smi

result = subprocess.run(['rocm-smi', '--showmeminfo', 'vram'],

capture_output=True, text=True)

if result.returncode == 0:

info['detailed'] = {'rocm_smi_output': result.stdout.strip()}

except:

pass

elif self.device == 'mps':

# Limited information available for MPS

info['device_name'] = 'Apple Silicon GPU'

else:

info['device_name'] = 'CPU'

info['cpu_count'] = os.cpu_count()

return info

def optimize_for_inference(self, model) -> Any:

"""

Optimize a model for inference on the detected device.

Args:

model: The model to optimize

Returns:

The optimized model

"""

if self.device == 'cuda':

# Convert to half precision for NVIDIA GPUs

model = model.half().to(self.device)

# Could add CUDA-specific optimizations here

elif self.device == 'rocm':

# ROCm optimizations

model = model.half().to(self.device)

elif self.device == 'mps':

# MPS optimizations

model = model.to(self.device)

else:

# CPU optimizations

model = model.float().to(self.device)

return model

def get_recommended_batch_size(self) -> int:

"""

Get a recommended batch size based on the device.

Returns:

Recommended batch size

"""

if self.device == 'cpu':

return 1

elif self.device == 'mps':

return 4 # Conservative default for Apple Silicon

else:

# For CUDA and ROCm, try to estimate based on available memory

try:

if self.device == 'cuda' or self.device == 'rocm':

free_memory = torch.cuda.get_device_properties(0).total_memory - torch.cuda.memory_allocated(0)

# Very rough estimate: assume 1GB per batch item for a large model

batch_size = max(1, int(free_memory / (1024**3 * 1e9)))

return min(8, batch_size) # Cap at 8 to be safe

except:

pass

# Default if estimation fails

return 2

def print_device_info(self) -> None:

"""Print information about the detected device."""

print(f"Device: {self.device_info['device_type']}")

print(f"Device Name: {self.device_info['device_name']}")

if self.device == 'cuda' or self.device == 'rocm':

print(f"Device Count: {self.device_info['device_count']}")

print(f"Current Device: {self.device_info['current_device']}")

print(f"Memory Allocated: {self.device_info['memory_allocated'] / 1024**2:.2f} MB")

print(f"Memory Reserved: {self.device_info['memory_reserved'] / 1024**2:.2f} MB")

if 'detailed' in self.device_info:

print("\nDetailed Information:")

for key, value in self.device_info['detailed'].items():

print(f" {key}: {value}")

elif self.device == 'cpu':

print(f"CPU Count: {self.device_info['cpu_count']}")

This GPUManager class provides utilities for detecting and managing GPU resources. It can identify the type of GPU available (NVIDIA CUDA, AMD ROCm, or Apple MPS), gather information about the device, optimize models for inference on the detected device, and recommend appropriate batch sizes based on available memory. This helps ensure that the research agent can run efficiently on different hardware platforms.

Tool Integration for Agentic Capabilities

An agentic AI system can use various tools to extend its capabilities beyond what the base LLM can do. For a research agent, these tools might include web browsers, search engines, PDF readers, citation managers, and other utilities that help with gathering and processing information.

To implement tool integration, we need to create a framework that allows the LLM to select and use appropriate tools based on the user's query and the current state of the research process. This involves defining a set of tools with clear interfaces, implementing a mechanism for the LLM to select tools, and handling the results of tool usage.

Here's an example of how we might implement a tool integration framework:

import json

import requests

import subprocess

import os

from typing import List, Dict, Any, Callable, Optional

from abc import ABC, abstractmethod

class Tool(ABC):

"""Abstract base class for tools."""

@property

@abstractmethod

def name(self) -> str:

"""Get the name of the tool."""

pass

@property

@abstractmethod

def description(self) -> str:

"""Get the description of the tool."""

pass

@abstractmethod

def run(self, input_data: Any) -> Dict[str, Any]:

"""

Run the tool with the given input.

Args:

input_data: Input data for the tool

Returns:

A dictionary with the results

"""

pass

class WebSearchTool(Tool):

"""Tool for searching the web."""

def __init__(self, api_key: str = None):

"""

Initialize the web search tool.

Args:

api_key: API key for the search service

"""

self.api_key = api_key or os.getenv("SERP_API_KEY")

@property

def name(self) -> str:

return "web_search"

@property

def description(self) -> str:

return "Search the web for information on a topic"

def run(self, input_data: str) -> Dict[str, Any]:

"""

Search the web for the given query.

Args:

input_data: Search query

Returns:

A dictionary with search results

"""

if not self.api_key:

return {"error": "No API key provided for web search"}

url = "https://serpapi.com/search"

params = {

"q": input_data,

"api_key": self.api_key,

"engine": "google"

}

try:

response = requests.get(url, params=params)

data = response.json()

results = []

for item in data.get('organic_results', [])[:5]:

results.append({

'title': item.get('title', ''),

'snippet': item.get('snippet', ''),

'link': item.get('link', '')

})

return {

"query": input_data,

"results": results

}

except Exception as e:

return {"error": f"Error searching the web: {str(e)}"}

class ArxivSearchTool(Tool):

"""Tool for searching arXiv."""

@property

def name(self) -> str:

return "arxiv_search"

@property

def description(self) -> str:

return "Search arXiv for academic papers on a topic"

def run(self, input_data: str) -> Dict[str, Any]:

"""

Search arXiv for the given query.

Args:

input_data: Search query

Returns:

A dictionary with search results

"""

try:

import arxiv

search = arxiv.Search(

query=input_data,

max_results=5,

sort_by=arxiv.SortCriterion.Relevance

)

results = []

for paper in search.results():

results.append({

'title': paper.title,

'authors': [author for author in paper.authors],

'abstract': paper.summary,

'url': paper.pdf_url,

'published': paper.published.strftime('%Y-%m-%d')

})

return {

"query": input_data,

"results": results

}

except Exception as e:

return {"error": f"Error searching arXiv: {str(e)}"}

class PDFReaderTool(Tool):

"""Tool for reading PDF documents."""

@property

def name(self) -> str:

return "pdf_reader"

@property

def description(self) -> str:

return "Extract text from a PDF document"

def run(self, input_data: str) -> Dict[str, Any]:

"""

Extract text from a PDF document.

Args:

input_data: URL or path to the PDF document

Returns:

A dictionary with the extracted text

"""

try:

import PyPDF2

import io

# Check if input is a URL or a local path

if input_data.startswith(('http://', 'https://')):

# Download the PDF

response = requests.get(input_data)

pdf_file = io.BytesIO(response.content)

else:

# Open local file

pdf_file = open(input_data, 'rb')

# Extract text

reader = PyPDF2.PdfReader(pdf_file)

text = ""

for page_num in range(min(5, len(reader.pages))): # Limit to first 5 pages

page = reader.pages[page_num]

text += page.extract_text() + "\n\n"

# Close the file if it's a local file

if not input_data.startswith(('http://', 'https://')):

pdf_file.close()

return {

"source": input_data,

"text": text[:10000], # Limit text length

"pages_read": min(5, len(reader.pages)),

"total_pages": len(reader.pages)

}

except Exception as e:

return {"error": f"Error reading PDF: {str(e)}"}

class ToolManager:

"""Manager for tools used by the research agent."""

def __init__(self, llm_interface):

"""

Initialize the tool manager.

Args:

llm_interface: Interface to the LLM

"""

self.llm_interface = llm_interface

self.tools = {}

# Register default tools

self.register_tool(WebSearchTool())

self.register_tool(ArxivSearchTool())

self.register_tool(PDFReaderTool())

def register_tool(self, tool: Tool) -> None:

"""

Args:

tool: The tool to register

"""

self.tools[tool.name] = tool

def get_tool(self, tool_name: str) -> Optional[Tool]:

"""

Get a tool by name.

Args:

tool_name: Name of the tool

Returns:

The tool or None if not found

"""

return self.tools.get(tool_name)

def list_tools(self) -> List[Dict[str, str]]:

"""

List all available tools.

Returns:

A list of dictionaries with tool information

"""

return [{"name": tool.name, "description": tool.description} for tool in self.tools.values()]

def select_tool(self, query: str) -> Dict[str, Any]:

"""

Use the LLM to select an appropriate tool for a query.

Args:

query: The user's query

Returns:

A dictionary with the selected tool and parameters

"""

tools_list = self.list_tools()

tools_json = json.dumps(tools_list)

prompt = f"""

I need to select the most appropriate tool to help with this research query:

"{query}"

Available tools:

{tools_json}

Please select the most appropriate tool and provide any parameters needed.

Format your response as a JSON object with 'tool_name' and 'parameters' fields.

"""

system_message = "You are a research assistant selecting tools to help with research queries."

response = self.llm_interface.generate(

prompt=prompt,

system_message=system_message,

json_format=True

)

try:

result = json.loads(response['text'])

return result

except json.JSONDecodeError:

# Fallback if the response is not valid JSON

return {

"tool_name": "web_search",

"parameters": query

}

def use_tool(self, tool_name: str, parameters: Any) -> Dict[str, Any]:

"""

Use a tool with the given parameters.

Args:

tool_name: Name of the tool to use

parameters: Parameters for the tool

Returns:

The results from the tool

"""

tool = self.get_tool(tool_name)

if not tool:

return {"error": f"Tool not found: {tool_name}"}

try:

return tool.run(parameters)

except Exception as e:

return {"error": f"Error using tool {tool_name}: {str(e)}"}

def process_with_tools(self, query: str, max_tools: int = 3) -> Dict[str, Any]:

"""

Process a query using appropriate tools.

Args:

query: The user's query

max_tools: Maximum number of tools to use

Returns:

A dictionary with the results

"""

results = []

for i in range(max_tools):

# Select a tool

tool_selection = self.select_tool(query)

tool_name = tool_selection.get('tool_name')

parameters = tool_selection.get('parameters')

# Use the tool

tool_result = self.use_tool(tool_name, parameters)

# Add to results

results.append({

"tool": tool_name,

"parameters": parameters,

"result": tool_result

})

# Check if we need to continue

if i < max_tools - 1:

# Ask the LLM if we need more information

prompt = f"""

I'm researching: "{query}"

So far, I've used these tools and found:

{json.dumps(results, indent=2)}

Do I need to use additional tools to answer the query effectively?

Respond with a JSON object with 'need_more_tools' (true/false) and 'next_tool_suggestion' fields.

"""

system_message = "You are a research assistant deciding if more information is needed."

response = self.llm_interface.generate(

prompt=prompt,

system_message=system_message,

json_format=True

)

try:

continuation = json.loads(response['text'])

if not continuation.get('need_more_tools', True):

break

except:

# If parsing fails, continue with the loop

pass

return {

"query": query,

"tool_results": results

}

This code includes several classes for implementing tool integration in the research agent. The Tool abstract base class defines the interface for all tools, with concrete implementations for web search, arXiv search, and PDF reading. The ToolManager class manages the available tools, selects appropriate tools for a given query using the LLM, and processes queries using multiple tools in sequence. This framework allows the research agent to leverage external capabilities to gather and process information more effectively.

Putting It All Together: The Research Agent

Now that we've implemented all the core components of our research agent, we need to integrate them into a cohesive system. This involves creating a main class that orchestrates the various components and manages the overall research process.

Here's an example of how we might implement the main ResearchAgent class:

import os

import json

from typing import Dict, Any, List, Optional

from datetime import datetime

class ResearchAgent:

def __init__(self, config: Dict[str, Any] = None):

"""

Initialize the research agent with the specified configuration.

Args:

config: Configuration dictionary

"""

self.config = config or {}

# Set up LLM interface

model_type = self.config.get('model_type', 'openai')

model_name = self.config.get('model_name', 'gpt-4')

api_key = self.config.get('api_key')

# Initialize GPU manager

self.gpu_manager = GPUManager()

device = self.gpu_manager.device

print(f"Initializing Research Agent with {model_type} model {model_name} on {device}")

self.gpu_manager.print_device_info()

# Initialize LLM interface

self.llm_interface = LLMInterface(

model_type=model_type,

model_name=model_name,

api_key=api_key,

device=device

)

# Initialize components

self.query_processor = QueryProcessor(api_key=api_key)

self.tool_manager = ToolManager(self.llm_interface)

self.information_retriever = InformationRetriever(api_keys=self.config.get('api_keys', {}))

self.source_validator = SourceValidator(api_key=api_key)

self.reference_manager = ReferenceManager()

self.response_generator = ResponseGenerator(api_key=api_key)

self.refinement_manager = RefinementManager(api_key=api_key)

# Initialize session

self.session = ResearchSession()

def process_query(self, query: str) -> Dict[str, Any]:

"""

Process a research query.

Args:

query: The research query

Returns:

A dictionary with the research findings

"""

print(f"Processing query: {query}")

# Process the query

processed_query = self.query_processor.process_query(query)

print(f"Processed query: {json.dumps(processed_query, indent=2)}")

# Use tools to gather initial information

tool_results = self.tool_manager.process_with_tools(query)

print(f"Tool results gathered from {len(tool_results['tool_results'])} tools")

# Retrieve information from various sources

search_queries = processed_query.get('search_queries', [query])

suggested_sources = processed_query.get('suggested_sources', [])

sources_to_search = ['web', 'arxiv', 'google_scholar']

if suggested_sources:

# Map suggested sources to actual source names

source_mapping = {

'google scholar': 'google_scholar',

'arxiv': 'arxiv',

'web': 'web',

'internet': 'web',

'pubmed': 'pubmed'

}

sources_to_search = [source_mapping.get(s.lower(), s.lower()) for s in suggested_sources]

sources_to_search = [s for s in sources_to_search if s in ['web', 'arxiv', 'google_scholar', 'pubmed']]

retrieved_info = self.information_retriever.retrieve_information(search_queries, sources_to_search)

# Combine all sources

all_sources = []

# Add sources from tool results

for tool_result in tool_results['tool_results']:

if 'result' in tool_result and 'results' in tool_result['result']:

for result in tool_result['result']['results']:

all_sources.append({

'title': result.get('title', ''),

'authors': result.get('authors', []),

'abstract': result.get('abstract', result.get('snippet', '')),

'url': result.get('link', result.get('url', '')),

'published': result.get('published', ''),

'source_type': tool_result['tool']

})

# Add sources from information retrieval

for source_type, sources in retrieved_info.items():

all_sources.extend(sources)

print(f"Total sources gathered: {len(all_sources)}")

# Validate sources

validated_sources = self.source_validator.validate_sources(all_sources, query)

print(f"Sources validated: {len(validated_sources)}")

# Generate response

reference_style = self.config.get('reference_style', 'apa')

response = self.response_generator.generate_response(query, validated_sources, reference_style)

# Add the interaction to the session

self.session.add_interaction(query, response)

return response

def refine_query(self, refined_query: str) -> Dict[str, Any]:

"""

Process a refined query based on previous research.

Args:

refined_query: The refined research query

Returns:

A dictionary with the research findings

"""

# Get refinement information

refinement_info = self.refinement_manager.process_refinement(self.session, refined_query)

# Process the refined query

return self.process_query(refined_query)

def save_session(self, file_path: str) -> None:

"""

Save the current session to a file.

Args:

file_path: Path to save the session

"""

self.session.save_session(file_path)

def load_session(self, file_path: str) -> None:

"""

Load a session from a file.

Args:

file_path: Path to the session file

"""

self.session = ResearchSession.load_session(file_path)

This ResearchAgent class integrates all the components we've developed into a cohesive system. It initializes the necessary components, processes research queries by coordinating the various components, and manages research sessions. The process_query method orchestrates the entire research process, from processing the query to generating the final response. The refine_query method handles query refinement based on previous research.

Running the Research Agent

To run the research agent, we need to create a script that initializes the agent and provides an interface for users to interact with it. Here's an example of how we might implement a simple command-line interface:

import argparse

import json

import os

from research_agent import ResearchAgent

def main():

# Parse command-line arguments

parser = argparse.ArgumentParser(description="LLM-based Research Agent")

parser.add_argument("--config", type=str, default="config.json", help="Path to configuration file")

parser.add_argument("--model", type=str, help="Model type (openai, local)")

parser.add_argument("--model_name", type=str, help="Model name")

parser.add_argument("--api_key", type=str, help="API key for commercial LLMs")

parser.add_argument("--session", type=str, help="Path to session file to load")

args = parser.parse_args()

# Load configuration

config = {}

if os.path.exists(args.config):

with open(args.config, 'r') as f:

config = json.load(f)

# Override config with command-line arguments

if args.model:

config['model_type'] = args.model

if args.model_name:

config['model_name'] = args.model_name

if args.api_key:

config['api_key'] = args.api_key

# Initialize the research agent

agent = ResearchAgent(config)

# Load session if specified

if args.session and os.path.exists(args.session):

agent.load_session(args.session)

print(f"Loaded session from {args.session}")

# Start the CLI

print("Welcome to the Research Agent CLI")

print("Enter your research question or type 'exit' to quit")

print("Type 'refine' to refine your previous query")

print("Type 'save <filename>' to save the current session")

while True:

user_input = input("\nResearch Question: ")

if user_input.lower() == 'exit':

print("Thank you for using the Research Agent. Goodbye!")

break

elif user_input.lower() == 'refine':

if not agent.session.interactions:

print("No previous query to refine. Please enter a research question first.")

continue

print(f"Previous query: {agent.session.current_query}")

refined_query = input("Refined Query: ")

response = agent.refine_query(refined_query)

print("\n" + "="*80)

print("RESEARCH FINDINGS:")

print(response['findings'])

print("\nSOURCES:")

for i, source in enumerate(response['sources'], 1):

print(f"{source['id']} {source['reference']}")

print("="*80)

elif user_input.lower().startswith('save '):

filename = user_input[5:].strip()

if not filename:

filename = f"session_{agent.session.session_id}.json"

agent.save_session(filename)

print(f"Session saved to {filename}")

else:

response = agent.process_query(user_input)

print("\n" + "="*80)

print("RESEARCH FINDINGS:")

print(response['findings'])

print("\nSOURCES:")

for i, source in enumerate(response['sources'], 1):

print(f"{source['id']} {source['reference']}")

print("="*80)

if __name__ == "__main__":

main()

This script provides a simple command-line interface for interacting with the research agent. It parses command-line arguments, loads configuration from a file, initializes the research agent, and provides commands for conducting research, refining queries, and saving sessions.

For a web-based interface, we would need to create a web server using a framework like Flask or FastAPI, and a frontend using HTML, CSS, and JavaScript. Here's a simplified example of how we might implement a Flask-based web server:

from flask import Flask, request, jsonify, render_template

import json

import os

from research_agent import ResearchAgent

app = Flask(__name__)

# Initialize the research agent

config_path = os.environ.get('CONFIG_PATH', 'config.json')

config = {}

if os.path.exists(config_path):

with open(config_path, 'r') as f:

config = json.load(f)

research_agent = ResearchAgent(config)

@app.route('/')

def index():

return render_template('index.html')

@app.route('/api/research', methods=['POST'])

def conduct_research():

data = request.json

query = data.get('query', '')

if not query:

return jsonify({'error': 'No query provided'}), 400

response = research_agent.process_query(query)

return jsonify(response)

@app.route('/api/refine', methods=['POST'])

def refine_research():

data = request.json

refined_query = data.get('refined_query', '')

if not refined_query:

return jsonify({'error': 'No refined query provided'}), 400

response = research_agent.refine_query(refined_query)

return jsonify(response)

@app.route('/api/save_session', methods=['POST'])

def save_session():

data = request.json

filename = data.get('filename', f"session_{research_agent.session.session_id}.json")

research_agent.save_session(filename)

return jsonify({'message': f'Session saved to {filename}'})

@app.route('/api/load_session', methods=['POST'])

def load_session():

data = request.json

filename = data.get('filename', '')

if not filename or not os.path.exists(filename):

return jsonify({'error': 'Invalid or non-existent session file'}), 400

research_agent.load_session(filename)

return jsonify({'message': f'Session loaded from {filename}'})

if __name__ == '__main__':

app.run(debug=True)

This Flask application provides API endpoints for conducting research, refining queries, and saving/loading sessions. It would need to be complemented with HTML, CSS, and JavaScript files to create a complete web interface.

Evaluation and Optimization

To ensure that our research agent is performing effectively, we need to evaluate its performance and optimize its components. This involves assessing the quality of the information retrieved, the relevance of the sources, the accuracy of the synthesized findings, and the overall user experience.

There are several approaches to evaluation and optimization:

1. User feedback: Collect feedback from users on the quality and usefulness of the research findings. This can be done through explicit feedback mechanisms (like ratings or comments) or implicit feedback (like user engagement metrics).

2. Expert evaluation: Have domain experts evaluate the quality of the research findings and the accuracy of the information provided. This can help identify areas where the agent is performing well or poorly.

3. Automated metrics: Develop automated metrics to evaluate different aspects of the agent's performance, such as the relevance of retrieved sources, the diversity of information, and the coherence of the synthesized findings.

4. A/B testing: Compare different configurations or components of the agent to identify which ones perform better. This can help optimize the system over time.

Here's an example of how we might implement a simple evaluation framework:

import json

from typing import Dict, Any, List

from datetime import datetime

class EvaluationMetrics:

"""Class for calculating evaluation metrics for the research agent."""

@staticmethod

def calculate_metrics(query: str, response: Dict[str, Any], feedback: Dict[str, Any] = None) -> Dict[str, Any]:

"""

Calculate evaluation metrics for a research response.

Args:

query: The research query

response: The response from the research agent

feedback: Optional user feedback

Returns:

A dictionary with evaluation metrics

"""

metrics = {}

# Source metrics

sources = response.get('sources', [])

metrics['num_sources'] = len(sources)

if sources:

relevance_scores = [s.get('relevance_score', 0) for s in sources if 'relevance_score' in s]

credibility_scores = [s.get('credibility_score', 0) for s in sources if 'credibility_score' in s]

if relevance_scores:

metrics['avg_relevance_score'] = sum(relevance_scores) / len(relevance_scores)

metrics['max_relevance_score'] = max(relevance_scores)

metrics['min_relevance_score'] = min(relevance_scores)

if credibility_scores:

metrics['avg_credibility_score'] = sum(credibility_scores) / len(credibility_scores)

metrics['max_credibility_score'] = max(credibility_scores)

metrics['min_credibility_score'] = min(credibility_scores)

# Source diversity

source_types = [s.get('source_type', 'unknown') for s in sources]

unique_source_types = set(source_types)

metrics['source_diversity'] = len(unique_source_types)

metrics['source_type_distribution'] = {t: source_types.count(t) for t in unique_source_types}

# Response metrics

findings = response.get('findings', '')

metrics['response_length'] = len(findings)

metrics['response_word_count'] = len(findings.split())

# Citation metrics

citation_count = findings.count('[') # Simple approximation

metrics['citation_count'] = citation_count

metrics['citation_density'] = citation_count / metrics['response_word_count'] if metrics['response_word_count'] > 0 else 0

# User feedback metrics

if feedback:

metrics['user_feedback'] = feedback

return metrics

class EvaluationManager:

"""Manager for evaluating the research agent."""

def __init__(self, log_file: str = None):

"""

Initialize the evaluation manager.

Args:

log_file: Path to the evaluation log file

"""

self.log_file = log_file or f"evaluation_log_{datetime.now().strftime('%Y%m%d%H%M%S')}.jsonl"

def log_evaluation(self, query: str, response: Dict[str, Any], feedback: Dict[str, Any] = None) -> Dict[str, Any]:

"""

Log an evaluation entry.

Args:

query: The research query

response: The response from the research agent

feedback: Optional user feedback

Returns:

The evaluation metrics

"""

# Calculate metrics

metrics = EvaluationMetrics.calculate_metrics(query, response, feedback)

# Create log entry

log_entry = {

'timestamp': datetime.now().isoformat(),

'query': query,

'metrics': metrics

}

# Write to log file

with open(self.log_file, 'a') as f:

f.write(json.dumps(log_entry) + '\n')

return metrics

def analyze_logs(self, log_file: str = None) -> Dict[str, Any]:

"""

Analyze evaluation logs.

Args:

log_file: Path to the log file to analyze (defaults to the instance's log file)

Returns:

A dictionary with analysis results

"""

file_to_analyze = log_file or self.log_file

entries = []

with open(file_to_analyze, 'r') as f:

for line in f:

entries.append(json.loads(line))

if not entries:

return {'error': 'No log entries found'}

# Aggregate metrics

aggregated = {

'num_entries': len(entries),

'avg_num_sources': sum(e['metrics'].get('num_sources', 0) for e in entries) / len(entries),

'avg_response_word_count': sum(e['metrics'].get('response_word_count', 0) for e in entries) / len(entries),

'avg_citation_density': sum(e['metrics'].get('citation_density', 0) for e in entries) / len(entries)

}

# User feedback if available

feedback_entries = [e for e in entries if 'user_feedback' in e['metrics']]

if feedback_entries:

aggregated['avg_user_rating'] = sum(e['metrics']['user_feedback'].get('rating', 0) for e in feedback_entries) / len(feedback_entries)

return aggregated

This code includes two classes for evaluating the research agent: EvaluationMetrics and EvaluationManager. The EvaluationMetrics class calculates various metrics for a research response, including source metrics, response metrics, citation metrics, and user feedback metrics. The EvaluationManager class logs evaluation entries and provides methods for analyzing the logs.

Conclusion

In this article, we've explored how to build an LLM-based research agent that can help users find, evaluate, and synthesize information from various sources. We've covered the key components of such a system, including the user interface, query processing, information retrieval, source validation, response generation, and iterative refinement. We've also discussed technical implementation details, such as LLM selection, GPU acceleration, and tool integration.

The research agent we've designed is capable of understanding natural language queries, searching for relevant information from multiple sources, evaluating the credibility and relevance of sources, synthesizing findings into coherent responses, and supporting iterative refinement of research questions. It leverages the power of large language models to understand and generate text, while also integrating with external tools to extend its capabilities.

Building an effective research agent requires careful attention to various aspects, including the quality and relevance of information retrieved, the credibility of sources, the coherence and accuracy of synthesized findings, and the overall user experience. By evaluating and optimizing these aspects, we can create a research agent that provides valuable assistance to users in their research endeavors.

As large language models and related technologies continue to evolve, the capabilities of research agents will likely improve, enabling more sophisticated and effective research assistance. Future directions might include more advanced source evaluation, better handling of conflicting information, more sophisticated tool integration, and improved personalization based on user preferences and research history.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Thursday, September 04, 2025

Building an LLM-based Research Agent

Introduction

System Architecture Overview

Core Components

User Interface

Query Processing

Information Retrieval System

Source Validation and Reference Management

Response Generation

Iterative Refinement Mechanism

Technical Implementation

LLM Selection and Integration

GPU Acceleration

Tool Integration for Agentic Capabilities

Putting It All Together: The Research Agent

Running the Research Agent

Evaluation and Optimization

Conclusion

No comments:

About Me