Monday, June 15, 2026

BUILDING AN LLM-BASED INTELLIGENT RESEARCH AGENT FOR ACADEMIC AND SCIENTIFIC LITERATURE DISCOVERY



INTRODUCTION AND MOTIVATION

The exponential growth of scientific literature, academic papers, books, and online articles has created an information overload problem for researchers, students, and professionals. When someone wants to learn about a specific subject area such as quantum computing, generative artificial intelligence, machine learning, or even specialized topics like rose cultivation or Python programming, they face the challenge of sifting through thousands of potentially relevant resources to find the most authoritative and useful materials.

An LLM-based intelligent agent can solve this problem by automating the discovery, evaluation, and recommendation of relevant learning resources. Such an agent combines the natural language understanding capabilities of large language models with web search functionality, content analysis, and intelligent filtering to provide curated recommendations tailored to the user's specified subject area.

This article presents a comprehensive guide to building such an agent from the ground up. We will explore the architectural components, implementation details, and practical considerations necessary to create a production-ready system that can serve real users with diverse research needs.

ARCHITECTURAL OVERVIEW AND SYSTEM DESIGN

The intelligent research agent consists of several interconnected components that work together to transform a user's subject query into a curated list of recommended resources. Understanding the architecture is crucial before diving into implementation details.

The core components include a user interface layer that accepts subject queries, an LLM orchestration layer that manages the language model interactions, a web search integration layer that retrieves candidate resources from the internet, a content analysis and ranking system that evaluates the quality and relevance of discovered resources, and a presentation layer that formats and delivers recommendations to the user.

The system follows a pipeline architecture where each stage processes and enriches the data before passing it to the next stage. This design promotes modularity, testability, and maintainability while allowing for parallel processing where appropriate.

A critical design decision involves supporting both local and remote LLM deployments. Local deployment offers privacy, cost control, and reduced latency but requires significant computational resources. Remote deployment through API services provides easier setup and automatic scaling but incurs ongoing costs and potential privacy concerns. Our architecture accommodates both approaches through abstraction layers.

Supporting multiple GPU architectures including NVIDIA CUDA, AMD ROCm, Intel GPUs, and Apple Metal Performance Shaders ensures the system can run on diverse hardware configurations. This cross-platform compatibility is achieved through careful selection of underlying libraries and conditional code paths.

LLM INTEGRATION AND ABSTRACTION LAYER

The foundation of our agent is the ability to interact with large language models in a flexible, hardware-agnostic manner. We create an abstraction layer that provides a unified interface regardless of whether the LLM runs locally or remotely, and regardless of the underlying GPU architecture.

The abstraction layer defines a common interface that all LLM providers must implement. This interface includes methods for generating text completions, streaming responses, and managing conversation context. By programming against this interface rather than specific implementations, we can swap LLM backends without modifying the higher-level agent logic.

For local LLM deployment, we leverage libraries that support multiple GPU backends. The llama-cpp-python library provides bindings to llama.cpp, which has been optimized for various hardware architectures. It automatically detects available GPU acceleration and uses the appropriate backend whether that is CUDA for NVIDIA, ROCm for AMD, Metal for Apple Silicon, or Vulkan for Intel GPUs.

Here is a foundational code example showing the LLM abstraction interface:

from abc import ABC, abstractmethod
from typing import List, Dict, Optional, Iterator
from dataclasses import dataclass

@dataclass
class Message:
    """Represents a single message in a conversation."""
    role: str  # 'system', 'user', or 'assistant'
    content: str

class LLMProvider(ABC):
    """Abstract base class for all LLM providers."""
    
    @abstractmethod
    def generate(self, messages: List[Message], max_tokens: int = 2048, 
                temperature: float = 0.7) -> str:
        """
        Generate a completion given a list of messages.
        
        Args:
            messages: Conversation history as a list of Message objects
            max_tokens: Maximum number of tokens to generate
            temperature: Sampling temperature for randomness control
            
        Returns:
            Generated text as a string
        """
        pass
    
    @abstractmethod
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """
        Generate a completion with streaming output.
        
        Args:
            messages: Conversation history as a list of Message objects
            max_tokens: Maximum number of tokens to generate
            temperature: Sampling temperature for randomness control
            
        Yields:
            Text chunks as they are generated
        """
        pass
    
    @abstractmethod
    def get_model_info(self) -> Dict[str, str]:
        """
        Retrieve information about the loaded model.
        
        Returns:
            Dictionary containing model name, version, and capabilities
        """
        pass

This interface provides the contract that all concrete LLM implementations must fulfill. The Message dataclass encapsulates the structure of conversation turns, making it easy to build multi-turn dialogues. The generate method handles synchronous completion generation, while stream_generate enables real-time streaming of responses for better user experience with long outputs.

Now we implement a concrete provider for local LLM execution using llama-cpp-python:

from llama_cpp import Llama
import os
from typing import List, Iterator

class LocalLLMProvider(LLMProvider):
    """LLM provider for locally-hosted models using llama.cpp."""
    
    def __init__(self, model_path: str, n_gpu_layers: int = -1, 
                 n_ctx: int = 4096, verbose: bool = False):
        """
        Initialize the local LLM provider.
        
        Args:
            model_path: Path to the GGUF model file
            n_gpu_layers: Number of layers to offload to GPU (-1 for all)
            n_ctx: Context window size in tokens
            verbose: Whether to print detailed loading information
        """
        if not os.path.exists(model_path):
            raise FileNotFoundError(f"Model file not found: {model_path}")
        
        # Initialize llama.cpp with automatic GPU detection
        # It will use CUDA, ROCm, Metal, or Vulkan depending on availability
        self.llm = Llama(
            model_path=model_path,
            n_gpu_layers=n_gpu_layers,
            n_ctx=n_ctx,
            verbose=verbose,
            n_threads=os.cpu_count() or 4
        )
        
        self.model_path = model_path
        self.context_size = n_ctx
    
    def generate(self, messages: List[Message], max_tokens: int = 2048,
                temperature: float = 0.7) -> str:
        """Generate a completion from the local model."""
        # Convert Message objects to llama.cpp format
        formatted_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in messages
        ]
        
        # Generate completion
        response = self.llm.create_chat_completion(
            messages=formatted_messages,
            max_tokens=max_tokens,
            temperature=temperature,
            stream=False
        )
        
        # Extract the generated text
        return response['choices'][0]['message']['content']
    
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """Generate a streaming completion from the local model."""
        formatted_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in messages
        ]
        
        # Create streaming completion
        stream = self.llm.create_chat_completion(
            messages=formatted_messages,
            max_tokens=max_tokens,
            temperature=temperature,
            stream=True
        )
        
        # Yield chunks as they arrive
        for chunk in stream:
            delta = chunk['choices'][0]['delta']
            if 'content' in delta:
                yield delta['content']
    
    def get_model_info(self) -> Dict[str, str]:
        """Return information about the loaded model."""
        return {
            'provider': 'local_llama_cpp',
            'model_path': self.model_path,
            'context_size': str(self.context_size),
            'gpu_layers': 'auto-detected'
        }

The LocalLLMProvider implementation handles the complexity of GPU detection and model loading. The llama-cpp-python library automatically detects available GPU acceleration and configures itself accordingly. When n_gpu_layers is set to negative one, all compatible layers are offloaded to the GPU for maximum performance. The library's build system includes support for CUDA, ROCm, Metal, and Vulkan, so the same code works across different hardware platforms.

For remote LLM access, we implement a provider that communicates with API services such as OpenAI, Anthropic, or other compatible endpoints:

import requests
import json
from typing import List, Iterator, Optional

class RemoteLLMProvider(LLMProvider):
    """LLM provider for remote API-based models."""
    
    def __init__(self, api_key: str, api_base: str = "https://api.openai.com/v1",
                 model_name: str = "gpt-4", timeout: int = 120):
        """
        Initialize the remote LLM provider.
        
        Args:
            api_key: API authentication key
            api_base: Base URL for the API endpoint
            model_name: Name of the model to use
            timeout: Request timeout in seconds
        """
        self.api_key = api_key
        self.api_base = api_base.rstrip('/')
        self.model_name = model_name
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        })
    
    def generate(self, messages: List[Message], max_tokens: int = 2048,
                temperature: float = 0.7) -> str:
        """Generate a completion from the remote API."""
        # Prepare the request payload
        payload = {
            'model': self.model_name,
            'messages': [
                {'role': msg.role, 'content': msg.content}
                for msg in messages
            ],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'stream': False
        }
        
        # Make the API request
        response = self.session.post(
            f'{self.api_base}/chat/completions',
            json=payload,
            timeout=self.timeout
        )
        
        # Handle errors
        if response.status_code != 200:
            raise Exception(f"API request failed: {response.status_code} - {response.text}")
        
        # Parse and return the response
        result = response.json()
        return result['choices'][0]['message']['content']
    
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """Generate a streaming completion from the remote API."""
        payload = {
            'model': self.model_name,
            'messages': [
                {'role': msg.role, 'content': msg.content}
                for msg in messages
            ],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'stream': True
        }
        
        # Make streaming request
        response = self.session.post(
            f'{self.api_base}/chat/completions',
            json=payload,
            timeout=self.timeout,
            stream=True
        )
        
        if response.status_code != 200:
            raise Exception(f"API request failed: {response.status_code}")
        
        # Process the streaming response
        for line in response.iter_lines():
            if not line:
                continue
            
            line_text = line.decode('utf-8')
            if not line_text.startswith('data: '):
                continue
            
            data_str = line_text[6:]  # Remove 'data: ' prefix
            if data_str.strip() == '[DONE]':
                break
            
            try:
                data = json.loads(data_str)
                delta = data['choices'][0]['delta']
                if 'content' in delta:
                    yield delta['content']
            except json.JSONDecodeError:
                continue
    
    def get_model_info(self) -> Dict[str, str]:
        """Return information about the remote model."""
        return {
            'provider': 'remote_api',
            'model_name': self.model_name,
            'api_base': self.api_base
        }

The RemoteLLMProvider handles the intricacies of HTTP communication with remote API endpoints. It manages authentication, request formatting, error handling, and streaming response parsing. The implementation uses a persistent session to reuse connections and improve performance across multiple requests.

With both local and remote providers implemented, we create a factory function that instantiates the appropriate provider based on configuration:

from typing import Union
from enum import Enum

class LLMBackend(Enum):
    """Enumeration of supported LLM backends."""
    LOCAL = "local"
    REMOTE = "remote"

def create_llm_provider(backend: Union[LLMBackend, str], 
                       **kwargs) -> LLMProvider:
    """
    Factory function to create the appropriate LLM provider.
    
    Args:
        backend: The backend type (LOCAL or REMOTE)
        **kwargs: Backend-specific configuration parameters
        
    Returns:
        Configured LLMProvider instance
        
    Example for local:
        provider = create_llm_provider(
            LLMBackend.LOCAL,
            model_path="/path/to/model.gguf",
            n_gpu_layers=-1
        )
        
    Example for remote:
        provider = create_llm_provider(
            LLMBackend.REMOTE,
            api_key="sk-...",
            model_name="gpt-4"
        )
    """
    if isinstance(backend, str):
        backend = LLMBackend(backend.lower())
    
    if backend == LLMBackend.LOCAL:
        required_params = ['model_path']
        for param in required_params:
            if param not in kwargs:
                raise ValueError(f"Missing required parameter for local backend: {param}")
        return LocalLLMProvider(**kwargs)
    
    elif backend == LLMBackend.REMOTE:
        required_params = ['api_key']
        for param in required_params:
            if param not in kwargs:
                raise ValueError(f"Missing required parameter for remote backend: {param}")
        return RemoteLLMProvider(**kwargs)
    
    else:
        raise ValueError(f"Unsupported backend: {backend}")

This factory pattern provides a clean interface for creating LLM providers without exposing the implementation details to the rest of the application. The caller simply specifies the desired backend type and provides the necessary configuration parameters.

WEB SEARCH INTEGRATION AND CONTENT RETRIEVAL

The next critical component is the ability to search the internet for relevant resources. We need to query search engines, retrieve results, and extract useful information from web pages. This involves integrating with search APIs and implementing robust web scraping capabilities.

For search functionality, we can use several approaches. The most straightforward is to integrate with established search APIs such as Google Custom Search, Bing Search API, or DuckDuckGo. Each has different pricing models, rate limits, and result quality characteristics. For production systems, using multiple search providers with fallback logic improves reliability.

Here is an implementation of a search abstraction layer with support for multiple providers:

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import List, Optional
import requests
from urllib.parse import quote_plus

@dataclass
class SearchResult:
    """Represents a single search result."""
    title: str
    url: str
    snippet: str
    source: str  # The search provider that returned this result

class SearchProvider(ABC):
    """Abstract base class for search providers."""
    
    @abstractmethod
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """
        Execute a search query and return results.
        
        Args:
            query: The search query string
            num_results: Maximum number of results to return
            
        Returns:
            List of SearchResult objects
        """
        pass

class DuckDuckGoSearchProvider(SearchProvider):
    """Search provider using DuckDuckGo's HTML interface."""
    
    def __init__(self, timeout: int = 30):
        """
        Initialize the DuckDuckGo search provider.
        
        Args:
            timeout: Request timeout in seconds
        """
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute a DuckDuckGo search."""
        from bs4 import BeautifulSoup
        
        # DuckDuckGo HTML search endpoint
        url = f"https://html.duckduckgo.com/html/?q={quote_plus(query)}"
        
        try:
            response = self.session.get(url, timeout=self.timeout)
            response.raise_for_status()
        except requests.RequestException as e:
            raise Exception(f"DuckDuckGo search failed: {str(e)}")
        
        # Parse the HTML response
        soup = BeautifulSoup(response.text, 'html.parser')
        results = []
        
        # Find all result divs
        result_divs = soup.find_all('div', class_='result')
        
        for div in result_divs[:num_results]:
            # Extract title and URL
            title_elem = div.find('a', class_='result__a')
            if not title_elem:
                continue
            
            title = title_elem.get_text(strip=True)
            url = title_elem.get('href', '')
            
            # Extract snippet
            snippet_elem = div.find('a', class_='result__snippet')
            snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''
            
            if url and title:
                results.append(SearchResult(
                    title=title,
                    url=url,
                    snippet=snippet,
                    source='duckduckgo'
                ))
        
        return results

class BingSearchProvider(SearchProvider):
    """Search provider using Bing Search API."""
    
    def __init__(self, api_key: str, timeout: int = 30):
        """
        Initialize the Bing search provider.
        
        Args:
            api_key: Bing Search API key
            timeout: Request timeout in seconds
        """
        self.api_key = api_key
        self.timeout = timeout
        self.endpoint = "https://api.bing.microsoft.com/v7.0/search"
        self.session = requests.Session()
        self.session.headers.update({
            'Ocp-Apim-Subscription-Key': api_key
        })
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute a Bing search."""
        params = {
            'q': query,
            'count': num_results,
            'textDecorations': False,
            'textFormat': 'Raw'
        }
        
        try:
            response = self.session.get(
                self.endpoint,
                params=params,
                timeout=self.timeout
            )
            response.raise_for_status()
        except requests.RequestException as e:
            raise Exception(f"Bing search failed: {str(e)}")
        
        data = response.json()
        results = []
        
        # Parse web pages results
        if 'webPages' in data and 'value' in data['webPages']:
            for item in data['webPages']['value']:
                results.append(SearchResult(
                    title=item.get('name', ''),
                    url=item.get('url', ''),
                    snippet=item.get('snippet', ''),
                    source='bing'
                ))
        
        return results

The search abstraction provides a unified interface for querying different search engines. The DuckDuckGoSearchProvider uses web scraping since DuckDuckGo offers a simple HTML interface that does not require API keys. The BingSearchProvider uses the official Bing Search API which requires a subscription key but provides more reliable and structured results.

To enhance the search capabilities specifically for academic and scientific content, we implement a specialized provider that queries academic databases:

class ScholarSearchProvider(SearchProvider):
    """Search provider for academic papers and scholarly articles."""
    
    def __init__(self, timeout: int = 30):
        """
        Initialize the scholar search provider.
        
        Args:
            timeout: Request timeout in seconds
        """
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """
        Search for scholarly articles using Google Scholar.
        
        Note: This is a simplified implementation. Production systems should
        use official APIs or services like Semantic Scholar API, arXiv API, etc.
        """
        from bs4 import BeautifulSoup
        
        # Google Scholar search URL
        url = f"https://scholar.google.com/scholar?q={quote_plus(query)}&hl=en&num={num_results}"
        
        try:
            response = self.session.get(url, timeout=self.timeout)
            response.raise_for_status()
        except requests.RequestException as e:
            raise Exception(f"Scholar search failed: {str(e)}")
        
        soup = BeautifulSoup(response.text, 'html.parser')
        results = []
        
        # Find all result divs
        result_divs = soup.find_all('div', class_='gs_ri')
        
        for div in result_divs[:num_results]:
            # Extract title
            title_elem = div.find('h3', class_='gs_rt')
            if not title_elem:
                continue
            
            # Remove citation markers
            for cite in title_elem.find_all('span', class_='gs_ct1'):
                cite.decompose()
            for cite in title_elem.find_all('span', class_='gs_ct2'):
                cite.decompose()
            
            title_link = title_elem.find('a')
            title = title_link.get_text(strip=True) if title_link else title_elem.get_text(strip=True)
            url = title_link.get('href', '') if title_link else ''
            
            # Extract snippet
            snippet_elem = div.find('div', class_='gs_rs')
            snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''
            
            if title:
                results.append(SearchResult(
                    title=title,
                    url=url,
                    snippet=snippet,
                    source='google_scholar'
                ))
        
        return results

The ScholarSearchProvider specifically targets academic content by querying Google Scholar. This is particularly valuable when users are researching scientific or technical topics where peer-reviewed papers and academic publications are the most authoritative sources.

Now we implement a multi-provider search aggregator that queries multiple search engines and combines their results:

from typing import Set
from concurrent.futures import ThreadPoolExecutor, as_completed

class AggregatedSearchProvider(SearchProvider):
    """Aggregates results from multiple search providers."""
    
    def __init__(self, providers: List[SearchProvider], max_workers: int = 3):
        """
        Initialize the aggregated search provider.
        
        Args:
            providers: List of SearchProvider instances to query
            max_workers: Maximum number of concurrent search requests
        """
        self.providers = providers
        self.max_workers = max_workers
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """
        Execute searches across all providers and aggregate results.
        
        Results are deduplicated by URL and ranked by frequency across providers.
        """
        all_results = []
        seen_urls: Set[str] = set()
        
        # Execute searches in parallel
        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            # Submit search tasks
            future_to_provider = {
                executor.submit(provider.search, query, num_results): provider
                for provider in self.providers
            }
            
            # Collect results as they complete
            for future in as_completed(future_to_provider):
                provider = future_to_provider[future]
                try:
                    results = future.result()
                    
                    # Add unique results
                    for result in results:
                        if result.url and result.url not in seen_urls:
                            seen_urls.add(result.url)
                            all_results.append(result)
                
                except Exception as e:
                    # Log the error but continue with other providers
                    print(f"Search provider {provider.__class__.__name__} failed: {str(e)}")
                    continue
        
        # Return top results
        return all_results[:num_results]

The AggregatedSearchProvider executes searches across multiple providers concurrently using a thread pool. This parallelization significantly reduces total search time compared to sequential execution. The results are deduplicated by URL to avoid showing the same resource multiple times.

With search capabilities in place, we need to retrieve and extract content from the discovered web pages. This involves fetching HTML content, parsing it, and extracting the main textual content while filtering out navigation, advertisements, and other non-essential elements:

from bs4 import BeautifulSoup
import requests
from typing import Optional, Dict
from urllib.parse import urlparse

class ContentExtractor:
    """Extracts main content from web pages."""
    
    def __init__(self, timeout: int = 30, max_content_length: int = 50000):
        """
        Initialize the content extractor.
        
        Args:
            timeout: Request timeout in seconds
            max_content_length: Maximum content length to extract in characters
        """
        self.timeout = timeout
        self.max_content_length = max_content_length
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    def extract(self, url: str) -> Optional[Dict[str, str]]:
        """
        Extract main content from a URL.
        
        Args:
            url: The URL to extract content from
            
        Returns:
            Dictionary with 'title', 'content', and 'url' keys, or None if extraction fails
        """
        try:
            response = self.session.get(url, timeout=self.timeout)
            response.raise_for_status()
        except requests.RequestException as e:
            print(f"Failed to fetch {url}: {str(e)}")
            return None
        
        # Parse HTML
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Remove script and style elements
        for element in soup(['script', 'style', 'nav', 'footer', 'header', 'aside']):
            element.decompose()
        
        # Extract title
        title = ''
        title_tag = soup.find('title')
        if title_tag:
            title = title_tag.get_text(strip=True)
        
        # Try to find main content area
        main_content = None
        
        # Look for common main content containers
        for selector in ['main', 'article', '[role="main"]', '.content', '#content']:
            main_content = soup.select_one(selector)
            if main_content:
                break
        
        # If no main content found, use body
        if not main_content:
            main_content = soup.find('body')
        
        if not main_content:
            return None
        
        # Extract text content
        text = main_content.get_text(separator=' ', strip=True)
        
        # Clean up whitespace
        text = ' '.join(text.split())
        
        # Truncate if too long
        if len(text) > self.max_content_length:
            text = text[:self.max_content_length] + '...'
        
        return {
            'title': title,
            'content': text,
            'url': url
        }
    
    def extract_metadata(self, url: str) -> Dict[str, str]:
        """
        Extract metadata from a web page.
        
        Args:
            url: The URL to extract metadata from
            
        Returns:
            Dictionary containing available metadata fields
        """
        try:
            response = self.session.get(url, timeout=self.timeout)
            response.raise_for_status()
        except requests.RequestException:
            return {}
        
        soup = BeautifulSoup(response.text, 'html.parser')
        metadata = {}
        
        # Extract Open Graph metadata
        for meta in soup.find_all('meta', property=True):
            prop = meta.get('property', '')
            if prop.startswith('og:'):
                key = prop[3:]  # Remove 'og:' prefix
                metadata[key] = meta.get('content', '')
        
        # Extract standard meta tags
        for meta in soup.find_all('meta', attrs={'name': True}):
            name = meta.get('name', '')
            if name in ['description', 'author', 'keywords', 'published_time']:
                metadata[name] = meta.get('content', '')
        
        return metadata

The ContentExtractor class handles the complexity of retrieving web pages and extracting their main content. It removes common non-content elements like scripts, styles, and navigation components. The extraction logic looks for semantic HTML5 elements like main and article tags, which typically contain the primary content. The extract_metadata method retrieves additional information from meta tags, which can be useful for determining the type and quality of the resource.

CONTENT ANALYSIS AND RELEVANCE RANKING

Once we have retrieved search results and their content, we need to analyze and rank them based on relevance to the user's query and overall quality. This is where the LLM's natural language understanding capabilities become crucial.

The analysis process involves several steps. First, we use the LLM to assess how well each resource matches the user's specified subject area. Second, we evaluate the quality and authority of the source. Third, we categorize the resource type such as academic paper, tutorial, book, or general article. Finally, we combine these factors into an overall relevance score.

Here is an implementation of the content analyzer:

from typing import List, Dict, Tuple
from dataclasses import dataclass, field
from enum import Enum

class ResourceType(Enum):
    """Types of learning resources."""
    ACADEMIC_PAPER = "academic_paper"
    BOOK = "book"
    TUTORIAL = "tutorial"
    ARTICLE = "article"
    DOCUMENTATION = "documentation"
    VIDEO = "video"
    COURSE = "course"
    UNKNOWN = "unknown"

@dataclass
class AnalyzedResource:
    """Represents an analyzed and scored resource."""
    title: str
    url: str
    snippet: str
    resource_type: ResourceType
    relevance_score: float  # 0.0 to 1.0
    quality_score: float    # 0.0 to 1.0
    reasoning: str
    metadata: Dict[str, str] = field(default_factory=dict)
    
    @property
    def overall_score(self) -> float:
        """Calculate overall score as weighted combination."""
        return 0.6 * self.relevance_score + 0.4 * self.quality_score

class ContentAnalyzer:
    """Analyzes and ranks content using LLM capabilities."""
    
    def __init__(self, llm_provider: LLMProvider):
        """
        Initialize the content analyzer.
        
        Args:
            llm_provider: The LLM provider to use for analysis
        """
        self.llm = llm_provider
    
    def analyze_resource(self, search_result: SearchResult, 
                        content: Optional[str], subject: str) -> AnalyzedResource:
        """
        Analyze a single resource for relevance and quality.
        
        Args:
            search_result: The search result to analyze
            content: Extracted content from the URL (if available)
            subject: The subject area the user is researching
            
        Returns:
            AnalyzedResource with scores and classification
        """
        # Build analysis prompt
        analysis_text = f"Title: {search_result.title}\n"
        analysis_text += f"URL: {search_result.url}\n"
        analysis_text += f"Snippet: {search_result.snippet}\n"
        
        if content:
            # Use first 2000 characters of content for analysis
            content_preview = content[:2000]
            analysis_text += f"Content preview: {content_preview}\n"
        
        prompt = self._build_analysis_prompt(analysis_text, subject)
        
        # Get LLM analysis
        messages = [
            Message(role='system', content='You are an expert research assistant that evaluates the relevance and quality of learning resources.'),
            Message(role='user', content=prompt)
        ]
        
        try:
            response = self.llm.generate(messages, max_tokens=500, temperature=0.3)
            
            # Parse the structured response
            scores = self._parse_analysis_response(response)
            
            return AnalyzedResource(
                title=search_result.title,
                url=search_result.url,
                snippet=search_result.snippet,
                resource_type=scores['resource_type'],
                relevance_score=scores['relevance_score'],
                quality_score=scores['quality_score'],
                reasoning=scores['reasoning']
            )
        
        except Exception as e:
            # If analysis fails, return with default scores
            print(f"Analysis failed for {search_result.url}: {str(e)}")
            return AnalyzedResource(
                title=search_result.title,
                url=search_result.url,
                snippet=search_result.snippet,
                resource_type=ResourceType.UNKNOWN,
                relevance_score=0.5,
                quality_score=0.5,
                reasoning="Analysis failed; using default scores"
            )
    
    def _build_analysis_prompt(self, resource_info: str, subject: str) -> str:
        """Build the prompt for resource analysis."""
        prompt = f"""Analyze the following resource for a user researching "{subject}".

{resource_info}

Provide your analysis in the following structured format:

RESOURCE_TYPE: [one of: academic_paper, book, tutorial, article, documentation, video, course, unknown]
RELEVANCE_SCORE: [0.0 to 1.0, where 1.0 means highly relevant to the subject]
QUALITY_SCORE: [0.0 to 1.0, where 1.0 means high quality and authoritative]
REASONING: [brief explanation of your scores]

Consider these factors:
- How well does the resource match the subject area?
- Is it from an authoritative source?
- Is it comprehensive and well-structured?
- Is it suitable for learning about the subject?
"""
        return prompt
    
    def _parse_analysis_response(self, response: str) -> Dict:
        """Parse the structured analysis response from the LLM."""
        lines = response.strip().split('\n')
        result = {
            'resource_type': ResourceType.UNKNOWN,
            'relevance_score': 0.5,
            'quality_score': 0.5,
            'reasoning': ''
        }
        
        for line in lines:
            line = line.strip()
            
            if line.startswith('RESOURCE_TYPE:'):
                type_str = line.split(':', 1)[1].strip().lower()
                try:
                    result['resource_type'] = ResourceType(type_str)
                except ValueError:
                    result['resource_type'] = ResourceType.UNKNOWN
            
            elif line.startswith('RELEVANCE_SCORE:'):
                try:
                    score = float(line.split(':', 1)[1].strip())
                    result['relevance_score'] = max(0.0, min(1.0, score))
                except ValueError:
                    pass
            
            elif line.startswith('QUALITY_SCORE:'):
                try:
                    score = float(line.split(':', 1)[1].strip())
                    result['quality_score'] = max(0.0, min(1.0, score))
                except ValueError:
                    pass
            
            elif line.startswith('REASONING:'):
                result['reasoning'] = line.split(':', 1)[1].strip()
        
        return result
    
    def batch_analyze(self, search_results: List[SearchResult], 
                     subject: str, extractor: ContentExtractor,
                     max_workers: int = 5) -> List[AnalyzedResource]:
        """
        Analyze multiple resources in parallel.
        
        Args:
            search_results: List of search results to analyze
            subject: The subject area being researched
            extractor: ContentExtractor instance for fetching page content
            max_workers: Maximum number of parallel analysis tasks
            
        Returns:
            List of AnalyzedResource objects sorted by overall score
        """
        from concurrent.futures import ThreadPoolExecutor, as_completed
        
        analyzed_resources = []
        
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            # Submit analysis tasks
            future_to_result = {}
            
            for result in search_results:
                # Extract content in parallel
                future = executor.submit(self._analyze_with_content, 
                                       result, subject, extractor)
                future_to_result[future] = result
            
            # Collect results
            for future in as_completed(future_to_result):
                try:
                    analyzed = future.result()
                    analyzed_resources.append(analyzed)
                except Exception as e:
                    result = future_to_result[future]
                    print(f"Failed to analyze {result.url}: {str(e)}")
        
        # Sort by overall score
        analyzed_resources.sort(key=lambda x: x.overall_score, reverse=True)
        
        return analyzed_resources
    
    def _analyze_with_content(self, search_result: SearchResult, 
                             subject: str, extractor: ContentExtractor) -> AnalyzedResource:
        """Helper method to extract content and analyze."""
        # Try to extract content
        extracted = extractor.extract(search_result.url)
        content = extracted['content'] if extracted else None
        
        # Analyze the resource
        return self.analyze_resource(search_result, content, subject)

The ContentAnalyzer leverages the LLM to perform sophisticated analysis of each resource. The analysis prompt asks the LLM to evaluate multiple dimensions including relevance to the subject, quality and authority of the source, and resource type classification. The structured output format makes it easy to parse the LLM's response and extract numerical scores.

The batch_analyze method processes multiple resources in parallel, which is crucial for performance when analyzing dozens of search results. Each resource is analyzed independently, allowing for efficient parallelization.

RECOMMENDATION GENERATION AND PRESENTATION

With analyzed and ranked resources, the final step is to generate comprehensive recommendations that help the user understand why each resource was selected and how it relates to their research topic. This involves synthesizing the analysis results into a coherent narrative.

Here is the implementation of the recommendation generator:

from typing import List, Dict

class RecommendationGenerator:
    """Generates structured recommendations from analyzed resources."""
    
    def __init__(self, llm_provider: LLMProvider):
        """
        Initialize the recommendation generator.
        
        Args:
            llm_provider: The LLM provider for generating descriptions
        """
        self.llm = llm_provider
    
    def generate_recommendations(self, analyzed_resources: List[AnalyzedResource],
                                subject: str, max_recommendations: int = 10) -> str:
        """
        Generate a comprehensive recommendation report.
        
        Args:
            analyzed_resources: List of analyzed resources sorted by score
            subject: The subject area being researched
            max_recommendations: Maximum number of resources to include
            
        Returns:
            Formatted recommendation text
        """
        # Take top resources
        top_resources = analyzed_resources[:max_recommendations]
        
        # Group by resource type
        grouped = self._group_by_type(top_resources)
        
        # Generate introduction
        intro = self._generate_introduction(subject, len(top_resources))
        
        # Generate sections for each resource type
        sections = []
        
        type_order = [
            ResourceType.ACADEMIC_PAPER,
            ResourceType.BOOK,
            ResourceType.COURSE,
            ResourceType.TUTORIAL,
            ResourceType.DOCUMENTATION,
            ResourceType.ARTICLE,
            ResourceType.VIDEO,
            ResourceType.UNKNOWN
        ]
        
        for resource_type in type_order:
            if resource_type in grouped and grouped[resource_type]:
                section = self._generate_type_section(
                    resource_type, 
                    grouped[resource_type],
                    subject
                )
                sections.append(section)
        
        # Combine all parts
        report = intro + '\n\n'
        report += '\n\n'.join(sections)
        report += '\n\n' + self._generate_conclusion(subject)
        
        return report
    
    def _group_by_type(self, resources: List[AnalyzedResource]) -> Dict[ResourceType, List[AnalyzedResource]]:
        """Group resources by their type."""
        grouped = {}
        for resource in resources:
            if resource.resource_type not in grouped:
                grouped[resource.resource_type] = []
            grouped[resource.resource_type].append(resource)
        return grouped
    
    def _generate_introduction(self, subject: str, num_resources: int) -> str:
        """Generate an introduction for the recommendations."""
        prompt = f"""Write a brief introduction (2-3 sentences) for a curated list of {num_resources} learning resources about "{subject}". 
The introduction should welcome the user and explain that these resources have been carefully selected and analyzed for relevance and quality."""
        
        messages = [
            Message(role='system', content='You are a helpful research assistant.'),
            Message(role='user', content=prompt)
        ]
        
        return self.llm.generate(messages, max_tokens=200, temperature=0.7)
    
    def _generate_type_section(self, resource_type: ResourceType, 
                              resources: List[AnalyzedResource],
                              subject: str) -> str:
        """Generate a section for a specific resource type."""
        # Section header
        type_names = {
            ResourceType.ACADEMIC_PAPER: 'Academic Papers and Research',
            ResourceType.BOOK: 'Books',
            ResourceType.COURSE: 'Online Courses',
            ResourceType.TUTORIAL: 'Tutorials and Guides',
            ResourceType.DOCUMENTATION: 'Documentation',
            ResourceType.ARTICLE: 'Articles and Blog Posts',
            ResourceType.VIDEO: 'Video Resources',
            ResourceType.UNKNOWN: 'Additional Resources'
        }
        
        section = f"=== {type_names.get(resource_type, 'Resources')} ===\n\n"
        
        # Add each resource
        for i, resource in enumerate(resources, 1):
            section += f"{i}. {resource.title}\n"
            section += f"   URL: {resource.url}\n"
            section += f"   Relevance: {resource.relevance_score:.2f} | Quality: {resource.quality_score:.2f}\n"
            section += f"   {resource.reasoning}\n\n"
        
        return section
    
    def _generate_conclusion(self, subject: str) -> str:
        """Generate a conclusion for the recommendations."""
        prompt = f"""Write a brief conclusion (2-3 sentences) for a curated list of learning resources about "{subject}".
Encourage the user to explore these resources and mention that they can request more specific recommendations if needed."""
        
        messages = [
            Message(role='system', content='You are a helpful research assistant.'),
            Message(role='user', content=prompt)
        ]
        
        return self.llm.generate(messages, max_tokens=200, temperature=0.7)

The RecommendationGenerator creates a well-structured report that organizes resources by type and provides clear explanations for each recommendation. The grouping by resource type helps users quickly find the kind of material they prefer, whether that is academic papers for deep technical understanding or tutorials for hands-on learning.

ORCHESTRATING THE COMPLETE AGENT WORKFLOW

Now we bring all the components together into a cohesive agent that manages the entire workflow from receiving a user query to delivering recommendations:

from typing import Optional, List
import logging

class ResearchAgent:
    """Main agent that orchestrates the research and recommendation process."""
    
    def __init__(self, llm_provider: LLMProvider, 
                 search_provider: SearchProvider,
                 content_extractor: ContentExtractor,
                 content_analyzer: ContentAnalyzer,
                 recommendation_generator: RecommendationGenerator,
                 logger: Optional[logging.Logger] = None):
        """
        Initialize the research agent.
        
        Args:
            llm_provider: LLM provider for language understanding
            search_provider: Search provider for finding resources
            content_extractor: Extractor for retrieving page content
            content_analyzer: Analyzer for evaluating resources
            recommendation_generator: Generator for creating recommendations
            logger: Optional logger for tracking operations
        """
        self.llm = llm_provider
        self.search = search_provider
        self.extractor = content_extractor
        self.analyzer = content_analyzer
        self.recommender = recommendation_generator
        self.logger = logger or logging.getLogger(__name__)
    
    def research(self, subject: str, num_results: int = 20,
                max_recommendations: int = 10) -> str:
        """
        Execute the complete research workflow.
        
        Args:
            subject: The subject area to research
            num_results: Number of search results to retrieve
            max_recommendations: Maximum recommendations to generate
            
        Returns:
            Formatted recommendation report
        """
        self.logger.info(f"Starting research for subject: {subject}")
        
        # Step 1: Enhance the search query using LLM
        enhanced_query = self._enhance_query(subject)
        self.logger.info(f"Enhanced query: {enhanced_query}")
        
        # Step 2: Search for resources
        self.logger.info(f"Searching for {num_results} resources...")
        search_results = self.search.search(enhanced_query, num_results)
        self.logger.info(f"Found {len(search_results)} search results")
        
        if not search_results:
            return f"No resources found for subject: {subject}"
        
        # Step 3: Analyze and rank resources
        self.logger.info("Analyzing resources...")
        analyzed_resources = self.analyzer.batch_analyze(
            search_results, 
            subject, 
            self.extractor
        )
        self.logger.info(f"Analyzed {len(analyzed_resources)} resources")
        
        # Step 4: Generate recommendations
        self.logger.info("Generating recommendations...")
        recommendations = self.recommender.generate_recommendations(
            analyzed_resources,
            subject,
            max_recommendations
        )
        
        self.logger.info("Research complete")
        return recommendations
    
    def _enhance_query(self, subject: str) -> str:
        """
        Use LLM to enhance the search query for better results.
        
        Args:
            subject: The original subject from the user
            
        Returns:
            Enhanced search query string
        """
        prompt = f"""Given the subject "{subject}", generate an optimized search query that will find high-quality learning resources including books, academic papers, tutorials, and articles.

The query should:
- Include relevant technical terms and synonyms
- Be concise but comprehensive
- Focus on authoritative and educational content

Provide only the search query, nothing else."""
        
        messages = [
            Message(role='system', content='You are an expert at formulating effective search queries.'),
            Message(role='user', content=prompt)
        ]
        
        enhanced = self.llm.generate(messages, max_tokens=100, temperature=0.5)
        return enhanced.strip()
    
    def interactive_research(self):
        """
        Run an interactive research session where the user can make multiple queries.
        """
        print("Research Agent - Interactive Mode")
        print("=" * 50)
        print("Enter a subject area to research, or 'quit' to exit.\n")
        
        while True:
            subject = input("Subject: ").strip()
            
            if subject.lower() in ['quit', 'exit', 'q']:
                print("Goodbye!")
                break
            
            if not subject:
                print("Please enter a valid subject.\n")
                continue
            
            try:
                print("\nResearching... This may take a minute.\n")
                recommendations = self.research(subject)
                print(recommendations)
                print("\n" + "=" * 50 + "\n")
            
            except Exception as e:
                print(f"Error during research: {str(e)}\n")
                self.logger.error(f"Research failed: {str(e)}", exc_info=True)

The ResearchAgent class serves as the main entry point for the system. It coordinates all the components and manages the workflow from query to recommendations. The research method implements the complete pipeline: query enhancement, search execution, content analysis, and recommendation generation. The interactive_research method provides a simple command-line interface for users to make multiple queries in a session.

CONFIGURATION MANAGEMENT AND INITIALIZATION

A production system needs robust configuration management to handle different deployment scenarios, API keys, model paths, and other settings. Here is a configuration system that supports multiple environments:

from dataclasses import dataclass
from typing import Optional
import os
import json

@dataclass
class LLMConfig:
    """Configuration for LLM provider."""
    backend: str  # 'local' or 'remote'
    model_path: Optional[str] = None  # For local models
    api_key: Optional[str] = None  # For remote APIs
    api_base: Optional[str] = None
    model_name: Optional[str] = None
    n_gpu_layers: int = -1
    n_ctx: int = 4096

@dataclass
class SearchConfig:
    """Configuration for search providers."""
    providers: List[str]  # e.g., ['duckduckgo', 'bing']
    bing_api_key: Optional[str] = None
    timeout: int = 30

@dataclass
class AgentConfig:
    """Main configuration for the research agent."""
    llm: LLMConfig
    search: SearchConfig
    max_search_results: int = 20
    max_recommendations: int = 10
    content_timeout: int = 30
    max_content_length: int = 50000
    log_level: str = 'INFO'
    
    @classmethod
    def from_file(cls, config_path: str) -> 'AgentConfig':
        """Load configuration from a JSON file."""
        with open(config_path, 'r') as f:
            data = json.load(f)
        
        llm_config = LLMConfig(**data['llm'])
        search_config = SearchConfig(**data['search'])
        
        return cls(
            llm=llm_config,
            search=search_config,
            max_search_results=data.get('max_search_results', 20),
            max_recommendations=data.get('max_recommendations', 10),
            content_timeout=data.get('content_timeout', 30),
            max_content_length=data.get('max_content_length', 50000),
            log_level=data.get('log_level', 'INFO')
        )
    
    @classmethod
    def from_env(cls) -> 'AgentConfig':
        """Load configuration from environment variables."""
        llm_backend = os.getenv('LLM_BACKEND', 'local')
        
        llm_config = LLMConfig(
            backend=llm_backend,
            model_path=os.getenv('LLM_MODEL_PATH'),
            api_key=os.getenv('LLM_API_KEY'),
            api_base=os.getenv('LLM_API_BASE'),
            model_name=os.getenv('LLM_MODEL_NAME'),
            n_gpu_layers=int(os.getenv('LLM_GPU_LAYERS', '-1')),
            n_ctx=int(os.getenv('LLM_CONTEXT_SIZE', '4096'))
        )
        
        search_providers = os.getenv('SEARCH_PROVIDERS', 'duckduckgo').split(',')
        search_config = SearchConfig(
            providers=search_providers,
            bing_api_key=os.getenv('BING_API_KEY'),
            timeout=int(os.getenv('SEARCH_TIMEOUT', '30'))
        )
        
        return cls(
            llm=llm_config,
            search=search_config,
            max_search_results=int(os.getenv('MAX_SEARCH_RESULTS', '20')),
            max_recommendations=int(os.getenv('MAX_RECOMMENDATIONS', '10')),
            content_timeout=int(os.getenv('CONTENT_TIMEOUT', '30')),
            log_level=os.getenv('LOG_LEVEL', 'INFO')
        )

def create_agent_from_config(config: AgentConfig) -> ResearchAgent:
    """
    Factory function to create a fully configured ResearchAgent.
    
    Args:
        config: AgentConfig instance with all settings
        
    Returns:
        Configured ResearchAgent ready to use
    """
    # Setup logging
    logging.basicConfig(
        level=getattr(logging, config.log_level),
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
    )
    logger = logging.getLogger('ResearchAgent')
    
    # Create LLM provider
    if config.llm.backend == 'local':
        llm_provider = create_llm_provider(
            LLMBackend.LOCAL,
            model_path=config.llm.model_path,
            n_gpu_layers=config.llm.n_gpu_layers,
            n_ctx=config.llm.n_ctx
        )
    else:
        llm_provider = create_llm_provider(
            LLMBackend.REMOTE,
            api_key=config.llm.api_key,
            api_base=config.llm.api_base,
            model_name=config.llm.model_name
        )
    
    # Create search providers
    search_providers = []
    for provider_name in config.search.providers:
        if provider_name.lower() == 'duckduckgo':
            search_providers.append(DuckDuckGoSearchProvider(timeout=config.search.timeout))
        elif provider_name.lower() == 'bing' and config.search.bing_api_key:
            search_providers.append(BingSearchProvider(
                api_key=config.search.bing_api_key,
                timeout=config.search.timeout
            ))
        elif provider_name.lower() == 'scholar':
            search_providers.append(ScholarSearchProvider(timeout=config.search.timeout))
    
    # Create aggregated search provider
    search_provider = AggregatedSearchProvider(search_providers)
    
    # Create content extractor
    content_extractor = ContentExtractor(
        timeout=config.content_timeout,
        max_content_length=config.max_content_length
    )
    
    # Create content analyzer
    content_analyzer = ContentAnalyzer(llm_provider)
    
    # Create recommendation generator
    recommendation_generator = RecommendationGenerator(llm_provider)
    
    # Create and return the agent
    return ResearchAgent(
        llm_provider=llm_provider,
        search_provider=search_provider,
        content_extractor=content_extractor,
        content_analyzer=content_analyzer,
        recommendation_generator=recommendation_generator,
        logger=logger
    )

The configuration system supports both file-based and environment variable-based configuration, making it flexible for different deployment scenarios. The create_agent_from_config factory function handles all the complexity of instantiating and wiring together the various components based on the configuration.

ERROR HANDLING AND RESILIENCE

Production systems must handle errors gracefully and continue operating even when individual components fail. We implement comprehensive error handling and retry logic:

from functools import wraps
import time
from typing import Callable, Any

def retry_on_failure(max_attempts: int = 3, delay: float = 1.0, 
                    backoff: float = 2.0):
    """
    Decorator that retries a function on failure with exponential backoff.
    
    Args:
        max_attempts: Maximum number of retry attempts
        delay: Initial delay between retries in seconds
        backoff: Multiplier for delay after each attempt
    """
    def decorator(func: Callable) -> Callable:
        @wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            current_delay = delay
            last_exception = None
            
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    last_exception = e
                    if attempt < max_attempts - 1:
                        time.sleep(current_delay)
                        current_delay *= backoff
                    else:
                        raise last_exception
            
            raise last_exception
        
        return wrapper
    return decorator

class ResilientSearchProvider(SearchProvider):
    """Search provider wrapper with error handling and fallback."""
    
    def __init__(self, primary: SearchProvider, 
                 fallback: Optional[SearchProvider] = None):
        """
        Initialize resilient search provider.
        
        Args:
            primary: Primary search provider to use
            fallback: Optional fallback provider if primary fails
        """
        self.primary = primary
        self.fallback = fallback
    
    @retry_on_failure(max_attempts=2, delay=1.0)
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Search with automatic fallback on failure."""
        try:
            return self.primary.search(query, num_results)
        except Exception as e:
            if self.fallback:
                print(f"Primary search failed, using fallback: {str(e)}")
                return self.fallback.search(query, num_results)
            else:
                raise

The retry decorator implements exponential backoff for transient failures, which is particularly important for network operations. The ResilientSearchProvider wraps search providers with fallback logic, ensuring that the system can continue operating even if one search provider fails.

PERFORMANCE OPTIMIZATION AND CACHING

For production use, we need to optimize performance and reduce redundant operations. Implementing caching for search results and content extraction significantly improves response times for repeated queries:

from functools import lru_cache
import hashlib
import pickle
import os
from typing import Optional

class CachedContentExtractor(ContentExtractor):
    """Content extractor with disk-based caching."""
    
    def __init__(self, cache_dir: str = '.cache', **kwargs):
        """
        Initialize cached content extractor.
        
        Args:
            cache_dir: Directory to store cached content
            **kwargs: Arguments passed to ContentExtractor
        """
        super().__init__(**kwargs)
        self.cache_dir = cache_dir
        os.makedirs(cache_dir, exist_ok=True)
    
    def _get_cache_path(self, url: str) -> str:
        """Generate cache file path for a URL."""
        url_hash = hashlib.md5(url.encode()).hexdigest()
        return os.path.join(self.cache_dir, f"{url_hash}.pkl")
    
    def extract(self, url: str) -> Optional[Dict[str, str]]:
        """Extract content with caching."""
        cache_path = self._get_cache_path(url)
        
        # Check cache
        if os.path.exists(cache_path):
            try:
                with open(cache_path, 'rb') as f:
                    return pickle.load(f)
            except Exception:
                pass  # Cache read failed, fetch fresh
        
        # Fetch fresh content
        content = super().extract(url)
        
        # Cache the result
        if content:
            try:
                with open(cache_path, 'wb') as f:
                    pickle.dump(content, f)
            except Exception:
                pass  # Cache write failed, not critical
        
        return content

class SearchResultCache:
    """Cache for search results with TTL support."""
    
    def __init__(self, cache_dir: str = '.cache', ttl_hours: int = 24):
        """
        Initialize search result cache.
        
        Args:
            cache_dir: Directory to store cached results
            ttl_hours: Time-to-live for cached results in hours
        """
        self.cache_dir = os.path.join(cache_dir, 'search')
        self.ttl_seconds = ttl_hours * 3600
        os.makedirs(self.cache_dir, exist_ok=True)
    
    def _get_cache_key(self, query: str, num_results: int) -> str:
        """Generate cache key for a query."""
        key_str = f"{query}:{num_results}"
        return hashlib.md5(key_str.encode()).hexdigest()
    
    def get(self, query: str, num_results: int) -> Optional[List[SearchResult]]:
        """Retrieve cached search results if available and fresh."""
        cache_key = self._get_cache_key(query, num_results)
        cache_path = os.path.join(self.cache_dir, f"{cache_key}.pkl")
        
        if not os.path.exists(cache_path):
            return None
        
        # Check if cache is still fresh
        cache_age = time.time() - os.path.getmtime(cache_path)
        if cache_age > self.ttl_seconds:
            return None
        
        try:
            with open(cache_path, 'rb') as f:
                return pickle.load(f)
        except Exception:
            return None
    
    def set(self, query: str, num_results: int, results: List[SearchResult]):
        """Cache search results."""
        cache_key = self._get_cache_key(query, num_results)
        cache_path = os.path.join(self.cache_dir, f"{cache_key}.pkl")
        
        try:
            with open(cache_path, 'wb') as f:
                pickle.dump(results, f)
        except Exception:
            pass  # Cache write failed, not critical

class CachedSearchProvider(SearchProvider):
    """Search provider wrapper with caching."""
    
    def __init__(self, provider: SearchProvider, cache: SearchResultCache):
        """
        Initialize cached search provider.
        
        Args:
            provider: Underlying search provider
            cache: Cache instance to use
        """
        self.provider = provider
        self.cache = cache
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Search with caching."""
        # Check cache first
        cached = self.cache.get(query, num_results)
        if cached is not None:
            return cached
        
        # Fetch fresh results
        results = self.provider.search(query, num_results)
        
        # Cache the results
        self.cache.set(query, num_results, results)
        
        return results

The caching implementations use disk-based storage to persist results across sessions. The SearchResultCache includes time-to-live functionality to ensure that cached results do not become stale. This is particularly important for rapidly changing topics where new resources are frequently published.

MONITORING AND OBSERVABILITY

Production systems require monitoring to track performance, identify issues, and understand usage patterns. We implement comprehensive logging and metrics collection:

from dataclasses import dataclass
from datetime import datetime
from typing import Dict, List
import json

@dataclass
class ResearchMetrics:
    """Metrics for a research operation."""
    subject: str
    start_time: datetime
    end_time: Optional[datetime] = None
    search_results_count: int = 0
    analyzed_resources_count: int = 0
    recommendations_count: int = 0
    errors: List[str] = None
    
    def __post_init__(self):
        if self.errors is None:
            self.errors = []
    
    @property
    def duration_seconds(self) -> float:
        """Calculate operation duration in seconds."""
        if self.end_time:
            return (self.end_time - self.start_time).total_seconds()
        return 0.0
    
    def to_dict(self) -> Dict:
        """Convert metrics to dictionary for serialization."""
        return {
            'subject': self.subject,
            'start_time': self.start_time.isoformat(),
            'end_time': self.end_time.isoformat() if self.end_time else None,
            'duration_seconds': self.duration_seconds,
            'search_results_count': self.search_results_count,
            'analyzed_resources_count': self.analyzed_resources_count,
            'recommendations_count': self.recommendations_count,
            'errors': self.errors
        }

class MetricsCollector:
    """Collects and persists metrics for research operations."""
    
    def __init__(self, metrics_file: str = 'metrics.jsonl'):
        """
        Initialize metrics collector.
        
        Args:
            metrics_file: Path to file for storing metrics
        """
        self.metrics_file = metrics_file
    
    def record(self, metrics: ResearchMetrics):
        """Record metrics to file."""
        with open(self.metrics_file, 'a') as f:
            f.write(json.dumps(metrics.to_dict()) + '\n')
    
    def get_statistics(self) -> Dict:
        """Calculate aggregate statistics from recorded metrics."""
        if not os.path.exists(self.metrics_file):
            return {}
        
        total_operations = 0
        total_duration = 0.0
        total_results = 0
        total_errors = 0
        
        with open(self.metrics_file, 'r') as f:
            for line in f:
                try:
                    data = json.loads(line)
                    total_operations += 1
                    total_duration += data.get('duration_seconds', 0)
                    total_results += data.get('search_results_count', 0)
                    total_errors += len(data.get('errors', []))
                except json.JSONDecodeError:
                    continue
        
        if total_operations == 0:
            return {}
        
        return {
            'total_operations': total_operations,
            'average_duration_seconds': total_duration / total_operations,
            'average_results_per_operation': total_results / total_operations,
            'total_errors': total_errors,
            'error_rate': total_errors / total_operations
        }

class MonitoredResearchAgent(ResearchAgent):
    """Research agent with integrated metrics collection."""
    
    def __init__(self, metrics_collector: MetricsCollector, **kwargs):
        """
        Initialize monitored research agent.
        
        Args:
            metrics_collector: MetricsCollector instance
            **kwargs: Arguments passed to ResearchAgent
        """
        super().__init__(**kwargs)
        self.metrics = metrics_collector
    
    def research(self, subject: str, num_results: int = 20,
                max_recommendations: int = 10) -> str:
        """Execute research with metrics collection."""
        # Initialize metrics
        operation_metrics = ResearchMetrics(
            subject=subject,
            start_time=datetime.now()
        )
        
        try:
            # Execute research
            result = super().research(subject, num_results, max_recommendations)
            
            # Record success metrics
            operation_metrics.end_time = datetime.now()
            operation_metrics.search_results_count = num_results
            operation_metrics.recommendations_count = max_recommendations
            
            return result
        
        except Exception as e:
            # Record error
            operation_metrics.end_time = datetime.now()
            operation_metrics.errors.append(str(e))
            raise
        
        finally:
            # Always record metrics
            self.metrics.record(operation_metrics)

The metrics collection system tracks key performance indicators for each research operation including duration, result counts, and errors. This data enables operators to identify performance bottlenecks, track error rates, and understand usage patterns over time.

FULL PRODUCTION-READY RUNNING EXAMPLE

Now we present a complete, production-ready implementation that integrates all the components discussed above. This example can handle any subject area query and provides a robust, scalable solution.

#!/usr/bin/env python3
"""
Research Agent - Production Implementation

A complete LLM-based agent for discovering and recommending learning resources
across any subject area. Supports local and remote LLMs with multiple GPU
architectures (NVIDIA CUDA, AMD ROCm, Intel, Apple MPS).

Usage:
    python research_agent.py --config config.json
    python research_agent.py --subject "Quantum Computing"
"""

import argparse
import sys
import os
import logging
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Iterator, Union, Set, Any, Callable
from enum import Enum
import requests
import json
import hashlib
import pickle
import time
from datetime import datetime
from functools import wraps
from concurrent.futures import ThreadPoolExecutor, as_completed
from urllib.parse import quote_plus, urlparse

# Try to import optional dependencies
try:
    from llama_cpp import Llama
    LLAMA_CPP_AVAILABLE = True
except ImportError:
    LLAMA_CPP_AVAILABLE = False
    print("Warning: llama-cpp-python not available. Local LLM support disabled.")

try:
    from bs4 import BeautifulSoup
    BS4_AVAILABLE = True
except ImportError:
    BS4_AVAILABLE = False
    print("Warning: beautifulsoup4 not available. Web scraping disabled.")


# ============================================================================
# Core Data Structures
# ============================================================================

@dataclass
class Message:
    """Represents a single message in a conversation."""
    role: str  # 'system', 'user', or 'assistant'
    content: str


@dataclass
class SearchResult:
    """Represents a single search result."""
    title: str
    url: str
    snippet: str
    source: str


class ResourceType(Enum):
    """Types of learning resources."""
    ACADEMIC_PAPER = "academic_paper"
    BOOK = "book"
    TUTORIAL = "tutorial"
    ARTICLE = "article"
    DOCUMENTATION = "documentation"
    VIDEO = "video"
    COURSE = "course"
    UNKNOWN = "unknown"


@dataclass
class AnalyzedResource:
    """Represents an analyzed and scored resource."""
    title: str
    url: str
    snippet: str
    resource_type: ResourceType
    relevance_score: float
    quality_score: float
    reasoning: str
    metadata: Dict[str, str] = field(default_factory=dict)
    
    @property
    def overall_score(self) -> float:
        """Calculate overall score as weighted combination."""
        return 0.6 * self.relevance_score + 0.4 * self.quality_score


@dataclass
class ResearchMetrics:
    """Metrics for a research operation."""
    subject: str
    start_time: datetime
    end_time: Optional[datetime] = None
    search_results_count: int = 0
    analyzed_resources_count: int = 0
    recommendations_count: int = 0
    errors: List[str] = field(default_factory=list)
    
    @property
    def duration_seconds(self) -> float:
        """Calculate operation duration in seconds."""
        if self.end_time:
            return (self.end_time - self.start_time).total_seconds()
        return 0.0
    
    def to_dict(self) -> Dict:
        """Convert metrics to dictionary for serialization."""
        return {
            'subject': self.subject,
            'start_time': self.start_time.isoformat(),
            'end_time': self.end_time.isoformat() if self.end_time else None,
            'duration_seconds': self.duration_seconds,
            'search_results_count': self.search_results_count,
            'analyzed_resources_count': self.analyzed_resources_count,
            'recommendations_count': self.recommendations_count,
            'errors': self.errors
        }


# ============================================================================
# Configuration
# ============================================================================

@dataclass
class LLMConfig:
    """Configuration for LLM provider."""
    backend: str
    model_path: Optional[str] = None
    api_key: Optional[str] = None
    api_base: Optional[str] = None
    model_name: Optional[str] = None
    n_gpu_layers: int = -1
    n_ctx: int = 4096


@dataclass
class SearchConfig:
    """Configuration for search providers."""
    providers: List[str]
    bing_api_key: Optional[str] = None
    timeout: int = 30


@dataclass
class AgentConfig:
    """Main configuration for the research agent."""
    llm: LLMConfig
    search: SearchConfig
    max_search_results: int = 20
    max_recommendations: int = 10
    content_timeout: int = 30
    max_content_length: int = 50000
    log_level: str = 'INFO'
    cache_enabled: bool = True
    cache_dir: str = '.cache'
    cache_ttl_hours: int = 24
    metrics_enabled: bool = True
    metrics_file: str = 'metrics.jsonl'
    
    @classmethod
    def from_file(cls, config_path: str) -> 'AgentConfig':
        """Load configuration from a JSON file."""
        with open(config_path, 'r') as f:
            data = json.load(f)
        
        llm_config = LLMConfig(**data['llm'])
        search_config = SearchConfig(**data['search'])
        
        return cls(
            llm=llm_config,
            search=search_config,
            max_search_results=data.get('max_search_results', 20),
            max_recommendations=data.get('max_recommendations', 10),
            content_timeout=data.get('content_timeout', 30),
            max_content_length=data.get('max_content_length', 50000),
            log_level=data.get('log_level', 'INFO'),
            cache_enabled=data.get('cache_enabled', True),
            cache_dir=data.get('cache_dir', '.cache'),
            cache_ttl_hours=data.get('cache_ttl_hours', 24),
            metrics_enabled=data.get('metrics_enabled', True),
            metrics_file=data.get('metrics_file', 'metrics.jsonl')
        )
    
    @classmethod
    def from_env(cls) -> 'AgentConfig':
        """Load configuration from environment variables."""
        llm_backend = os.getenv('LLM_BACKEND', 'local')
        
        llm_config = LLMConfig(
            backend=llm_backend,
            model_path=os.getenv('LLM_MODEL_PATH'),
            api_key=os.getenv('LLM_API_KEY'),
            api_base=os.getenv('LLM_API_BASE'),
            model_name=os.getenv('LLM_MODEL_NAME'),
            n_gpu_layers=int(os.getenv('LLM_GPU_LAYERS', '-1')),
            n_ctx=int(os.getenv('LLM_CONTEXT_SIZE', '4096'))
        )
        
        search_providers = os.getenv('SEARCH_PROVIDERS', 'duckduckgo').split(',')
        search_config = SearchConfig(
            providers=search_providers,
            bing_api_key=os.getenv('BING_API_KEY'),
            timeout=int(os.getenv('SEARCH_TIMEOUT', '30'))
        )
        
        return cls(
            llm=llm_config,
            search=search_config,
            max_search_results=int(os.getenv('MAX_SEARCH_RESULTS', '20')),
            max_recommendations=int(os.getenv('MAX_RECOMMENDATIONS', '10')),
            content_timeout=int(os.getenv('CONTENT_TIMEOUT', '30')),
            log_level=os.getenv('LOG_LEVEL', 'INFO'),
            cache_enabled=os.getenv('CACHE_ENABLED', 'true').lower() == 'true',
            cache_dir=os.getenv('CACHE_DIR', '.cache'),
            cache_ttl_hours=int(os.getenv('CACHE_TTL_HOURS', '24')),
            metrics_enabled=os.getenv('METRICS_ENABLED', 'true').lower() == 'true',
            metrics_file=os.getenv('METRICS_FILE', 'metrics.jsonl')
        )


# ============================================================================
# Utility Functions and Decorators
# ============================================================================

def retry_on_failure(max_attempts: int = 3, delay: float = 1.0, 
                    backoff: float = 2.0):
    """
    Decorator that retries a function on failure with exponential backoff.
    
    Args:
        max_attempts: Maximum number of retry attempts
        delay: Initial delay between retries in seconds
        backoff: Multiplier for delay after each attempt
    """
    def decorator(func: Callable) -> Callable:
        @wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            current_delay = delay
            last_exception = None
            
            for attempt in range(max_attempts):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    last_exception = e
                    if attempt < max_attempts - 1:
                        time.sleep(current_delay)
                        current_delay *= backoff
            
            raise last_exception
        
        return wrapper
    return decorator


# ============================================================================
# LLM Provider Abstraction
# ============================================================================

class LLMProvider(ABC):
    """Abstract base class for all LLM providers."""
    
    @abstractmethod
    def generate(self, messages: List[Message], max_tokens: int = 2048, 
                temperature: float = 0.7) -> str:
        """Generate a completion given a list of messages."""
        pass
    
    @abstractmethod
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """Generate a completion with streaming output."""
        pass
    
    @abstractmethod
    def get_model_info(self) -> Dict[str, str]:
        """Retrieve information about the loaded model."""
        pass


class LocalLLMProvider(LLMProvider):
    """LLM provider for locally-hosted models using llama.cpp."""
    
    def __init__(self, model_path: str, n_gpu_layers: int = -1, 
                 n_ctx: int = 4096, verbose: bool = False):
        """
        Initialize the local LLM provider.
        
        Args:
            model_path: Path to the GGUF model file
            n_gpu_layers: Number of layers to offload to GPU (-1 for all)
            n_ctx: Context window size in tokens
            verbose: Whether to print detailed loading information
        """
        if not LLAMA_CPP_AVAILABLE:
            raise ImportError("llama-cpp-python is required for local LLM support")
        
        if not os.path.exists(model_path):
            raise FileNotFoundError(f"Model file not found: {model_path}")
        
        self.llm = Llama(
            model_path=model_path,
            n_gpu_layers=n_gpu_layers,
            n_ctx=n_ctx,
            verbose=verbose,
            n_threads=os.cpu_count() or 4
        )
        
        self.model_path = model_path
        self.context_size = n_ctx
    
    def generate(self, messages: List[Message], max_tokens: int = 2048,
                temperature: float = 0.7) -> str:
        """Generate a completion from the local model."""
        formatted_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in messages
        ]
        
        response = self.llm.create_chat_completion(
            messages=formatted_messages,
            max_tokens=max_tokens,
            temperature=temperature,
            stream=False
        )
        
        return response['choices'][0]['message']['content']
    
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """Generate a streaming completion from the local model."""
        formatted_messages = [
            {"role": msg.role, "content": msg.content}
            for msg in messages
        ]
        
        stream = self.llm.create_chat_completion(
            messages=formatted_messages,
            max_tokens=max_tokens,
            temperature=temperature,
            stream=True
        )
        
        for chunk in stream:
            delta = chunk['choices'][0]['delta']
            if 'content' in delta:
                yield delta['content']
    
    def get_model_info(self) -> Dict[str, str]:
        """Return information about the loaded model."""
        return {
            'provider': 'local_llama_cpp',
            'model_path': self.model_path,
            'context_size': str(self.context_size),
            'gpu_layers': 'auto-detected'
        }


class RemoteLLMProvider(LLMProvider):
    """LLM provider for remote API-based models."""
    
    def __init__(self, api_key: str, api_base: str = "https://api.openai.com/v1",
                 model_name: str = "gpt-4", timeout: int = 120):
        """
        Initialize the remote LLM provider.
        
        Args:
            api_key: API authentication key
            api_base: Base URL for the API endpoint
            model_name: Name of the model to use
            timeout: Request timeout in seconds
        """
        self.api_key = api_key
        self.api_base = api_base.rstrip('/')
        self.model_name = model_name
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        })
    
    @retry_on_failure(max_attempts=3, delay=1.0)
    def generate(self, messages: List[Message], max_tokens: int = 2048,
                temperature: float = 0.7) -> str:
        """Generate a completion from the remote API."""
        payload = {
            'model': self.model_name,
            'messages': [
                {'role': msg.role, 'content': msg.content}
                for msg in messages
            ],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'stream': False
        }
        
        response = self.session.post(
            f'{self.api_base}/chat/completions',
            json=payload,
            timeout=self.timeout
        )
        
        if response.status_code != 200:
            raise Exception(f"API request failed: {response.status_code} - {response.text}")
        
        result = response.json()
        return result['choices'][0]['message']['content']
    
    def stream_generate(self, messages: List[Message], max_tokens: int = 2048,
                       temperature: float = 0.7) -> Iterator[str]:
        """Generate a streaming completion from the remote API."""
        payload = {
            'model': self.model_name,
            'messages': [
                {'role': msg.role, 'content': msg.content}
                for msg in messages
            ],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'stream': True
        }
        
        response = self.session.post(
            f'{self.api_base}/chat/completions',
            json=payload,
            timeout=self.timeout,
            stream=True
        )
        
        if response.status_code != 200:
            raise Exception(f"API request failed: {response.status_code}")
        
        for line in response.iter_lines():
            if not line:
                continue
            
            line_text = line.decode('utf-8')
            if not line_text.startswith('data: '):
                continue
            
            data_str = line_text[6:]
            if data_str.strip() == '[DONE]':
                break
            
            try:
                data = json.loads(data_str)
                delta = data['choices'][0]['delta']
                if 'content' in delta:
                    yield delta['content']
            except json.JSONDecodeError:
                continue
    
    def get_model_info(self) -> Dict[str, str]:
        """Return information about the remote model."""
        return {
            'provider': 'remote_api',
            'model_name': self.model_name,
            'api_base': self.api_base
        }


class LLMBackend(Enum):
    """Enumeration of supported LLM backends."""
    LOCAL = "local"
    REMOTE = "remote"


def create_llm_provider(backend: Union[LLMBackend, str], 
                       **kwargs) -> LLMProvider:
    """Factory function to create the appropriate LLM provider."""
    if isinstance(backend, str):
        backend = LLMBackend(backend.lower())
    
    if backend == LLMBackend.LOCAL:
        required_params = ['model_path']
        for param in required_params:
            if param not in kwargs:
                raise ValueError(f"Missing required parameter for local backend: {param}")
        return LocalLLMProvider(**kwargs)
    
    elif backend == LLMBackend.REMOTE:
        required_params = ['api_key']
        for param in required_params:
            if param not in kwargs:
                raise ValueError(f"Missing required parameter for remote backend: {param}")
        return RemoteLLMProvider(**kwargs)
    
    else:
        raise ValueError(f"Unsupported backend: {backend}")


# ============================================================================
# Search Provider Abstraction
# ============================================================================

class SearchProvider(ABC):
    """Abstract base class for search providers."""
    
    @abstractmethod
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute a search query and return results."""
        pass


class DuckDuckGoSearchProvider(SearchProvider):
    """Search provider using DuckDuckGo's HTML interface."""
    
    def __init__(self, timeout: int = 30):
        """Initialize the DuckDuckGo search provider."""
        if not BS4_AVAILABLE:
            raise ImportError("beautifulsoup4 is required for DuckDuckGo search")
        
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    @retry_on_failure(max_attempts=2)
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute a DuckDuckGo search."""
        url = f"https://html.duckduckgo.com/html/?q={quote_plus(query)}"
        
        response = self.session.get(url, timeout=self.timeout)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.text, 'html.parser')
        results = []
        
        result_divs = soup.find_all('div', class_='result')
        
        for div in result_divs[:num_results]:
            title_elem = div.find('a', class_='result__a')
            if not title_elem:
                continue
            
            title = title_elem.get_text(strip=True)
            url = title_elem.get('href', '')
            
            snippet_elem = div.find('a', class_='result__snippet')
            snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''
            
            if url and title:
                results.append(SearchResult(
                    title=title,
                    url=url,
                    snippet=snippet,
                    source='duckduckgo'
                ))
        
        return results


class BingSearchProvider(SearchProvider):
    """Search provider using Bing Search API."""
    
    def __init__(self, api_key: str, timeout: int = 30):
        """Initialize the Bing search provider."""
        self.api_key = api_key
        self.timeout = timeout
        self.endpoint = "https://api.bing.microsoft.com/v7.0/search"
        self.session = requests.Session()
        self.session.headers.update({
            'Ocp-Apim-Subscription-Key': api_key
        })
    
    @retry_on_failure(max_attempts=2)
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute a Bing search."""
        params = {
            'q': query,
            'count': num_results,
            'textDecorations': False,
            'textFormat': 'Raw'
        }
        
        response = self.session.get(
            self.endpoint,
            params=params,
            timeout=self.timeout
        )
        response.raise_for_status()
        
        data = response.json()
        results = []
        
        if 'webPages' in data and 'value' in data['webPages']:
            for item in data['webPages']['value']:
                results.append(SearchResult(
                    title=item.get('name', ''),
                    url=item.get('url', ''),
                    snippet=item.get('snippet', ''),
                    source='bing'
                ))
        
        return results


class ScholarSearchProvider(SearchProvider):
    """Search provider for academic papers and scholarly articles."""
    
    def __init__(self, timeout: int = 30):
        """Initialize the scholar search provider."""
        if not BS4_AVAILABLE:
            raise ImportError("beautifulsoup4 is required for Scholar search")
        
        self.timeout = timeout
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    @retry_on_failure(max_attempts=2)
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Search for scholarly articles using Google Scholar."""
        url = f"https://scholar.google.com/scholar?q={quote_plus(query)}&hl=en&num={num_results}"
        
        response = self.session.get(url, timeout=self.timeout)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.text, 'html.parser')
        results = []
        
        result_divs = soup.find_all('div', class_='gs_ri')
        
        for div in result_divs[:num_results]:
            title_elem = div.find('h3', class_='gs_rt')
            if not title_elem:
                continue
            
            for cite in title_elem.find_all('span', class_='gs_ct1'):
                cite.decompose()
            for cite in title_elem.find_all('span', class_='gs_ct2'):
                cite.decompose()
            
            title_link = title_elem.find('a')
            title = title_link.get_text(strip=True) if title_link else title_elem.get_text(strip=True)
            url = title_link.get('href', '') if title_link else ''
            
            snippet_elem = div.find('div', class_='gs_rs')
            snippet = snippet_elem.get_text(strip=True) if snippet_elem else ''
            
            if title:
                results.append(SearchResult(
                    title=title,
                    url=url,
                    snippet=snippet,
                    source='google_scholar'
                ))
        
        return results


class AggregatedSearchProvider(SearchProvider):
    """Aggregates results from multiple search providers."""
    
    def __init__(self, providers: List[SearchProvider], max_workers: int = 3):
        """Initialize the aggregated search provider."""
        self.providers = providers
        self.max_workers = max_workers
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Execute searches across all providers and aggregate results."""
        all_results = []
        seen_urls: Set[str] = set()
        
        with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
            future_to_provider = {
                executor.submit(provider.search, query, num_results): provider
                for provider in self.providers
            }
            
            for future in as_completed(future_to_provider):
                provider = future_to_provider[future]
                try:
                    results = future.result()
                    
                    for result in results:
                        if result.url and result.url not in seen_urls:
                            seen_urls.add(result.url)
                            all_results.append(result)
                
                except Exception as e:
                    logging.warning(f"Search provider {provider.__class__.__name__} failed: {str(e)}")
                    continue
        
        return all_results[:num_results]


# ============================================================================
# Content Extraction
# ============================================================================

class ContentExtractor:
    """Extracts main content from web pages."""
    
    def __init__(self, timeout: int = 30, max_content_length: int = 50000):
        """Initialize the content extractor."""
        if not BS4_AVAILABLE:
            raise ImportError("beautifulsoup4 is required for content extraction")
        
        self.timeout = timeout
        self.max_content_length = max_content_length
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    @retry_on_failure(max_attempts=2)
    def extract(self, url: str) -> Optional[Dict[str, str]]:
        """Extract main content from a URL."""
        try:
            response = self.session.get(url, timeout=self.timeout)
            response.raise_for_status()
        except requests.RequestException as e:
            logging.warning(f"Failed to fetch {url}: {str(e)}")
            return None
        
        soup = BeautifulSoup(response.text, 'html.parser')
        
        for element in soup(['script', 'style', 'nav', 'footer', 'header', 'aside']):
            element.decompose()
        
        title = ''
        title_tag = soup.find('title')
        if title_tag:
            title = title_tag.get_text(strip=True)
        
        main_content = None
        for selector in ['main', 'article', '[role="main"]', '.content', '#content']:
            main_content = soup.select_one(selector)
            if main_content:
                break
        
        if not main_content:
            main_content = soup.find('body')
        
        if not main_content:
            return None
        
        text = main_content.get_text(separator=' ', strip=True)
        text = ' '.join(text.split())
        
        if len(text) > self.max_content_length:
            text = text[:self.max_content_length] + '...'
        
        return {
            'title': title,
            'content': text,
            'url': url
        }


class CachedContentExtractor(ContentExtractor):
    """Content extractor with disk-based caching."""
    
    def __init__(self, cache_dir: str = '.cache', **kwargs):
        """Initialize cached content extractor."""
        super().__init__(**kwargs)
        self.cache_dir = cache_dir
        os.makedirs(cache_dir, exist_ok=True)
    
    def _get_cache_path(self, url: str) -> str:
        """Generate cache file path for a URL."""
        url_hash = hashlib.md5(url.encode()).hexdigest()
        return os.path.join(self.cache_dir, f"{url_hash}.pkl")
    
    def extract(self, url: str) -> Optional[Dict[str, str]]:
        """Extract content with caching."""
        cache_path = self._get_cache_path(url)
        
        if os.path.exists(cache_path):
            try:
                with open(cache_path, 'rb') as f:
                    return pickle.load(f)
            except Exception:
                pass
        
        content = super().extract(url)
        
        if content:
            try:
                with open(cache_path, 'wb') as f:
                    pickle.dump(content, f)
            except Exception:
                pass
        
        return content


# ============================================================================
# Search Result Caching
# ============================================================================

class SearchResultCache:
    """Cache for search results with TTL support."""
    
    def __init__(self, cache_dir: str = '.cache', ttl_hours: int = 24):
        """Initialize search result cache."""
        self.cache_dir = os.path.join(cache_dir, 'search')
        self.ttl_seconds = ttl_hours * 3600
        os.makedirs(self.cache_dir, exist_ok=True)
    
    def _get_cache_key(self, query: str, num_results: int) -> str:
        """Generate cache key for a query."""
        key_str = f"{query}:{num_results}"
        return hashlib.md5(key_str.encode()).hexdigest()
    
    def get(self, query: str, num_results: int) -> Optional[List[SearchResult]]:
        """Retrieve cached search results if available and fresh."""
        cache_key = self._get_cache_key(query, num_results)
        cache_path = os.path.join(self.cache_dir, f"{cache_key}.pkl")
        
        if not os.path.exists(cache_path):
            return None
        
        cache_age = time.time() - os.path.getmtime(cache_path)
        if cache_age > self.ttl_seconds:
            return None
        
        try:
            with open(cache_path, 'rb') as f:
                return pickle.load(f)
        except Exception:
            return None
    
    def set(self, query: str, num_results: int, results: List[SearchResult]):
        """Cache search results."""
        cache_key = self._get_cache_key(query, num_results)
        cache_path = os.path.join(self.cache_dir, f"{cache_key}.pkl")
        
        try:
            with open(cache_path, 'wb') as f:
                pickle.dump(results, f)
        except Exception:
            pass


class CachedSearchProvider(SearchProvider):
    """Search provider wrapper with caching."""
    
    def __init__(self, provider: SearchProvider, cache: SearchResultCache):
        """Initialize cached search provider."""
        self.provider = provider
        self.cache = cache
    
    def search(self, query: str, num_results: int = 10) -> List[SearchResult]:
        """Search with caching."""
        cached = self.cache.get(query, num_results)
        if cached is not None:
            logging.info(f"Using cached search results for: {query}")
            return cached
        
        results = self.provider.search(query, num_results)
        self.cache.set(query, num_results, results)
        
        return results


# ============================================================================
# Content Analysis
# ============================================================================

class ContentAnalyzer:
    """Analyzes and ranks content using LLM capabilities."""
    
    def __init__(self, llm_provider: LLMProvider):
        """Initialize the content analyzer."""
        self.llm = llm_provider
    
    def analyze_resource(self, search_result: SearchResult, 
                        content: Optional[str], subject: str) -> AnalyzedResource:
        """Analyze a single resource for relevance and quality."""
        analysis_text = f"Title: {search_result.title}\n"
        analysis_text += f"URL: {search_result.url}\n"
        analysis_text += f"Snippet: {search_result.snippet}\n"
        
        if content:
            content_preview = content[:2000]
            analysis_text += f"Content preview: {content_preview}\n"
        
        prompt = self._build_analysis_prompt(analysis_text, subject)
        
        messages = [
            Message(role='system', content='You are an expert research assistant that evaluates the relevance and quality of learning resources.'),
            Message(role='user', content=prompt)
        ]
        
        try:
            response = self.llm.generate(messages, max_tokens=500, temperature=0.3)
            scores = self._parse_analysis_response(response)
            
            return AnalyzedResource(
                title=search_result.title,
                url=search_result.url,
                snippet=search_result.snippet,
                resource_type=scores['resource_type'],
                relevance_score=scores['relevance_score'],
                quality_score=scores['quality_score'],
                reasoning=scores['reasoning']
            )
        
        except Exception as e:
            logging.warning(f"Analysis failed for {search_result.url}: {str(e)}")
            return AnalyzedResource(
                title=search_result.title,
                url=search_result.url,
                snippet=search_result.snippet,
                resource_type=ResourceType.UNKNOWN,
                relevance_score=0.5,
                quality_score=0.5,
                reasoning="Analysis failed; using default scores"
            )
    
    def _build_analysis_prompt(self, resource_info: str, subject: str) -> str:
        """Build the prompt for resource analysis."""
        prompt = f"""Analyze the following resource for a user researching "{subject}".

{resource_info}

Provide your analysis in the following structured format:

RESOURCE_TYPE: [one of: academic_paper, book, tutorial, article, documentation, video, course, unknown]
RELEVANCE_SCORE: [0.0 to 1.0, where 1.0 means highly relevant to the subject]
QUALITY_SCORE: [0.0 to 1.0, where 1.0 means high quality and authoritative]
REASONING: [brief explanation of your scores]

Consider these factors:
- How well does the resource match the subject area?
- Is it from an authoritative source?
- Is it comprehensive and well-structured?
- Is it suitable for learning about the subject?
"""
        return prompt
    
    def _parse_analysis_response(self, response: str) -> Dict:
        """Parse the structured analysis response from the LLM."""
        lines = response.strip().split('\n')
        result = {
            'resource_type': ResourceType.UNKNOWN,
            'relevance_score': 0.5,
            'quality_score': 0.5,
            'reasoning': ''
        }
        
        for line in lines:
            line = line.strip()
            
            if line.startswith('RESOURCE_TYPE:'):
                type_str = line.split(':', 1)[1].strip().lower()
                try:
                    result['resource_type'] = ResourceType(type_str)
                except ValueError:
                    result['resource_type'] = ResourceType.UNKNOWN
            
            elif line.startswith('RELEVANCE_SCORE:'):
                try:
                    score = float(line.split(':', 1)[1].strip())
                    result['relevance_score'] = max(0.0, min(1.0, score))
                except ValueError:
                    pass
            
            elif line.startswith('QUALITY_SCORE:'):
                try:
                    score = float(line.split(':', 1)[1].strip())
                    result['quality_score'] = max(0.0, min(1.0, score))
                except ValueError:
                    pass
            
            elif line.startswith('REASONING:'):
                result['reasoning'] = line.split(':', 1)[1].strip()
        
        return result
    
    def batch_analyze(self, search_results: List[SearchResult], 
                     subject: str, extractor: ContentExtractor,
                     max_workers: int = 5) -> List[AnalyzedResource]:
        """Analyze multiple resources in parallel."""
        analyzed_resources = []
        
        with ThreadPoolExecutor(max_workers=max_workers) as executor:
            future_to_result = {}
            
            for result in search_results:
                future = executor.submit(self._analyze_with_content, 
                                       result, subject, extractor)
                future_to_result[future] = result
            
            for future in as_completed(future_to_result):
                try:
                    analyzed = future.result()
                    analyzed_resources.append(analyzed)
                except Exception as e:
                    result = future_to_result[future]
                    logging.error(f"Failed to analyze {result.url}: {str(e)}")
        
        analyzed_resources.sort(key=lambda x: x.overall_score, reverse=True)
        
        return analyzed_resources
    
    def _analyze_with_content(self, search_result: SearchResult, 
                             subject: str, extractor: ContentExtractor) -> AnalyzedResource:
        """Helper method to extract content and analyze."""
        extracted = extractor.extract(search_result.url)
        content = extracted['content'] if extracted else None
        
        return self.analyze_resource(search_result, content, subject)


# ============================================================================
# Recommendation Generation
# ============================================================================

class RecommendationGenerator:
    """Generates structured recommendations from analyzed resources."""
    
    def __init__(self, llm_provider: LLMProvider):
        """Initialize the recommendation generator."""
        self.llm = llm_provider
    
    def generate_recommendations(self, analyzed_resources: List[AnalyzedResource],
                                subject: str, max_recommendations: int = 10) -> str:
        """Generate a comprehensive recommendation report."""
        top_resources = analyzed_resources[:max_recommendations]
        grouped = self._group_by_type(top_resources)
        
        intro = self._generate_introduction(subject, len(top_resources))
        
        sections = []
        type_order = [
            ResourceType.ACADEMIC_PAPER,
            ResourceType.BOOK,
            ResourceType.COURSE,
            ResourceType.TUTORIAL,
            ResourceType.DOCUMENTATION,
            ResourceType.ARTICLE,
            ResourceType.VIDEO,
            ResourceType.UNKNOWN
        ]
        
        for resource_type in type_order:
            if resource_type in grouped and grouped[resource_type]:
                section = self._generate_type_section(
                    resource_type, 
                    grouped[resource_type],
                    subject
                )
                sections.append(section)
        
        report = intro + '\n\n'
        report += '\n\n'.join(sections)
        report += '\n\n' + self._generate_conclusion(subject)
        
        return report
    
    def _group_by_type(self, resources: List[AnalyzedResource]) -> Dict[ResourceType, List[AnalyzedResource]]:
        """Group resources by their type."""
        grouped = {}
        for resource in resources:
            if resource.resource_type not in grouped:
                grouped[resource.resource_type] = []
            grouped[resource.resource_type].append(resource)
        return grouped
    
    def _generate_introduction(self, subject: str, num_resources: int) -> str:
        """Generate an introduction for the recommendations."""
        prompt = f"""Write a brief introduction (2-3 sentences) for a curated list of {num_resources} learning resources about "{subject}". 
The introduction should welcome the user and explain that these resources have been carefully selected and analyzed for relevance and quality."""
        
        messages = [
            Message(role='system', content='You are a helpful research assistant.'),
            Message(role='user', content=prompt)
        ]
        
        return self.llm.generate(messages, max_tokens=200, temperature=0.7)
    
    def _generate_type_section(self, resource_type: ResourceType, 
                              resources: List[AnalyzedResource],
                              subject: str) -> str:
        """Generate a section for a specific resource type."""
        type_names = {
            ResourceType.ACADEMIC_PAPER: 'Academic Papers and Research',
            ResourceType.BOOK: 'Books',
            ResourceType.COURSE: 'Online Courses',
            ResourceType.TUTORIAL: 'Tutorials and Guides',
            ResourceType.DOCUMENTATION: 'Documentation',
            ResourceType.ARTICLE: 'Articles and Blog Posts',
            ResourceType.VIDEO: 'Video Resources',
            ResourceType.UNKNOWN: 'Additional Resources'
        }
        
        section = f"=== {type_names.get(resource_type, 'Resources')} ===\n\n"
        
        for i, resource in enumerate(resources, 1):
            section += f"{i}. {resource.title}\n"
            section += f"   URL: {resource.url}\n"
            section += f"   Relevance: {resource.relevance_score:.2f} | Quality: {resource.quality_score:.2f}\n"
            section += f"   {resource.reasoning}\n\n"
        
        return section
    
    def _generate_conclusion(self, subject: str) -> str:
        """Generate a conclusion for the recommendations."""
        prompt = f"""Write a brief conclusion (2-3 sentences) for a curated list of learning resources about "{subject}".
Encourage the user to explore these resources and mention that they can request more specific recommendations if needed."""
        
        messages = [
            Message(role='system', content='You are a helpful research assistant.'),
            Message(role='user', content=prompt)
        ]
        
        return self.llm.generate(messages, max_tokens=200, temperature=0.7)


# ============================================================================
# Metrics Collection
# ============================================================================

class MetricsCollector:
    """Collects and persists metrics for research operations."""
    
    def __init__(self, metrics_file: str = 'metrics.jsonl'):
        """Initialize metrics collector."""
        self.metrics_file = metrics_file
    
    def record(self, metrics: ResearchMetrics):
        """Record metrics to file."""
        with open(self.metrics_file, 'a') as f:
            f.write(json.dumps(metrics.to_dict()) + '\n')
    
    def get_statistics(self) -> Dict:
        """Calculate aggregate statistics from recorded metrics."""
        if not os.path.exists(self.metrics_file):
            return {}
        
        total_operations = 0
        total_duration = 0.0
        total_results = 0
        total_errors = 0
        
        with open(self.metrics_file, 'r') as f:
            for line in f:
                try:
                    data = json.loads(line)
                    total_operations += 1
                    total_duration += data.get('duration_seconds', 0)
                    total_results += data.get('search_results_count', 0)
                    total_errors += len(data.get('errors', []))
                except json.JSONDecodeError:
                    continue
        
        if total_operations == 0:
            return {}
        
        return {
            'total_operations': total_operations,
            'average_duration_seconds': total_duration / total_operations,
            'average_results_per_operation': total_results / total_operations,
            'total_errors': total_errors,
            'error_rate': total_errors / total_operations
        }


# ============================================================================
# Main Research Agent
# ============================================================================

class ResearchAgent:
    """Main agent that orchestrates the research and recommendation process."""
    
    def __init__(self, llm_provider: LLMProvider, 
                 search_provider: SearchProvider,
                 content_extractor: ContentExtractor,
                 content_analyzer: ContentAnalyzer,
                 recommendation_generator: RecommendationGenerator,
                 metrics_collector: Optional[MetricsCollector] = None):
        """Initialize the research agent."""
        self.llm = llm_provider
        self.search = search_provider
        self.extractor = content_extractor
        self.analyzer = content_analyzer
        self.recommender = recommendation_generator
        self.metrics = metrics_collector
    
    def research(self, subject: str, num_results: int = 20,
                max_recommendations: int = 10) -> str:
        """Execute the complete research workflow."""
        operation_metrics = ResearchMetrics(
            subject=subject,
            start_time=datetime.now()
        ) if self.metrics else None
        
        try:
            logging.info(f"Starting research for subject: {subject}")
            
            enhanced_query = self._enhance_query(subject)
            logging.info(f"Enhanced query: {enhanced_query}")
            
            logging.info(f"Searching for {num_results} resources...")
            search_results = self.search.search(enhanced_query, num_results)
            logging.info(f"Found {len(search_results)} search results")
            
            if operation_metrics:
                operation_metrics.search_results_count = len(search_results)
            
            if not search_results:
                return f"No resources found for subject: {subject}"
            
            logging.info("Analyzing resources...")
            analyzed_resources = self.analyzer.batch_analyze(
                search_results, 
                subject, 
                self.extractor
            )
            logging.info(f"Analyzed {len(analyzed_resources)} resources")
            
            if operation_metrics:
                operation_metrics.analyzed_resources_count = len(analyzed_resources)
            
            logging.info("Generating recommendations...")
            recommendations = self.recommender.generate_recommendations(
                analyzed_resources,
                subject,
                max_recommendations
            )
            
            if operation_metrics:
                operation_metrics.recommendations_count = max_recommendations
                operation_metrics.end_time = datetime.now()
            
            logging.info("Research complete")
            return recommendations
        
        except Exception as e:
            if operation_metrics:
                operation_metrics.errors.append(str(e))
                operation_metrics.end_time = datetime.now()
            raise
        
        finally:
            if operation_metrics and self.metrics:
                self.metrics.record(operation_metrics)
    
    def _enhance_query(self, subject: str) -> str:
        """Use LLM to enhance the search query for better results."""
        prompt = f"""Given the subject "{subject}", generate an optimized search query that will find high-quality learning resources including books, academic papers, tutorials, and articles.

The query should:
- Include relevant technical terms and synonyms
- Be concise but comprehensive
- Focus on authoritative and educational content

Provide only the search query, nothing else."""
        
        messages = [
            Message(role='system', content='You are an expert at formulating effective search queries.'),
            Message(role='user', content=prompt)
        ]
        
        enhanced = self.llm.generate(messages, max_tokens=100, temperature=0.5)
        return enhanced.strip()
    
    def interactive_research(self):
        """Run an interactive research session."""
        print("Research Agent - Interactive Mode")
        print("=" * 50)
        print("Enter a subject area to research, or 'quit' to exit.\n")
        
        while True:
            subject = input("Subject: ").strip()
            
            if subject.lower() in ['quit', 'exit', 'q']:
                print("Goodbye!")
                break
            
            if not subject:
                print("Please enter a valid subject.\n")
                continue
            
            try:
                print("\nResearching... This may take a minute.\n")
                recommendations = self.research(subject)
                print(recommendations)
                print("\n" + "=" * 50 + "\n")
            
            except Exception as e:
                print(f"Error during research: {str(e)}\n")
                logging.error(f"Research failed: {str(e)}", exc_info=True)


# ============================================================================
# Agent Factory
# ============================================================================

def create_agent_from_config(config: AgentConfig) -> ResearchAgent:
    """Factory function to create a fully configured ResearchAgent."""
    logging.basicConfig(
        level=getattr(logging, config.log_level),
        format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
    )
    
    if config.llm.backend == 'local':
        llm_provider = create_llm_provider(
            LLMBackend.LOCAL,
            model_path=config.llm.model_path,
            n_gpu_layers=config.llm.n_gpu_layers,
            n_ctx=config.llm.n_ctx
        )
    else:
        llm_provider = create_llm_provider(
            LLMBackend.REMOTE,
            api_key=config.llm.api_key,
            api_base=config.llm.api_base,
            model_name=config.llm.model_name
        )
    
    search_providers = []
    for provider_name in config.search.providers:
        if provider_name.lower() == 'duckduckgo':
            search_providers.append(DuckDuckGoSearchProvider(timeout=config.search.timeout))
        elif provider_name.lower() == 'bing' and config.search.bing_api_key:
            search_providers.append(BingSearchProvider(
                api_key=config.search.bing_api_key,
                timeout=config.search.timeout
            ))
        elif provider_name.lower() == 'scholar':
            search_providers.append(ScholarSearchProvider(timeout=config.search.timeout))
    
    base_search_provider = AggregatedSearchProvider(search_providers)
    
    if config.cache_enabled:
        search_cache = SearchResultCache(
            cache_dir=config.cache_dir,
            ttl_hours=config.cache_ttl_hours
        )
        search_provider = CachedSearchProvider(base_search_provider, search_cache)
        content_extractor = CachedContentExtractor(
            cache_dir=config.cache_dir,
            timeout=config.content_timeout,
            max_content_length=config.max_content_length
        )
    else:
        search_provider = base_search_provider
        content_extractor = ContentExtractor(
            timeout=config.content_timeout,
            max_content_length=config.max_content_length
        )
    
    content_analyzer = ContentAnalyzer(llm_provider)
    recommendation_generator = RecommendationGenerator(llm_provider)
    
    metrics_collector = None
    if config.metrics_enabled:
        metrics_collector = MetricsCollector(metrics_file=config.metrics_file)
    
    return ResearchAgent(
        llm_provider=llm_provider,
        search_provider=search_provider,
        content_extractor=content_extractor,
        content_analyzer=content_analyzer,
        recommendation_generator=recommendation_generator,
        metrics_collector=metrics_collector
    )


# ============================================================================
# Command-Line Interface
# ============================================================================

def main():
    """Main entry point for the research agent."""
    parser = argparse.ArgumentParser(
        description='Research Agent - LLM-based resource discovery system'
    )
    
    parser.add_argument(
        '--config',
        type=str,
        help='Path to configuration JSON file'
    )
    
    parser.add_argument(
        '--subject',
        type=str,
        help='Subject area to research (for single query mode)'
    )
    
    parser.add_argument(
        '--interactive',
        action='store_true',
        help='Run in interactive mode'
    )
    
    parser.add_argument(
        '--stats',
        action='store_true',
        help='Display statistics from previous operations'
    )
    
    args = parser.parse_args()
    
    try:
        if args.config:
            config = AgentConfig.from_file(args.config)
        else:
            config = AgentConfig.from_env()
        
        agent = create_agent_from_config(config)
        
        if args.stats and config.metrics_enabled:
            metrics = MetricsCollector(config.metrics_file)
            stats = metrics.get_statistics()
            if stats:
                print("Research Agent Statistics")
                print("=" * 50)
                print(f"Total operations: {stats['total_operations']}")
                print(f"Average duration: {stats['average_duration_seconds']:.2f} seconds")
                print(f"Average results per operation: {stats['average_results_per_operation']:.1f}")
                print(f"Total errors: {stats['total_errors']}")
                print(f"Error rate: {stats['error_rate']:.2%}")
            else:
                print("No statistics available yet.")
            return
        
        if args.subject:
            recommendations = agent.research(args.subject)
            print(recommendations)
        elif args.interactive:
            agent.interactive_research()
        else:
            parser.print_help()
    
    except Exception as e:
        logging.error(f"Fatal error: {str(e)}", exc_info=True)
        print(f"Error: {str(e)}", file=sys.stderr)
        sys.exit(1)


if __name__ == '__main__':
    main()

This complete implementation provides a production-ready research agent system that can be deployed and used immediately. The code includes all necessary error handling, caching, metrics collection, and configuration management. It supports both local and remote LLM deployments with automatic GPU detection across NVIDIA CUDA, AMD ROCm, Intel, and Apple Metal architectures. The system can search multiple providers concurrently, analyze content in parallel, and generate comprehensive recommendations for any subject area a user specifies.