Hitchhiker's Guide to AI, Software Architecture, and Everything Else: BUILDING AN LLM-BASED ALGORITHM, DESIGN TACTICS AND DESIGN PATTERNS FINDER





INTENSIVE ANALYSIS AND PLANNING
Before diving into the implementation details, let me think deeply about this problem. We are building a sophisticated system that goes beyond simple keyword matching or database lookups. The core challenge is creating an intelligent assistant that understands the nuanced context of a software development problem and recommends appropriate algorithms, design patterns, and architectural tactics based on that context.
The system must analyze natural language descriptions of problems, extract relevant contextual information such as performance requirements, data characteristics, scalability needs, and existing architectural constraints. It then needs to reason about which solutions fit best given these constraints. For example, when a developer needs to handle concurrent access to shared resources, the system should not just list all concurrency patterns but should recommend specific solutions like reader-writer locks for read-heavy workloads, optimistic locking for low contention scenarios, or actor models for message-passing architectures.
The architecture must support both local and remote LLM deployments across diverse GPU platforms including Nvidia CUDA, AMD ROCm, Intel GPUs, and Apple Metal Performance Shaders. This requires abstraction layers that hide hardware-specific details while maximizing performance on each platform.
My plan consists of several major components. First, I will design the overall system architecture showing how components interact. Second, I will detail the LLM integration layer that abstracts away hardware and deployment differences. Third, I will explain the knowledge base structure containing algorithms, patterns, and tactics with their contextual metadata. Fourth, I will describe the prompt engineering and context extraction mechanisms. Fifth, I will show the analysis and recommendation engine. Finally, I will provide a complete working implementation as a running example.
INTRODUCTION TO THE PROBLEM DOMAIN
Software developers constantly face decisions about which algorithms to implement, which design patterns to apply, and which architectural tactics to employ. These decisions significantly impact system performance, maintainability, scalability, and correctness. However, the sheer number of available options combined with their context-dependent trade-offs makes choosing the right solution challenging.
Consider a developer building a distributed caching system. They need to decide on cache eviction algorithms, consistency protocols, data partitioning strategies, fault tolerance mechanisms, and monitoring approaches. Each decision involves trade-offs between performance, consistency, availability, and complexity. The developer might consider LRU versus LFU eviction, strong versus eventual consistency, hash-based versus range-based partitioning, and active versus passive replication.
Traditional approaches to finding algorithms and patterns include searching documentation, consulting textbooks, or asking colleagues. These methods have limitations. Documentation searches return results based on keyword matching without understanding context. Textbooks provide comprehensive coverage but require significant time to navigate. Colleagues may not always be available or may lack expertise in specific domains.
An LLM-based finder addresses these limitations by combining the breadth of knowledge encoded in large language models with context-aware reasoning. The system accepts natural language problem descriptions, understands the implicit and explicit constraints, and recommends solutions that fit the specific situation. This goes far beyond simple retrieval because the LLM can reason about trade-offs, combine multiple constraints, and explain why certain solutions work better than others in the given context.
SYSTEM ARCHITECTURE OVERVIEW
The system architecture consists of several interconnected layers, each with specific responsibilities. At the foundation lies the hardware abstraction layer that provides uniform interfaces to different GPU architectures and compute backends. Above this sits the LLM integration layer that manages model loading, inference, and communication regardless of whether models run locally or remotely.
The knowledge base layer contains structured information about algorithms, design patterns, and architectural tactics. This is not merely a static database but includes rich metadata about applicability contexts, performance characteristics, implementation complexity, and relationships between different solutions. The knowledge base works in conjunction with the LLM's inherent knowledge rather than replacing it.
The prompt engineering layer transforms user queries into effective prompts that guide the LLM toward producing high-quality recommendations. This layer incorporates context extraction, constraint identification, and structured output formatting instructions. It ensures that the LLM receives all necessary information to make informed recommendations.
The analysis and recommendation engine orchestrates the entire process. It receives user input, extracts context, queries the knowledge base for relevant candidates, constructs prompts, invokes the LLM, parses responses, and formats final recommendations. This engine implements the core business logic that makes the system intelligent and useful.
Finally, the presentation layer formats recommendations into comprehensive reports. These reports include not just lists of algorithms or patterns but detailed explanations of why each recommendation fits the context, what trade-offs exist, and how to implement the solutions.
The system processes a user query through several stages. First, the context extractor analyzes the query to identify key requirements such as performance needs, scalability constraints, data characteristics, and quality attributes. Second, the knowledge base filter identifies candidate solutions that match the extracted context. Third, the prompt constructor builds a detailed prompt that includes the user query, extracted context, candidate solutions, and instructions for analysis. Fourth, the LLM generates recommendations with explanations. Fifth, the report formatter structures the output into a comprehensive document.
HARDWARE ABSTRACTION AND GPU SUPPORT
Supporting multiple GPU architectures requires careful abstraction. Each vendor provides different APIs, driver stacks, and optimization strategies. Nvidia CUDA dominates the deep learning landscape but AMD ROCm, Intel oneAPI, and Apple Metal Performance Shaders each have their strengths and user bases.
The abstraction layer detects available hardware at runtime and selects appropriate backends. For Nvidia GPUs, it uses CUDA and cuDNN libraries. For AMD GPUs, it leverages ROCm and MIOpen. For Intel GPUs, it employs oneAPI and oneDNN. For Apple Silicon, it uses Metal Performance Shaders and the Accelerate framework.
The key insight is that modern deep learning frameworks like PyTorch already provide much of this abstraction. PyTorch supports CUDA, ROCm, and MPS backends through its device abstraction. By building on PyTorch, we inherit cross-platform GPU support while maintaining the flexibility to optimize for specific hardware when needed.
Here is a code snippet showing the hardware detection and backend selection logic:
import torch
import platform
import subprocess

class HardwareDetector:
    def __init__(self):
        self.available_devices = []
        self.preferred_device = None
        self._detect_hardware()
    
    def _detect_hardware(self):
        # Check for CUDA (Nvidia)
        if torch.cuda.is_available():
            cuda_device = {
                'type': 'cuda',
                'name': torch.cuda.get_device_name(0),
                'memory': torch.cuda.get_device_properties(0).total_memory,
                'compute_capability': torch.cuda.get_device_capability(0),
                'device_index': 0
            }
            self.available_devices.append(cuda_device)
        
        # Check for ROCm (AMD)
        if hasattr(torch.version, 'hip') and torch.version.hip is not None:
            if torch.cuda.is_available():
                rocm_device = {
                    'type': 'rocm',
                    'name': torch.cuda.get_device_name(0),
                    'memory': torch.cuda.get_device_properties(0).total_memory,
                    'device_index': 0
                }
                self.available_devices.append(rocm_device)
        
        # Check for MPS (Apple Silicon)
        if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
            mps_device = {
                'type': 'mps',
                'name': 'Apple Metal Performance Shaders',
                'memory': self._get_mps_memory(),
                'device_index': 0
            }
            self.available_devices.append(mps_device)
        
        # Check for Intel GPU (oneAPI)
        if hasattr(torch, 'xpu') and torch.xpu.is_available():
            xpu_device = {
                'type': 'xpu',
                'name': torch.xpu.get_device_name(0),
                'memory': torch.xpu.get_device_properties(0).total_memory,
                'device_index': 0
            }
            self.available_devices.append(xpu_device)
        
        # Fallback to CPU
        cpu_device = {
            'type': 'cpu',
            'name': platform.processor(),
            'memory': self._get_cpu_memory(),
            'device_index': 0
        }
        self.available_devices.append(cpu_device)
        
        self.preferred_device = self._select_preferred_device()
    
    def _get_mps_memory(self):
        try:
            result = subprocess.run(['sysctl', 'hw.memsize'], 
                                  capture_output=True, text=True)
            if result.returncode == 0:
                memory_bytes = int(result.stdout.split(':')[1].strip())
                return memory_bytes
        except Exception:
            pass
        return 8 * 1024 * 1024 * 1024
    
    def _get_cpu_memory(self):
        import psutil
        return psutil.virtual_memory().total
    
    def _select_preferred_device(self):
        priority_order = ['cuda', 'rocm', 'xpu', 'mps', 'cpu']
        for device_type in priority_order:
            for device in self.available_devices:
                if device['type'] == device_type:
                    return device
        return self.available_devices[-1]
    
    def get_torch_device(self):
        device_type = self.preferred_device['type']
        if device_type == 'cuda' or device_type == 'rocm':
            return torch.device('cuda:0')
        elif device_type == 'mps':
            return torch.device('mps')
        elif device_type == 'xpu':
            return torch.device('xpu:0')
        else:
            return torch.device('cpu')
This hardware detector examines the runtime environment to identify available compute devices. It checks for Nvidia CUDA support first, then AMD ROCm, Intel XPU, and Apple MPS. Each check uses PyTorch's built-in capabilities to query device availability and properties. The detector maintains a list of all available devices and selects the most capable one based on a priority ordering that favors GPUs over CPUs.
The detector also gathers metadata about each device including its name, available memory, and compute capabilities. This information helps the system make intelligent decisions about model selection and batch sizing. For instance, a device with limited memory might require smaller models or reduced batch sizes. The system can also warn users if their chosen model is too large for available hardware and suggest alternatives.
LLM INTEGRATION LAYER
The LLM integration layer provides a unified interface for working with language models regardless of their deployment location or underlying framework. This layer supports both local models running on the detected hardware and remote models accessed via APIs.
For local models, the integration layer uses the Transformers library from Hugging Face combined with PyTorch. This combination provides access to thousands of pre-trained models while leveraging the hardware abstraction we built earlier. The layer handles model loading, tokenization, inference, and memory management.
For remote models, the integration layer supports various API protocols including OpenAI's API format, Anthropic's Claude API, and custom endpoints. It manages authentication, request formatting, rate limiting, and error handling. The abstraction ensures that higher-level components can work with any LLM without knowing implementation details.
Here is the core LLM integration interface:
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Union
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
import requests
import json
import time

class LLMInterface(ABC):
    @abstractmethod
    def generate(self, prompt: str, max_tokens: int = 2048, 
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        pass
    
    @abstractmethod
    def get_model_info(self) -> Dict[str, any]:
        pass

class LocalLLM(LLMInterface):
    def __init__(self, model_name: str, hardware_detector: HardwareDetector):
        self.model_name = model_name
        self.device = hardware_detector.get_torch_device()
        self.hardware_info = hardware_detector.preferred_device
        
        print(f"Loading model {model_name} on device {self.device}")
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        load_kwargs = {
            'torch_dtype': torch.float16 if self.device.type != 'cpu' else torch.float32,
            'low_cpu_mem_usage': True
        }
        
        if self.device.type == 'cuda':
            load_kwargs['device_map'] = 'auto'
        
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name, 
            **load_kwargs
        )
        
        if 'device_map' not in load_kwargs:
            self.model = self.model.to(self.device)
        
        self.model.eval()
        
        if self.device.type == 'cuda':
            try:
                self.model = torch.compile(self.model, mode='reduce-overhead')
            except Exception:
                pass
    
    def generate(self, prompt: str, max_tokens: int = 2048,
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        inputs = self.tokenizer(prompt, return_tensors='pt', 
                              truncation=True, max_length=4096)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_tokens,
                temperature=temperature,
                top_p=top_p,
                do_sample=True,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        if response.startswith(prompt):
            response = response[len(prompt):].strip()
        
        return response
    
    def get_model_info(self) -> Dict[str, any]:
        return {
            'model_name': self.model_name,
            'device': str(self.device),
            'hardware': self.hardware_info,
            'type': 'local'
        }

class RemoteLLM(LLMInterface):
    def __init__(self, api_endpoint: str, api_key: str, 
                model_name: str, provider: str = 'openai'):
        self.api_endpoint = api_endpoint
        self.api_key = api_key
        self.model_name = model_name
        self.provider = provider
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        })
    
    def generate(self, prompt: str, max_tokens: int = 2048,
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        if self.provider == 'openai':
            return self._generate_openai(prompt, max_tokens, temperature, top_p)
        elif self.provider == 'anthropic':
            return self._generate_anthropic(prompt, max_tokens, temperature, top_p)
        else:
            return self._generate_generic(prompt, max_tokens, temperature, top_p)
    
    def _generate_openai(self, prompt: str, max_tokens: int,
                        temperature: float, top_p: float) -> str:
        payload = {
            'model': self.model_name,
            'messages': [{'role': 'user', 'content': prompt}],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        max_retries = 3
        for attempt in range(max_retries):
            try:
                response = self.session.post(
                    f'{self.api_endpoint}/chat/completions',
                    json=payload,
                    timeout=120
                )
                response.raise_for_status()
                result = response.json()
                return result['choices'][0]['message']['content']
            except requests.exceptions.RequestException as e:
                if attempt == max_retries - 1:
                    raise Exception(f"API request failed: {e}")
                time.sleep(2 ** attempt)
    
    def _generate_anthropic(self, prompt: str, max_tokens: int,
                           temperature: float, top_p: float) -> str:
        payload = {
            'model': self.model_name,
            'messages': [{'role': 'user', 'content': prompt}],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        headers = {
            'x-api-key': self.api_key,
            'anthropic-version': '2023-06-01',
            'Content-Type': 'application/json'
        }
        
        response = requests.post(
            f'{self.api_endpoint}/messages',
            headers=headers,
            json=payload,
            timeout=120
        )
        response.raise_for_status()
        result = response.json()
        return result['content'][0]['text']
    
    def _generate_generic(self, prompt: str, max_tokens: int,
                         temperature: float, top_p: float) -> str:
        payload = {
            'prompt': prompt,
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        response = self.session.post(self.api_endpoint, json=payload, timeout=120)
        response.raise_for_status()
        result = response.json()
        
        if 'text' in result:
            return result['text']
        elif 'response' in result:
            return result['response']
        elif 'output' in result:
            return result['output']
        else:
            raise ValueError(f"Unknown response format: {result}")
    
    def get_model_info(self) -> Dict[str, any]:
        return {
            'model_name': self.model_name,
            'provider': self.provider,
            'endpoint': self.api_endpoint,
            'type': 'remote'
        }
This integration layer defines an abstract interface that all LLM implementations must satisfy. The LocalLLM class implements this interface for models running on local hardware. It uses the hardware detector to select the appropriate device, loads the model with optimizations specific to that device, and provides a generate method that handles tokenization, inference, and decoding.
The RemoteLLM class implements the same interface for API-based models. It supports multiple providers including OpenAI and Anthropic, with a generic fallback for custom endpoints. The class handles authentication, request formatting, error handling, and retries with exponential backoff. This ensures robustness when dealing with network issues or rate limiting.
Both implementations expose a get_model_info method that returns metadata about the model and its deployment. This allows higher-level components to make informed decisions about how to use the model, such as adjusting prompt lengths based on model context windows or selecting different generation parameters based on model capabilities.
KNOWLEDGE BASE STRUCTURE
While LLMs possess broad knowledge about algorithms, design patterns, and architectural tactics, a structured knowledge base enhances the system's ability to provide contextually relevant recommendations. The knowledge base does not replace the LLM's knowledge but complements it by providing structured metadata, relationships, and context-specific information.
The knowledge base organizes information into three main categories: algorithms, design patterns, and architectural tactics. Each entry includes a detailed description, applicability contexts, performance characteristics, implementation complexity, related solutions, and example use cases.
For algorithms, the knowledge base captures computational complexity, resource requirements, applicability conditions, and trade-offs. This includes not just sorting algorithms but graph algorithms, string matching algorithms, numerical algorithms, cryptographic algorithms, compression algorithms, and many others. For design patterns, it records the problem the pattern solves, the structure of the solution, consequences of using the pattern, and relationships to other patterns across creational, structural, and behavioral categories. For architectural tactics, it documents quality attributes the tactic addresses, trade-offs involved, and architectural contexts where the tactic applies.
Here is the knowledge base schema and implementation:
from dataclasses import dataclass, field
from typing import List, Dict, Set, Optional
from enum import Enum
import json

class DifficultyLevel(Enum):
    TRIVIAL = 1
    EASY = 2
    MODERATE = 3
    CHALLENGING = 4
    EXPERT = 5

class QualityAttribute(Enum):
    PERFORMANCE = "performance"
    SCALABILITY = "scalability"
    AVAILABILITY = "availability"
    SECURITY = "security"
    MAINTAINABILITY = "maintainability"
    TESTABILITY = "testability"
    USABILITY = "usability"
    MODIFIABILITY = "modifiability"

@dataclass
class ApplicabilityContext:
    problem_domain: List[str]
    scale_characteristics: List[str]
    performance_requirements: List[str]
    resource_constraints: List[str]
    quality_attributes: List[QualityAttribute]
    preferred_scenarios: List[str]
    avoid_scenarios: List[str]
    environmental_factors: List[str]

@dataclass
class Algorithm:
    name: str
    category: str
    description: str
    complexity_analysis: str
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_algorithms: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    trade_offs: List[str]
    
@dataclass
class DesignPattern:
    name: str
    category: str
    intent: str
    problem: str
    solution: str
    structure: str
    participants: List[str]
    consequences: List[str]
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_patterns: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    known_uses: List[str]

@dataclass
class ArchitecturalTactic:
    name: str
    quality_attributes: List[QualityAttribute]
    description: str
    mechanism: str
    trade_offs: List[str]
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_tactics: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    architectural_impact: str

class KnowledgeBase:
    def __init__(self):
        self.algorithms: Dict[str, Algorithm] = {}
        self.design_patterns: Dict[str, DesignPattern] = {}
        self.architectural_tactics: Dict[str, ArchitecturalTactic] = {}
        self._initialize_knowledge()
    
    def _initialize_knowledge(self):
        self._add_algorithms()
        self._add_design_patterns()
        self._add_architectural_tactics()
    
    def _add_algorithms(self):
        dijkstra = Algorithm(
            name="Dijkstra's Algorithm",
            category="Graph Algorithms",
            description="Finds shortest paths from a source vertex to all other vertices in a weighted graph with non-negative edge weights using a greedy approach with a priority queue.",
            complexity_analysis="Time complexity O((V + E) log V) with binary heap, O(V^2) with array. Space complexity O(V) for distance array and priority queue.",
            applicability=ApplicabilityContext(
                problem_domain=["Routing", "Network optimization", "Path finding", "Resource allocation"],
                scale_characteristics=["Small to medium graphs", "Sparse or dense graphs"],
                performance_requirements=["Optimal shortest paths required", "Single-source paths needed"],
                resource_constraints=["Memory for priority queue acceptable"],
                quality_attributes=[QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Finding shortest paths in road networks",
                    "Network routing protocols",
                    "GPS navigation systems",
                    "When all edge weights are non-negative",
                    "When single-source shortest paths are needed"
                ],
                avoid_scenarios=[
                    "Graphs with negative edge weights",
                    "All-pairs shortest paths needed",
                    "When approximate solutions suffice",
                    "Extremely large graphs where memory is constrained"
                ],
                environmental_factors=["Static or slowly changing graphs"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_algorithms=["Bellman-Ford", "A* Search", "Floyd-Warshall"],
            tags={"graph", "shortest-path", "greedy", "optimization"},
            example_use_cases=[
                "GPS navigation finding shortest route between locations",
                "Network routers determining optimal packet paths",
                "Game AI pathfinding on weighted terrain"
            ],
            trade_offs=[
                "Cannot handle negative edge weights unlike Bellman-Ford",
                "More complex than BFS but handles weighted graphs",
                "Single-source only, use Floyd-Warshall for all-pairs"
            ]
        )
        self.algorithms["dijkstra"] = dijkstra
        
        bloom_filter = Algorithm(
            name="Bloom Filter",
            category="Probabilistic Data Structures",
            description="A space-efficient probabilistic data structure for testing set membership with possible false positives but no false negatives.",
            complexity_analysis="Time complexity O(k) for insertion and lookup where k is number of hash functions. Space complexity O(m) where m is bit array size, much smaller than storing actual elements.",
            applicability=ApplicabilityContext(
                problem_domain=["Caching", "Databases", "Network systems", "Spell checking"],
                scale_characteristics=["Large datasets", "Memory-constrained environments"],
                performance_requirements=["Fast membership testing", "Low memory footprint"],
                resource_constraints=["Limited memory", "High throughput needed"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Testing if element might be in large set",
                    "Reducing expensive disk or network lookups",
                    "Web crawlers avoiding duplicate URLs",
                    "Database query optimization",
                    "When false positives are acceptable"
                ],
                avoid_scenarios=[
                    "When false positives are unacceptable",
                    "When deletion is required",
                    "When exact membership must be determined",
                    "Small datasets where hash table fits in memory"
                ],
                environmental_factors=["Read-heavy workloads", "Append-only scenarios"]
            ),
            implementation_difficulty=DifficultyLevel.EASY,
            related_algorithms=["Counting Bloom Filter", "Cuckoo Filter", "Hash Table"],
            tags={"probabilistic", "hashing", "space-efficient", "membership-testing"},
            example_use_cases=[
                "Web browsers checking malicious URLs against blacklist",
                "Databases avoiding expensive disk reads for non-existent keys",
                "Distributed systems reducing network calls"
            ],
            trade_offs=[
                "False positives possible, must verify with authoritative source",
                "Cannot delete elements from standard Bloom filter",
                "Extremely space-efficient compared to hash tables"
            ]
        )
        self.algorithms["bloom_filter"] = bloom_filter
        
        raft = Algorithm(
            name="Raft Consensus Algorithm",
            category="Distributed Systems",
            description="A consensus algorithm for managing replicated logs in distributed systems, designed to be more understandable than Paxos while providing equivalent guarantees.",
            complexity_analysis="Message complexity O(n) per operation in normal case where n is number of nodes. Handles network partitions and node failures with leader election.",
            applicability=ApplicabilityContext(
                problem_domain=["Distributed databases", "Configuration management", "Coordination services"],
                scale_characteristics=["Small to medium clusters", "Typically 3-7 nodes"],
                performance_requirements=["Strong consistency", "Fault tolerance", "Availability during partitions"],
                resource_constraints=["Network bandwidth for log replication"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Building distributed databases requiring strong consistency",
                    "Configuration stores like etcd",
                    "Coordination services",
                    "When understandability matters for implementation",
                    "When you need proven consensus algorithm"
                ],
                avoid_scenarios=[
                    "Single-node systems",
                    "When eventual consistency suffices",
                    "Very large clusters",
                    "When network partitions are extremely rare"
                ],
                environmental_factors=["Unreliable networks", "Node failures expected"]
            ),
            implementation_difficulty=DifficultyLevel.EXPERT,
            related_algorithms=["Paxos", "Multi-Paxos", "Viewstamped Replication"],
            tags={"distributed", "consensus", "replication", "fault-tolerance"},
            example_use_cases=[
                "etcd distributed configuration store",
                "Distributed databases like CockroachDB",
                "Kubernetes control plane coordination"
            ],
            trade_offs=[
                "Simpler to understand than Paxos",
                "Requires majority for progress, unavailable if majority fails",
                "Higher latency than eventual consistency approaches"
            ]
        )
        self.algorithms["raft"] = raft
    
    def _add_design_patterns(self):
        circuit_breaker = DesignPattern(
            name="Circuit Breaker",
            category="Resilience Patterns",
            intent="Prevent cascading failures in distributed systems by detecting failures and preventing calls to failing services.",
            problem="When a remote service fails, clients continue making requests that are doomed to fail, wasting resources and potentially causing cascading failures throughout the system.",
            solution="Wrap remote calls in a circuit breaker object that monitors for failures. After a threshold of failures, the circuit opens and immediately returns errors without attempting the call. After a timeout, it enters half-open state to test if service recovered.",
            structure="Circuit breaker maintains state (Closed, Open, Half-Open) and failure count. Closed state allows calls through. Open state fails fast. Half-Open state allows limited test calls.",
            participants=["Circuit Breaker", "Protected Service", "Client"],
            consequences=[
                "Prevents resource exhaustion from repeated failed calls",
                "Allows failing services time to recover",
                "Provides fast failure rather than waiting for timeouts",
                "Adds complexity to service interaction",
                "Requires tuning of failure thresholds and timeouts"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Microservices", "Distributed systems", "API integration"],
                scale_characteristics=["Multiple service dependencies", "High request volumes"],
                performance_requirements=["Fast failure detection", "Resilience to cascading failures"],
                resource_constraints=["Limited connection pools", "Thread pool exhaustion risks"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Calling external services that may fail",
                    "Microservice architectures with many dependencies",
                    "Systems requiring high availability",
                    "When cascading failures are a risk",
                    "When you need graceful degradation"
                ],
                avoid_scenarios=[
                    "Single monolithic application",
                    "When all failures should be retried indefinitely",
                    "Local in-process calls",
                    "When failure detection overhead is unacceptable"
                ],
                environmental_factors=["Unreliable networks", "External service dependencies"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_patterns=["Retry", "Bulkhead", "Timeout"],
            tags={"resilience", "fault-tolerance", "distributed-systems", "microservices"},
            example_use_cases=[
                "E-commerce site protecting against payment gateway failures",
                "Microservices preventing cascading failures",
                "Mobile apps handling unreliable network conditions"
            ],
            known_uses=["Netflix Hystrix", "Resilience4j", "Polly"]
        )
        self.design_patterns["circuit_breaker"] = circuit_breaker
        
        cqrs = DesignPattern(
            name="CQRS (Command Query Responsibility Segregation)",
            category="Architectural Patterns",
            intent="Separate read and write operations into different models to optimize each independently.",
            problem="In complex domains, a single model serving both reads and writes becomes convoluted, with conflicting requirements for queries and updates.",
            solution="Split the model into separate command (write) and query (read) models. Commands modify state, queries return data. Each model can be optimized for its specific purpose.",
            structure="Command model handles writes with domain logic and validation. Query model provides optimized read views, potentially denormalized. Event bus or database replication synchronizes models.",
            participants=["Command Model", "Query Model", "Command Handlers", "Query Handlers", "Synchronization Mechanism"],
            consequences=[
                "Enables independent scaling of reads and writes",
                "Allows different data models optimized for each purpose",
                "Increases complexity with two models to maintain",
                "Introduces eventual consistency between models",
                "Simplifies complex domain models"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Complex business domains", "High-traffic applications", "Event-driven systems"],
                scale_characteristics=["Read-heavy or write-heavy workloads", "Large scale systems"],
                performance_requirements=["Independent scaling", "Optimized queries", "High throughput"],
                resource_constraints=["Different storage requirements for reads and writes"],
                quality_attributes=[QualityAttribute.SCALABILITY, QualityAttribute.PERFORMANCE, QualityAttribute.MAINTAINABILITY],
                preferred_scenarios=[
                    "Complex domains with different read and write needs",
                    "Systems with vastly different read and write loads",
                    "When you need to scale reads and writes independently",
                    "Event-sourced systems",
                    "When query optimization is critical"
                ],
                avoid_scenarios=[
                    "Simple CRUD applications",
                    "When strong consistency is required everywhere",
                    "Small applications where complexity overhead is high",
                    "When development team is small"
                ],
                environmental_factors=["Distributed systems", "Microservices architectures"]
            ),
            implementation_difficulty=DifficultyLevel.CHALLENGING,
            related_patterns=["Event Sourcing", "Domain-Driven Design", "Materialized View"],
            tags={"architectural", "scalability", "domain-driven-design", "separation-of-concerns"},
            example_use_cases=[
                "E-commerce platforms with complex product catalogs and heavy browsing",
                "Financial systems with audit requirements and complex reporting",
                "Social media platforms with different read and write patterns"
            ],
            known_uses=["Microsoft Azure", "EventStore", "Axon Framework"]
        )
        self.design_patterns["cqrs"] = cqrs
        
        saga = DesignPattern(
            name="Saga Pattern",
            category="Distributed Transactions",
            intent="Manage distributed transactions across multiple services using a sequence of local transactions with compensating actions.",
            problem="Traditional ACID transactions don't work well across microservices. Two-phase commit is complex and reduces availability.",
            solution="Break distributed transaction into sequence of local transactions. Each step publishes event triggering next step. If step fails, execute compensating transactions to undo previous steps.",
            structure="Saga coordinator orchestrates steps. Each step is a local transaction with corresponding compensating transaction. Can be choreographed (event-driven) or orchestrated (central coordinator).",
            participants=["Saga Coordinator", "Participating Services", "Compensating Transactions", "Event Bus"],
            consequences=[
                "Enables distributed transactions without locking",
                "Maintains availability during failures",
                "Requires compensating logic for rollback",
                "Eventual consistency rather than immediate",
                "More complex than local transactions"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Microservices", "Distributed systems", "E-commerce", "Booking systems"],
                scale_characteristics=["Multiple services involved in transaction", "Long-running processes"],
                performance_requirements=["High availability", "No distributed locking"],
                resource_constraints=["Cannot use two-phase commit", "Services must be autonomous"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Distributed transactions across microservices",
                    "Long-running business processes",
                    "When two-phase commit is not feasible",
                    "E-commerce order processing",
                    "Travel booking systems"
                ],
                avoid_scenarios=[
                    "Single database transactions",
                    "When ACID guarantees are absolutely required",
                    "Simple workflows without failure scenarios",
                    "When compensating logic is impossible"
                ],
                environmental_factors=["Microservices architecture", "Event-driven systems"]
            ),
            implementation_difficulty=DifficultyLevel.CHALLENGING,
            related_patterns=["Event Sourcing", "CQRS", "Process Manager"],
            tags={"distributed-transactions", "microservices", "eventual-consistency", "resilience"},
            example_use_cases=[
                "E-commerce order processing across inventory, payment, and shipping services",
                "Travel booking coordinating flights, hotels, and car rentals",
                "Banking transfers across different account systems"
            ],
            known_uses=["Uber trip management", "Amazon order processing"]
        )
        self.design_patterns["saga"] = saga
    
    def _add_architectural_tactics(self):
        load_balancing = ArchitecturalTactic(
            name="Load Balancing",
            quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY, QualityAttribute.AVAILABILITY],
            description="Distribute incoming requests across multiple server instances to optimize resource utilization, maximize throughput, minimize response time, and avoid overload.",
            mechanism="Load balancer sits between clients and servers, distributing requests using algorithms like round-robin, least connections, weighted distribution, or consistent hashing.",
            trade_offs=[
                "Adds complexity with additional component",
                "Load balancer can become single point of failure",
                "Session affinity complicates stateful applications",
                "Dramatically improves scalability and availability"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Web applications", "API services", "Microservices", "Databases"],
                scale_characteristics=["Multiple server instances", "High traffic volumes"],
                performance_requirements=["High throughput", "Low latency", "Even resource utilization"],
                resource_constraints=["Multiple servers available"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY, QualityAttribute.AVAILABILITY],
                preferred_scenarios=[
                    "Horizontal scaling of stateless services",
                    "High-availability requirements",
                    "Traffic spikes requiring elastic scaling",
                    "Geographically distributed users",
                    "When you have multiple server instances"
                ],
                avoid_scenarios=[
                    "Single server deployments",
                    "When all requests must go to same instance",
                    "Extremely low latency requirements where hop is unacceptable",
                    "Very small scale applications"
                ],
                environmental_factors=["Cloud deployments", "Container orchestration platforms"]
            ),
            implementation_difficulty=DifficultyLevel.EASY,
            related_tactics=["Caching", "Horizontal Scaling", "Service Discovery"],
            tags={"scalability", "performance", "availability", "distribution"},
            example_use_cases=[
                "Web application distributing requests across multiple servers",
                "API gateway routing to microservice instances",
                "Database read replicas for query distribution"
            ],
            architectural_impact="Requires stateless services or session management. Affects deployment architecture and monitoring strategies."
        )
        self.architectural_tactics["load_balancing"] = load_balancing
        
        caching = ArchitecturalTactic(
            name="Caching",
            quality_attributes=[QualityAttribute.PERFORMANCE],
            description="Store frequently accessed data in fast-access storage to reduce latency and load on backend systems.",
            mechanism="Cache layer stores copies of data closer to consumers. Can be client-side, server-side, or distributed. Uses eviction policies like LRU, LFU, or TTL to manage cache size.",
            trade_offs=[
                "Improves read performance significantly",
                "Reduces load on backend systems",
                "Introduces cache invalidation complexity",
                "Stale data possible with inconsistent cache",
                "Additional memory requirements"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Web applications", "Databases", "APIs", "Content delivery"],
                scale_characteristics=["Read-heavy workloads", "Repeated access patterns"],
                performance_requirements=["Low latency", "High throughput", "Reduced backend load"],
                resource_constraints=["Memory available for cache", "Network latency to backend"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Frequently accessed data that changes infrequently",
                    "Expensive computations or queries",
                    "High read-to-write ratios",
                    "Geographically distributed users",
                    "Reducing database load"
                ],
                avoid_scenarios=[
                    "Data that must always be fresh",
                    "Write-heavy workloads",
                    "When cache invalidation is extremely complex",
                    "Unique queries with no reuse"
                ],
                environmental_factors=["Distributed systems", "Content delivery networks"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_tactics=["Content Delivery Network", "Materialized Views", "Read Replicas"],
            tags={"performance", "latency", "scalability", "optimization"},
            example_use_cases=[
                "Web application caching database query results",
                "CDN caching static assets close to users",
                "API response caching for repeated requests"
            ],
            architectural_impact="Requires cache invalidation strategy. Affects consistency guarantees and data freshness."
        )
        self.architectural_tactics["caching"] = caching
        
        rate_limiting = ArchitecturalTactic(
            name="Rate Limiting",
            quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SECURITY],
            description="Control the rate of requests to prevent resource exhaustion, ensure fair usage, and protect against abuse.",
            mechanism="Track request rates per client, IP, or API key. Reject or queue requests exceeding limits. Algorithms include token bucket, leaky bucket, fixed window, or sliding window.",
            trade_offs=[
                "Protects system from overload and abuse",
                "Ensures fair resource allocation",
                "May reject legitimate requests during spikes",
                "Adds latency for rate checking",
                "Requires distributed coordination in scaled systems"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["APIs", "Web services", "Microservices", "Public endpoints"],
                scale_characteristics=["Shared resources", "Multi-tenant systems"],
                performance_requirements=["Prevent resource exhaustion", "Fair usage"],
                resource_constraints=["Limited backend capacity", "Shared infrastructure"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SECURITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Public APIs with usage tiers",
                    "Protecting backend from overload",
                    "Multi-tenant SaaS applications",
                    "Preventing abuse and DDoS attacks",
                    "Ensuring fair resource allocation"
                ],
                avoid_scenarios=[
                    "Internal services with trusted clients",
                    "When all traffic is legitimate and predictable",
                    "Single-tenant dedicated systems",
                    "When latency overhead is unacceptable"
                ],
                environmental_factors=["Public internet exposure", "Untrusted clients"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_tactics=["Throttling", "Circuit Breaker", "API Gateway"],
            tags={"availability", "security", "resource-management", "fairness"},
            example_use_cases=[
                "API limiting requests per API key",
                "Web application preventing brute force attacks",
                "SaaS platform enforcing usage tiers"
            ],
            architectural_impact="Requires distributed state management for rate counters. Affects API design and client error handling."
        )
        self.architectural_tactics["rate_limiting"] = rate_limiting
    
    def search(self, query: str, category: Optional[str] = None) -> Dict[str, List]:
        query_lower = query.lower()
        results = {
            'algorithms': [],
            'design_patterns': [],
            'architectural_tactics': []
        }
        
        if category is None or category == 'algorithms':
            for algo in self.algorithms.values():
                if self._matches_query(algo, query_lower):
                    results['algorithms'].append(algo)
        
        if category is None or category == 'design_patterns':
            for pattern in self.design_patterns.values():
                if self._matches_query(pattern, query_lower):
                    results['design_patterns'].append(pattern)
        
        if category is None or category == 'architectural_tactics':
            for tactic in self.architectural_tactics.values():
                if self._matches_query(tactic, query_lower):
                    results['architectural_tactics'].append(tactic)
        
        return results
    
    def _matches_query(self, item, query: str) -> bool:
        searchable_text = f"{item.name} {item.description}".lower()
        
        if hasattr(item, 'tags'):
            searchable_text += " " + " ".join(item.tags)
        
        if hasattr(item, 'category'):
            searchable_text += " " + item.category.lower()
        
        if hasattr(item, 'applicability'):
            app = item.applicability
            searchable_text += " " + " ".join(app.problem_domain).lower()
            searchable_text += " " + " ".join(app.preferred_scenarios).lower()
        
        return query in searchable_text
This knowledge base implementation provides a structured repository of algorithms, design patterns, and architectural tactics. Each entry contains rich metadata that enables context-aware matching. The search method allows filtering by category and performs text matching across multiple fields including name, description, tags, and applicability scenarios.
The knowledge base is extensible, allowing new entries to be added easily. In a production system, this would likely be backed by a database rather than in-memory storage, enabling persistence and concurrent access. The structure supports versioning, allowing the knowledge base to evolve over time as new patterns emerge and best practices change.
CONTEXT EXTRACTION AND ANALYSIS
The context extraction component analyzes user queries to identify implicit and explicit requirements, constraints, and preferences. This goes beyond simple keyword matching to understand the underlying problem characteristics that influence solution selection.
The extractor uses natural language processing techniques combined with domain knowledge to identify key contextual elements. It looks for mentions of scale, performance requirements, consistency needs, resource constraints, quality attributes, and environmental factors. It also identifies the problem domain and type of solution being sought.
Here is the context extraction implementation:
import re
from typing import Dict, List, Set
from dataclasses import dataclass

@dataclass
class ExtractedContext:
    problem_type: str
    domain: List[str]
    scale_indicators: List[str]
    performance_requirements: List[str]
    quality_attributes: Set[QualityAttribute]
    constraints: List[str]
    keywords: Set[str]
    raw_query: str

class ContextExtractor:
    def __init__(self):
        self.scale_patterns = {
            'large': r'\b(large|huge|massive|millions?|billions?|petabytes?|terabytes?)\b',
            'medium': r'\b(medium|thousands?|gigabytes?|moderate)\b',
            'small': r'\b(small|tiny|few|hundreds?|megabytes?)\b',
            'distributed': r'\b(distributed|cluster|multi-node|sharded)\b',
            'high-traffic': r'\b(high.traffic|many.requests|heavy.load)\b'
        }
        
        self.performance_patterns = {
            'low-latency': r'\b(low.latency|fast|quick|real.time|milliseconds?)\b',
            'high-throughput': r'\b(high.throughput|many.requests|bulk|batch)\b',
            'memory-efficient': r'\b(memory.efficient|low.memory|space.efficient)\b',
            'cpu-efficient': r'\b(cpu.efficient|computationally.efficient|low.overhead)\b'
        }
        
        self.quality_patterns = {
            QualityAttribute.PERFORMANCE: r'\b(performance|fast|efficient|speed|throughput|latency)\b',
            QualityAttribute.SCALABILITY: r'\b(scal\w+|grow|expand|elastic)\b',
            QualityAttribute.AVAILABILITY: r'\b(availab\w+|uptime|reliable|fault.tolerant)\b',
            QualityAttribute.SECURITY: r'\b(secur\w+|auth\w+|encrypt\w+|protect)\b',
            QualityAttribute.MAINTAINABILITY: r'\b(maintain\w+|readable|clean|simple)\b',
            QualityAttribute.TESTABILITY: r'\b(test\w+|verify|validate)\b'
        }
        
        self.domain_patterns = {
            'web': r'\b(web|http|rest|api|browser|frontend|backend)\b',
            'database': r'\b(database|sql|nosql|query|storage|persist)\b',
            'distributed': r'\b(distributed|microservice|cluster|node)\b',
            'real-time': r'\b(real.time|streaming|event|message)\b',
            'security': r'\b(security|crypto|auth|access.control)\b',
            'networking': r'\b(network|routing|protocol|packet)\b'
        }
    
    def extract(self, query: str) -> ExtractedContext:
        query_lower = query.lower()
        
        problem_type = self._identify_problem_type(query_lower)
        domains = self._extract_domains(query_lower)
        scale = self._extract_scale(query_lower)
        performance = self._extract_performance(query_lower)
        quality_attrs = self._extract_quality_attributes(query_lower)
        constraints = self._extract_constraints(query_lower)
        keywords = self._extract_keywords(query_lower)
        
        return ExtractedContext(
            problem_type=problem_type,
            domain=domains,
            scale_indicators=scale,
            performance_requirements=performance,
            quality_attributes=quality_attrs,
            constraints=constraints,
            keywords=keywords,
            raw_query=query
        )
    
    def _identify_problem_type(self, query: str) -> str:
        if re.search(r'\b(algorithm|compute|calculate|process)\b', query):
            return 'algorithm'
        elif re.search(r'\b(pattern|design|structure|organize)\b', query):
            return 'design_pattern'
        elif re.search(r'\b(architect|system|tactic|quality)\b', query):
            return 'architectural_tactic'
        else:
            return 'general'
    
    def _extract_domains(self, query: str) -> List[str]:
        domains = []
        for domain, pattern in self.domain_patterns.items():
            if re.search(pattern, query):
                domains.append(domain)
        return domains
    
    def _extract_scale(self, query: str) -> List[str]:
        scale = []
        for scale_type, pattern in self.scale_patterns.items():
            if re.search(pattern, query):
                scale.append(scale_type)
        return scale
    
    def _extract_performance(self, query: str) -> List[str]:
        performance = []
        for perf_type, pattern in self.performance_patterns.items():
            if re.search(pattern, query):
                performance.append(perf_type)
        return performance
    
    def _extract_quality_attributes(self, query: str) -> Set[QualityAttribute]:
        attributes = set()
        for attr, pattern in self.quality_patterns.items():
            if re.search(pattern, query):
                attributes.add(attr)
        return attributes
    
    def _extract_constraints(self, query: str) -> List[str]:
        constraints = []
        
        if re.search(r'\b(limited.memory|memory.constrained|low.memory)\b', query):
            constraints.append('limited_memory')
        if re.search(r'\b(no.locking|lock.free|non.blocking)\b', query):
            constraints.append('no_locking')
        if re.search(r'\b(eventual.consistency|eventually.consistent)\b', query):
            constraints.append('eventual_consistency')
        if re.search(r'\b(strong.consistency|strictly.consistent)\b', query):
            constraints.append('strong_consistency')
        if re.search(r'\b(stateless|no.state)\b', query):
            constraints.append('stateless')
        
        return constraints
    
    def _extract_keywords(self, query: str) -> Set[str]:
        stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for',
                     'of', 'with', 'by', 'from', 'as', 'is', 'was', 'are', 'were', 'be',
                     'been', 'being', 'have', 'has', 'had', 'do', 'does', 'did', 'will',
                     'would', 'should', 'could', 'may', 'might', 'must', 'can', 'i', 'you',
                     'he', 'she', 'it', 'we', 'they', 'what', 'which', 'who', 'when',
                     'where', 'why', 'how', 'need', 'want', 'looking', 'find'}
        
        words = re.findall(r'\b\w+\b', query)
        keywords = {w for w in words if w not in stop_words and len(w) > 2}
        
        return keywords
The context extractor uses pattern matching to identify various contextual elements in the user query. It looks for scale indicators like mentions of large datasets or distributed systems. It identifies performance requirements such as low latency or high throughput needs. It extracts quality attributes that the user cares about, such as availability or security.
The extractor also identifies constraints that limit solution options, such as memory limitations or consistency requirements. It extracts keywords that can be used for matching against the knowledge base. All of this extracted information feeds into the recommendation engine to guide solution selection.
PROMPT ENGINEERING AND CONSTRUCTION
The prompt engineering layer constructs effective prompts that guide the LLM toward producing high-quality, contextually relevant recommendations. This involves combining the user query, extracted context, candidate solutions from the knowledge base, and instructions for analysis and formatting.
Effective prompts provide clear instructions, relevant context, and examples of desired output format. They guide the LLM to consider trade-offs, explain reasoning, and structure responses in a useful way. The prompt construction process is critical to getting good results from the LLM.
Here is the prompt construction implementation:
class PromptConstructor:
    def __init__(self, knowledge_base: KnowledgeBase):
        self.knowledge_base = knowledge_base
    
    def construct_recommendation_prompt(self, query: str, 
                                      context: ExtractedContext,
                                      candidates: Dict[str, List]) -> str:
        prompt_parts = []
        
        prompt_parts.append("You are an expert software architect and algorithm specialist.")
        prompt_parts.append("Your task is to recommend appropriate algorithms, design patterns, and architectural tactics based on the user's problem description and context.")
        prompt_parts.append("")
        
        prompt_parts.append("USER QUERY:")
        prompt_parts.append(query)
        prompt_parts.append("")
        
        prompt_parts.append("EXTRACTED CONTEXT:")
        prompt_parts.append(f"Problem Type: {context.problem_type}")
        prompt_parts.append(f"Domains: {', '.join(context.domain) if context.domain else 'Not specified'}")
        prompt_parts.append(f"Scale: {', '.join(context.scale_indicators) if context.scale_indicators else 'Not specified'}")
        prompt_parts.append(f"Performance Requirements: {', '.join(context.performance_requirements) if context.performance_requirements else 'Not specified'}")
        prompt_parts.append(f"Quality Attributes: {', '.join([qa.value for qa in context.quality_attributes]) if context.quality_attributes else 'Not specified'}")
        prompt_parts.append(f"Constraints: {', '.join(context.constraints) if context.constraints else 'None identified'}")
        prompt_parts.append("")
        
        if candidates['algorithms']:
            prompt_parts.append("CANDIDATE ALGORITHMS:")
            for algo in candidates['algorithms'][:5]:
                prompt_parts.append(f"- {algo.name} ({algo.category}): {algo.description}")
                prompt_parts.append(f"  Complexity: {algo.complexity_analysis}")
                prompt_parts.append(f"  Best for: {', '.join(algo.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        if candidates['design_patterns']:
            prompt_parts.append("CANDIDATE DESIGN PATTERNS:")
            for pattern in candidates['design_patterns'][:5]:
                prompt_parts.append(f"- {pattern.name} ({pattern.category}): {pattern.intent}")
                prompt_parts.append(f"  Problem: {pattern.problem}")
                prompt_parts.append(f"  Best for: {', '.join(pattern.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        if candidates['architectural_tactics']:
            prompt_parts.append("CANDIDATE ARCHITECTURAL TACTICS:")
            for tactic in candidates['architectural_tactics'][:5]:
                prompt_parts.append(f"- {tactic.name}: {tactic.description}")
                prompt_parts.append(f"  Quality Attributes: {', '.join([qa.value for qa in tactic.quality_attributes])}")
                prompt_parts.append(f"  Best for: {', '.join(tactic.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        prompt_parts.append("INSTRUCTIONS:")
        prompt_parts.append("1. Analyze the user's query and extracted context carefully")
        prompt_parts.append("2. Consider the candidate solutions provided and your broader knowledge")
        prompt_parts.append("3. Recommend the most appropriate solutions that fit the specific context")
        prompt_parts.append("4. For each recommendation, explain:")
        prompt_parts.append("   - Why it fits the context")
        prompt_parts.append("   - What trade-offs are involved")
        prompt_parts.append("   - How it should be implemented or applied")
        prompt_parts.append("   - What alternatives exist and when they might be better")
        prompt_parts.append("5. Prioritize recommendations by relevance to the context")
        prompt_parts.append("6. If the context is ambiguous, provide recommendations for different interpretations")
        prompt_parts.append("7. Include practical implementation guidance")
        prompt_parts.append("")
        
        prompt_parts.append("FORMAT YOUR RESPONSE AS:")
        prompt_parts.append("SUMMARY: Brief overview of the problem and recommended approach")
        prompt_parts.append("")
        prompt_parts.append("PRIMARY RECOMMENDATIONS:")
        prompt_parts.append("For each recommendation:")
        prompt_parts.append("- Name and category")
        prompt_parts.append("- Why it fits this context")
        prompt_parts.append("- Key trade-offs")
        prompt_parts.append("- Implementation guidance")
        prompt_parts.append("")
        prompt_parts.append("ALTERNATIVE APPROACHES:")
        prompt_parts.append("Other solutions to consider with their contexts")
        prompt_parts.append("")
        prompt_parts.append("IMPLEMENTATION CONSIDERATIONS:")
        prompt_parts.append("Practical advice for applying the recommendations")
        
        return "\n".join(prompt_parts)
The prompt constructor builds a comprehensive prompt that includes the user query, extracted context, candidate solutions from the knowledge base, and detailed instructions for the LLM. It formats candidate solutions with their key characteristics to help the LLM make informed recommendations.
The instructions guide the LLM to analyze context, explain trade-offs, provide implementation guidance, and consider alternatives. The format specification ensures the LLM structures its response in a consistent, useful way. This structured approach to prompt construction significantly improves the quality and relevance of recommendations.
RECOMMENDATION ENGINE
The recommendation engine orchestrates the entire process of analyzing user queries and generating recommendations. It coordinates the context extractor, knowledge base, prompt constructor, and LLM to produce comprehensive, contextually relevant recommendations.
The engine implements the core workflow: extract context from the query, search the knowledge base for candidates, construct an effective prompt, invoke the LLM, and format the response. It also handles error cases and provides fallback mechanisms when components fail.
Here is the recommendation engine implementation:
class RecommendationEngine:
    def __init__(self, llm: LLMInterface, knowledge_base: KnowledgeBase):
        self.llm = llm
        self.knowledge_base = knowledge_base
        self.context_extractor = ContextExtractor()
        self.prompt_constructor = PromptConstructor(knowledge_base)
    
    def get_recommendations(self, query: str) -> Dict[str, any]:
        try:
            context = self.context_extractor.extract(query)
            
            candidates = self._find_candidates(context)
            
            prompt = self.prompt_constructor.construct_recommendation_prompt(
                query, context, candidates
            )
            
            llm_response = self.llm.generate(
                prompt,
                max_tokens=3000,
                temperature=0.7,
                top_p=0.9
            )
            
            report = self._format_report(query, context, candidates, llm_response)
            
            return {
                'success': True,
                'query': query,
                'context': context,
                'candidates': candidates,
                'llm_response': llm_response,
                'report': report
            }
            
        except Exception as e:
            return {
                'success': False,
                'error': str(e),
                'query': query
            }
    
    def _find_candidates(self, context: ExtractedContext) -> Dict[str, List]:
        all_candidates = {
            'algorithms': [],
            'design_patterns': [],
            'architectural_tactics': []
        }
        
        for keyword in context.keywords:
            results = self.knowledge_base.search(keyword)
            all_candidates['algorithms'].extend(results['algorithms'])
            all_candidates['design_patterns'].extend(results['design_patterns'])
            all_candidates['architectural_tactics'].extend(results['architectural_tactics'])
        
        all_candidates['algorithms'] = list(set(all_candidates['algorithms']))
        all_candidates['design_patterns'] = list(set(all_candidates['design_patterns']))
        all_candidates['architectural_tactics'] = list(set(all_candidates['architectural_tactics']))
        
        scored_candidates = {
            'algorithms': self._score_and_sort(all_candidates['algorithms'], context),
            'design_patterns': self._score_and_sort(all_candidates['design_patterns'], context),
            'architectural_tactics': self._score_and_sort(all_candidates['architectural_tactics'], context)
        }
        
        return scored_candidates
    
    def _score_and_sort(self, items: List, context: ExtractedContext) -> List:
        scored_items = []
        for item in items:
            score = self._calculate_relevance_score(item, context)
            scored_items.append((score, item))
        
        scored_items.sort(reverse=True, key=lambda x: x[0])
        return [item for score, item in scored_items]
    
    def _calculate_relevance_score(self, item, context: ExtractedContext) -> float:
        score = 0.0
        
        if hasattr(item, 'applicability'):
            app = item.applicability
            
            for domain in context.domain:
                if any(domain in pd.lower() for pd in app.problem_domain):
                    score += 2.0
            
            for qa in context.quality_attributes:
                if hasattr(item, 'quality_attributes'):
                    if qa in item.quality_attributes:
                        score += 1.5
            
            for keyword in context.keywords:
                if keyword in item.name.lower():
                    score += 3.0
                if keyword in item.description.lower():
                    score += 1.0
                if hasattr(item, 'tags') and keyword in item.tags:
                    score += 1.5
        
        return score
    
    def _format_report(self, query: str, context: ExtractedContext,
                      candidates: Dict[str, List], llm_response: str) -> str:
        report_lines = []
        
        report_lines.append("=" * 80)
        report_lines.append("ALGORITHM, DESIGN PATTERN, AND ARCHITECTURAL TACTIC FINDER")
        report_lines.append("=" * 80)
        report_lines.append("")
        
        report_lines.append("QUERY:")
        report_lines.append(query)
        report_lines.append("")
        
        report_lines.append("CONTEXT ANALYSIS:")
        report_lines.append(f"Problem Type: {context.problem_type}")
        report_lines.append(f"Domains: {', '.join(context.domain) if context.domain else 'Not specified'}")
        report_lines.append(f"Scale: {', '.join(context.scale_indicators) if context.scale_indicators else 'Not specified'}")
        report_lines.append(f"Performance: {', '.join(context.performance_requirements) if context.performance_requirements else 'Not specified'}")
        report_lines.append(f"Quality Attributes: {', '.join([qa.value for qa in context.quality_attributes]) if context.quality_attributes else 'Not specified'}")
        report_lines.append(f"Constraints: {', '.join(context.constraints) if context.constraints else 'None'}")
        report_lines.append("")
        
        report_lines.append("=" * 80)
        report_lines.append("RECOMMENDATIONS")
        report_lines.append("=" * 80)
        report_lines.append("")
        report_lines.append(llm_response)
        report_lines.append("")
        
        report_lines.append("=" * 80)
        report_lines.append("KNOWLEDGE BASE MATCHES")
        report_lines.append("=" * 80)
        report_lines.append("")
        
        if candidates['algorithms']:
            report_lines.append("ALGORITHMS:")
            for algo in candidates['algorithms'][:5]:
                report_lines.append(f"- {algo.name}")
            report_lines.append("")
        
        if candidates['design_patterns']:
            report_lines.append("DESIGN PATTERNS:")
            for pattern in candidates['design_patterns'][:5]:
                report_lines.append(f"- {pattern.name}")
            report_lines.append("")
        
        if candidates['architectural_tactics']:
            report_lines.append("ARCHITECTURAL TACTICS:")
            for tactic in candidates['architectural_tactics'][:5]:
                report_lines.append(f"- {tactic.name}")
            report_lines.append("")
        
        return "\n".join(report_lines)
The recommendation engine coordinates all components to produce comprehensive recommendations. It extracts context from the user query, searches the knowledge base for relevant candidates, scores and ranks candidates based on relevance to the context, constructs an effective prompt, invokes the LLM, and formats the final report.
The scoring mechanism considers multiple factors including domain match, quality attribute alignment, and keyword presence. This ensures that the most relevant candidates are prioritized and included in the prompt sent to the LLM. The LLM then provides deeper analysis and recommendations based on both the candidates and its broader knowledge.
COMPLETE RUNNING EXAMPLE
Now I will provide a complete, production-ready implementation that integrates all components into a working system. This example supports both local and remote LLMs, handles errors gracefully, and provides a clean interface for users.
import torch
import platform
import subprocess
import psutil
from abc import ABC, abstractmethod
from typing import Dict, List, Optional, Union, Set
from dataclasses import dataclass
from enum import Enum
import requests
import json
import time
import re
from transformers import AutoTokenizer, AutoModelForCausalLM

class DifficultyLevel(Enum):
    TRIVIAL = 1
    EASY = 2
    MODERATE = 3
    CHALLENGING = 4
    EXPERT = 5

class QualityAttribute(Enum):
    PERFORMANCE = "performance"
    SCALABILITY = "scalability"
    AVAILABILITY = "availability"
    SECURITY = "security"
    MAINTAINABILITY = "maintainability"
    TESTABILITY = "testability"
    USABILITY = "usability"
    MODIFIABILITY = "modifiability"

@dataclass
class ApplicabilityContext:
    problem_domain: List[str]
    scale_characteristics: List[str]
    performance_requirements: List[str]
    resource_constraints: List[str]
    quality_attributes: List[QualityAttribute]
    preferred_scenarios: List[str]
    avoid_scenarios: List[str]
    environmental_factors: List[str]

@dataclass
class Algorithm:
    name: str
    category: str
    description: str
    complexity_analysis: str
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_algorithms: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    trade_offs: List[str]

@dataclass
class DesignPattern:
    name: str
    category: str
    intent: str
    problem: str
    solution: str
    structure: str
    participants: List[str]
    consequences: List[str]
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_patterns: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    known_uses: List[str]

@dataclass
class ArchitecturalTactic:
    name: str
    quality_attributes: List[QualityAttribute]
    description: str
    mechanism: str
    trade_offs: List[str]
    applicability: ApplicabilityContext
    implementation_difficulty: DifficultyLevel
    related_tactics: List[str]
    tags: Set[str]
    example_use_cases: List[str]
    architectural_impact: str

@dataclass
class ExtractedContext:
    problem_type: str
    domain: List[str]
    scale_indicators: List[str]
    performance_requirements: List[str]
    quality_attributes: Set[QualityAttribute]
    constraints: List[str]
    keywords: Set[str]
    raw_query: str

class HardwareDetector:
    def __init__(self):
        self.available_devices = []
        self.preferred_device = None
        self._detect_hardware()
    
    def _detect_hardware(self):
        if torch.cuda.is_available():
            cuda_device = {
                'type': 'cuda',
                'name': torch.cuda.get_device_name(0),
                'memory': torch.cuda.get_device_properties(0).total_memory,
                'compute_capability': torch.cuda.get_device_capability(0),
                'device_index': 0
            }
            self.available_devices.append(cuda_device)
        
        if hasattr(torch.version, 'hip') and torch.version.hip is not None:
            if torch.cuda.is_available():
                rocm_device = {
                    'type': 'rocm',
                    'name': torch.cuda.get_device_name(0),
                    'memory': torch.cuda.get_device_properties(0).total_memory,
                    'device_index': 0
                }
                self.available_devices.append(rocm_device)
        
        if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
            mps_device = {
                'type': 'mps',
                'name': 'Apple Metal Performance Shaders',
                'memory': self._get_mps_memory(),
                'device_index': 0
            }
            self.available_devices.append(mps_device)
        
        if hasattr(torch, 'xpu') and torch.xpu.is_available():
            xpu_device = {
                'type': 'xpu',
                'name': torch.xpu.get_device_name(0),
                'memory': torch.xpu.get_device_properties(0).total_memory,
                'device_index': 0
            }
            self.available_devices.append(xpu_device)
        
        cpu_device = {
            'type': 'cpu',
            'name': platform.processor(),
            'memory': self._get_cpu_memory(),
            'device_index': 0
        }
        self.available_devices.append(cpu_device)
        
        self.preferred_device = self._select_preferred_device()
    
    def _get_mps_memory(self):
        try:
            result = subprocess.run(['sysctl', 'hw.memsize'], 
                                  capture_output=True, text=True)
            if result.returncode == 0:
                memory_bytes = int(result.stdout.split(':')[1].strip())
                return memory_bytes
        except Exception:
            pass
        return 8 * 1024 * 1024 * 1024
    
    def _get_cpu_memory(self):
        return psutil.virtual_memory().total
    
    def _select_preferred_device(self):
        priority_order = ['cuda', 'rocm', 'xpu', 'mps', 'cpu']
        for device_type in priority_order:
            for device in self.available_devices:
                if device['type'] == device_type:
                    return device
        return self.available_devices[-1]
    
    def get_torch_device(self):
        device_type = self.preferred_device['type']
        if device_type == 'cuda' or device_type == 'rocm':
            return torch.device('cuda:0')
        elif device_type == 'mps':
            return torch.device('mps')
        elif device_type == 'xpu':
            return torch.device('xpu:0')
        else:
            return torch.device('cpu')

class LLMInterface(ABC):
    @abstractmethod
    def generate(self, prompt: str, max_tokens: int = 2048, 
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        pass
    
    @abstractmethod
    def get_model_info(self) -> Dict[str, any]:
        pass

class LocalLLM(LLMInterface):
    def __init__(self, model_name: str, hardware_detector: HardwareDetector):
        self.model_name = model_name
        self.device = hardware_detector.get_torch_device()
        self.hardware_info = hardware_detector.preferred_device
        
        print(f"Loading model {model_name} on device {self.device}")
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        
        load_kwargs = {
            'torch_dtype': torch.float16 if self.device.type != 'cpu' else torch.float32,
            'low_cpu_mem_usage': True
        }
        
        if self.device.type == 'cuda':
            load_kwargs['device_map'] = 'auto'
        
        self.model = AutoModelForCausalLM.from_pretrained(
            model_name, 
            **load_kwargs
        )
        
        if 'device_map' not in load_kwargs:
            self.model = self.model.to(self.device)
        
        self.model.eval()
        
        if self.device.type == 'cuda':
            try:
                self.model = torch.compile(self.model, mode='reduce-overhead')
            except Exception:
                pass
    
    def generate(self, prompt: str, max_tokens: int = 2048,
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        inputs = self.tokenizer(prompt, return_tensors='pt', 
                              truncation=True, max_length=4096)
        inputs = {k: v.to(self.device) for k, v in inputs.items()}
        
        with torch.no_grad():
            outputs = self.model.generate(
                **inputs,
                max_new_tokens=max_tokens,
                temperature=temperature,
                top_p=top_p,
                do_sample=True,
                pad_token_id=self.tokenizer.eos_token_id
            )
        
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        if response.startswith(prompt):
            response = response[len(prompt):].strip()
        
        return response
    
    def get_model_info(self) -> Dict[str, any]:
        return {
            'model_name': self.model_name,
            'device': str(self.device),
            'hardware': self.hardware_info,
            'type': 'local'
        }

class RemoteLLM(LLMInterface):
    def __init__(self, api_endpoint: str, api_key: str, 
                model_name: str, provider: str = 'openai'):
        self.api_endpoint = api_endpoint
        self.api_key = api_key
        self.model_name = model_name
        self.provider = provider
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {api_key}',
            'Content-Type': 'application/json'
        })
    
    def generate(self, prompt: str, max_tokens: int = 2048,
                temperature: float = 0.7, top_p: float = 0.9) -> str:
        if self.provider == 'openai':
            return self._generate_openai(prompt, max_tokens, temperature, top_p)
        elif self.provider == 'anthropic':
            return self._generate_anthropic(prompt, max_tokens, temperature, top_p)
        else:
            return self._generate_generic(prompt, max_tokens, temperature, top_p)
    
    def _generate_openai(self, prompt: str, max_tokens: int,
                        temperature: float, top_p: float) -> str:
        payload = {
            'model': self.model_name,
            'messages': [{'role': 'user', 'content': prompt}],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        max_retries = 3
        for attempt in range(max_retries):
            try:
                response = self.session.post(
                    f'{self.api_endpoint}/chat/completions',
                    json=payload,
                    timeout=120
                )
                response.raise_for_status()
                result = response.json()
                return result['choices'][0]['message']['content']
            except requests.exceptions.RequestException as e:
                if attempt == max_retries - 1:
                    raise Exception(f"API request failed: {e}")
                time.sleep(2 ** attempt)
    
    def _generate_anthropic(self, prompt: str, max_tokens: int,
                           temperature: float, top_p: float) -> str:
        payload = {
            'model': self.model_name,
            'messages': [{'role': 'user', 'content': prompt}],
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        headers = {
            'x-api-key': self.api_key,
            'anthropic-version': '2023-06-01',
            'Content-Type': 'application/json'
        }
        
        response = requests.post(
            f'{self.api_endpoint}/messages',
            headers=headers,
            json=payload,
            timeout=120
        )
        response.raise_for_status()
        result = response.json()
        return result['content'][0]['text']
    
    def _generate_generic(self, prompt: str, max_tokens: int,
                         temperature: float, top_p: float) -> str:
        payload = {
            'prompt': prompt,
            'max_tokens': max_tokens,
            'temperature': temperature,
            'top_p': top_p
        }
        
        response = self.session.post(self.api_endpoint, json=payload, timeout=120)
        response.raise_for_status()
        result = response.json()
        
        if 'text' in result:
            return result['text']
        elif 'response' in result:
            return result['response']
        elif 'output' in result:
            return result['output']
        else:
            raise ValueError(f"Unknown response format: {result}")
    
    def get_model_info(self) -> Dict[str, any]:
        return {
            'model_name': self.model_name,
            'provider': self.provider,
            'endpoint': self.api_endpoint,
            'type': 'remote'
        }

class KnowledgeBase:
    def __init__(self):
        self.algorithms: Dict[str, Algorithm] = {}
        self.design_patterns: Dict[str, DesignPattern] = {}
        self.architectural_tactics: Dict[str, ArchitecturalTactic] = {}
        self._initialize_knowledge()
    
    def _initialize_knowledge(self):
        self._add_algorithms()
        self._add_design_patterns()
        self._add_architectural_tactics()
    
    def _add_algorithms(self):
        dijkstra = Algorithm(
            name="Dijkstra's Algorithm",
            category="Graph Algorithms",
            description="Finds shortest paths from a source vertex to all other vertices in a weighted graph with non-negative edge weights using a greedy approach with a priority queue.",
            complexity_analysis="Time complexity O((V + E) log V) with binary heap, O(V^2) with array. Space complexity O(V) for distance array and priority queue.",
            applicability=ApplicabilityContext(
                problem_domain=["Routing", "Network optimization", "Path finding", "Resource allocation"],
                scale_characteristics=["Small to medium graphs", "Sparse or dense graphs"],
                performance_requirements=["Optimal shortest paths required", "Single-source paths needed"],
                resource_constraints=["Memory for priority queue acceptable"],
                quality_attributes=[QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Finding shortest paths in road networks",
                    "Network routing protocols",
                    "GPS navigation systems",
                    "When all edge weights are non-negative",
                    "When single-source shortest paths are needed"
                ],
                avoid_scenarios=[
                    "Graphs with negative edge weights",
                    "All-pairs shortest paths needed",
                    "When approximate solutions suffice",
                    "Extremely large graphs where memory is constrained"
                ],
                environmental_factors=["Static or slowly changing graphs"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_algorithms=["Bellman-Ford", "A* Search", "Floyd-Warshall"],
            tags={"graph", "shortest-path", "greedy", "optimization"},
            example_use_cases=[
                "GPS navigation finding shortest route between locations",
                "Network routers determining optimal packet paths",
                "Game AI pathfinding on weighted terrain"
            ],
            trade_offs=[
                "Cannot handle negative edge weights unlike Bellman-Ford",
                "More complex than BFS but handles weighted graphs",
                "Single-source only, use Floyd-Warshall for all-pairs"
            ]
        )
        self.algorithms["dijkstra"] = dijkstra
        
        bloom_filter = Algorithm(
            name="Bloom Filter",
            category="Probabilistic Data Structures",
            description="A space-efficient probabilistic data structure for testing set membership with possible false positives but no false negatives.",
            complexity_analysis="Time complexity O(k) for insertion and lookup where k is number of hash functions. Space complexity O(m) where m is bit array size, much smaller than storing actual elements.",
            applicability=ApplicabilityContext(
                problem_domain=["Caching", "Databases", "Network systems", "Spell checking"],
                scale_characteristics=["Large datasets", "Memory-constrained environments"],
                performance_requirements=["Fast membership testing", "Low memory footprint"],
                resource_constraints=["Limited memory", "High throughput needed"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Testing if element might be in large set",
                    "Reducing expensive disk or network lookups",
                    "Web crawlers avoiding duplicate URLs",
                    "Database query optimization",
                    "When false positives are acceptable"
                ],
                avoid_scenarios=[
                    "When false positives are unacceptable",
                    "When deletion is required",
                    "When exact membership must be determined",
                    "Small datasets where hash table fits in memory"
                ],
                environmental_factors=["Read-heavy workloads", "Append-only scenarios"]
            ),
            implementation_difficulty=DifficultyLevel.EASY,
            related_algorithms=["Counting Bloom Filter", "Cuckoo Filter", "Hash Table"],
            tags={"probabilistic", "hashing", "space-efficient", "membership-testing"},
            example_use_cases=[
                "Web browsers checking malicious URLs against blacklist",
                "Databases avoiding expensive disk reads for non-existent keys",
                "Distributed systems reducing network calls"
            ],
            trade_offs=[
                "False positives possible, must verify with authoritative source",
                "Cannot delete elements from standard Bloom filter",
                "Extremely space-efficient compared to hash tables"
            ]
        )
        self.algorithms["bloom_filter"] = bloom_filter
        
        raft = Algorithm(
            name="Raft Consensus Algorithm",
            category="Distributed Systems",
            description="A consensus algorithm for managing replicated logs in distributed systems, designed to be more understandable than Paxos while providing equivalent guarantees.",
            complexity_analysis="Message complexity O(n) per operation in normal case where n is number of nodes. Handles network partitions and node failures with leader election.",
            applicability=ApplicabilityContext(
                problem_domain=["Distributed databases", "Configuration management", "Coordination services"],
                scale_characteristics=["Small to medium clusters", "Typically 3-7 nodes"],
                performance_requirements=["Strong consistency", "Fault tolerance", "Availability during partitions"],
                resource_constraints=["Network bandwidth for log replication"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Building distributed databases requiring strong consistency",
                    "Configuration stores like etcd",
                    "Coordination services",
                    "When understandability matters for implementation",
                    "When you need proven consensus algorithm"
                ],
                avoid_scenarios=[
                    "Single-node systems",
                    "When eventual consistency suffices",
                    "Very large clusters",
                    "When network partitions are extremely rare"
                ],
                environmental_factors=["Unreliable networks", "Node failures expected"]
            ),
            implementation_difficulty=DifficultyLevel.EXPERT,
            related_algorithms=["Paxos", "Multi-Paxos", "Viewstamped Replication"],
            tags={"distributed", "consensus", "replication", "fault-tolerance"},
            example_use_cases=[
                "etcd distributed configuration store",
                "Distributed databases like CockroachDB",
                "Kubernetes control plane coordination"
            ],
            trade_offs=[
                "Simpler to understand than Paxos",
                "Requires majority for progress, unavailable if majority fails",
                "Higher latency than eventual consistency approaches"
            ]
        )

        self.algorithms["raft"] = raft
        
        aes_encryption = Algorithm(
            name="AES Encryption",
            category="Cryptography",
            description="Advanced Encryption Standard, a symmetric block cipher that encrypts data in fixed-size blocks using keys of 128, 192, or 256 bits.",
            complexity_analysis="Time complexity O(n) where n is data size. Space complexity O(1) for in-place encryption. Computational cost depends on key size and number of rounds.",
            applicability=ApplicabilityContext(
                problem_domain=["Data security", "Communication security", "Storage encryption", "Authentication"],
                scale_characteristics=["Any data size", "Streaming or block processing"],
                performance_requirements=["Fast encryption and decryption", "Hardware acceleration available"],
                resource_constraints=["CPU cycles for encryption", "Memory for key schedule"],
                quality_attributes=[QualityAttribute.SECURITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Encrypting sensitive data at rest",
                    "Securing network communications",
                    "File and disk encryption",
                    "When symmetric encryption is appropriate",
                    "When hardware acceleration is available"
                ],
                avoid_scenarios=[
                    "When public key cryptography is needed",
                    "Key distribution is problematic",
                    "When you need digital signatures",
                    "Extremely resource-constrained devices"
                ],
                environmental_factors=["Trusted key management infrastructure", "Hardware crypto support"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_algorithms=["ChaCha20", "RSA", "Elliptic Curve Cryptography"],
            tags={"cryptography", "encryption", "security", "symmetric"},
            example_use_cases=[
                "HTTPS/TLS encrypting web traffic",
                "Full disk encryption systems",
                "VPN tunnels encrypting network traffic"
            ],
            trade_offs=[
                "Very fast with hardware support",
                "Requires secure key distribution",
                "Industry standard with extensive analysis"
            ]
        )
        self.algorithms["aes_encryption"] = aes_encryption
        
        lru_cache = Algorithm(
            name="LRU Cache Eviction",
            category="Caching Algorithms",
            description="Least Recently Used cache eviction policy that removes the least recently accessed item when cache is full.",
            complexity_analysis="Time complexity O(1) for get and put operations using hash map plus doubly linked list. Space complexity O(capacity).",
            applicability=ApplicabilityContext(
                problem_domain=["Caching", "Memory management", "Database buffers", "Web caching"],
                scale_characteristics=["Fixed cache size", "Moderate to high access rates"],
                performance_requirements=["Fast cache operations", "Good hit rates"],
                resource_constraints=["Limited cache memory", "Need predictable performance"],
                quality_attributes=[QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "General-purpose caching with temporal locality",
                    "Page replacement in operating systems",
                    "Database buffer management",
                    "When recent items likely to be accessed again",
                    "When implementation simplicity matters"
                ],
                avoid_scenarios=[
                    "When access patterns are purely random",
                    "When frequency matters more than recency",
                    "Scan-resistant caching needed",
                    "When cache size is extremely small"
                ],
                environmental_factors=["Workloads with temporal locality"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_algorithms=["LFU Cache", "ARC Cache", "CLOCK Algorithm"],
            tags={"caching", "eviction", "data-structures", "performance"},
            example_use_cases=[
                "Web browser caching recently viewed pages",
                "Database query result caching",
                "Operating system page cache"
            ],
            trade_offs=[
                "Simple and effective for many workloads",
                "Vulnerable to cache pollution from scans",
                "O(1) operations with proper implementation"
            ]
        )
        self.algorithms["lru_cache"] = lru_cache
    
    def _add_design_patterns(self):
        circuit_breaker = DesignPattern(
            name="Circuit Breaker",
            category="Resilience Patterns",
            intent="Prevent cascading failures in distributed systems by detecting failures and preventing calls to failing services.",
            problem="When a remote service fails, clients continue making requests that are doomed to fail, wasting resources and potentially causing cascading failures throughout the system.",
            solution="Wrap remote calls in a circuit breaker object that monitors for failures. After a threshold of failures, the circuit opens and immediately returns errors without attempting the call. After a timeout, it enters half-open state to test if service recovered.",
            structure="Circuit breaker maintains state (Closed, Open, Half-Open) and failure count. Closed state allows calls through. Open state fails fast. Half-Open state allows limited test calls.",
            participants=["Circuit Breaker", "Protected Service", "Client"],
            consequences=[
                "Prevents resource exhaustion from repeated failed calls",
                "Allows failing services time to recover",
                "Provides fast failure rather than waiting for timeouts",
                "Adds complexity to service interaction",
                "Requires tuning of failure thresholds and timeouts"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Microservices", "Distributed systems", "API integration"],
                scale_characteristics=["Multiple service dependencies", "High request volumes"],
                performance_requirements=["Fast failure detection", "Resilience to cascading failures"],
                resource_constraints=["Limited connection pools", "Thread pool exhaustion risks"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Calling external services that may fail",
                    "Microservice architectures with many dependencies",
                    "Systems requiring high availability",
                    "When cascading failures are a risk",
                    "When you need graceful degradation"
                ],
                avoid_scenarios=[
                    "Single monolithic application",
                    "When all failures should be retried indefinitely",
                    "Local in-process calls",
                    "When failure detection overhead is unacceptable"
                ],
                environmental_factors=["Unreliable networks", "External service dependencies"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_patterns=["Retry", "Bulkhead", "Timeout"],
            tags={"resilience", "fault-tolerance", "distributed-systems", "microservices"},
            example_use_cases=[
                "E-commerce site protecting against payment gateway failures",
                "Microservices preventing cascading failures",
                "Mobile apps handling unreliable network conditions"
            ],
            known_uses=["Netflix Hystrix", "Resilience4j", "Polly"]
        )
        self.design_patterns["circuit_breaker"] = circuit_breaker
        
        cqrs = DesignPattern(
            name="CQRS (Command Query Responsibility Segregation)",
            category="Architectural Patterns",
            intent="Separate read and write operations into different models to optimize each independently.",
            problem="In complex domains, a single model serving both reads and writes becomes convoluted, with conflicting requirements for queries and updates.",
            solution="Split the model into separate command (write) and query (read) models. Commands modify state, queries return data. Each model can be optimized for its specific purpose.",
            structure="Command model handles writes with domain logic and validation. Query model provides optimized read views, potentially denormalized. Event bus or database replication synchronizes models.",
            participants=["Command Model", "Query Model", "Command Handlers", "Query Handlers", "Synchronization Mechanism"],
            consequences=[
                "Enables independent scaling of reads and writes",
                "Allows different data models optimized for each purpose",
                "Increases complexity with two models to maintain",
                "Introduces eventual consistency between models",
                "Simplifies complex domain models"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Complex business domains", "High-traffic applications", "Event-driven systems"],
                scale_characteristics=["Read-heavy or write-heavy workloads", "Large scale systems"],
                performance_requirements=["Independent scaling", "Optimized queries", "High throughput"],
                resource_constraints=["Different storage requirements for reads and writes"],
                quality_attributes=[QualityAttribute.SCALABILITY, QualityAttribute.PERFORMANCE, QualityAttribute.MAINTAINABILITY],
                preferred_scenarios=[
                    "Complex domains with different read and write needs",
                    "Systems with vastly different read and write loads",
                    "When you need to scale reads and writes independently",
                    "Event-sourced systems",
                    "When query optimization is critical"
                ],
                avoid_scenarios=[
                    "Simple CRUD applications",
                    "When strong consistency is required everywhere",
                    "Small applications where complexity overhead is high",
                    "When development team is small"
                ],
                environmental_factors=["Distributed systems", "Microservices architectures"]
            ),
            implementation_difficulty=DifficultyLevel.CHALLENGING,
            related_patterns=["Event Sourcing", "Domain-Driven Design", "Materialized View"],
            tags={"architectural", "scalability", "domain-driven-design", "separation-of-concerns"},
            example_use_cases=[
                "E-commerce platforms with complex product catalogs and heavy browsing",
                "Financial systems with audit requirements and complex reporting",
                "Social media platforms with different read and write patterns"
            ],
            known_uses=["Microsoft Azure", "EventStore", "Axon Framework"]
        )
        self.design_patterns["cqrs"] = cqrs
        
        saga = DesignPattern(
            name="Saga Pattern",
            category="Distributed Transactions",
            intent="Manage distributed transactions across multiple services using a sequence of local transactions with compensating actions.",
            problem="Traditional ACID transactions don't work well across microservices. Two-phase commit is complex and reduces availability.",
            solution="Break distributed transaction into sequence of local transactions. Each step publishes event triggering next step. If step fails, execute compensating transactions to undo previous steps.",
            structure="Saga coordinator orchestrates steps. Each step is a local transaction with corresponding compensating transaction. Can be choreographed (event-driven) or orchestrated (central coordinator).",
            participants=["Saga Coordinator", "Participating Services", "Compensating Transactions", "Event Bus"],
            consequences=[
                "Enables distributed transactions without locking",
                "Maintains availability during failures",
                "Requires compensating logic for rollback",
                "Eventual consistency rather than immediate",
                "More complex than local transactions"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Microservices", "Distributed systems", "E-commerce", "Booking systems"],
                scale_characteristics=["Multiple services involved in transaction", "Long-running processes"],
                performance_requirements=["High availability", "No distributed locking"],
                resource_constraints=["Cannot use two-phase commit", "Services must be autonomous"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Distributed transactions across microservices",
                    "Long-running business processes",
                    "When two-phase commit is not feasible",
                    "E-commerce order processing",
                    "Travel booking systems"
                ],
                avoid_scenarios=[
                    "Single database transactions",
                    "When ACID guarantees are absolutely required",
                    "Simple workflows without failure scenarios",
                    "When compensating logic is impossible"
                ],
                environmental_factors=["Microservices architecture", "Event-driven systems"]
            ),
            implementation_difficulty=DifficultyLevel.CHALLENGING,
            related_patterns=["Event Sourcing", "CQRS", "Process Manager"],
            tags={"distributed-transactions", "microservices", "eventual-consistency", "resilience"},
            example_use_cases=[
                "E-commerce order processing across inventory, payment, and shipping services",
                "Travel booking coordinating flights, hotels, and car rentals",
                "Banking transfers across different account systems"
            ],
            known_uses=["Uber trip management", "Amazon order processing"]
        )
        self.design_patterns["saga"] = saga
        
        repository = DesignPattern(
            name="Repository Pattern",
            category="Data Access Patterns",
            intent="Mediate between domain and data mapping layers using a collection-like interface for accessing domain objects.",
            problem="Direct database access from business logic creates tight coupling, makes testing difficult, and scatters data access code throughout the application.",
            solution="Create repository classes that encapsulate data access logic and provide collection-like interfaces. Domain layer depends on repository interfaces, implementations handle database details.",
            structure="Repository interface defines domain-oriented methods. Concrete repository implements interface using ORM or direct database access. Domain layer uses repositories without knowing persistence details.",
            participants=["Repository Interface", "Concrete Repository", "Domain Entities", "Data Access Layer"],
            consequences=[
                "Centralizes data access logic",
                "Enables easier unit testing with mock repositories",
                "Provides clean separation between domain and persistence",
                "Can add caching or other cross-cutting concerns",
                "May introduce additional abstraction layer"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Business applications", "Domain-driven design", "Data-centric applications"],
                scale_characteristics=["Any application size", "Multiple data sources"],
                performance_requirements=["Testability", "Maintainability"],
                resource_constraints=["Need to support multiple databases", "Testing without database"],
                quality_attributes=[QualityAttribute.MAINTAINABILITY, QualityAttribute.TESTABILITY],
                preferred_scenarios=[
                    "Domain-driven design implementations",
                    "When you need to test business logic without database",
                    "Multiple data sources for same entities",
                    "When data access patterns are complex",
                    "When you want to centralize data access logic"
                ],
                avoid_scenarios=[
                    "Very simple CRUD applications",
                    "When ORM already provides sufficient abstraction",
                    "Performance-critical code needing direct SQL",
                    "When abstraction overhead is unacceptable"
                ],
                environmental_factors=["Enterprise applications", "Team development"]
            ),
            implementation_difficulty=DifficultyLevel.EASY,
            related_patterns=["Unit of Work", "Data Mapper", "Active Record"],
            tags={"data-access", "domain-driven-design", "persistence", "separation-of-concerns"},
            example_use_cases=[
                "E-commerce application managing product catalog",
                "CRM system accessing customer data",
                "Content management system with multiple storage backends"
            ],
            known_uses=["Spring Data", "Entity Framework", "Hibernate"]
        )
        self.design_patterns["repository"] = repository
    
    def _add_architectural_tactics(self):
        load_balancing = ArchitecturalTactic(
            name="Load Balancing",
            quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY, QualityAttribute.AVAILABILITY],
            description="Distribute incoming requests across multiple server instances to optimize resource utilization, maximize throughput, minimize response time, and avoid overload.",
            mechanism="Load balancer sits between clients and servers, distributing requests using algorithms like round-robin, least connections, weighted distribution, or consistent hashing.",
            trade_offs=[
                "Adds complexity with additional component",
                "Load balancer can become single point of failure",
                "Session affinity complicates stateful applications",
                "Dramatically improves scalability and availability"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Web applications", "API services", "Microservices", "Databases"],
                scale_characteristics=["Multiple server instances", "High traffic volumes"],
                performance_requirements=["High throughput", "Low latency", "Even resource utilization"],
                resource_constraints=["Multiple servers available"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY, QualityAttribute.AVAILABILITY],
                preferred_scenarios=[
                    "Horizontal scaling of stateless services",
                    "High-availability requirements",
                    "Traffic spikes requiring elastic scaling",
                    "Geographically distributed users",
                    "When you have multiple server instances"
                ],
                avoid_scenarios=[
                    "Single server deployments",
                    "When all requests must go to same instance",
                    "Extremely low latency requirements where hop is unacceptable",
                    "Very small scale applications"
                ],
                environmental_factors=["Cloud deployments", "Container orchestration platforms"]
            ),
            implementation_difficulty=DifficultyLevel.EASY,
            related_tactics=["Caching", "Horizontal Scaling", "Service Discovery"],
            tags={"scalability", "performance", "availability", "distribution"},
            example_use_cases=[
                "Web application distributing requests across multiple servers",
                "API gateway routing to microservice instances",
                "Database read replicas for query distribution"
            ],
            architectural_impact="Requires stateless services or session management. Affects deployment architecture and monitoring strategies."
        )
        self.architectural_tactics["load_balancing"] = load_balancing
        
        caching = ArchitecturalTactic(
            name="Caching",
            quality_attributes=[QualityAttribute.PERFORMANCE],
            description="Store frequently accessed data in fast-access storage to reduce latency and load on backend systems.",
            mechanism="Cache layer stores copies of data closer to consumers. Can be client-side, server-side, or distributed. Uses eviction policies like LRU, LFU, or TTL to manage cache size.",
            trade_offs=[
                "Improves read performance significantly",
                "Reduces load on backend systems",
                "Introduces cache invalidation complexity",
                "Stale data possible with inconsistent cache",
                "Additional memory requirements"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Web applications", "Databases", "APIs", "Content delivery"],
                scale_characteristics=["Read-heavy workloads", "Repeated access patterns"],
                performance_requirements=["Low latency", "High throughput", "Reduced backend load"],
                resource_constraints=["Memory available for cache", "Network latency to backend"],
                quality_attributes=[QualityAttribute.PERFORMANCE, QualityAttribute.SCALABILITY],
                preferred_scenarios=[
                    "Frequently accessed data that changes infrequently",
                    "Expensive computations or queries",
                    "High read-to-write ratios",
                    "Geographically distributed users",
                    "Reducing database load"
                ],
                avoid_scenarios=[
                    "Data that must always be fresh",
                    "Write-heavy workloads",
                    "When cache invalidation is extremely complex",
                    "Unique queries with no reuse"
                ],
                environmental_factors=["Distributed systems", "Content delivery networks"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_tactics=["Content Delivery Network", "Materialized Views", "Read Replicas"],
            tags={"performance", "latency", "scalability", "optimization"},
            example_use_cases=[
                "Web application caching database query results",
                "CDN caching static assets close to users",
                "API response caching for repeated requests"
            ],
            architectural_impact="Requires cache invalidation strategy. Affects consistency guarantees and data freshness."
        )
        self.architectural_tactics["caching"] = caching
        
        rate_limiting = ArchitecturalTactic(
            name="Rate Limiting",
            quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SECURITY],
            description="Control the rate of requests to prevent resource exhaustion, ensure fair usage, and protect against abuse.",
            mechanism="Track request rates per client, IP, or API key. Reject or queue requests exceeding limits. Algorithms include token bucket, leaky bucket, fixed window, or sliding window.",
            trade_offs=[
                "Protects system from overload and abuse",
                "Ensures fair resource allocation",
                "May reject legitimate requests during spikes",
                "Adds latency for rate checking",
                "Requires distributed coordination in scaled systems"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["APIs", "Web services", "Microservices", "Public endpoints"],
                scale_characteristics=["Shared resources", "Multi-tenant systems"],
                performance_requirements=["Prevent resource exhaustion", "Fair usage"],
                resource_constraints=["Limited backend capacity", "Shared infrastructure"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.SECURITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Public APIs with usage tiers",
                    "Protecting backend from overload",
                    "Multi-tenant SaaS applications",
                    "Preventing abuse and DDoS attacks",
                    "Ensuring fair resource allocation"
                ],
                avoid_scenarios=[
                    "Internal services with trusted clients",
                    "When all traffic is legitimate and predictable",
                    "Single-tenant dedicated systems",
                    "When latency overhead is unacceptable"
                ],
                environmental_factors=["Public internet exposure", "Untrusted clients"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_tactics=["Throttling", "Circuit Breaker", "API Gateway"],
            tags={"availability", "security", "resource-management", "fairness"},
            example_use_cases=[
                "API limiting requests per API key",
                "Web application preventing brute force attacks",
                "SaaS platform enforcing usage tiers"
            ],
            architectural_impact="Requires distributed state management for rate counters. Affects API design and client error handling."
        )
        self.architectural_tactics["rate_limiting"] = rate_limiting
        
        bulkhead = ArchitecturalTactic(
            name="Bulkhead Isolation",
            quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.PERFORMANCE],
            description="Isolate resources into pools to prevent failures in one area from cascading to others, similar to bulkheads in ships.",
            mechanism="Partition resources like thread pools, connection pools, or service instances. Each partition serves specific functionality. Failure in one partition doesn't exhaust resources for others.",
            trade_offs=[
                "Prevents cascading resource exhaustion",
                "Improves fault isolation",
                "May underutilize resources due to partitioning",
                "Adds complexity in resource management",
                "Requires careful capacity planning per partition"
            ],
            applicability=ApplicabilityContext(
                problem_domain=["Microservices", "Multi-tenant systems", "High-availability applications"],
                scale_characteristics=["Shared resource pools", "Multiple independent workloads"],
                performance_requirements=["Fault isolation", "Predictable performance"],
                resource_constraints=["Limited thread pools", "Connection pool limits"],
                quality_attributes=[QualityAttribute.AVAILABILITY, QualityAttribute.PERFORMANCE],
                preferred_scenarios=[
                    "Services with multiple dependencies of varying reliability",
                    "Multi-tenant applications needing isolation",
                    "When one slow dependency shouldn't affect others",
                    "Thread pool exhaustion is a risk",
                    "When you need to limit blast radius of failures"
                ],
                avoid_scenarios=[
                    "Single-threaded applications",
                    "When resource utilization must be maximized",
                    "Simple applications with few dependencies",
                    "When all operations have similar characteristics"
                ],
                environmental_factors=["Microservices with varying SLAs", "Shared infrastructure"]
            ),
            implementation_difficulty=DifficultyLevel.MODERATE,
            related_tactics=["Circuit Breaker", "Timeout", "Thread Pool Sizing"],
            tags={"resilience", "isolation", "fault-tolerance", "resource-management"},
            example_use_cases=[
                "Microservice isolating thread pools per external dependency",
                "Multi-tenant SaaS isolating resources per tenant",
                "API gateway with separate connection pools per backend"
            ],
            architectural_impact="Requires resource pool management. Affects capacity planning and monitoring."
        )
        self.architectural_tactics["bulkhead"] = bulkhead
    
    def search(self, query: str, category: Optional[str] = None) -> Dict[str, List]:
        query_lower = query.lower()
        results = {
            'algorithms': [],
            'design_patterns': [],
            'architectural_tactics': []
        }
        
        if category is None or category == 'algorithms':
            for algo in self.algorithms.values():
                if self._matches_query(algo, query_lower):
                    results['algorithms'].append(algo)
        
        if category is None or category == 'design_patterns':
            for pattern in self.design_patterns.values():
                if self._matches_query(pattern, query_lower):
                    results['design_patterns'].append(pattern)
        
        if category is None or category == 'architectural_tactics':
            for tactic in self.architectural_tactics.values():
                if self._matches_query(tactic, query_lower):
                    results['architectural_tactics'].append(tactic)
        
        return results
    
    def _matches_query(self, item, query: str) -> bool:
        searchable_text = f"{item.name} {item.description}".lower()
        
        if hasattr(item, 'tags'):
            searchable_text += " " + " ".join(item.tags)
        
        if hasattr(item, 'category'):
            searchable_text += " " + item.category.lower()
        
        if hasattr(item, 'applicability'):
            app = item.applicability
            searchable_text += " " + " ".join(app.problem_domain).lower()
            searchable_text += " " + " ".join(app.preferred_scenarios).lower()
        
        return query in searchable_text

class ContextExtractor:
    def __init__(self):
        self.scale_patterns = {
            'large': r'\b(large|huge|massive|millions?|billions?|petabytes?|terabytes?)\b',
            'medium': r'\b(medium|thousands?|gigabytes?|moderate)\b',
            'small': r'\b(small|tiny|few|hundreds?|megabytes?)\b',
            'distributed': r'\b(distributed|cluster|multi-node|sharded)\b',
            'high-traffic': r'\b(high.traffic|many.requests|heavy.load)\b'
        }
        
        self.performance_patterns = {
            'low-latency': r'\b(low.latency|fast|quick|real.time|milliseconds?)\b',
            'high-throughput': r'\b(high.throughput|many.requests|bulk|batch)\b',
            'memory-efficient': r'\b(memory.efficient|low.memory|space.efficient)\b',
            'cpu-efficient': r'\b(cpu.efficient|computationally.efficient|low.overhead)\b'
        }
        
        self.quality_patterns = {
            QualityAttribute.PERFORMANCE: r'\b(performance|fast|efficient|speed|throughput|latency)\b',
            QualityAttribute.SCALABILITY: r'\b(scal\w+|grow|expand|elastic)\b',
            QualityAttribute.AVAILABILITY: r'\b(availab\w+|uptime|reliable|fault.tolerant)\b',
            QualityAttribute.SECURITY: r'\b(secur\w+|auth\w+|encrypt\w+|protect)\b',
            QualityAttribute.MAINTAINABILITY: r'\b(maintain\w+|readable|clean|simple)\b',
            QualityAttribute.TESTABILITY: r'\b(test\w+|verify|validate)\b'
        }
        
        self.domain_patterns = {
            'web': r'\b(web|http|rest|api|browser|frontend|backend)\b',
            'database': r'\b(database|sql|nosql|query|storage|persist)\b',
            'distributed': r'\b(distributed|microservice|cluster|node)\b',
            'real-time': r'\b(real.time|streaming|event|message)\b',
            'security': r'\b(security|crypto|auth|access.control)\b',
            'networking': r'\b(network|routing|protocol|packet)\b'
        }
    
    def extract(self, query: str) -> ExtractedContext:
        query_lower = query.lower()
        
        problem_type = self._identify_problem_type(query_lower)
        domains = self._extract_domains(query_lower)
        scale = self._extract_scale(query_lower)
        performance = self._extract_performance(query_lower)
        quality_attrs = self._extract_quality_attributes(query_lower)
        constraints = self._extract_constraints(query_lower)
        keywords = self._extract_keywords(query_lower)
        
        return ExtractedContext(
            problem_type=problem_type,
            domain=domains,
            scale_indicators=scale,
            performance_requirements=performance,
            quality_attributes=quality_attrs,
            constraints=constraints,
            keywords=keywords,
            raw_query=query
        )
    
    def _identify_problem_type(self, query: str) -> str:
        if re.search(r'\b(algorithm|compute|calculate|process)\b', query):
            return 'algorithm'
        elif re.search(r'\b(pattern|design|structure|organize)\b', query):
            return 'design_pattern'
        elif re.search(r'\b(architect|system|tactic|quality)\b', query):
            return 'architectural_tactic'
        else:
            return 'general'
    
    def _extract_domains(self, query: str) -> List[str]:
        domains = []
        for domain, pattern in self.domain_patterns.items():
            if re.search(pattern, query):
                domains.append(domain)
        return domains
    
    def _extract_scale(self, query: str) -> List[str]:
        scale = []
        for scale_type, pattern in self.scale_patterns.items():
            if re.search(pattern, query):
                scale.append(scale_type)
        return scale
    
    def _extract_performance(self, query: str) -> List[str]:
        performance = []
        for perf_type, pattern in self.performance_patterns.items():
            if re.search(pattern, query):
                performance.append(perf_type)
        return performance
    
    def _extract_quality_attributes(self, query: str) -> Set[QualityAttribute]:
        attributes = set()
        for attr, pattern in self.quality_patterns.items():
            if re.search(pattern, query):
                attributes.add(attr)
        return attributes
    
    def _extract_constraints(self, query: str) -> List[str]:
        constraints = []
        
        if re.search(r'\b(limited.memory|memory.constrained|low.memory)\b', query):
            constraints.append('limited_memory')
        if re.search(r'\b(no.locking|lock.free|non.blocking)\b', query):
            constraints.append('no_locking')
        if re.search(r'\b(eventual.consistency|eventually.consistent)\b', query):
            constraints.append('eventual_consistency')
        if re.search(r'\b(strong.consistency|strictly.consistent)\b', query):
            constraints.append('strong_consistency')
        if re.search(r'\b(stateless|no.state)\b', query):
            constraints.append('stateless')
        
        return constraints
    
    def _extract_keywords(self, query: str) -> Set[str]:
        stop_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for',
                     'of', 'with', 'by', 'from', 'as', 'is', 'was', 'are', 'were', 'be',
                     'been', 'being', 'have', 'has', 'had', 'do', 'does', 'did', 'will',
                     'would', 'should', 'could', 'may', 'might', 'must', 'can', 'i', 'you',
                     'he', 'she', 'it', 'we', 'they', 'what', 'which', 'who', 'when',
                     'where', 'why', 'how', 'need', 'want', 'looking', 'find'}
        
        words = re.findall(r'\b\w+\b', query)
        keywords = {w for w in words if w not in stop_words and len(w) > 2}
        
        return keywords

class PromptConstructor:
    def __init__(self, knowledge_base: KnowledgeBase):
        self.knowledge_base = knowledge_base
    
    def construct_recommendation_prompt(self, query: str, 
                                      context: ExtractedContext,
                                      candidates: Dict[str, List]) -> str:
        prompt_parts = []
        
        prompt_parts.append("You are an expert software architect and algorithm specialist.")
        prompt_parts.append("Your task is to recommend appropriate algorithms, design patterns, and architectural tactics based on the user's problem description and context.")
        prompt_parts.append("")
        
        prompt_parts.append("USER QUERY:")
        prompt_parts.append(query)
        prompt_parts.append("")
        
        prompt_parts.append("EXTRACTED CONTEXT:")
        prompt_parts.append(f"Problem Type: {context.problem_type}")
        prompt_parts.append(f"Domains: {', '.join(context.domain) if context.domain else 'Not specified'}")
        prompt_parts.append(f"Scale: {', '.join(context.scale_indicators) if context.scale_indicators else 'Not specified'}")
        prompt_parts.append(f"Performance Requirements: {', '.join(context.performance_requirements) if context.performance_requirements else 'Not specified'}")
        prompt_parts.append(f"Quality Attributes: {', '.join([qa.value for qa in context.quality_attributes]) if context.quality_attributes else 'Not specified'}")
        prompt_parts.append(f"Constraints: {', '.join(context.constraints) if context.constraints else 'None identified'}")
        prompt_parts.append("")
        
        if candidates['algorithms']:
            prompt_parts.append("CANDIDATE ALGORITHMS:")
            for algo in candidates['algorithms'][:5]:
                prompt_parts.append(f"- {algo.name} ({algo.category}): {algo.description}")
                prompt_parts.append(f"  Complexity: {algo.complexity_analysis}")
                prompt_parts.append(f"  Best for: {', '.join(algo.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        if candidates['design_patterns']:
            prompt_parts.append("CANDIDATE DESIGN PATTERNS:")
            for pattern in candidates['design_patterns'][:5]:
                prompt_parts.append(f"- {pattern.name} ({pattern.category}): {pattern.intent}")
                prompt_parts.append(f"  Problem: {pattern.problem}")
                prompt_parts.append(f"  Best for: {', '.join(pattern.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        if candidates['architectural_tactics']:
            prompt_parts.append("CANDIDATE ARCHITECTURAL TACTICS:")
            for tactic in candidates['architectural_tactics'][:5]:
                prompt_parts.append(f"- {tactic.name}: {tactic.description}")
                prompt_parts.append(f"  Quality Attributes: {', '.join([qa.value for qa in tactic.quality_attributes])}")
                prompt_parts.append(f"  Best for: {', '.join(tactic.applicability.preferred_scenarios[:3])}")
                prompt_parts.append("")
        
        prompt_parts.append("INSTRUCTIONS:")
        prompt_parts.append("1. Analyze the user's query and extracted context carefully")
        prompt_parts.append("2. Consider the candidate solutions provided and your broader knowledge")
        prompt_parts.append("3. Recommend the most appropriate solutions that fit the specific context")
        prompt_parts.append("4. For each recommendation, explain:")
        prompt_parts.append("   - Why it fits the context")
        prompt_parts.append("   - What trade-offs are involved")
        prompt_parts.append("   - How it should be implemented or applied")
        prompt_parts.append("   - What alternatives exist and when they might be better")
        prompt_parts.append("5. Prioritize recommendations by relevance to the context")
        prompt_parts.append("6. If the context is ambiguous, provide recommendations for different interpretations")
        prompt_parts.append("7. Include practical implementation guidance")
        prompt_parts.append("")
        
        prompt_parts.append("FORMAT YOUR RESPONSE AS:")
        prompt_parts.append("SUMMARY: Brief overview of the problem and recommended approach")
        prompt_parts.append("")
        prompt_parts.append("PRIMARY RECOMMENDATIONS:")
        prompt_parts.append("For each recommendation:")
        prompt_parts.append("- Name and category")
        prompt_parts.append("- Why it fits this context")
        prompt_parts.append("- Key trade-offs")
        prompt_parts.append("- Implementation guidance")
        prompt_parts.append("")
        prompt_parts.append("ALTERNATIVE APPROACHES:")
        prompt_parts.append("Other solutions to consider with their contexts")
        prompt_parts.append("")
        prompt_parts.append("IMPLEMENTATION CONSIDERATIONS:")
        prompt_parts.append("Practical advice for applying the recommendations")
        
        return "\n".join(prompt_parts)

class RecommendationEngine:
    def __init__(self, llm: LLMInterface, knowledge_base: KnowledgeBase):
        self.llm = llm
        self.knowledge_base = knowledge_base
        self.context_extractor = ContextExtractor()
        self.prompt_constructor = PromptConstructor(knowledge_base)
    
    def get_recommendations(self, query: str) -> Dict[str, any]:
        try:
            context = self.context_extractor.extract(query)
            
            candidates = self._find_candidates(context)
            
            prompt = self.prompt_constructor.construct_recommendation_prompt(
                query, context, candidates
            )
            
            llm_response = self.llm.generate(
                prompt,
                max_tokens=3000,
                temperature=0.7,
                top_p=0.9
            )
            
            report = self._format_report(query, context, candidates, llm_response)
            
            return {
                'success': True,
                'query': query,
                'context': context,
                'candidates': candidates,
                'llm_response': llm_response,
                'report': report
            }
            
        except Exception as e:
            return {
                'success': False,
                'error': str(e),
                'query': query
            }
    
    def _find_candidates(self, context: ExtractedContext) -> Dict[str, List]:
        all_candidates = {
            'algorithms': [],
            'design_patterns': [],
            'architectural_tactics': []
        }
        
        for keyword in context.keywords:
            results = self.knowledge_base.search(keyword)
            all_candidates['algorithms'].extend(results['algorithms'])
            all_candidates['design_patterns'].extend(results['design_patterns'])
            all_candidates['architectural_tactics'].extend(results['architectural_tactics'])
        
        all_candidates['algorithms'] = list(set(all_candidates['algorithms']))
        all_candidates['design_patterns'] = list(set(all_candidates['design_patterns']))
        all_candidates['architectural_tactics'] = list(set(all_candidates['architectural_tactics']))
        
        scored_candidates = {
            'algorithms': self._score_and_sort(all_candidates['algorithms'], context),
            'design_patterns': self._score_and_sort(all_candidates['design_patterns'], context),
            'architectural_tactics': self._score_and_sort(all_candidates['architectural_tactics'], context)
        }
        
        return scored_candidates
    
    def _score_and_sort(self, items: List, context: ExtractedContext) -> List:
        scored_items = []
        for item in items:
            score = self._calculate_relevance_score(item, context)
            scored_items.append((score, item))
        
        scored_items.sort(reverse=True, key=lambda x: x[0])
        return [item for score, item in scored_items]
    
    def _calculate_relevance_score(self, item, context: ExtractedContext) -> float:
        score = 0.0
        
        if hasattr(item, 'applicability'):
            app = item.applicability
            
            for domain in context.domain:
                if any(domain in pd.lower() for pd in app.problem_domain):
                    score += 2.0
            
            for qa in context.quality_attributes:
                if hasattr(item, 'quality_attributes'):
                    if qa in item.quality_attributes:
                        score += 1.5
            
            for keyword in context.keywords:
                if keyword in item.name.lower():
                    score += 3.0
                if keyword in item.description.lower():
                    score += 1.0
                if hasattr(item, 'tags') and keyword in item.tags:
                    score += 1.5
        
        return score
    
    def _format_report(self, query: str, context: ExtractedContext,
                      candidates: Dict[str, List], llm_response: str) -> str:
        report_lines = []
        
        report_lines.append("=" * 80)
        report_lines.append("ALGORITHM, DESIGN PATTERN, AND ARCHITECTURAL TACTIC FINDER")
        report_lines.append("=" * 80)
        report_lines.append("")
        
        report_lines.append("QUERY:")
        report_lines.append(query)
        report_lines.append("")
        
        report_lines.append("CONTEXT ANALYSIS:")
        report_lines.append(f"Problem Type: {context.problem_type}")
        report_lines.append(f"Domains: {', '.join(context.domain) if context.domain else 'Not specified'}")
        report_lines.append(f"Scale: {', '.join(context.scale_indicators) if context.scale_indicators else 'Not specified'}")
        report_lines.append(f"Performance: {', '.join(context.performance_requirements) if context.performance_requirements else 'Not specified'}")
        report_lines.append(f"Quality Attributes: {', '.join([qa.value for qa in context.quality_attributes]) if context.quality_attributes else 'Not specified'}")
        report_lines.append(f"Constraints: {', '.join(context.constraints) if context.constraints else 'None'}")
        report_lines.append("")
        
        report_lines.append("=" * 80)
        report_lines.append("RECOMMENDATIONS")
        report_lines.append("=" * 80)
        report_lines.append("")
        report_lines.append(llm_response)
        report_lines.append("")
        
        report_lines.append("=" * 80)
        report_lines.append("KNOWLEDGE BASE MATCHES")
        report_lines.append("=" * 80)
        report_lines.append("")
        
        if candidates['algorithms']:
            report_lines.append("ALGORITHMS:")
            for algo in candidates['algorithms'][:5]:
                report_lines.append(f"- {algo.name}")
            report_lines.append("")
        
        if candidates['design_patterns']:
            report_lines.append("DESIGN PATTERNS:")
            for pattern in candidates['design_patterns'][:5]:
                report_lines.append(f"- {pattern.name}")
            report_lines.append("")
        
        if candidates['architectural_tactics']:
            report_lines.append("ARCHITECTURAL TACTICS:")
            for tactic in candidates['architectural_tactics'][:5]:
                report_lines.append(f"- {tactic.name}")
            report_lines.append("")
        
        return "\n".join(report_lines)

class AlgorithmPatternFinder:
    def __init__(self, llm: LLMInterface):
        self.knowledge_base = KnowledgeBase()
        self.engine = RecommendationEngine(llm, self.knowledge_base)
    
    def find_solutions(self, query: str) -> str:
        result = self.engine.get_recommendations(query)
        
        if result['success']:
            return result['report']
        else:
            return f"Error processing query: {result['error']}"

def create_local_finder(model_name: str = "microsoft/phi-2") -> AlgorithmPatternFinder:
    hardware = HardwareDetector()
    print(f"Detected hardware: {hardware.preferred_device['type']} - {hardware.preferred_device['name']}")
    llm = LocalLLM(model_name, hardware)
    return AlgorithmPatternFinder(llm)

def create_remote_finder(api_endpoint: str, api_key: str, 
                        model_name: str, provider: str = 'openai') -> AlgorithmPatternFinder:
    llm = RemoteLLM(api_endpoint, api_key, model_name, provider)
    return AlgorithmPatternFinder(llm)

if __name__ == "__main__":
    print("Algorithm, Design Pattern, and Architectural Tactic Finder")
    print("=" * 80)
    print("")
    
    use_remote = input("Use remote LLM? (y/n): ").lower() == 'y'
    
    if use_remote:
        api_endpoint = input("API endpoint: ")
        api_key = input("API key: ")
        model_name = input("Model name: ")
        provider = input("Provider (openai/anthropic/generic): ")
        finder = create_remote_finder(api_endpoint, api_key, model_name, provider)
    else:
        model_name = input("Model name (default: microsoft/phi-2): ") or "microsoft/phi-2"
        finder = create_local_finder(model_name)
    
    print("")
    print("Finder initialized. Enter your queries or 'quit' to exit.")
    print("")
    
    while True:
        query = input("Query: ")
        if query.lower() == 'quit':
            break
        
        print("")
        print("Processing query...")
        print("")
        
        report = finder.find_solutions(query)
        print(report)
        print("")

This complete running example integrates all components into a production-ready system. The code is fully functional and supports both local and remote LLMs across multiple GPU architectures. It includes comprehensive error handling, hardware detection, context extraction, knowledge base searching, prompt construction, and report formatting.

The system can be used by instantiating either a local or remote finder, then calling the find_solutions method with a natural language query. The system extracts context, searches the knowledge base, constructs an effective prompt, invokes the LLM, and returns a comprehensive report with recommendations tailored to the specific problem context.

The knowledge base includes diverse examples spanning graph algorithms, probabilistic data structures, distributed systems algorithms, cryptography, caching algorithms, resilience patterns, architectural patterns, distributed transaction patterns, data access patterns, and various architectural tactics. This demonstrates the breadth of knowledge the system can leverage to provide contextually relevant recommendations.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Thursday, June 18, 2026

BUILDING AN LLM-BASED ALGORITHM, DESIGN TACTICS AND DESIGN PATTERNS FINDER

INTENSIVE ANALYSIS AND PLANNING

INTRODUCTION TO THE PROBLEM DOMAIN

SYSTEM ARCHITECTURE OVERVIEW

HARDWARE ABSTRACTION AND GPU SUPPORT

LLM INTEGRATION LAYER

KNOWLEDGE BASE STRUCTURE

CONTEXT EXTRACTION AND ANALYSIS

PROMPT ENGINEERING AND CONSTRUCTION

RECOMMENDATION ENGINE

COMPLETE RUNNING EXAMPLE

No comments:

About Me