Sunday, March 15, 2026

BUILDING AN LLM-POWERED ANTLR PARSER GENERATOR




Introduction


The development of domain-specific languages and parsers has traditionally required deep expertise in compiler construction and formal language theory. This article presents a comprehensive approach to building an intelligent chatbot system that leverages Large Language Models to automatically generate ANTLR v4 parsers and lexers based on user specifications. The system can process both concrete language requests and Backus-Naur Form grammar definitions, utilizing available GPU resources for optimal performance.


The proposed architecture combines the power of modern LLMs with established parsing technologies to democratize parser generation. Users can simply describe their parsing needs in natural language or provide formal grammar specifications, and the system will automatically generate complete parser implementations with detailed guidance for refinement and deployment.


System Architecture Overview


The LLM-powered ANTLR generator consists of several interconnected components that work together to transform user requests into functional parsers. The core architecture follows clean architecture principles with clear separation of concerns and dependency inversion.


The primary components include the LLM Interface Layer, which handles communication with both local and remote language models, the Grammar Search Engine for discovering existing ANTLR grammars, the BNF Conversion Module for transforming Backus-Naur Form specifications into ANTLR syntax, the ANTLR Generation Engine for producing parser code, and the Result Summarization Component for providing user guidance.


The system leverages GPU acceleration through a unified GPU abstraction layer that supports NVIDIA CUDA, AMD ROCm, and Apple Metal Performance Shaders. This enables efficient processing of large language models while maintaining compatibility across different hardware platforms.


Core Component Design


The LLM Interface Layer serves as the central communication hub between user requests and the language model. This component abstracts the differences between local and remote LLM deployments, providing a consistent interface for natural language processing tasks.


    class LLMInterface:

        def __init__(self, model_config, gpu_config):

            self.model_config = model_config

            self.gpu_accelerator = GPUAccelerator(gpu_config)

            self.tokenizer = self._initialize_tokenizer()

            self.model = self._load_model()

        

        def _initialize_tokenizer(self):

            # Initialize tokenizer based on model configuration

            if self.model_config.model_type == "local":

                return AutoTokenizer.from_pretrained(self.model_config.model_path)

            else:

                return RemoteTokenizer(self.model_config.api_endpoint)

        

        def process_request(self, user_input, context):

            # Process user input and generate appropriate response

            tokens = self.tokenizer.encode(user_input)

            with self.gpu_accelerator.context():

                response = self.model.generate(tokens, context)

            return self.tokenizer.decode(response)


The LLM Interface Layer handles the complexity of model loading and GPU memory management. When processing requests, it ensures optimal utilization of available hardware resources while maintaining consistent response quality across different deployment scenarios.


The Grammar Search Engine implements intelligent web search capabilities specifically designed for discovering ANTLR grammar files. This component uses sophisticated search strategies to locate high-quality grammar definitions for requested programming languages.


   class GrammarSearchEngine:

        def __init__(self, search_config):

            self.search_providers = self._initialize_providers(search_config)

            self.grammar_validator = ANTLRGrammarValidator()

            self.cache = GrammarCache()

        

        def search_grammar(self, language_name):

            # Search for existing ANTLR grammars for the specified language

            cached_result = self.cache.get(language_name)

            if cached_result:

                return cached_result

            

            search_terms = self._generate_search_terms(language_name)

            results = []

            

            for provider in self.search_providers:

                provider_results = provider.search(search_terms)

                validated_results = self._validate_grammars(provider_results)

                results.extend(validated_results)

            

            best_grammar = self._rank_and_select(results)

            self.cache.store(language_name, best_grammar)

            return best_grammar


The search engine employs multiple search strategies including GitHub repository searches, academic paper repositories, and specialized ANTLR grammar collections. Each discovered grammar undergoes validation to ensure syntactic correctness and completeness before being considered for use.


The BNF Conversion Module represents one of the most sophisticated components in the system. It transforms Backus-Naur Form specifications into valid ANTLR v4 grammar syntax while preserving the semantic meaning of the original specification.


    class BNFConverter:

        def __init__(self):

            self.bnf_parser = BNFParser()

            self.antlr_generator = ANTLRGrammarGenerator()

            self.semantic_analyzer = SemanticAnalyzer()

        

        def convert_bnf_to_antlr(self, bnf_specification):

            # Parse BNF specification and convert to ANTLR grammar

            bnf_ast = self.bnf_parser.parse(bnf_specification)

            semantic_model = self.semantic_analyzer.analyze(bnf_ast)

            antlr_grammar = self.antlr_generator.generate(semantic_model)

            return antlr_grammar

        

        def _handle_bnf_constructs(self, bnf_node):

            # Convert specific BNF constructs to ANTLR equivalents

            if bnf_node.type == "ALTERNATIVE":

                return self._convert_alternatives(bnf_node)

            elif bnf_node.type == "SEQUENCE":

                return self._convert_sequence(bnf_node)

            elif bnf_node.type == "OPTIONAL":

                return self._convert_optional(bnf_node)

            elif bnf_node.type == "REPETITION":

                return self._convert_repetition(bnf_node)


The BNF Converter handles the nuanced differences between BNF notation and ANTLR syntax. It recognizes common BNF patterns and transforms them into idiomatic ANTLR constructs while maintaining the original language semantics.


GPU Acceleration Framework


The GPU acceleration framework provides a unified interface for leveraging different GPU architectures. This abstraction layer enables the system to automatically detect and utilize available GPU resources regardless of the underlying hardware platform.


    class GPUAccelerator:

        def __init__(self, gpu_config):

            self.gpu_type = self._detect_gpu_type()

            self.device_manager = self._create_device_manager()

            self.memory_manager = GPUMemoryManager(self.gpu_type)

        

        def _detect_gpu_type(self):

            # Automatically detect available GPU hardware

            if torch.cuda.is_available():

                return "NVIDIA_CUDA"

            elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():

                return "APPLE_MPS"

            elif self._check_rocm_availability():

                return "AMD_ROCM"

            else:

                return "CPU_FALLBACK"

        

        def context(self):

            # Provide GPU context for model operations

            return self.device_manager.get_context()


The GPU acceleration framework automatically optimizes memory allocation and computation scheduling based on the detected hardware. For NVIDIA GPUs, it utilizes CUDA cores and Tensor cores when available. For AMD hardware, it leverages ROCm for compute acceleration. Apple Silicon devices benefit from Metal Performance Shaders integration for efficient neural network operations.


The framework also implements intelligent memory management to handle large language models efficiently. It employs techniques such as gradient checkpointing, model sharding, and dynamic batching to maximize throughput while preventing out-of-memory errors.


ANTLR Generation Pipeline


The ANTLR Generation Pipeline orchestrates the entire process from user input to final parser generation. This component coordinates between all other modules to ensure smooth execution and proper error handling throughout the generation process.


    class ANTLRGenerationPipeline:

        def __init__(self, config):

            self.llm_interface = LLMInterface(config.llm_config, config.gpu_config)

            self.grammar_search = GrammarSearchEngine(config.search_config)

            self.bnf_converter = BNFConverter()

            self.antlr_compiler = ANTLRCompiler()

            self.result_summarizer = ResultSummarizer()

        

        def generate_parser(self, user_request):

            # Main pipeline for parser generation

            request_analysis = self.llm_interface.analyze_request(user_request)

            

            if request_analysis.input_type == "LANGUAGE_NAME":

                grammar = self.grammar_search.search_grammar(request_analysis.language)

            elif request_analysis.input_type == "BNF_SPECIFICATION":

                grammar = self.bnf_converter.convert_bnf_to_antlr(request_analysis.bnf)

            else:

                raise UnsupportedRequestTypeError("Unknown request type")

            

            parser_code = self.antlr_compiler.compile_grammar(

                grammar, 

                request_analysis.target_language

            )

            

            summary = self.result_summarizer.create_summary(

                grammar, 

                parser_code, 

                request_analysis

            )

            

            return GenerationResult(grammar, parser_code, summary)


The pipeline implements comprehensive error handling and recovery mechanisms. When grammar search fails, it can fall back to LLM-generated grammars. If BNF conversion encounters ambiguities, it requests clarification from the user through natural language interaction.


The ANTLR Compiler component wraps the standard ANTLR tool chain and provides additional functionality for multi-language code generation. It supports Java, Python, C#, JavaScript, Go, and C++ target languages with appropriate runtime library integration.


    class ANTLRCompiler:

        def __init__(self):

            self.antlr_tool = ANTLRTool()

            self.code_generators = self._initialize_generators()

        

        def compile_grammar(self, grammar, target_language):

            # Compile ANTLR grammar to target language

            grammar_file = self._write_grammar_file(grammar)

            

            compilation_result = self.antlr_tool.compile(

                grammar_file, 

                target_language

            )

            

            if compilation_result.has_errors():

                return self._handle_compilation_errors(compilation_result)

            

            generated_code = self._collect_generated_files(compilation_result)

            return self._package_result(generated_code, target_language)


The compiler automatically handles ANTLR tool invocation, manages temporary files, and collects all generated artifacts. It also performs post-processing to integrate runtime libraries and generate example usage code.


Natural Language Processing Integration


The natural language processing capabilities enable the system to understand complex user requests and provide intelligent responses. The LLM integration goes beyond simple text generation to include semantic understanding of grammar specifications and parser requirements.


    class RequestAnalyzer:

        def __init__(self, llm_interface):

            self.llm_interface = llm_interface

            self.intent_classifier = IntentClassifier()

            self.entity_extractor = EntityExtractor()

        

        def analyze_request(self, user_input):

            # Analyze user request to determine processing strategy

            intent = self.intent_classifier.classify(user_input)

            entities = self.entity_extractor.extract(user_input)

            

            if intent == "LANGUAGE_PARSER_REQUEST":

                return LanguageRequest(

                    language=entities.get("language_name"),

                    target_language=entities.get("target_language", "Java"),

                    features=entities.get("language_features", [])

                )

            elif intent == "BNF_CONVERSION_REQUEST":

                return BNFRequest(

                    bnf_specification=entities.get("bnf_text"),

                    target_language=entities.get("target_language", "Java"),

                    grammar_name=entities.get("grammar_name", "CustomGrammar")

                )


The request analyzer employs sophisticated natural language understanding techniques to extract structured information from user queries. It can handle ambiguous requests by asking clarifying questions and maintains conversation context across multiple interactions.


The system also implements advanced prompt engineering techniques to ensure consistent and accurate responses from the underlying language model. These prompts are carefully crafted to elicit specific types of information while maintaining natural conversation flow.


Result Summarization and User Guidance


The Result Summarization Component generates comprehensive reports that help users understand what was created and how to proceed with their generated parsers. This component leverages the LLM's natural language generation capabilities to produce clear, actionable guidance.


    class ResultSummarizer:

        def __init__(self, llm_interface):

            self.llm_interface = llm_interface

            self.template_engine = SummaryTemplateEngine()

        

        def create_summary(self, grammar, parser_code, request_analysis):

            # Generate comprehensive summary of generation results

            grammar_analysis = self._analyze_grammar_structure(grammar)

            code_analysis = self._analyze_generated_code(parser_code)

            

            summary_context = {

                "grammar_structure": grammar_analysis,

                "code_structure": code_analysis,

                "target_language": request_analysis.target_language,

                "original_request": request_analysis.original_text

            }

            

            summary_text = self.llm_interface.generate_summary(summary_context)

            refinement_suggestions = self._generate_refinement_suggestions(

                grammar_analysis, 

                code_analysis

            )

            

            return ParserSummary(summary_text, refinement_suggestions)


The summarization component analyzes the generated grammar and code to identify potential areas for improvement. It provides specific suggestions for enhancing parser performance, adding error handling, and extending functionality.


The component also generates example usage code and integration instructions tailored to the target programming language. This includes dependency management, build configuration, and testing strategies appropriate for the generated parser.


Error Handling and Recovery Strategies


Robust error handling is essential for a production-ready parser generation system. The architecture implements multiple layers of error detection and recovery to ensure graceful handling of various failure scenarios.


    class ErrorHandler:

        def __init__(self):

            self.error_strategies = {

                "GRAMMAR_SEARCH_FAILED": self._handle_search_failure,

                "BNF_CONVERSION_ERROR": self._handle_conversion_error,

                "ANTLR_COMPILATION_ERROR": self._handle_compilation_error,

                "GPU_MEMORY_ERROR": self._handle_gpu_error

            }

        

        def handle_error(self, error_type, error_context):

            # Route errors to appropriate handling strategies

            if error_type in self.error_strategies:

                return self.error_strategies[error_type](error_context)

            else:

                return self._handle_unknown_error(error_type, error_context)

        

        def _handle_search_failure(self, context):

            # Fallback to LLM-generated grammar when search fails

            fallback_grammar = self._generate_grammar_from_llm(context.language)

            return RecoveryResult("GRAMMAR_GENERATED", fallback_grammar)


The error handling system implements progressive fallback strategies. When automated grammar search fails, the system can generate grammars using the LLM's knowledge of programming languages. If BNF conversion encounters ambiguities, it engages in clarifying dialogue with the user.


For GPU-related errors, the system automatically falls back to CPU processing while notifying the user of reduced performance. Memory management errors trigger automatic model optimization and batch size adjustment.


Performance Optimization Techniques


The system employs various optimization techniques to ensure responsive performance even when processing complex grammar specifications or large language models. These optimizations span multiple system layers from GPU utilization to caching strategies.


   class PerformanceOptimizer:

        def __init__(self, system_config):

            self.gpu_optimizer = GPUOptimizer()

            self.cache_manager = CacheManager()

            self.model_optimizer = ModelOptimizer()

        

        def optimize_inference(self, model, input_data):

            # Apply various optimization techniques for model inference

            optimized_model = self.model_optimizer.optimize(model)

            

            if self.gpu_optimizer.supports_mixed_precision():

                optimized_model = self.gpu_optimizer.enable_mixed_precision(optimized_model)

            

            batch_size = self.gpu_optimizer.calculate_optimal_batch_size(

                optimized_model, 

                input_data

            )

            

            return self._run_optimized_inference(optimized_model, input_data, batch_size)


The performance optimization framework implements dynamic batching to maximize GPU utilization, mixed-precision training for supported hardware, and intelligent caching of frequently requested grammars and model outputs.


The system also employs model quantization techniques when appropriate to reduce memory usage while maintaining output quality. For local model deployments, it supports model sharding across multiple GPUs when available.


Security and Privacy Considerations


Security and privacy are paramount when building systems that process user code and grammar specifications. The architecture implements multiple security layers to protect user data and prevent malicious code execution.


    class SecurityManager:

        def __init__(self):

            self.input_sanitizer = InputSanitizer()

            self.code_analyzer = CodeSecurityAnalyzer()

            self.sandbox_manager = SandboxManager()

        

        def validate_user_input(self, user_input):

            # Sanitize and validate user input for security threats

            sanitized_input = self.input_sanitizer.sanitize(user_input)

            

            if self.input_sanitizer.contains_malicious_patterns(sanitized_input):

                raise SecurityViolationError("Potentially malicious input detected")

            

            return sanitized_input

        

        def execute_antlr_compilation(self, grammar_file):

            # Execute ANTLR compilation in sandboxed environment

            with self.sandbox_manager.create_sandbox() as sandbox:

                return sandbox.execute_antlr(grammar_file)


The security framework implements input sanitization to prevent injection attacks, sandboxed execution environments for ANTLR compilation, and comprehensive logging for security auditing. All generated code undergoes static analysis to identify potential security vulnerabilities.


For remote LLM deployments, the system implements secure communication protocols and ensures that sensitive grammar specifications are not inadvertently stored or logged by external services.


Testing and Quality Assurance


Comprehensive testing ensures the reliability and correctness of generated parsers. The system implements automated testing frameworks that validate both the generation process and the resulting parser implementations.


    class QualityAssuranceFramework:

        def __init__(self):

            self.grammar_tester = GrammarTester()

            self.parser_validator = ParserValidator()

            self.performance_profiler = PerformanceProfiler()

        

        def validate_generated_parser(self, grammar, parser_code, test_cases):

            # Comprehensive validation of generated parser

            grammar_validation = self.grammar_tester.validate_grammar(grammar)

            

            if not grammar_validation.is_valid():

                return ValidationResult(False, grammar_validation.errors)

            

            parser_validation = self.parser_validator.validate_parser(

                parser_code, 

                test_cases

            )

            

            performance_metrics = self.performance_profiler.profile_parser(

                parser_code, 

                test_cases

            )

            

            return ValidationResult(

                parser_validation.is_valid(),

                parser_validation.errors,

                performance_metrics

            )


The quality assurance framework automatically generates test cases based on grammar specifications, validates parser correctness against known language samples, and profiles performance characteristics to identify potential bottlenecks.


The testing system also includes regression testing capabilities to ensure that system updates do not break existing functionality. It maintains a comprehensive test suite covering various programming languages and grammar patterns.


Deployment and Scaling Considerations


The system architecture supports various deployment scenarios from single-user desktop applications to large-scale cloud services. The modular design enables flexible scaling strategies based on usage patterns and resource requirements.


    class DeploymentManager:
        def __init__(self, deployment_config):
            self.config = deployment_config
            self.resource_manager = ResourceManager()
            self.load_balancer = LoadBalancer()
        
        def deploy_system(self):
            # Deploy system components based on configuration
            if self.config.deployment_type == "SINGLE_USER":
                return self._deploy_standalone()
            elif self.config.deployment_type == "MULTI_USER":
                return self._deploy_distributed()
            elif self.config.deployment_type == "CLOUD_SERVICE":
                return self._deploy_cloud_native()
        
        def _deploy_distributed(self):
            # Deploy distributed system with load balancing
            llm_cluster = self.resource_manager.create_llm_cluster()
            grammar_service = self.resource_manager.create_grammar_service()
            compilation_service = self.resource_manager.create_compilation_service()
            
            self.load_balancer.configure_routing(
                llm_cluster, 
                grammar_service, 
                compilation_service
            )


The deployment framework supports horizontal scaling of individual components based on demand. LLM inference can be distributed across multiple GPU nodes, while grammar search and compilation services can scale independently.


For cloud deployments, the system integrates with container orchestration platforms and implements auto-scaling policies based on request volume and resource utilization metrics.


Future Enhancement Opportunities


The current architecture provides a solid foundation for future enhancements and feature additions. Several areas present opportunities for significant capability improvements and user experience enhancements.


Advanced grammar optimization techniques could automatically refine generated grammars for better performance and maintainability. Machine learning models could learn from user feedback to improve grammar quality over time.


Integration with version control systems would enable collaborative grammar development and change tracking. Advanced IDE plugins could provide real-time grammar assistance and parser debugging capabilities.


Multi-modal input support could allow users to provide grammar specifications through diagrams, flowcharts, or other visual representations. This would make the system accessible to users who prefer visual specification methods.


Running Example Implementation


The following complete implementation demonstrates a calculator language parser generator that showcases all the key concepts discussed in this article. This example processes a user request for a simple arithmetic expression parser and generates a complete ANTLR grammar with Java parser code.


If you want to see the source code of the full and general LLM Agent, see below! 


                             



"""

Complete LLM-Powered ANTLR Parser Generator

Calculator Language Example Implementation

"""


import torch

import requests

import subprocess

import tempfile

import os

import json

from typing import Dict, List, Optional, Tuple

from dataclasses import dataclass

from abc import ABC, abstractmethod


# Configuration classes for system setup

@dataclass

class LLMConfig:

    model_type: str  # "local" or "remote"

    model_path: str

    api_endpoint: Optional[str] = None

    api_key: Optional[str] = None


@dataclass

class GPUConfig:

    enable_gpu: bool = True

    memory_limit: Optional[int] = None

    mixed_precision: bool = True


@dataclass

class SystemConfig:

    llm_config: LLMConfig

    gpu_config: GPUConfig

    antlr_jar_path: str

    temp_directory: str


# Core domain models

@dataclass

class UserRequest:

    original_text: str

    request_type: str  # "LANGUAGE" or "BNF"

    language_name: Optional[str] = None

    bnf_specification: Optional[str] = None

    target_language: str = "Java"


@dataclass

class GenerationResult:

    grammar_content: str

    parser_code: Dict[str, str]  # filename -> content mapping

    summary: str

    refinement_suggestions: List[str]


# GPU Acceleration Framework

class GPUAccelerator:

    def __init__(self, config: GPUConfig):

        self.config = config

        self.device = self._detect_and_configure_device()

    

    def _detect_and_configure_device(self):

        """Detect and configure the best available GPU device"""

        if not self.config.enable_gpu:

            return torch.device("cpu")

        

        if torch.cuda.is_available():

            device = torch.device("cuda")

            if self.config.memory_limit:

                torch.cuda.set_per_process_memory_fraction(

                    self.config.memory_limit / torch.cuda.get_device_properties(0).total_memory

                )

            return device

        elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():

            return torch.device("mps")

        else:

            print("No GPU available, falling back to CPU")

            return torch.device("cpu")

    

    def get_device(self):

        """Get the configured device for tensor operations"""

        return self.device


# LLM Interface Implementation

class LLMInterface:

    def __init__(self, config: LLMConfig, gpu_accelerator: GPUAccelerator):

        self.config = config

        self.gpu_accelerator = gpu_accelerator

        self.device = gpu_accelerator.get_device()

    

    def analyze_request(self, user_input: str) -> UserRequest:

        """Analyze user input to determine request type and extract parameters"""

        # Simplified analysis for demonstration

        user_input_lower = user_input.lower()

        

        if "calculator" in user_input_lower or "arithmetic" in user_input_lower:

            return UserRequest(

                original_text=user_input,

                request_type="LANGUAGE",

                language_name="calculator",

                target_language="Java"

            )

        elif "bnf" in user_input_lower or "::=" in user_input:

            return UserRequest(

                original_text=user_input,

                request_type="BNF",

                bnf_specification=self._extract_bnf_from_input(user_input),

                target_language="Java"

            )

        else:

            # Default to language request

            return UserRequest(

                original_text=user_input,

                request_type="LANGUAGE",

                language_name=self._extract_language_name(user_input),

                target_language="Java"

            )

    

    def _extract_bnf_from_input(self, user_input: str) -> str:

        """Extract BNF specification from user input"""

        # Simple extraction - in production this would be more sophisticated

        lines = user_input.split('\n')

        bnf_lines = [line for line in lines if '::=' in line or line.strip().startswith('<')]

        return '\n'.join(bnf_lines)

    

    def _extract_language_name(self, user_input: str) -> str:

        """Extract language name from user input"""

        # Simple keyword extraction - in production this would use NLP

        common_languages = ["java", "python", "c++", "javascript", "calculator", "json", "xml"]

        user_input_lower = user_input.lower()

        

        for lang in common_languages:

            if lang in user_input_lower:

                return lang

        

        return "unknown"

    

    def generate_summary(self, generation_result: GenerationResult) -> str:

        """Generate a comprehensive summary of the generation process"""

        summary = f"""

ANTLR Parser Generation Summary

==============================


Generated Grammar: {len(generation_result.grammar_content)} characters

Target Language: Java

Generated Files: {len(generation_result.parser_code)} files


Grammar Structure Analysis:

- The grammar defines a complete parser for the requested language

- Lexer rules handle tokenization of input text

- Parser rules define the syntactic structure


Generated Files:

"""

        for filename in generation_result.parser_code.keys():

            summary += f"- {filename}\n"

        

        summary += """

Next Steps:

1. Compile the generated Java files with ANTLR runtime dependency

2. Create a main class to instantiate and use the parser

3. Add error handling and semantic actions as needed

4. Test with sample input files


Integration Instructions:

- Add ANTLR runtime JAR to your classpath

- Import the generated parser classes

- Create parser instances and call parse methods

"""

        return summary


# Grammar Search Engine

class GrammarSearchEngine:

    def __init__(self):

        self.known_grammars = {

            "calculator": self._get_calculator_grammar(),

            "json": self._get_json_grammar(),

            "arithmetic": self._get_calculator_grammar()

        }

    

    def search_grammar(self, language_name: str) -> Optional[str]:

        """Search for existing ANTLR grammar for the specified language"""

        if language_name.lower() in self.known_grammars:

            return self.known_grammars[language_name.lower()]

        

        # In production, this would search GitHub, ANTLR grammar repositories, etc.

        print(f"No known grammar found for {language_name}, generating basic template")

        return None

    

    def _get_calculator_grammar(self) -> str:

        """Return a complete calculator grammar"""

        return """

grammar Calculator;


// Parser rules

expr:   expr ('*'|'/') expr

    |   expr ('+'|'-') expr

    |   '(' expr ')'

    |   NUMBER

    ;


// Lexer rules

NUMBER: [0-9]+ ('.' [0-9]+)?;

WS: [ \\t\\r\\n]+ -> skip;

"""

    

    def _get_json_grammar(self) -> str:

        """Return a basic JSON grammar"""

        return """

grammar JSON;


json:   value;


value:  STRING

    |   NUMBER

    |   'true'

    |   'false'

    |   'null'

    |   object

    |   array

    ;


object: '{' pair (',' pair)* '}'

    |   '{' '}'

    ;


pair: STRING ':' value;


array:  '[' value (',' value)* ']'

    |   '[' ']'

    ;


STRING: '"' (~[\\r\\n"] | '\\\\' .)* '"';

NUMBER: '-'? [0-9]+ ('.' [0-9]+)?;

WS: [ \\t\\r\\n]+ -> skip;

"""


# BNF to ANTLR Converter

class BNFConverter:

    def __init__(self):

        self.conversion_rules = {

            "::=": ":",

            "<": "",

            ">": "",

            "|": "|"

        }

    

    def convert_bnf_to_antlr(self, bnf_specification: str) -> str:

        """Convert BNF specification to ANTLR grammar"""

        lines = bnf_specification.strip().split('\n')

        antlr_lines = []

        

        # Add grammar header

        antlr_lines.append("grammar GeneratedGrammar;")

        antlr_lines.append("")

        

        # Convert each BNF rule

        for line in lines:

            if '::=' in line:

                antlr_line = self._convert_bnf_rule(line)

                antlr_lines.append(antlr_line)

        

        # Add basic lexer rules

        antlr_lines.extend([

            "",

            "// Basic lexer rules",

            "ID: [a-zA-Z][a-zA-Z0-9]*;",

            "NUMBER: [0-9]+;",

            "WS: [ \\t\\r\\n]+ -> skip;"

        ])

        

        return '\n'.join(antlr_lines)

    

    def _convert_bnf_rule(self, bnf_rule: str) -> str:

        """Convert a single BNF rule to ANTLR syntax"""

        # Remove angle brackets and convert assignment operator

        converted = bnf_rule.replace('<', '').replace('>', '').replace('::=', ':')

        

        # Add semicolon if not present

        if not converted.strip().endswith(';'):

            converted += ';'

        

        return converted


# ANTLR Compiler Wrapper

class ANTLRCompiler:

    def __init__(self, antlr_jar_path: str, temp_directory: str):

        self.antlr_jar_path = antlr_jar_path

        self.temp_directory = temp_directory

    

    def compile_grammar(self, grammar_content: str, target_language: str = "Java") -> Dict[str, str]:

        """Compile ANTLR grammar and return generated code"""

        # Create temporary grammar file

        grammar_file = os.path.join(self.temp_directory, "Grammar.g4")

        

        with open(grammar_file, 'w') as f:

            f.write(grammar_content)

        

        # Run ANTLR compiler

        cmd = [

            "java", "-jar", self.antlr_jar_path,

            "-Dlanguage=" + target_language,

            "-o", self.temp_directory,

            grammar_file

        ]

        

        try:

            result = subprocess.run(cmd, capture_output=True, text=True, check=True)

            print("ANTLR compilation successful")

        except subprocess.CalledProcessError as e:

            print(f"ANTLR compilation failed: {e.stderr}")

            return {}

        

        # Collect generated files

        generated_files = {}

        for filename in os.listdir(self.temp_directory):

            if filename.endswith('.java') or filename.endswith('.py') or filename.endswith('.cpp'):

                filepath = os.path.join(self.temp_directory, filename)

                with open(filepath, 'r') as f:

                    generated_files[filename] = f.read()

        

        return generated_files


# Main Pipeline Orchestrator

class ANTLRGenerationPipeline:

    def __init__(self, config: SystemConfig):

        self.config = config

        self.gpu_accelerator = GPUAccelerator(config.gpu_config)

        self.llm_interface = LLMInterface(config.llm_config, self.gpu_accelerator)

        self.grammar_search = GrammarSearchEngine()

        self.bnf_converter = BNFConverter()

        self.antlr_compiler = ANTLRCompiler(config.antlr_jar_path, config.temp_directory)

    

    def generate_parser(self, user_input: str) -> GenerationResult:

        """Main pipeline for generating ANTLR parsers from user input"""

        print(f"Processing request: {user_input}")

        

        # Analyze user request

        request = self.llm_interface.analyze_request(user_input)

        print(f"Request type: {request.request_type}")

        

        # Generate or find grammar

        if request.request_type == "LANGUAGE":

            grammar_content = self.grammar_search.search_grammar(request.language_name)

            if not grammar_content:

                grammar_content = self._generate_default_grammar(request.language_name)

        elif request.request_type == "BNF":

            grammar_content = self.bnf_converter.convert_bnf_to_antlr(request.bnf_specification)

        else:

            raise ValueError(f"Unsupported request type: {request.request_type}")

        

        print("Grammar generated successfully")

        

        # Compile grammar to target language

        parser_code = self.antlr_compiler.compile_grammar(grammar_content, request.target_language)

        

        # Generate result summary

        result = GenerationResult(

            grammar_content=grammar_content,

            parser_code=parser_code,

            summary="",

            refinement_suggestions=[]

        )

        

        result.summary = self.llm_interface.generate_summary(result)

        result.refinement_suggestions = self._generate_refinement_suggestions(result)

        

        return result

    

    def _generate_default_grammar(self, language_name: str) -> str:

        """Generate a basic grammar template for unknown languages"""

        return f"""

grammar {language_name.capitalize()};


// Main entry point

start: statement+;


statement: expression ';';


expression: ID | NUMBER | STRING;


// Lexer rules

ID: [a-zA-Z][a-zA-Z0-9]*;

NUMBER: [0-9]+;

STRING: '"' (~[\\r\\n"] | '\\\\' .)* '"';

WS: [ \\t\\r\\n]+ -> skip;

"""

    

    def _generate_refinement_suggestions(self, result: GenerationResult) -> List[str]:

        """Generate suggestions for improving the generated parser"""

        suggestions = [

            "Add semantic actions to build an Abstract Syntax Tree (AST)",

            "Implement error recovery strategies for better error messages",

            "Add support for comments in the language specification",

            "Consider adding operator precedence rules for mathematical expressions",

            "Implement visitor or listener patterns for tree traversal",

            "Add comprehensive unit tests for the parser"

        ]

        return suggestions


# Example usage and demonstration

def main():

    """Demonstrate the complete ANTLR generation pipeline"""

    # Configuration setup

    config = SystemConfig(

        llm_config=LLMConfig(

            model_type="local",

            model_path="gpt2"  # Placeholder for actual model

        ),

        gpu_config=GPUConfig(

            enable_gpu=True,

            mixed_precision=True

        ),

        antlr_jar_path="/path/to/antlr-4.9.2-complete.jar",  # Update with actual path

        temp_directory=tempfile.mkdtemp()

    )

    

    # Create pipeline instance

    pipeline = ANTLRGenerationPipeline(config)

    

    # Example 1: Generate calculator parser

    print("Example 1: Calculator Language Parser")

    print("=" * 50)

    

    calculator_request = "Generate a parser for a simple calculator language that supports arithmetic expressions with numbers, parentheses, and basic operators"

    

    try:

        result = pipeline.generate_parser(calculator_request)

        

        print("Generated Grammar:")

        print("-" * 20)

        print(result.grammar_content)

        print()

        

        print("Generated Files:")

        print("-" * 20)

        for filename, content in result.parser_code.items():

            print(f"File: {filename}")

            print(f"Size: {len(content)} characters")

            print()

        

        print("Summary:")

        print("-" * 20)

        print(result.summary)

        print()

        

        print("Refinement Suggestions:")

        print("-" * 20)

        for i, suggestion in enumerate(result.refinement_suggestions, 1):

            print(f"{i}. {suggestion}")

        

    except Exception as e:

        print(f"Error generating parser: {e}")

    

    # Example 2: BNF conversion

    print("\n\nExample 2: BNF to ANTLR Conversion")

    print("=" * 50)

    

    bnf_request = """

    Convert this BNF to ANTLR:

    <expr> ::= <term> | <expr> '+' <term> | <expr> '-' <term>

    <term> ::= <factor> | <term> '*' <factor> | <term> '/' <factor>

    <factor> ::= <number> | '(' <expr> ')'

    """

    

    try:

        result = pipeline.generate_parser(bnf_request)

        

        print("Converted Grammar:")

        print("-" * 20)

        print(result.grammar_content)

        

    except Exception as e:

        print(f"Error converting BNF: {e}")

    

    # Cleanup

    import shutil

    shutil.rmtree(config.temp_directory)


if __name__ == "__main__":

    main()


This complete implementation demonstrates all the key concepts discussed in the article. The system can process natural language requests for parser generation, search for existing grammars, convert BNF specifications to ANTLR syntax, compile grammars using the ANTLR tool, and provide comprehensive summaries with refinement suggestions.


The example showcases GPU acceleration support, modular architecture with clean separation of concerns, comprehensive error handling, and extensible design patterns. The calculator language example provides a concrete demonstration of the entire pipeline from user request to generated parser code.


The implementation follows clean architecture principles with dependency injection, abstract interfaces, and clear separation between domain logic and infrastructure concerns. Each component can be independently tested and extended without affecting other parts of the system.


System Overview


This implementation provides a complete, general-purpose LLM Agent that processes arbitrary user prompts to generate ANTLR v4 parsers. The agent intelligently analyzes user requests, searches for existing grammars when appropriate, converts BNF specifications, generates custom grammars, and executes the complete ANTLR toolchain to produce working parsers.


The system is designed to handle any language specification or parsing requirement without being limited to predefined examples or templates.


COMPLETE IMPLEMENTATION



import os

import sys

import json

import subprocess

import tempfile

import shutil

import requests

import re

import logging

import asyncio

import aiohttp

from typing import Dict, List, Optional, Tuple, Union, Any

from dataclasses import dataclass, asdict, field

from abc import ABC, abstractmethod

from pathlib import Path

from datetime import datetime, timedelta

import hashlib

import yaml

from urllib.parse import quote_plus, urljoin

import xml.etree.ElementTree as ET


# Configure comprehensive logging

logging.basicConfig(

    level=logging.INFO,

    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',

    handlers=[

        logging.FileHandler('antlr_agent.log'),

        logging.StreamHandler(sys.stdout)

    ]

)

logger = logging.getLogger(__name__)


# Core Configuration Classes

@dataclass

class LLMConfig:

    """Configuration for Language Model integration"""

    provider: str  # "openai", "anthropic", "huggingface", "local", "ollama"

    model_name: str

    api_key: Optional[str] = None

    api_endpoint: Optional[str] = None

    max_tokens: int = 8000

    temperature: float = 0.3

    local_model_path: Optional[str] = None

    timeout: int = 120


@dataclass

class GPUConfig:

    """GPU acceleration configuration"""

    enable_gpu: bool = True

    gpu_type: str = "auto"  # "nvidia", "amd", "apple", "auto"

    memory_limit_gb: Optional[float] = None

    mixed_precision: bool = True

    device_id: int = 0


@dataclass

class SearchConfig:

    """Web search configuration for grammar discovery"""

    enable_web_search: bool = True

    github_token: Optional[str] = None

    search_engines: List[str] = field(default_factory=lambda: ["github", "antlr-grammars"])

    max_results_per_source: int = 5

    timeout: int = 30

    cache_duration_hours: int = 24


@dataclass

class ANTLRConfig:

    """ANTLR tool configuration"""

    jar_path: str

    version: str = "4.13.1"

    java_path: str = "java"

    target_languages: List[str] = field(default_factory=lambda: ["Java", "Python3", "Cpp", "CSharp", "JavaScript", "Go"])

    generate_visitor: bool = True

    generate_listener: bool = True


@dataclass

class SystemConfig:

    """Main system configuration"""

    llm_config: LLMConfig

    gpu_config: GPUConfig

    search_config: SearchConfig

    antlr_config: ANTLRConfig

    output_base_directory: str

    temp_directory: str

    enable_caching: bool = True

    max_concurrent_requests: int = 3


# Data Models

@dataclass

class ParsedRequest:

    """Structured representation of user request"""

    original_prompt: str

    intent: str  # "generate_parser", "convert_bnf", "find_grammar", "custom_language"

    language_name: Optional[str] = None

    language_description: Optional[str] = None

    bnf_specification: Optional[str] = None

    ebnf_specification: Optional[str] = None

    target_language: str = "Java"

    grammar_name: Optional[str] = None

    special_requirements: List[str] = field(default_factory=list)

    examples: List[str] = field(default_factory=list)

    confidence: float = 0.0


@dataclass

class GrammarSource:

    """Information about a grammar source"""

    content: str

    source_type: str  # "web", "built-in", "generated"

    url: Optional[str] = None

    quality_score: float = 0.0

    language: str = ""

    description: str = ""

    license: Optional[str] = None


@dataclass

class GenerationResult:

    """Complete result of parser generation process"""

    request: ParsedRequest

    grammar_file_path: str

    generated_files: Dict[str, str]  # relative_path -> absolute_path

    output_directory: str

    compilation_success: bool

    compilation_log: str

    antlr_version: str

    target_language: str

    generation_time: float

    summary: str

    next_steps: List[str]

    performance_notes: List[str]


# GPU Detection and Acceleration

class GPUManager:

    """Manages GPU detection and optimization across different vendors"""

    

    def __init__(self, config: GPUConfig):

        self.config = config

        self.device_info = self._detect_hardware()

        self.device = self._configure_device()

    

    def _detect_hardware(self) -> Dict[str, Any]:

        """Comprehensive GPU hardware detection"""

        info = {

            "type": "cpu",

            "name": "CPU",

            "memory_gb": 0,

            "compute_capability": None,

            "driver_version": None

        }

        

        if not self.config.enable_gpu:

            return info

        

        # NVIDIA CUDA Detection

        if self._check_nvidia():

            try:

                import torch

                if torch.cuda.is_available():

                    device_props = torch.cuda.get_device_properties(self.config.device_id)

                    info.update({

                        "type": "nvidia",

                        "name": device_props.name,

                        "memory_gb": device_props.total_memory / (1024**3),

                        "compute_capability": f"{device_props.major}.{device_props.minor}",

                        "driver_version": torch.version.cuda

                    })

                    logger.info(f"NVIDIA GPU detected: {info['name']}")

            except Exception as e:

                logger.warning(f"NVIDIA detection failed: {e}")

        

        # Apple Metal Detection

        elif self._check_apple_metal():

            try:

                import torch

                if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():

                    info.update({

                        "type": "apple",

                        "name": "Apple Silicon GPU",

                        "memory_gb": 16,  # Unified memory estimation

                        "compute_capability": "Metal Performance Shaders"

                    })

                    logger.info("Apple Silicon GPU with Metal detected")

            except Exception as e:

                logger.warning(f"Apple Metal detection failed: {e}")

        

        # AMD ROCm Detection

        elif self._check_amd_rocm():

            info.update({

                "type": "amd",

                "name": "AMD GPU",

                "memory_gb": 8,  # Default estimation

                "compute_capability": "ROCm"

            })

            logger.info("AMD GPU with ROCm detected")

        

        return info

    

    def _check_nvidia(self) -> bool:

        """Check for NVIDIA GPU availability"""

        try:

            result = subprocess.run(['nvidia-smi'], capture_output=True, text=True)

            return result.returncode == 0

        except FileNotFoundError:

            return False

    

    def _check_apple_metal(self) -> bool:

        """Check for Apple Metal support"""

        try:

            import platform

            return platform.system() == "Darwin" and platform.machine() in ["arm64", "aarch64"]

        except:

            return False

    

    def _check_amd_rocm(self) -> bool:

        """Check for AMD ROCm support"""

        try:

            result = subprocess.run(['rocm-smi'], capture_output=True, text=True)

            return result.returncode == 0

        except FileNotFoundError:

            return False

    

    def _configure_device(self):

        """Configure optimal device for computation"""

        if self.device_info["type"] == "cpu":

            return "cpu"

        

        try:

            import torch

            

            if self.device_info["type"] == "nvidia":

                device = torch.device(f"cuda:{self.config.device_id}")

                if self.config.memory_limit_gb:

                    fraction = self.config.memory_limit_gb / self.device_info["memory_gb"]

                    torch.cuda.set_per_process_memory_fraction(fraction, self.config.device_id)

                return device

            

            elif self.device_info["type"] == "apple":

                return torch.device("mps")

            

            elif self.device_info["type"] == "amd":

                return torch.device("cuda")  # ROCm uses CUDA-like interface

        

        except Exception as e:

            logger.warning(f"Device configuration failed: {e}")

        

        return "cpu"

    

    def get_device_info(self) -> Dict[str, Any]:

        """Get comprehensive device information"""

        return self.device_info.copy()


# Abstract LLM Interface

class LLMProvider(ABC):

    """Abstract base class for LLM providers"""

    

    @abstractmethod

    async def analyze_request(self, prompt: str) -> ParsedRequest:

        """Analyze user prompt and extract structured information"""

        pass

    

    @abstractmethod

    async def generate_grammar(self, request: ParsedRequest) -> str:

        """Generate ANTLR grammar based on request"""

        pass

    

    @abstractmethod

    async def convert_bnf_to_antlr(self, bnf_content: str, grammar_name: str) -> str:

        """Convert BNF/EBNF to ANTLR grammar"""

        pass

    

    @abstractmethod

    async def enhance_grammar(self, grammar: str, requirements: List[str]) -> str:

        """Enhance existing grammar with additional requirements"""

        pass

    

    @abstractmethod

    async def generate_summary(self, result: GenerationResult) -> str:

        """Generate comprehensive summary and documentation"""

        pass


# OpenAI Implementation

class OpenAIProvider(LLMProvider):

    """OpenAI GPT implementation"""

    

    def __init__(self, config: LLMConfig, gpu_manager: GPUManager):

        self.config = config

        self.gpu_manager = gpu_manager

        

        if not config.api_key:

            raise ValueError("OpenAI API key required")

    

    async def _make_request(self, messages: List[Dict], temperature: float = None) -> str:

        """Make async request to OpenAI API"""

        headers = {

            "Authorization": f"Bearer {self.config.api_key}",

            "Content-Type": "application/json"

        }

        

        data = {

            "model": self.config.model_name,

            "messages": messages,

            "max_tokens": self.config.max_tokens,

            "temperature": temperature or self.config.temperature

        }

        

        async with aiohttp.ClientSession() as session:

            try:

                async with session.post(

                    "https://api.openai.com/v1/chat/completions",

                    headers=headers,

                    json=data,

                    timeout=aiohttp.ClientTimeout(total=self.config.timeout)

                ) as response:

                    response.raise_for_status()

                    result = await response.json()

                    return result["choices"][0]["message"]["content"]

            

            except Exception as e:

                logger.error(f"OpenAI API request failed: {e}")

                raise

    

    async def analyze_request(self, prompt: str) -> ParsedRequest:

        """Analyze user prompt using OpenAI"""

        system_message = {

            "role": "system",

            "content": """You are an expert in formal languages, parsing, and ANTLR grammar development. 

            Analyze user requests for parser generation and extract structured information.

            

            Respond with JSON containing:

            - intent: "generate_parser", "convert_bnf", "find_grammar", or "custom_language"

            - language_name: if requesting parser for existing language (null if custom)

            - language_description: detailed description of the language to parse

            - bnf_specification: if BNF/EBNF is provided in the request

            - target_language: programming language for generated parser (default "Java")

            - grammar_name: suggested name for the grammar

            - special_requirements: array of special features or requirements

            - examples: array of example inputs if provided

            - confidence: confidence score 0.0-1.0 for the analysis

            

            Be thorough in extracting language_description even for known languages."""

        }

        

        user_message = {

            "role": "user", 

            "content": f"Analyze this parser generation request:\n\n{prompt}"

        }

        

        response = await self._make_request([system_message, user_message], temperature=0.1)

        

        try:

            # Extract JSON from response

            json_match = re.search(r'\{.*\}', response, re.DOTALL)

            if json_match:

                data = json.loads(json_match.group())

                

                return ParsedRequest(

                    original_prompt=prompt,

                    intent=data.get("intent", "generate_parser"),

                    language_name=data.get("language_name"),

                    language_description=data.get("language_description", ""),

                    bnf_specification=data.get("bnf_specification"),

                    target_language=data.get("target_language", "Java"),

                    grammar_name=data.get("grammar_name"),

                    special_requirements=data.get("special_requirements", []),

                    examples=data.get("examples", []),

                    confidence=data.get("confidence", 0.5)

                )

        

        except Exception as e:

            logger.warning(f"Failed to parse LLM analysis: {e}")

        

        # Fallback analysis

        return self._fallback_analysis(prompt)

    

    def _fallback_analysis(self, prompt: str) -> ParsedRequest:

        """Fallback analysis when JSON parsing fails"""

        prompt_lower = prompt.lower()

        

        # Detect BNF/EBNF

        if "::=" in prompt or "=" in prompt and ("<" in prompt and ">" in prompt):

            return ParsedRequest(

                original_prompt=prompt,

                intent="convert_bnf",

                bnf_specification=prompt,

                language_description="BNF specification conversion",

                target_language="Java",

                confidence=0.8

            )

        

        # Detect known languages

        known_languages = {

            "json": "JSON data format",

            "xml": "XML markup language", 

            "sql": "SQL database query language",

            "calculator": "arithmetic expression calculator",

            "java": "Java programming language",

            "python": "Python programming language",

            "javascript": "JavaScript programming language",

            "c++": "C++ programming language"

        }

        

        for lang, desc in known_languages.items():

            if lang in prompt_lower:

                return ParsedRequest(

                    original_prompt=prompt,

                    intent="find_grammar",

                    language_name=lang,

                    language_description=desc,

                    target_language="Java",

                    confidence=0.7

                )

        

        return ParsedRequest(

            original_prompt=prompt,

            intent="custom_language",

            language_description=prompt,

            target_language="Java",

            confidence=0.5

        )

    

    async def generate_grammar(self, request: ParsedRequest) -> str:

        """Generate ANTLR grammar from request"""

        system_message = {

            "role": "system",

            "content": """You are an expert ANTLR v4 grammar developer. Generate complete, production-ready ANTLR grammars.

            

            Requirements:

            - Use ANTLR v4 syntax exactly

            - Include grammar declaration with appropriate name

            - Define comprehensive lexer rules for all tokens

            - Create well-structured parser rules with proper precedence

            - Handle whitespace and comments appropriately

            - Follow ANTLR naming conventions (parser rules lowercase, lexer rules uppercase)

            - Include error handling considerations

            - Make grammar unambiguous and efficient

            

            Respond with only the grammar content, no explanations."""

        }

        

        prompt_parts = [f"Generate ANTLR v4 grammar for: {request.language_description}"]

        

        if request.language_name:

            prompt_parts.append(f"Language name: {request.language_name}")

        

        if request.grammar_name:

            prompt_parts.append(f"Grammar name: {request.grammar_name}")

        

        if request.special_requirements:

            prompt_parts.append(f"Special requirements: {', '.join(request.special_requirements)}")

        

        if request.examples:

            prompt_parts.append(f"Example inputs:\n{chr(10).join(request.examples)}")

        

        prompt_parts.append(f"Target language: {request.target_language}")

        

        user_message = {

            "role": "user",

            "content": "\n\n".join(prompt_parts)

        }

        

        return await self._make_request([system_message, user_message])

    

    async def convert_bnf_to_antlr(self, bnf_content: str, grammar_name: str) -> str:

        """Convert BNF/EBNF to ANTLR grammar"""

        system_message = {

            "role": "system",

            "content": """Convert BNF/EBNF specifications to ANTLR v4 grammar syntax.

            

            Conversion rules:

            - Replace ::= with :

            - Remove angle brackets from non-terminals  

            - Convert | to ANTLR alternatives

            - Handle optional elements [...] as (...)?

            - Handle repetition {...} as (...)*

            - Add appropriate lexer rules

            - Ensure ANTLR v4 compatibility

            - Add grammar declaration

            - Include whitespace handling

            

            Respond with only the converted grammar."""

        }

        

        user_message = {

            "role": "user",

            "content": f"Convert this BNF/EBNF to ANTLR v4 grammar named '{grammar_name}':\n\n{bnf_content}"

        }

        

        return await self._make_request([system_message, user_message])

    

    async def enhance_grammar(self, grammar: str, requirements: List[str]) -> str:

        """Enhance existing grammar with additional requirements"""

        system_message = {

            "role": "system", 

            "content": "Enhance the given ANTLR grammar to meet additional requirements. Maintain compatibility and add features as requested."

        }

        

        user_message = {

            "role": "user",

            "content": f"Enhance this ANTLR grammar:\n\n{grammar}\n\nAdditional requirements:\n{chr(10).join(f'- {req}' for req in requirements)}"

        }

        

        return await self._make_request([system_message, user_message])

    

    async def generate_summary(self, result: GenerationResult) -> str:

        """Generate comprehensive summary"""

        system_message = {

            "role": "system",

            "content": "Generate clear, comprehensive documentation for ANTLR parser generation results. Include usage instructions and next steps."

        }

        

        user_message = {

            "role": "user",

            "content": f"""Generate summary for this ANTLR parser generation:


Original Request: {result.request.original_prompt}

Grammar File: {result.grammar_file_path}

Target Language: {result.target_language}

Compilation: {'Success' if result.compilation_success else 'Failed'}

Generated Files: {len(result.generated_files)}

Generation Time: {result.generation_time:.2f}s


Include:

- Overview of what was generated

- File structure and contents

- Integration instructions for {result.target_language}

- Usage examples

- Next development steps

- Performance considerations"""

        }

        

        return await self._make_request([system_message, user_message])


# Grammar Search Engine

class GrammarSearchEngine:

    """Comprehensive grammar search across multiple sources"""

    

    def __init__(self, config: SearchConfig):

        self.config = config

        self.cache = {}

        self.session = None

    

    async def search_grammar(self, language_name: str) -> Optional[GrammarSource]:

        """Search for existing grammar across all sources"""

        cache_key = language_name.lower()

        

        # Check cache

        if cache_key in self.cache:

            cached_time, result = self.cache[cache_key]

            if datetime.now() - cached_time < timedelta(hours=self.config.cache_duration_hours):

                return result

        

        # Search all configured sources

        results = []

        

        if self.config.enable_web_search:

            if "github" in self.config.search_engines:

                github_results = await self._search_github(language_name)

                results.extend(github_results)

            

            if "antlr-grammars" in self.config.search_engines:

                antlr_results = await self._search_antlr_grammars(language_name)

                results.extend(antlr_results)

        

        # Select best result

        if results:

            best_result = max(results, key=lambda x: x.quality_score)

            self.cache[cache_key] = (datetime.now(), best_result)

            return best_result

        

        return None

    

    async def _search_github(self, language_name: str) -> List[GrammarSource]:

        """Search GitHub for ANTLR grammars"""

        results = []

        

        if not self.session:

            self.session = aiohttp.ClientSession()

        

        try:

            # Search GitHub API

            query = f"{language_name} antlr grammar filetype:g4"

            url = f"https://api.github.com/search/code?q={quote_plus(query)}"

            

            headers = {}

            if self.config.github_token:

                headers["Authorization"] = f"token {self.config.github_token}"

            

            async with self.session.get(url, headers=headers, timeout=self.config.timeout) as response:

                if response.status == 200:

                    data = await response.json()

                    

                    for item in data.get("items", [])[:self.config.max_results_per_source]:

                        # Fetch grammar content

                        content = await self._fetch_github_file(item["download_url"])

                        if content:

                            quality_score = self._calculate_quality_score(content, item)

                            

                            results.append(GrammarSource(

                                content=content,

                                source_type="web",

                                url=item["html_url"],

                                quality_score=quality_score,

                                language=language_name,

                                description=f"GitHub: {item['repository']['full_name']}"

                            ))

        

        except Exception as e:

            logger.warning(f"GitHub search failed: {e}")

        

        return results

    

    async def _fetch_github_file(self, download_url: str) -> Optional[str]:

        """Fetch file content from GitHub"""

        try:

            async with self.session.get(download_url, timeout=self.config.timeout) as response:

                if response.status == 200:

                    return await response.text()

        except Exception as e:

            logger.warning(f"Failed to fetch GitHub file: {e}")

        

        return None

    

    async def _search_antlr_grammars(self, language_name: str) -> List[GrammarSource]:

        """Search official ANTLR grammars repository"""

        results = []

        

        try:

            # Search the official ANTLR grammars-v4 repository

            base_url = "https://api.github.com/repos/antlr/grammars-v4/contents"

            

            async with self.session.get(base_url, timeout=self.config.timeout) as response:

                if response.status == 200:

                    contents = await response.json()

                    

                    # Look for matching directories

                    for item in contents:

                        if (item["type"] == "dir" and 

                            language_name.lower() in item["name"].lower()):

                            

                            grammar_content = await self._fetch_antlr_grammar_dir(item["url"])

                            if grammar_content:

                                results.append(GrammarSource(

                                    content=grammar_content,

                                    source_type="web",

                                    url=f"https://github.com/antlr/grammars-v4/tree/master/{item['name']}",

                                    quality_score=0.9,  # High quality for official grammars

                                    language=language_name,

                                    description=f"Official ANTLR grammar: {item['name']}"

                                ))

        

        except Exception as e:

            logger.warning(f"ANTLR grammars search failed: {e}")

        

        return results

    

    async def _fetch_antlr_grammar_dir(self, dir_url: str) -> Optional[str]:

        """Fetch grammar from ANTLR grammars directory"""

        try:

            async with self.session.get(dir_url, timeout=self.config.timeout) as response:

                if response.status == 200:

                    files = await response.json()

                    

                    # Look for .g4 files

                    for file_info in files:

                        if file_info["name"].endswith(".g4"):

                            content = await self._fetch_github_file(file_info["download_url"])

                            if content and "grammar" in content:

                                return content

        

        except Exception as e:

            logger.warning(f"Failed to fetch ANTLR grammar directory: {e}")

        

        return None

    

    def _calculate_quality_score(self, content: str, metadata: Dict) -> float:

        """Calculate quality score for grammar"""

        score = 0.0

        

        # Basic grammar structure

        if "grammar" in content and ":" in content:

            score += 0.3

        

        # Lexer rules present

        if re.search(r'[A-Z_]+\s*:', content):

            score += 0.2

        

        # Parser rules present  

        if re.search(r'[a-z_]+\s*:', content):

            score += 0.2

        

        # Repository stars (if available)

        if "stargazers_count" in metadata.get("repository", {}):

            stars = metadata["repository"]["stargazers_count"]

            score += min(0.2, stars / 1000)

        

        # Recent activity

        if "updated_at" in metadata.get("repository", {}):

            score += 0.1

        

        return min(1.0, score)

    

    async def close(self):

        """Close HTTP session"""

        if self.session:

            await self.session.close()


# ANTLR Compiler and File Manager

class ANTLRCompiler:

    """Manages ANTLR compilation and file generation"""

    

    def __init__(self, config: ANTLRConfig):

        self.config = config

        self._verify_antlr_installation()

    

    def _verify_antlr_installation(self):

        """Verify ANTLR installation and Java availability"""

        if not os.path.exists(self.config.jar_path):

            raise FileNotFoundError(f"ANTLR JAR not found: {self.config.jar_path}")

        

        try:

            result = subprocess.run(

                [self.config.java_path, "-version"],

                capture_output=True, text=True

            )

            if result.returncode != 0:

                raise RuntimeError("Java not available")

        except FileNotFoundError:

            raise RuntimeError("Java not found in PATH")

        

        logger.info(f"ANTLR {self.config.version} verified at {self.config.jar_path}")

    

    def create_project_directory(self, base_dir: str, grammar_name: str) -> str:

        """Create organized project directory structure"""

        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")

        project_name = f"{grammar_name}_{timestamp}"

        project_dir = os.path.join(base_dir, project_name)

        

        # Create directory structure

        os.makedirs(project_dir, exist_ok=True)

        os.makedirs(os.path.join(project_dir, "grammar"), exist_ok=True)

        os.makedirs(os.path.join(project_dir, "generated"), exist_ok=True)

        os.makedirs(os.path.join(project_dir, "examples"), exist_ok=True)

        os.makedirs(os.path.join(project_dir, "docs"), exist_ok=True)

        

        logger.info(f"Created project directory: {project_dir}")

        return project_dir

    

    def save_grammar(self, grammar_content: str, project_dir: str, grammar_name: str) -> str:

        """Save grammar to file with proper naming"""

        # Ensure grammar has proper name declaration

        if not grammar_content.strip().startswith("grammar"):

            grammar_content = f"grammar {grammar_name};\n\n{grammar_content}"

        

        grammar_file = os.path.join(project_dir, "grammar", f"{grammar_name}.g4")

        

        with open(grammar_file, 'w', encoding='utf-8') as f:

            f.write(grammar_content)

        

        logger.info(f"Grammar saved: {grammar_file}")

        return grammar_file

    

    async def compile_grammar(self, grammar_file: str, target_language: str, project_dir: str) -> Tuple[bool, str, Dict[str, str]]:

        """Compile ANTLR grammar and return results"""

        output_dir = os.path.join(project_dir, "generated")

        

        # Build ANTLR command

        cmd = [

            self.config.java_path,

            "-jar", self.config.jar_path,

            "-Dlanguage=" + target_language,

            "-o", output_dir

        ]

        

        if self.config.generate_visitor:

            cmd.append("-visitor")

        

        if self.config.generate_listener:

            cmd.append("-listener")

        

        cmd.append(grammar_file)

        

        logger.info(f"Compiling grammar with command: {' '.join(cmd)}")

        

        try:

            # Run ANTLR compilation

            result = subprocess.run(

                cmd,

                capture_output=True,

                text=True,

                timeout=120,

                cwd=project_dir

            )

            

            compilation_log = f"STDOUT:\n{result.stdout}\n\nSTDERR:\n{result.stderr}"

            success = result.returncode == 0

            

            # Collect generated files

            generated_files = {}

            if success:

                for root, dirs, files in os.walk(output_dir):

                    for file in files:

                        if file.endswith(('.java', '.py', '.cpp', '.cs', '.js', '.go')):

                            full_path = os.path.join(root, file)

                            rel_path = os.path.relpath(full_path, project_dir)

                            generated_files[rel_path] = full_path

            

            logger.info(f"Compilation {'succeeded' if success else 'failed'}")

            return success, compilation_log, generated_files

        

        except subprocess.TimeoutExpired:

            error_msg = "ANTLR compilation timed out"

            logger.error(error_msg)

            return False, error_msg, {}

        

        except Exception as e:

            error_msg = f"ANTLR compilation failed: {e}"

            logger.error(error_msg)

            return False, error_msg, {}

    

    def generate_build_files(self, project_dir: str, target_language: str, grammar_name: str):

        """Generate build files and integration examples"""

        if target_language == "Java":

            self._generate_java_build_files(project_dir, grammar_name)

        elif target_language == "Python3":

            self._generate_python_build_files(project_dir, grammar_name)

        elif target_language == "Cpp":

            self._generate_cpp_build_files(project_dir, grammar_name)

    

    def _generate_java_build_files(self, project_dir: str, grammar_name: str):

        """Generate Java build files and examples"""

        # Maven pom.xml

        pom_content = f"""<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0"

         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 

         http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    

    <groupId>com.example</groupId>

    <artifactId>{grammar_name.lower()}-parser</artifactId>

    <version>1.0.0</version>

    

    <properties>

        <maven.compiler.source>11</maven.compiler.source>

        <maven.compiler.target>11</maven.compiler.target>

        <antlr.version>{self.config.version}</antlr.version>

    </properties>

    

    <dependencies>

        <dependency>

            <groupId>org.antlr</groupId>

            <artifactId>antlr4-runtime</artifactId>

            <version>${{antlr.version}}</version>

        </dependency>

    </dependencies>

    

    <build>

        <plugins>

            <plugin>

                <groupId>org.antlr</groupId>

                <artifactId>antlr4-maven-plugin</artifactId>

                <version>${{antlr.version}}</version>

                <executions>

                    <execution>

                        <goals>

                            <goal>antlr4</goal>

                        </goals>

                    </execution>

                </executions>

            </plugin>

        </plugins>

    </build>

</project>"""

        

        with open(os.path.join(project_dir, "pom.xml"), 'w') as f:

            f.write(pom_content)

        

        # Example Java usage

        example_content = f"""import org.antlr.v4.runtime.*;

import org.antlr.v4.runtime.tree.*;


public class {grammar_name}Example {{

    public static void main(String[] args) throws Exception {{

        // Create input stream (from string, file, or stdin)

        String input = "your input here";

        ANTLRInputStream inputStream = new ANTLRInputStream(input);

        

        // Create lexer

        {grammar_name}Lexer lexer = new {grammar_name}Lexer(inputStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);

        

        // Create parser

        {grammar_name}Parser parser = new {grammar_name}Parser(tokens);

        

        // Parse starting from the root rule (adjust as needed)

        ParseTree tree = parser.startRule(); // Replace 'startRule' with your actual start rule

        

        // Print parse tree

        System.out.println(tree.toStringTree(parser));

        

        // Use visitor or listener for tree processing

        // {grammar_name}BaseVisitor visitor = new {grammar_name}BaseVisitor();

        // visitor.visit(tree);

    }}

}}"""

        

        with open(os.path.join(project_dir, "examples", f"{grammar_name}Example.java"), 'w') as f:

            f.write(example_content)

    

    def _generate_python_build_files(self, project_dir: str, grammar_name: str):

        """Generate Python build files and examples"""

        # requirements.txt

        requirements = f"antlr4-python3-runtime=={self.config.version}\n"

        

        with open(os.path.join(project_dir, "requirements.txt"), 'w') as f:

            f.write(requirements)

        

        # Example Python usage

        example_content = f"""from antlr4 import *

from generated.{grammar_name}Lexer import {grammar_name}Lexer

from generated.{grammar_name}Parser import {grammar_name}Parser


def main():

    # Create input stream

    input_text = "your input here"

    input_stream = InputStream(input_text)

    

    # Create lexer

    lexer = {grammar_name}Lexer(input_stream)

    token_stream = CommonTokenStream(lexer)

    

    # Create parser

    parser = {grammar_name}Parser(token_stream)

    

    # Parse starting from root rule (adjust as needed)

    tree = parser.startRule()  # Replace 'startRule' with your actual start rule

    

    # Print parse tree

    print(tree.toStringTree(recog=parser))

    

    # Use visitor or listener for tree processing

    # visitor = {grammar_name}Visitor()

    # visitor.visit(tree)


if __name__ == '__main__':

    main()

"""

        

        with open(os.path.join(project_dir, "examples", f"{grammar_name.lower()}_example.py"), 'w') as f:

            f.write(example_content)

    

    def _generate_cpp_build_files(self, project_dir: str, grammar_name: str):

        """Generate C++ build files and examples"""

        # CMakeLists.txt

        cmake_content = f"""cmake_minimum_required(VERSION 3.10)

project({grammar_name}Parser)


set(CMAKE_CXX_STANDARD 17)


# Find ANTLR runtime

find_package(PkgConfig REQUIRED)

pkg_check_modules(ANTLR4 REQUIRED antlr4-runtime)


# Include directories

include_directories(${{ANTLR4_INCLUDE_DIRS}})

include_directories(generated)


# Source files

file(GLOB GENERATED_SOURCES "generated/*.cpp")

set(SOURCES

    examples/{grammar_name.lower()}_example.cpp

    ${{GENERATED_SOURCES}}

)


# Executable

add_executable({grammar_name.lower()}_parser ${{SOURCES}})


# Link libraries

target_link_libraries({grammar_name.lower()}_parser ${{ANTLR4_LIBRARIES}})

"""

        

        with open(os.path.join(project_dir, "CMakeLists.txt"), 'w') as f:

            f.write(cmake_content)

        

        # Example C++ usage

        example_content = f"""#include <iostream>

#include <fstream>

#include "antlr4-runtime.h"

#include "{grammar_name}Lexer.h"

#include "{grammar_name}Parser.h"


using namespace antlr4;


int main(int argc, char* argv[]) {{

    // Create input stream

    std::string input = "your input here";

    ANTLRInputStream inputStream(input);

    

    // Create lexer

    {grammar_name}Lexer lexer(&inputStream);

    CommonTokenStream tokens(&lexer);

    

    // Create parser

    {grammar_name}Parser parser(&tokens);

    

    // Parse starting from root rule (adjust as needed)

    tree::ParseTree* tree = parser.startRule(); // Replace 'startRule' with your actual start rule

    

    // Print parse tree

    std::cout << tree->toStringTree(&parser) << std::endl;

    

    return 0;

}}

"""

        

        with open(os.path.join(project_dir, "examples", f"{grammar_name.lower()}_example.cpp"), 'w') as f:

            f.write(example_content)


# Main LLM Agent

class ANTLRGeneratorAgent:

    """Main LLM Agent for ANTLR parser generation"""

    

    def __init__(self, config: SystemConfig):

        self.config = config

        self.gpu_manager = GPUManager(config.gpu_config)

        self.llm_provider = self._create_llm_provider()

        self.search_engine = GrammarSearchEngine(config.search_config)

        self.compiler = ANTLRCompiler(config.antlr_config)

        

        # Ensure output directory exists

        os.makedirs(config.output_base_directory, exist_ok=True)

        

        logger.info("ANTLR Generator Agent initialized")

    

    def _create_llm_provider(self) -> LLMProvider:

        """Create appropriate LLM provider based on configuration"""

        if self.config.llm_config.provider == "openai":

            return OpenAIProvider(self.config.llm_config, self.gpu_manager)

        else:

            raise ValueError(f"Unsupported LLM provider: {self.config.llm_config.provider}")

    

    async def generate_parser(self, user_prompt: str) -> GenerationResult:

        """Main method to generate parser from user prompt"""

        start_time = datetime.now()

        

        logger.info(f"Processing user prompt: {user_prompt[:100]}...")

        

        try:

            # Step 1: Analyze user request

            request = await self.llm_provider.analyze_request(user_prompt)

            logger.info(f"Request analysis: {request.intent} (confidence: {request.confidence})")

            

            # Step 2: Determine grammar source strategy

            grammar_content = await self._obtain_grammar(request)

            

            # Step 3: Create project directory

            grammar_name = request.grammar_name or self._generate_grammar_name(request)

            project_dir = self.compiler.create_project_directory(

                self.config.output_base_directory, 

                grammar_name

            )

            

            # Step 4: Save grammar file

            grammar_file = self.compiler.save_grammar(grammar_content, project_dir, grammar_name)

            

            # Step 5: Compile grammar

            success, compilation_log, generated_files = await self.compiler.compile_grammar(

                grammar_file, 

                request.target_language, 

                project_dir

            )

            

            # Step 6: Generate build files and examples

            if success:

                self.compiler.generate_build_files(project_dir, request.target_language, grammar_name)

            

            # Step 7: Create result object

            generation_time = (datetime.now() - start_time).total_seconds()

            

            result = GenerationResult(

                request=request,

                grammar_file_path=grammar_file,

                generated_files=generated_files,

                output_directory=project_dir,

                compilation_success=success,

                compilation_log=compilation_log,

                antlr_version=self.config.antlr_config.version,

                target_language=request.target_language,

                generation_time=generation_time,

                summary="",

                next_steps=[],

                performance_notes=[]

            )

            

            # Step 8: Generate summary and documentation

            result.summary = await self.llm_provider.generate_summary(result)

            result.next_steps = self._generate_next_steps(result)

            result.performance_notes = self._generate_performance_notes(result)

            

            # Step 9: Save documentation

            await self._save_documentation(result)

            

            logger.info(f"Parser generation completed in {generation_time:.2f}s")

            return result

        

        except Exception as e:

            logger.error(f"Parser generation failed: {e}")

            raise

    

    async def _obtain_grammar(self, request: ParsedRequest) -> str:

        """Obtain grammar content based on request type"""

        if request.intent == "convert_bnf" and request.bnf_specification:

            logger.info("Converting BNF specification to ANTLR")

            grammar_name = request.grammar_name or "GeneratedGrammar"

            return await self.llm_provider.convert_bnf_to_antlr(

                request.bnf_specification, 

                grammar_name

            )

        

        elif request.intent == "find_grammar" and request.language_name:

            logger.info(f"Searching for existing grammar: {request.language_name}")

            

            # Try to find existing grammar

            existing_grammar = await self.search_engine.search_grammar(request.language_name)

            

            if existing_grammar and existing_grammar.quality_score > 0.7:

                logger.info(f"Found high-quality grammar from {existing_grammar.source_type}")

                

                # Enhance if special requirements exist

                if request.special_requirements:

                    return await self.llm_provider.enhance_grammar(

                        existing_grammar.content, 

                        request.special_requirements

                    )

                

                return existing_grammar.content

            

            else:

                logger.info("No suitable existing grammar found, generating new one")

                return await self.llm_provider.generate_grammar(request)

        

        else:

            logger.info("Generating custom grammar from description")

            return await self.llm_provider.generate_grammar(request)

    

    def _generate_grammar_name(self, request: ParsedRequest) -> str:

        """Generate appropriate grammar name"""

        if request.language_name:

            return request.language_name.capitalize()

        

        # Extract name from description

        words = re.findall(r'\b[a-zA-Z]+\b', request.language_description)

        if words:

            return ''.join(word.capitalize() for word in words[:2])

        

        return "CustomGrammar"

    

    def _generate_next_steps(self, result: GenerationResult) -> List[str]:

        """Generate next steps for the user"""

        steps = []

        

        if result.compilation_success:

            steps.extend([

                f"Navigate to the project directory: {result.output_directory}",

                f"Review the generated grammar file: {os.path.basename(result.grammar_file_path)}",

                "Examine the generated parser files in the 'generated' directory",

                "Run the example code in the 'examples' directory",

                "Customize the grammar for your specific needs",

                "Add semantic actions or tree processing logic",

                "Create comprehensive test cases",

                "Integrate the parser into your application"

            ])

            

            if result.target_language == "Java":

                steps.append("Build the project using Maven: mvn compile")

            elif result.target_language == "Python3":

                steps.append("Install dependencies: pip install -r requirements.txt")

            elif result.target_language == "Cpp":

                steps.append("Build using CMake: mkdir build && cd build && cmake .. && make")

        

        else:

            steps.extend([

                "Review the compilation errors in the log",

                "Fix grammar syntax issues",

                "Re-run the ANTLR compilation",

                "Consider simplifying the grammar structure"

            ])

        

        return steps

    

    def _generate_performance_notes(self, result: GenerationResult) -> List[str]:

        """Generate performance optimization notes"""

        notes = []

        

        if result.compilation_success:

            notes.extend([

                "Consider left-factoring rules to reduce ambiguity",

                "Use lexer modes for context-sensitive tokenization",

                "Implement error recovery strategies for production use",

                "Profile parser performance with large inputs",

                "Consider using prediction mode SLL for better performance"

            ])

            

            if result.generation_time > 30:

                notes.append("Consider using a more powerful GPU for faster LLM processing")

        

        return notes

    

    async def _save_documentation(self, result: GenerationResult):

        """Save comprehensive documentation"""

        docs_dir = os.path.join(result.output_directory, "docs")

        

        # Save summary

        with open(os.path.join(docs_dir, "README.md"), 'w') as f:

            f.write(f"# {os.path.basename(result.output_directory)}\n\n")

            f.write(result.summary)

            f.write("\n\n## Next Steps\n\n")

            for i, step in enumerate(result.next_steps, 1):

                f.write(f"{i}. {step}\n")

            

            f.write("\n\n## Performance Notes\n\n")

            for note in result.performance_notes:

                f.write(f"- {note}\n")

        

        # Save generation metadata

        metadata = {

            "generation_time": result.generation_time,

            "antlr_version": result.antlr_version,

            "target_language": result.target_language,

            "compilation_success": result.compilation_success,

            "original_prompt": result.request.original_prompt,

            "gpu_info": self.gpu_manager.get_device_info()

        }

        

        with open(os.path.join(docs_dir, "metadata.json"), 'w') as f:

            json.dump(metadata, f, indent=2, default=str)

    

    async def close(self):

        """Clean up resources"""

        await self.search_engine.close()


# CLI Interface

async def main():

    """Main CLI interface for the ANTLR Generator Agent"""

    import argparse

    

    parser = argparse.ArgumentParser(description="LLM-Powered ANTLR Parser Generator")

    parser.add_argument("prompt", help="User prompt for parser generation")

    parser.add_argument("--config", help="Configuration file path")

    parser.add_argument("--output-dir", help="Output directory", default="./generated_parsers")

    parser.add_argument("--target-lang", help="Target language", default="Java")

    parser.add_argument("--antlr-jar", help="Path to ANTLR JAR file", required=True)

    parser.add_argument("--openai-key", help="OpenAI API key")

    

    args = parser.parse_args()

    

    # Create configuration

    config = SystemConfig(

        llm_config=LLMConfig(

            provider="openai",

            model_name="gpt-4",

            api_key=args.openai_key or os.getenv("OPENAI_API_KEY")

        ),

        gpu_config=GPUConfig(enable_gpu=True),

        search_config=SearchConfig(enable_web_search=True),

        antlr_config=ANTLRConfig(jar_path=args.antlr_jar),

        output_base_directory=args.output_dir,

        temp_directory=tempfile.mkdtemp()

    )

    

    # Create and run agent

    agent = ANTLRGeneratorAgent(config)

    

    try:

        result = await agent.generate_parser(args.prompt)

        

        print(f"\n{'='*60}")

        print("ANTLR PARSER GENERATION COMPLETED")

        print(f"{'='*60}")

        print(f"Project Directory: {result.output_directory}")

        print(f"Compilation: {'SUCCESS' if result.compilation_success else 'FAILED'}")

        print(f"Generation Time: {result.generation_time:.2f} seconds")

        print(f"Generated Files: {len(result.generated_files)}")

        

        if result.compilation_success:

            print(f"\nGenerated Files:")

            for rel_path in result.generated_files.keys():

                print(f"  - {rel_path}")

        

        print(f"\nNext Steps:")

        for i, step in enumerate(result.next_steps[:5], 1):

            print(f"  {i}. {step}")

        

        print(f"\nFull documentation available in: {result.output_directory}/docs/")

    

    except Exception as e:

        logger.error(f"Generation failed: {e}")

        sys.exit(1)

    

    finally:

        await agent.close()


if __name__ == "__main__":

    asyncio.run(main())




USAGE EXAMPLES


Example 1: Generate calculator parser

>> python antlr_agent.py "Create a parser for arithmetic expressions with variables, functions, and parentheses" --antlr-jar /path/to/antlr.jar --openai-key your_key


Example 2: Convert BNF to ANTLR

>> python antlr_agent.py "Convert this BNF: <expr> ::= <term> | <expr> '+' <term>" --antlr-jar /path/to/antlr.jar --target-lang Python3


Example 3: Generate JSON parser with extensions

>> python antlr_agent.py "Generate a JSON parser that also supports comments and trailing commas" --antlr-jar /path/to/antlr.jar


Example 4: Custom domain-specific language

>> python antlr_agent.py "Create a parser for a configuration language with sections, key-value pairs, and lists" --antlr-jar /path/to/antlr.jar --target-lang Cpp


This comprehensive implementation provides a general-purpose LLM Agent that can handle any user prompt and generate appropriate ANTLR v4 parsers. The system is not constrained to specific examples and can intelligently process diverse parsing requirements while leveraging GPU acceleration and web search capabilities.

No comments: