Monday, December 22, 2025

An LLM-Based Git Analysis Agent: Deconstructing Repositories with AI




Introduction


Modern software development heavily relies on version control systems, with Git being the undisputed leader. Navigating and understanding complex Git repositories, especially unfamiliar ones, can be a daunting and time-consuming task for developers, project managers, and new team members alike. The sheer volume of code, commit history, branching strategies, and associated documentation often presents a significant barrier to entry or rapid comprehension. This article introduces the concept and detailed architecture of an LLM-based Git analysis agent designed to automate this process, providing comprehensive and insightful summaries of any given repository.

The core challenge addressed by this agent lies in bridging the gap between raw Git repository data and human-understandable, high-level summaries. Furthermore, a significant technical hurdle for any Large Language Model (LLM) is its inherent context window limitation. A typical repository can contain hundreds or thousands of files, far exceeding the token capacity of even the most advanced LLMs if processed all at once. Our agent is specifically engineered to overcome this by employing a progressive summarization strategy, breaking down the analysis into manageable, context-aware chunks.


Agent Architecture Overview


The Git analysis agent operates through a series of interconnected modules, each responsible for a specific aspect of repository understanding and information synthesis. This modular design ensures maintainability, scalability, and adherence to clean architecture principles. The overall flow begins with user input, proceeds through repository acquisition and detailed analysis, leverages LLMs for summarization, and culminates in a structured, comprehensive report.


Here is a conceptual ASCII diagram illustrating the agent's architecture:


+---------------------+     +--------------------------+

| User Configuration  |---->| Repository Acquisition   |

| (LLM, Repo Path)    |     | (Local/Remote)           |

+---------------------+     +--------------------------+

          |                             |

          v                             v

+---------------------+     +--------------------------+

| Orchestration Engine|---->| Git Interaction Module   |

| (Main Control Flow) |     | (Log, Diff, Files, Tags) |

+---------------------+     +--------------------------+

          |                             |

          v                             v

+---------------------+     +--------------------------+

| File Analysis Module|---->| LLM Integration Layer    |

| (Read, Chunk Files) |     | (Prompting, API Calls)   |

+---------------------+     +--------------------------+

          |                             |

          v                             v

+---------------------+     +--------------------------+

| Progressive         |     | Output Generation        |

| Summarization &     |---->| (Structured Report)      |

| Memory Module       |     +--------------------------+

+---------------------+



The agent's journey starts with the user providing configuration details, including the target repository's location and the LLM settings. The Repository Acquisition module then handles fetching the repository, whether it is a local directory or a remote URL. The Orchestration Engine acts as the central coordinator, directing the flow of analysis. It delegates tasks to the Git Interaction Module for extracting metadata such as commit history, branches, and tags. Concurrently, the File Analysis Module reads individual files, preparing their content for LLM processing. The LLM Integration Layer manages all communication with the chosen Large Language Model, crafting prompts and parsing responses. Crucially, the Progressive Summarization and Memory Module aggregates file-level summaries into higher-level insights, effectively managing context. Finally, the Output Generation module compiles all gathered and summarized information into a coherent and detailed report for the user.


Detailed Constituent Descriptions


Let us delve deeper into each critical component of our LLM-based Git analysis agent, providing code examples to illustrate their functionality.


Configuration Management


Effective configuration management is paramount for flexibility and ease of use. The user must be able to specify the repository path (local or remote) and the details for the LLM, including whether it is a local model (e.g., via Ollama or a local server) or a remote API (e.g., OpenAI, Azure OpenAI). This module centralizes these settings, making them accessible throughout the agent.

We define a `Configuration` class to encapsulate these settings, ensuring that all necessary parameters are available before the analysis begins.


# config.py


import os

from typing import Optional


class LLMConfig:

    """

    Encapsulates configuration settings for the Large Language Model.

    Supports both remote API-based LLMs and local server-based LLMs.

    """

    def __init__(self,

                 llm_type: str, # 'openai', 'local'

                 api_key: Optional[str] = None,

                 model_name: str = "gpt-4o-mini",

                 base_url: Optional[str] = None):

        """

        Initializes the LLM configuration.


        Args:

            llm_type: Specifies the type of LLM ('openai' for remote API, 'local' for a local server).

            api_key: The API key for remote LLM services (e.g., OpenAI API key).

                     This should ideally be loaded from environment variables for security.

            model_name: The specific model identifier to use (e.g., "gpt-4o-mini", "llama3").

            base_url: The base URL for local LLM servers (e.g., "http://localhost:11434/v1").

        """

        if llm_type not in ['openai', 'local']:

            raise ValueError("llm_type must be 'openai' or 'local'")


        self.llm_type = llm_type

        self.api_key = api_key if api_key else os.getenv("OPENAI_API_KEY")

        self.model_name = model_name

        self.base_url = base_url


        if self.llm_type == 'openai' and not self.api_key:

            raise ValueError("OPENAI_API_KEY environment variable or api_key must be set for OpenAI LLM type.")

        if self.llm_type == 'local' and not self.base_url:

            raise ValueError("base_url must be set for local LLM type.")


    def __repr__(self) -> str:

        """Provides a string representation of the LLMConfig object."""

        return (f"LLMConfig(llm_type='{self.llm_type}', model_name='{self.model_name}', "

                f"base_url='{self.base_url if self.base_url else 'N/A'}')")


class AgentConfig:

    """

    Main configuration class for the Git analysis agent.

    Holds repository path and LLM configuration.

    """

    def __init__(self,

                 repo_path: str,

                 llm_config: LLMConfig,

                 output_dir: str = "analysis_results"):

        """

        Initializes the agent configuration.


        Args:

            repo_path: The path to the local Git repository or its remote URL.

            llm_config: An instance of LLMConfig containing LLM-specific settings.

            output_dir: The directory where analysis results and summaries will be stored.

        """

        self.repo_path = repo_path

        self.llm_config = llm_config

        self.output_dir = output_dir


        # Ensure output directory exists

        os.makedirs(self.output_dir, exist_ok=True)


    def __repr__(self) -> str:

        """Provides a string representation of the AgentConfig object."""

        return (f"AgentConfig(repo_path='{self.repo_path}', llm_config={self.llm_config}, "

                f"output_dir='{self.output_dir}')")


Repository Acquisition


This module is responsible for obtaining the Git repository. It must handle two primary scenarios: a local file path already present on the user's machine or a remote URL pointing to a repository on platforms like GitHub or GitLab. For remote repositories, it performs a clone operation.

The `GitRepositoryManager` class encapsulates the logic for cloning remote repositories and validating local paths. It ensures that the agent always operates on a valid, accessible Git repository.


# git_operations.py


import os

import shutil

import git # type: ignore # gitpython library


class GitRepositoryManager:

    """

    Manages the acquisition and cleanup of Git repositories.

    Handles cloning remote repositories and validating local paths.

    """

    def __init__(self, repo_source_path: str, clone_dir: str = "cloned_repos"):

        """

        Initializes the GitRepositoryManager.


        Args:

            repo_source_path: The path to the local Git repository or its remote URL.

            clone_dir: The directory where remote repositories will be cloned.

        """

        self.repo_source_path = repo_source_path

        self.clone_dir = clone_dir

        self.local_repo_path: Optional[str] = None

        self.is_cloned = False


        os.makedirs(self.clone_dir, exist_ok=True)


    def acquire_repository(self) -> str:

        """

        Acquires the Git repository, either by using a local path or cloning a remote one.


        Returns:

            The absolute path to the local Git repository directory.


        Raises:

            ValueError: If the provided path is not a valid Git repository.

            git.InvalidGitRepositoryError: If cloning fails or the local path is not a Git repo.

            git.GitCommandError: If a git command fails during cloning.

        """

        if os.path.isdir(self.repo_source_path) and \

           os.path.exists(os.path.join(self.repo_source_path, '.git')):

            # It's already a local Git repository

            self.local_repo_path = os.path.abspath(self.repo_source_path)

            print(f"Using local repository at: {self.local_repo_path}")

        elif self.repo_source_path.startswith(('http://', 'https://', 'git@')):

            # It's a remote URL, clone it

            repo_name = self.repo_source_path.split('/')[-1].replace('.git', '')

            target_path = os.path.join(self.clone_dir, repo_name)


            if os.path.exists(target_path):

                print(f"Repository already cloned to {target_path}. Pulling latest changes...")

                repo = git.Repo(target_path)

                origin = repo.remotes.origin

                origin.pull()

            else:

                print(f"Cloning remote repository {self.repo_source_path} to {target_path}...")

                git.Repo.clone_from(self.repo_source_path, target_path)

            self.local_repo_path = os.path.abspath(target_path)

            self.is_cloned = True

            print(f"Repository successfully cloned/updated at: {self.local_repo_path}")

        else:

            raise ValueError(f"Invalid repository source: {self.repo_source_path}. "

                             "Must be a local path to a Git repo or a remote URL.")


        # Final check to ensure it's a valid Git repository

        try:

            _ = git.Repo(self.local_repo_path)

        except git.InvalidGitRepositoryError as e:

            raise ValueError(f"The path '{self.local_repo_path}' is not a valid Git repository.") from e


        return self.local_repo_path


    def cleanup(self) -> None:

        """

        Removes the cloned repository directory if it was cloned by this manager.

        """

        if self.is_cloned and self.local_repo_path and os.path.exists(self.local_repo_path):

            print(f"Cleaning up cloned repository: {self.local_repo_path}")

            shutil.rmtree(self.local_repo_path)

            self.local_repo_path = None

            self.is_cloned = False


Repository Traversal and Git Metadata Extraction


Once the repository is acquired, the `GitAnalyzer` module takes over to extract crucial metadata from the Git history. This includes information about contributors, commit patterns, branches, tags (representing releases), and general repository statistics. This data provides foundational context for the LLM's subsequent analysis.


# git_operations.py (continued)


from collections import defaultdict

from datetime import datetime


class GitAnalyzer:

    """

    Analyzes a local Git repository to extract metadata such as contributors,

    commit history, branches, and tags.

    """

    def __init__(self, repo_path: str):

        """

        Initializes the GitAnalyzer with the path to the local repository.


        Args:

            repo_path: The absolute path to the local Git repository.

        """

        try:

            self.repo = git.Repo(repo_path)

            self.repo_path = repo_path

        except git.InvalidGitRepositoryError as e:

            raise ValueError(f"'{repo_path}' is not a valid Git repository.") from e


    def get_contributors(self) -> dict:

        """

        Analyzes commit history to identify contributors and their commit counts.


        Returns:

            A dictionary where keys are contributor names (author name <email>)

            and values are their respective commit counts.

        """

        contributors = defaultdict(int)

        for commit in self.repo.iter_commits():

            author_info = f"{commit.author.name} <{commit.author.email}>"

            contributors[author_info] += 1

        return dict(contributors)


    def get_commit_summary(self, max_commits: int = 50) -> list[dict]:

        """

        Retrieves a summary of recent commits.


        Args:

            max_commits: The maximum number of commits to retrieve.


        Returns:

            A list of dictionaries, each representing a commit with its hash, author,

            date, and message.

        """

        commit_list = []

        for i, commit in enumerate(self.repo.iter_commits()):

            if i >= max_commits:

                break

            commit_list.append({

                "hash": commit.hexsha,

                "author": f"{commit.author.name} <{commit.author.email}>",

                "date": datetime.fromtimestamp(commit.committed_date).strftime('%Y-%m-%d %H:%M:%S'),

                "message": commit.message.strip()

            })

        return commit_list


    def get_branches(self) -> list[str]:

        """

        Lists all local and remote branches in the repository.


        Returns:

            A list of branch names.

        """

        return [head.name for head in self.repo.heads] + \

               [remote.name for remote in self.repo.remotes]


    def get_tags(self) -> list[str]:

        """

        Lists all tags (often representing releases) in the repository.


        Returns:

            A list of tag names.

        """

        return [tag.name for tag in self.repo.tags]


    def get_repo_structure(self) -> str:

        """

        Generates a simplified tree-like representation of the repository's file structure.

        Excludes typical Git-related directories and common build artifacts.


        Returns:

            A string representing the directory tree.

        """

        structure_lines = []

        ignore_patterns = ['.git', '__pycache__', 'venv', '.venv', 'node_modules',

                           'target', 'build', 'dist', '.idea', '.vscode']

        for root, dirs, files in os.walk(self.repo_path):

            # Filter out ignored directories

            dirs[:] = [d for d in dirs if d not in ignore_patterns]


            level = root.replace(self.repo_path, '').count(os.sep)

            indent = '    ' * level

            relative_path = os.path.relpath(root, self.repo_path)

            if relative_path == '.': # Don't print '.' for the root itself

                structure_lines.append(f"{os.path.basename(self.repo_path)}/")

            else:

                structure_lines.append(f"{indent}|-- {os.path.basename(root)}/")

            subindent = '    ' * (level + 1)

            for f in files:

                structure_lines.append(f"{subindent}|-- {f}")

        return "\n".join(structure_lines)



File-Level Analysis and Summarization Strategy


This is the core module addressing the LLM context window limitation. Instead of feeding the entire repository to the LLM, the agent processes files individually. The `FileProcessor` reads file contents, and then the `LLMSummarizer` uses the LLM to generate a concise summary for each file. This approach ensures that the LLM receives manageable chunks of information.

The `LLMClient` acts as an abstraction layer for interacting with different LLM providers, making the system flexible.


# llm_interface.py


import os

from abc import ABC, abstractmethod

from typing import Any, Dict, List, Optional

from openai import OpenAI # type: ignore


from config import LLMConfig


class LLMClient(ABC):

    """

    Abstract base class for LLM clients, defining the common interface.

    """

    @abstractmethod

    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the LLM and returns its completion.


        Args:

            prompt: The text prompt to send to the LLM.

            temperature: Controls the randomness of the output. Higher values mean more random.


        Returns:

            The generated text completion from the LLM.

        """

        pass


class OpenAILLMClient(LLMClient):

    """

    Concrete implementation of LLMClient for OpenAI API.

    """

    def __init__(self, config: LLMConfig):

        """

        Initializes the OpenAI LLM client.


        Args:

            config: An LLMConfig instance containing OpenAI-specific settings.

        """

        if config.llm_type != 'openai':

            raise ValueError("LLMConfig must be of type 'openai' for OpenAILLMClient.")

        if not config.api_key:

            raise ValueError("OpenAI API key is missing in configuration.")

        self.client = OpenAI(api_key=config.api_key)

        self.model_name = config.model_name

        print(f"Initialized OpenAI LLM Client with model: {self.model_name}")


    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the OpenAI API and returns its completion.

        """

        try:

            response = self.client.chat.completions.create(

                model=self.model_name,

                messages=[

                    {"role": "system", "content": "You are a helpful assistant."},

                    {"role": "user", "content": prompt}

                ],

                temperature=temperature,

            )

            return response.choices[0].message.content if response.choices[0].message.content else ""

        except Exception as e:

            print(f"Error calling OpenAI API: {e}")

            return f"Error: Could not get completion from OpenAI API - {e}"


class LocalLLMClient(LLMClient):

    """

    Concrete implementation of LLMClient for local LLM servers (e.g., Ollama).

    Assumes a compatible OpenAI-like API endpoint.

    """

    def __init__(self, config: LLMConfig):

        """

        Initializes the Local LLM client.


        Args:

            config: An LLMConfig instance containing local LLM-specific settings.

        """

        if config.llm_type != 'local':

            raise ValueError("LLMConfig must be of type 'local' for LocalLLMClient.")

        if not config.base_url:

            raise ValueError("Base URL is missing for local LLM configuration.")

        self.client = OpenAI(base_url=config.base_url, api_key="ollama") # API key is often dummy for local

        self.model_name = config.model_name

        print(f"Initialized Local LLM Client with model: {self.model_name} at {config.base_url}")


    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the local LLM server and returns its completion.

        """

        try:

            response = self.client.chat.completions.create(

                model=self.model_name,

                messages=[

                    {"role": "system", "content": "You are a helpful assistant."},

                    {"role": "user", "content": prompt}

                ],

                temperature=temperature,

            )

            return response.choices[0].message.content if response.choices[0].message.content else ""

        except Exception as e:

            print(f"Error calling Local LLM API: {e}")

            return f"Error: Could not get completion from Local LLM API - {e}"



The `FileProcessor` is responsible for reading file content, while the `LLMSummarizer` orchestrates the prompt creation and interaction with the `LLMClient`.


# summarization.py


import os

from typing import Dict, List, Tuple


from llm_interface import LLMClient


class FileProcessor:

    """

    Handles reading and processing of individual files within the repository.

    """

    def __init__(self, repo_root: str):

        """

        Initializes the FileProcessor.


        Args:

            repo_root: The root directory of the Git repository.

        """

        self.repo_root = repo_root


    def read_file_content(self, file_path: str) -> Optional[str]:

        """

        Reads the content of a specified file.

        Handles common encoding issues and skips binary files.


        Args:

            file_path: The absolute path to the file.


        Returns:

            The content of the file as a string, or None if it's a binary file

            or cannot be read.

        """

        if not os.path.exists(file_path) or not os.path.isfile(file_path):

            print(f"Warning: File not found or is not a file: {file_path}")

            return None


        # Heuristic to skip binary files

        mime_type_guess = None

        try:

            import mimetypes

            mime_type_guess, _ = mimetypes.guess_type(file_path)

        except ImportError:

            pass # mimetypes might not be available in some minimal environments


        if mime_type_guess and not mime_type_guess.startswith('text'):

            print(f"Skipping binary file: {file_path} (MIME type: {mime_type_guess})")

            return None


        # Attempt to read as text

        try:

            with open(file_path, 'r', encoding='utf-8') as f:

                return f.read()

        except UnicodeDecodeError:

            print(f"Skipping non-UTF-8 or binary file: {file_path}")

            return None

        except Exception as e:

            print(f"Error reading file {file_path}: {e}")

            return None


class LLMSummarizer:

    """

    Uses an LLM to generate summaries for file contents and aggregated information.

    """

    def __init__(self, llm_client: LLMClient):

        """

        Initializes the LLMSummarizer with an LLM client.


        Args:

            llm_client: An instance of a concrete LLMClient implementation.

        """

        self.llm_client = llm_client


    def summarize_file(self, file_path: str, file_content: str) -> str:

        """

        Generates a concise summary for a single file's content.


        Args:

            file_path: The relative path of the file being summarized.

            file_content: The full content of the file.


        Returns:

            A summary string generated by the LLM.

        """

        prompt = (

            f"You are an expert software engineer tasked with summarizing code and configuration files. "

            f"Provide a concise summary of the purpose, key functionalities, and important configurations "

            f"or dependencies found in the following file. Focus on what this file *does* and its role "

            f"within a larger project. Keep the summary under 150 words.\n\n"

            f"File: {file_path}\n"

            f"Content:\n```\n{file_content}\n```\n\n"

            f"Concise Summary:"

        )

        return self.llm_client.get_completion(prompt)


    def summarize_directory(self, directory_path: str, file_summaries: Dict[str, str]) -> str:

        """

        Generates a summary for a directory based on the summaries of its contained files.


        Args:

            directory_path: The relative path of the directory.

            file_summaries: A dictionary mapping file paths to their summaries within this directory.


        Returns:

            A summary string for the directory.

        """

        if not file_summaries:

            return f"Directory '{directory_path}' contains no relevant files or summaries."


        summaries_text = "\n".join([f"- {path}: {summary}" for path, summary in file_summaries.items()])

        prompt = (

            f"You are an expert software architect analyzing a project structure. "

            f"Based on the following file summaries, provide a concise overview of the purpose "

            f"and primary functionalities of the directory '{directory_path}'. "

            f"Identify any common themes, dependencies, or architectural patterns. "

            f"Keep the summary under 200 words.\n\n"

            f"Directory: {directory_path}\n"

            f"File Summaries:\n{summaries_text}\n\n"

            f"Concise Directory Summary:"

        )

        return self.llm_client.get_completion(prompt)


    def summarize_repository(self,

                             repo_name: str,

                             repo_structure: str,

                             directory_summaries: Dict[str, str],

                             git_metadata: Dict[str, Any]) -> str:

        """

        Generates a comprehensive summary of the entire repository.


        Args:

            repo_name: The name of the repository.

            repo_structure: A string representation of the repository's file structure.

            directory_summaries: A dictionary mapping directory paths to their summaries.

            git_metadata: A dictionary containing aggregated Git metadata (contributors, commits, etc.).


        Returns:

            A comprehensive summary string for the entire repository.

        """

        dir_summaries_text = "\n".join([f"- {path}: {summary}" for path, summary in directory_summaries.items()])

        contributors_text = "\n".join([f"  - {author} ({count} commits)" for author, count in git_metadata.get('contributors', {}).items()])

        recent_commits_text = "\n".join([f"  - {c['date']} by {c['author']}: {c['message']}" for c in git_metadata.get('recent_commits', [])[:5]])

        branches_text = ", ".join(git_metadata.get('branches', []))

        tags_text = ", ".join(git_metadata.get('tags', []))


        prompt = (

            f"You are a highly intelligent AI assistant specializing in software project analysis. "

            f"Your task is to provide a comprehensive and detailed summary of the Git repository named '{repo_name}'. "

            f"Synthesize information from the repository's structure, directory-level summaries, and Git metadata. "

            f"Cover the following aspects:\n"

            f"1.  **Overall Purpose and Key Functionalities:** What is the project about? What problems does it solve?\n"

            f"2.  **Architectural Overview/Structure:** Describe the main components and how they are organized.\n"

            f"3.  **Core Technologies/Dependencies:** Identify programming languages, frameworks, and key libraries.\n"

            f"4.  **Development Environment/Setup:** How would one set up and run this project? (e.g., Docker, `requirements.txt`)\n"

            f"5.  **Key Contributors and Activity:** Who are the main developers and what is the recent activity?\n"

            f"6.  **Release Strategy/Versioning:** How are releases managed (tags, branches)?\n"

            f"7.  **Known Issues/Limitations:** Any explicit mentions of problems or areas for improvement (from README/comments).\n"

            f"8.  **Evolution/Changes:** High-level overview of recent significant changes.\n\n"

            f"Repository Name: {repo_name}\n"

            f"Repository Structure:\n{repo_structure}\n\n"

            f"Directory Summaries:\n{dir_summaries_text}\n\n"

            f"Git Metadata:\n"

            f"  Contributors:\n{contributors_text}\n"

            f"  Recent Commits:\n{recent_commits_text}\n"

            f"  Branches: {branches_text}\n"

            f"  Tags (Releases): {tags_text}\n\n"

            f"Comprehensive Repository Summary:"

        )

        return self.llm_client.get_completion(prompt, temperature=0.2) # Lower temperature for factual summary


Progressive Summarization and Memory


This module is crucial for managing the context window. It stores file-level summaries and then aggregates them into directory-level summaries, and finally into an overall repository summary. This hierarchical summarization ensures that the LLM never receives an overwhelming amount of raw data at once, but rather progressively distilled information. The `SummaryAggregator` orchestrates this process, storing intermediate results.


# summarization.py (continued)


import json


class SummaryAggregator:

    """

    Manages the storage and aggregation of file and directory summaries.

    """

    def __init__(self, output_dir: str):

        """

        Initializes the SummaryAggregator.


        Args:

            output_dir: The directory where summaries will be saved.

        """

        self.output_dir = output_dir

        os.makedirs(output_dir, exist_ok=True)

        self.file_summaries: Dict[str, str] = {}

        self.directory_summaries: Dict[str, str] = {}

        self.repo_summary: Optional[str] = None

        self.git_metadata: Dict[str, Any] = {}


    def add_file_summary(self, relative_path: str, summary: str) -> None:

        """

        Adds a summary for a specific file.


        Args:

            relative_path: The path of the file relative to the repository root.

            summary: The LLM-generated summary for the file.

        """

        self.file_summaries[relative_path] = summary

        self._save_summary(f"file_summary_{relative_path.replace(os.sep, '_').replace('.', '_')}.txt", summary)


    def add_directory_summary(self, relative_path: str, summary: str) -> None:

        """

        Adds a summary for a specific directory.


        Args:

            relative_path: The path of the directory relative to the repository root.

            summary: The LLM-generated summary for the directory.

        """

        self.directory_summaries[relative_path] = summary

        self._save_summary(f"dir_summary_{relative_path.replace(os.sep, '_')}.txt", summary)


    def set_repo_summary(self, summary: str) -> None:

        """

        Sets the final comprehensive repository summary.


        Args:

            summary: The LLM-generated summary for the entire repository.

        """

        self.repo_summary = summary

        self._save_summary("repository_summary.txt", summary)


    def set_git_metadata(self, metadata: Dict[str, Any]) -> None:

        """

        Stores the extracted Git metadata.


        Args:

            metadata: A dictionary containing Git metadata.

        """

        self.git_metadata = metadata

        self._save_summary("git_metadata.json", json.dumps(metadata, indent=2))


    def get_file_summaries_for_directory(self, relative_dir_path: str) -> Dict[str, str]:

        """

        Retrieves file summaries belonging to a specific directory.


        Args:

            relative_dir_path: The relative path of the directory.


        Returns:

            A dictionary of file paths to summaries within that directory.

        """

        if relative_dir_path == ".": # Root directory

            return {p: s for p, s in self.file_summaries.items() if os.sep not in p and p != "README.md"}

        

        # Include README.md if it's in the root, but not in a sub-directory summary

        if relative_dir_path == "": # Special case for root

            return {p:s for p,s in self.file_summaries.items() if not os.path.dirname(p)}


        # For subdirectories, filter files that start with the directory path

        prefix = relative_dir_path + os.sep

        return {p: s for p, s in self.file_summaries.items() if p.startswith(prefix) and os.path.dirname(p) == relative_dir_path}



    def _save_summary(self, filename: str, content: str) -> None:

        """

        Helper method to save a summary to a file.

        """

        file_path = os.path.join(self.output_dir, filename)

        try:

            with open(file_path, 'w', encoding='utf-8') as f:

                f.write(content)

            print(f"Saved summary to {file_path}")

        except Exception as e:

            print(f"Error saving summary to {file_path}: {e}")


Output Generation


The final stage involves compiling all the gathered and summarized information into a coherent, human-readable report. This report should present the repository's structure, purpose, key features, development environment, contributors, and any identified issues or release information in an organized manner. The `GitAnalysisAgent` itself will handle the final report generation by orchestrating the collection of all summaries.


The Git Analysis Agent (Orchestrator)


The `GitAnalysisAgent` class serves as the main orchestrator, tying all the modules together. It manages the entire workflow, from repository acquisition to final report generation, ensuring that each step is executed logically and efficiently.


# agent.py


import os

from typing import Any, Dict


from config import AgentConfig, LLMConfig

from git_operations import GitRepositoryManager, GitAnalyzer

from llm_interface import LLMClient, OpenAILLMClient, LocalLLMClient

from summarization import FileProcessor, LLMSummarizer, SummaryAggregator


class GitAnalysisAgent:

    """

    The main orchestrator for the LLM-based Git analysis agent.

    Coordinates repository acquisition, Git metadata extraction, file processing,

    LLM summarization, and report generation.

    """

    def __init__(self, config: AgentConfig):

        """

        Initializes the GitAnalysisAgent with the provided configuration.


        Args:

            config: An instance of AgentConfig containing all necessary settings.

        """

        self.config = config

        self.repo_manager = GitRepositoryManager(config.repo_path, config.output_dir)

        self.llm_client: LLMClient

        if config.llm_config.llm_type == 'openai':

            self.llm_client = OpenAILLMClient(config.llm_config)

        elif config.llm_config.llm_type == 'local':

            self.llm_client = LocalLLMClient(config.llm_config)

        else:

            raise ValueError(f"Unsupported LLM type: {config.llm_config.llm_type}")


        self.llm_summarizer = LLMSummarizer(self.llm_client)

        self.summary_aggregator = SummaryAggregator(config.output_dir)

        self.local_repo_path: Optional[str] = None

        self.git_analyzer: Optional[GitAnalyzer] = None

        self.file_processor: Optional[FileProcessor] = None


    def analyze_repository(self) -> str:

        """

        Executes the full repository analysis workflow.


        Returns:

            The final comprehensive repository summary as a string.

        """

        print("\n--- Starting Repository Analysis ---")

        try:

            # 1. Acquire Repository

            self.local_repo_path = self.repo_manager.acquire_repository()

            self.git_analyzer = GitAnalyzer(self.local_repo_path)

            self.file_processor = FileProcessor(self.local_repo_path)


            # 2. Extract Git Metadata

            print("\n--- Extracting Git Metadata ---")

            git_metadata = self._extract_git_metadata()

            self.summary_aggregator.set_git_metadata(git_metadata)


            # 3. Analyze and Summarize Files

            print("\n--- Analyzing and Summarizing Files ---")

            self._analyze_and_summarize_files()


            # 4. Summarize Directories

            print("\n--- Summarizing Directories ---")

            self._summarize_directories()


            # 5. Generate Final Repository Summary

            print("\n--- Generating Final Repository Summary ---")

            repo_name = os.path.basename(self.local_repo_path)

            repo_structure = self.git_analyzer.get_repo_structure() if self.git_analyzer else "Could not generate structure."

            final_repo_summary = self.llm_summarizer.summarize_repository(

                repo_name=repo_name,

                repo_structure=repo_structure,

                directory_summaries=self.summary_aggregator.directory_summaries,

                git_metadata=git_metadata

            )

            self.summary_aggregator.set_repo_summary(final_repo_summary)

            print("\n--- Repository Analysis Complete ---")

            return final_repo_summary


        except Exception as e:

            print(f"An error occurred during analysis: {e}")

            return f"Analysis failed due to an error: {e}"

        finally:

            self.repo_manager.cleanup() # Ensure cloned repos are removed


    def _extract_git_metadata(self) -> Dict[str, Any]:

        """Helper to extract and return Git metadata."""

        if not self.git_analyzer:

            raise RuntimeError("GitAnalyzer not initialized.")


        metadata = {

            "contributors": self.git_analyzer.get_contributors(),

            "recent_commits": self.git_analyzer.get_commit_summary(max_commits=10),

            "branches": self.git_analyzer.get_branches(),

            "tags": self.git_analyzer.get_tags(),

            "repo_structure_preview": self.git_analyzer.get_repo_structure() # Store a preview for context

        }

        print("Git metadata extracted.")

        return metadata


    def _analyze_and_summarize_files(self) -> None:

        """

        Traverses the repository, reads files, and generates LLM summaries for each.

        """

        if not self.local_repo_path or not self.file_processor:

            raise RuntimeError("Repository path or file processor not initialized.")


        # Walk through the repository, excluding common ignored directories

        ignore_dirs = ['.git', '__pycache__', 'venv', '.venv', 'node_modules',

                       'target', 'build', 'dist', '.idea', '.vscode']

        

        # Add common documentation files to process first, as they often contain purpose

        priority_files = ['README.md', 'Dockerfile', 'requirements.txt', 'package.json', 'pom.xml']

        processed_files = set()


        # Process priority files first if they exist at the root

        for p_file in priority_files:

            abs_path = os.path.join(self.local_repo_path, p_file)

            if os.path.exists(abs_path) and os.path.isfile(abs_path):

                relative_path = os.path.relpath(abs_path, self.local_repo_path)

                print(f"Processing priority file: {relative_path}")

                content = self.file_processor.read_file_content(abs_path)

                if content:

                    summary = self.llm_summarizer.summarize_file(relative_path, content)

                    self.summary_aggregator.add_file_summary(relative_path, summary)

                processed_files.add(relative_path)



        for root, dirs, files in os.walk(self.local_repo_path):

            # Modify dirs in-place to prune traversal

            dirs[:] = [d for d in dirs if d not in ignore_dirs]


            for file_name in files:

                abs_file_path = os.path.join(root, file_name)

                relative_file_path = os.path.relpath(abs_file_path, self.local_repo_path)


                if relative_file_path in processed_files:

                    continue # Skip files already processed as priority


                # Skip common non-source files or very large files

                if any(relative_file_path.endswith(ext) for ext in ['.png', '.jpg', '.jpeg', '.gif', '.bin', '.zip', '.tar.gz', '.log']) or \

                   os.path.getsize(abs_file_path) > 1024 * 1024: # e.g., 1MB limit for text files

                    print(f"Skipping large or non-text file: {relative_file_path}")

                    continue


                print(f"Processing file: {relative_file_path}")

                content = self.file_processor.read_file_content(abs_file_path)

                if content:

                    summary = self.llm_summarizer.summarize_file(relative_file_path, content)

                    self.summary_aggregator.add_file_summary(relative_file_path, summary)

                processed_files.add(relative_file_path)


    def _summarize_directories(self) -> None:

        """

        Generates summaries for directories based on their contained file summaries.

        Processes directories from deepest to shallowest to ensure dependencies.

        """

        if not self.local_repo_path:

            raise RuntimeError("Repository path not initialized.")


        # Get all unique directory paths that have files summarized

        all_file_paths = self.summary_aggregator.file_summaries.keys()

        all_dirs = set()

        for f_path in all_file_paths:

            current_dir = os.path.dirname(f_path)

            while current_dir and current_dir != '.':

                all_dirs.add(current_dir)

                current_dir = os.path.dirname(current_dir)

        

        # Ensure root directory is included if there are any files

        if all_file_paths:

            all_dirs.add(".") # Represents the root directory


        # Sort directories by depth (deepest first) to summarize from bottom-up

        sorted_dirs = sorted(list(all_dirs), key=lambda x: x.count(os.sep), reverse=True)


        for dir_path in sorted_dirs:

            print(f"Summarizing directory: {dir_path if dir_path != '.' else 'root'}")

            file_summaries_in_dir = self.summary_aggregator.get_file_summaries_for_directory(dir_path)

            

            # Include sub-directory summaries in the current directory's context

            # This is key for progressive summarization

            sub_dir_summaries_for_context = {}

            for existing_dir, existing_summary in self.summary_aggregator.directory_summaries.items():

                if existing_dir.startswith(dir_path + os.sep):

                    sub_dir_summaries_for_context[existing_dir] = existing_summary

            

            combined_context = {**file_summaries_in_dir, **sub_dir_summaries_for_context}


            if combined_context:

                dir_summary = self.llm_summarizer.summarize_directory(dir_path, combined_context)

                self.summary_aggregator.add_directory_summary(dir_path, dir_summary)

            else:

                print(f"No relevant file or sub-directory summaries found for {dir_path}. Skipping directory summary.")


Running Example and Usage


To demonstrate the agent's capabilities, we will use a small, self-contained Python project. This project includes a `README.md`, `requirements.txt`, `Dockerfile`, and a `src` directory with a `main.py` and `utils.py`.

First, let us define the structure and content of our example repository. You would typically create these files in a directory, initialize a Git repository, and make a few commits.


my_simple_project/

├── .gitignore

├── Dockerfile

├── README.md

├── requirements.txt

└── src/

    ├── __init__.py

    ├── main.py

    └── utils.py



Content for `my_simple_project` files:


`README.md`:

# My Simple Project


This is a basic Python project demonstrating a simple utility.

It includes a main script and a utility module.


## Features

- Greets a user.

- Performs a simple arithmetic operation.


## Setup

1. Clone the repository.

2. Install dependencies: `pip install -r requirements.txt`

3. Run: `python src/main.py`


## Known Issues

- The arithmetic operation currently only supports integers.


`requirements.txt`:


# No external dependencies for this simple example

# But in a real project, this would list packages like:

# requests==2.28.1

# numpy==1.23.5


`Dockerfile`:


# Use an official Python runtime as a parent image

FROM python:3.9-slim-buster


# Set the working directory in the container

WORKDIR /app


# Copy the current directory contents into the container at /app

COPY . /app


# Install any needed packages specified in requirements.txt

RUN pip install --no-cache-dir -r requirements.txt


# Make port 80 available to the world outside this container

# EXPOSE 80


# Run main.py when the container launches

CMD ["python", "src/main.py"]


__init__.py:

`src/__init__.py`: (This file can be empty, its purpose is to mark `src` as a Python package)



`src/main.py`:


# src/main.py


from src.utils import add_numbers, greet


def run_application():

    """

    Main function to run the simple application logic.

    """

    print("Starting My Simple Project application...")

    name = "Alice"

    greet(name)


    num1 = 10

    num2 = 5

    result = add_numbers(num1, num2)

    print(f"The sum of {num1} and {num2} is: {result}")

    print("Application finished.")


if __name__ == "__main__":

    run_application()


`src/utils.py`:


# src/utils.py


def greet(name: str) -> None:

    """

    Prints a greeting message to the console.


    Args:

        name: The name of the person to greet.

    """

    print(f"Hello, {name}! Welcome to the utility module.")


def add_numbers(a: int, b: int) -> int:

    """

    Adds two integer numbers and returns their sum.


    Args:

        a: The first integer.

        b: The second integer.


    Returns:

        The sum of a and b.

    """

    return a + b


`.gitignore`:


# Byte-compiled / optimized / DLL files

__pycache__/

*.pyc

*.pyd

*.pyo


# Virtual environment

venv/

.venv/


# Editor backup files

*~


To run the analysis, you would typically have a `main.py` script that initializes the agent with the desired configuration. Ensure you have `gitpython` and `openai` libraries installed (`pip install GitPython openai`). For local LLMs, you would need an Ollama server running and a model pulled.


# main.py


import os

from config import AgentConfig, LLMConfig

from agent import GitAnalysisAgent


def setup_example_repo(repo_name: str = "my_simple_project") -> str:

    """

    Creates a dummy Git repository for demonstration purposes.

    """

    repo_path = os.path.join(os.getcwd(), repo_name)

    if os.path.exists(repo_path):

        import shutil

        shutil.rmtree(repo_path) # Clean up previous run


    os.makedirs(repo_path, exist_ok=True)

    

    # Create files

    with open(os.path.join(repo_path, "README.md"), "w") as f:

        f.write("# My Simple Project\n\nThis is a basic Python project demonstrating a simple utility.\nIt includes a main script and a utility module.\n\n## Features\n- Greets a user.\n- Performs a simple arithmetic operation.\n\n## Setup\n1. Clone the repository.\n2. Install dependencies: `pip install -r requirements.txt`\n3. Run: `python src/main.py`\n\n## Known Issues\n- The arithmetic operation currently only supports integers.\n")

    with open(os.path.join(repo_path, "requirements.txt"), "w") as f:

        f.write("# No external dependencies for this simple example\n")

    with open(os.path.join(repo_path, "Dockerfile"), "w") as f:

        f.write("FROM python:3.9-slim-buster\nWORKDIR /app\nCOPY . /app\nRUN pip install --no-cache-dir -r requirements.txt\nCMD [\"python\", \"src/main.py\"]\n")

    with open(os.path.join(repo_path, ".gitignore"), "w") as f:

        f.write("__pycache__/\n*.pyc\nvenv/\n")


    src_dir = os.path.join(repo_path, "src")

    os.makedirs(src_dir, exist_ok=True)

    with open(os.path.join(src_dir, "__init__.py"), "w") as f:

        f.write("")

    with open(os.path.join(src_dir, "main.py"), "w") as f:

        f.write("from src.utils import add_numbers, greet\n\ndef run_application():\n    print(\"Starting My Simple Project application...\")\n    name = \"Alice\"\n    greet(name)\n    num1 = 10\n    num2 = 5\n    result = add_numbers(num1, num2)\n    print(f\"The sum of {num1} and {num2} is: {result}\")\n    print(\"Application finished.\")\n\nif __name__ == \"__main__\":\n    run_application()\n")

    with open(os.path.join(src_dir, "utils.py"), "w") as f:

        f.write("def greet(name: str) -> None:\n    print(f\"Hello, {name}! Welcome to the utility module.\")\n\ndef add_numbers(a: int, b: int) -> int:\n    return a + b\n")


    # Initialize Git repository and make an initial commit

    import git # type: ignore

    repo = git.Repo.init(repo_path)

    repo.index.add(["."])

    repo.index.commit("Initial commit: Set up basic project structure and files")


    # Simulate another commit

    with open(os.path.join(src_dir, "main.py"), "a") as f:

        f.write("\n# Added a comment to simulate a change\n")

    repo.index.add([os.path.join(src_dir, "main.py")])

    repo.index.commit("Feature: Added a comment to main.py")


    print(f"Example repository '{repo_name}' created and initialized at {repo_path}")

    return repo_path


def main():

    """

    Main function to configure and run the Git analysis agent.

    """

    # --- IMPORTANT: Configure your LLM here ---

    # For OpenAI: Ensure OPENAI_API_KEY environment variable is set

    # llm_config = LLMConfig(llm_type='openai', model_name='gpt-4o-mini')


    # For Local LLM (e.g., Ollama running 'llama3' model at default port)

    # Make sure Ollama is running and you have 'llama3' model pulled:

    # ollama run llama3

    llm_config = LLMConfig(llm_type='local', model_name='llama3', base_url='http://localhost:11434/v1')


    # --- Setup example local repository ---

    local_repo_path = setup_example_repo("my_simple_project_to_analyze")

    # Alternatively, use a remote repository:

    # remote_repo_url = "https://github.com/git/git.git" # Example remote repo (will be cloned)

    # agent_config = AgentConfig(repo_path=remote_repo_url, llm_config=llm_config)


    agent_config = AgentConfig(repo_path=local_repo_path, llm_config=llm_config)


    agent = GitAnalysisAgent(agent_config)

    final_summary = agent.analyze_repository()


    print("\n==============================================================================")

    print("FINAL REPOSITORY ANALYSIS REPORT")

    print("==============================================================================")

    print(final_summary)

    print("==============================================================================")

    print(f"Detailed summaries are saved in: {agent_config.output_dir}")


if __name__ == "__main__":

    main()


When `main.py` is executed, it first sets up the example Git repository locally. Then, it initializes the `AgentConfig` with the path to this local repository and the chosen LLM configuration. The `GitAnalysisAgent` is instantiated and its `analyze_repository` method is called. This method orchestrates the entire process: cloning (if remote), extracting Git metadata, iterating through files to generate individual summaries, aggregating these into directory summaries, and finally synthesizing all this information into a comprehensive repository-level summary using the LLM. All intermediate and final summaries are saved to the `analysis_results` directory.

This agent provides a powerful tool for quickly gaining deep insights into any Git repository, significantly reducing the manual effort required for understanding complex codebases and their development history.


ADDENDUM: Full Running Example Code


To make the running example fully self-contained and executable, here are all the Python files that constitute the agent and the `main.py` script to run it.


1. `config.py`


# config.py


import os

from typing import Optional


class LLMConfig:

    """

    Encapsulates configuration settings for the Large Language Model.

    Supports both remote API-based LLMs and local server-based LLMs.

    """

    def __init__(self,

                 llm_type: str, # 'openai', 'local'

                 api_key: Optional[str] = None,

                 model_name: str = "gpt-4o-mini",

                 base_url: Optional[str] = None):

        """

        Initializes the LLM configuration.


        Args:

            llm_type: Specifies the type of LLM ('openai' for remote API, 'local' for a local server).

            api_key: The API key for remote LLM services (e.g., OpenAI API key).

                     This should ideally be loaded from environment variables for security.

            model_name: The specific model identifier to use (e.g., "gpt-4o-mini", "llama3").

            base_url: The base URL for local LLM servers (e.g., "http://localhost:11434/v1").

        """

        if llm_type not in ['openai', 'local']:

            raise ValueError("llm_type must be 'openai' or 'local'")


        self.llm_type = llm_type

        self.api_key = api_key if api_key else os.getenv("OPENAI_API_KEY")

        self.model_name = model_name

        self.base_url = base_url


        if self.llm_type == 'openai' and not self.api_key:

            raise ValueError("OPENAI_API_KEY environment variable or api_key must be set for OpenAI LLM type.")

        if self.llm_type == 'local' and not self.base_url:

            raise ValueError("base_url must be set for local LLM type.")


    def __repr__(self) -> str:

        """Provides a string representation of the LLMConfig object."""

        return (f"LLMConfig(llm_type='{self.llm_type}', model_name='{self.model_name}', "

                f"base_url='{self.base_url if self.base_url else 'N/A'}')")


class AgentConfig:

    """

    Main configuration class for the Git analysis agent.

    Holds repository path and LLM configuration.

    """

    def __init__(self,

                 repo_path: str,

                 llm_config: LLMConfig,

                 output_dir: str = "analysis_results"):

        """

        Initializes the agent configuration.


        Args:

            repo_path: The path to the local Git repository or its remote URL.

            llm_config: An instance of LLMConfig containing LLM-specific settings.

            output_dir: The directory where analysis results and summaries will be stored.

        """

        self.repo_path = repo_path

        self.llm_config = llm_config

        self.output_dir = output_dir


        # Ensure output directory exists

        os.makedirs(self.output_dir, exist_ok=True)


    def __repr__(self) -> str:

        """Provides a string representation of the AgentConfig object."""

        return (f"AgentConfig(repo_path='{self.repo_path}', llm_config={self.llm_config}, "

                f"output_dir='{self.output_dir}')")




2. `git_operations.py`


# git_operations.py


import os

import shutil

import git # type: ignore # gitpython library

from typing import Optional, Any, Dict

from collections import defaultdict

from datetime import datetime


class GitRepositoryManager:

    """

    Manages the acquisition and cleanup of Git repositories.

    Handles cloning remote repositories and validating local paths.

    """

    def __init__(self, repo_source_path: str, clone_dir: str = "cloned_repos"):

        """

        Initializes the GitRepositoryManager.


        Args:

            repo_source_path: The path to the local Git repository or its remote URL.

            clone_dir: The directory where remote repositories will be cloned.

        """

        self.repo_source_path = repo_source_path

        self.clone_dir = clone_dir

        self.local_repo_path: Optional[str] = None

        self.is_cloned = False


        os.makedirs(self.clone_dir, exist_ok=True)


    def acquire_repository(self) -> str:

        """

        Acquires the Git repository, either by using a local path or cloning a remote one.


        Returns:

            The absolute path to the local Git repository directory.


        Raises:

            ValueError: If the provided path is not a valid Git repository.

            git.InvalidGitRepositoryError: If cloning fails or the local path is not a Git repo.

            git.GitCommandError: If a git command fails during cloning.

        """

        if os.path.isdir(self.repo_source_path) and \

           os.path.exists(os.path.join(self.repo_source_path, '.git')):

            # It's already a local Git repository

            self.local_repo_path = os.path.abspath(self.repo_source_path)

            print(f"Using local repository at: {self.local_repo_path}")

        elif self.repo_source_path.startswith(('http://', 'https://', 'git@')):

            # It's a remote URL, clone it

            repo_name = self.repo_source_path.split('/')[-1].replace('.git', '')

            target_path = os.path.join(self.clone_dir, repo_name)


            if os.path.exists(target_path):

                print(f"Repository already cloned to {target_path}. Pulling latest changes...")

                repo = git.Repo(target_path)

                origin = repo.remotes.origin

                origin.pull()

            else:

                print(f"Cloning remote repository {self.repo_source_path} to {target_path}...")

                git.Repo.clone_from(self.repo_source_path, target_path)

            self.local_repo_path = os.path.abspath(target_path)

            self.is_cloned = True

            print(f"Repository successfully cloned/updated at: {self.local_repo_path}")

        else:

            raise ValueError(f"Invalid repository source: {self.repo_source_path}. "

                             "Must be a local path to a Git repo or a remote URL.")


        # Final check to ensure it's a valid Git repository

        try:

            _ = git.Repo(self.local_repo_path)

        except git.InvalidGitRepositoryError as e:

            raise ValueError(f"The path '{self.local_repo_path}' is not a valid Git repository.") from e


        return self.local_repo_path


    def cleanup(self) -> None:

        """

        Removes the cloned repository directory if it was cloned by this manager.

        """

        if self.is_cloned and self.local_repo_path and os.path.exists(self.local_repo_path):

            print(f"Cleaning up cloned repository: {self.local_repo_path}")

            shutil.rmtree(self.local_repo_path)

            self.local_repo_path = None

            self.is_cloned = False


class GitAnalyzer:

    """

    Analyzes a local Git repository to extract metadata such as contributors,

    commit history, branches, and tags.

    """

    def __init__(self, repo_path: str):

        """

        Initializes the GitAnalyzer with the path to the local repository.


        Args:

            repo_path: The absolute path to the local Git repository.

        """

        try:

            self.repo = git.Repo(repo_path)

            self.repo_path = repo_path

        except git.InvalidGitRepositoryError as e:

            raise ValueError(f"'{repo_path}' is not a valid Git repository.") from e


    def get_contributors(self) -> dict:

        """

        Analyzes commit history to identify contributors and their commit counts.


        Returns:

            A dictionary where keys are contributor names (author name <email>)

            and values are their respective commit counts.

        """

        contributors = defaultdict(int)

        for commit in self.repo.iter_commits():

            author_info = f"{commit.author.name} <{commit.author.email}>"

            contributors[author_info] += 1

        return dict(contributors)


    def get_commit_summary(self, max_commits: int = 50) -> list[dict]:

        """

        Retrieves a summary of recent commits.


        Args:

            max_commits: The maximum number of commits to retrieve.


        Returns:

            A list of dictionaries, each representing a commit with its hash, author,

            date, and message.

        """

        commit_list = []

        for i, commit in enumerate(self.repo.iter_commits()):

            if i >= max_commits:

                break

            commit_list.append({

                "hash": commit.hexsha,

                "author": f"{commit.author.name} <{commit.author.email}>",

                "date": datetime.fromtimestamp(commit.committed_date).strftime('%Y-%m-%d %H:%M:%S'),

                "message": commit.message.strip()

            })

        return commit_list


    def get_branches(self) -> list[str]:

        """

        Lists all local and remote branches in the repository.


        Returns:

            A list of branch names.

        """

        return [head.name for head in self.repo.heads] + \

               [remote.name for remote in self.repo.remotes]


    def get_tags(self) -> list[str]:

        """

        Lists all tags (often representing releases) in the repository.


        Returns:

            A list of tag names.

        """

        return [tag.name for tag in self.repo.tags]


    def get_repo_structure(self) -> str:

        """

        Generates a simplified tree-like representation of the repository's file structure.

        Excludes typical Git-related directories and common build artifacts.


        Returns:

            A string representing the directory tree.

        """

        structure_lines = []

        ignore_patterns = ['.git', '__pycache__', 'venv', '.venv', 'node_modules',

                           'target', 'build', 'dist', '.idea', '.vscode']

        for root, dirs, files in os.walk(self.repo_path):

            # Filter out ignored directories

            dirs[:] = [d for d in dirs if d not in ignore_patterns]


            level = root.replace(self.repo_path, '').count(os.sep)

            indent = '    ' * level

            relative_path = os.path.relpath(root, self.repo_path)

            if relative_path == '.': # Don't print '.' for the root itself

                structure_lines.append(f"{os.path.basename(self.repo_path)}/")

            else:

                structure_lines.append(f"{indent}|-- {os.path.basename(root)}/")

            subindent = '    ' * (level + 1)

            for f in files:

                structure_lines.append(f"{subindent}|-- {f}")

        return "\n".join(structure_lines)




3. `llm_interface.py`


# llm_interface.py


import os

from abc import ABC, abstractmethod

from typing import Any, Dict, List, Optional

from openai import OpenAI # type: ignore


from config import LLMConfig


class LLMClient(ABC):

    """

    Abstract base class for LLM clients, defining the common interface.

    """

    @abstractmethod

    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the LLM and returns its completion.


        Args:

            prompt: The text prompt to send to the LLM.

            temperature: Controls the randomness of the output. Higher values mean more random.


        Returns:

            The generated text completion from the LLM.

        """

        pass


class OpenAILLMClient(LLMClient):

    """

    Concrete implementation of LLMClient for OpenAI API.

    """

    def __init__(self, config: LLMConfig):

        """

        Initializes the OpenAI LLM client.


        Args:

            config: An LLMConfig instance containing OpenAI-specific settings.

        """

        if config.llm_type != 'openai':

            raise ValueError("LLMConfig must be of type 'openai' for OpenAILLMClient.")

        if not config.api_key:

            raise ValueError("OpenAI API key is missing in configuration.")

        self.client = OpenAI(api_key=config.api_key)

        self.model_name = config.model_name

        print(f"Initialized OpenAI LLM Client with model: {self.model_name}")


    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the OpenAI API and returns its completion.

        """

        try:

            response = self.client.chat.completions.create(

                model=self.model_name,

                messages=[

                    {"role": "system", "content": "You are a helpful assistant."},

                    {"role": "user", "content": prompt}

                ],

                temperature=temperature,

            )

            return response.choices[0].message.content if response.choices[0].message.content else ""

        except Exception as e:

            print(f"Error calling OpenAI API: {e}")

            return f"Error: Could not get completion from OpenAI API - {e}"


class LocalLLMClient(LLMClient):

    """

    Concrete implementation of LLMClient for local LLM servers (e.g., Ollama).

    Assumes a compatible OpenAI-like API endpoint.

    """

    def __init__(self, config: LLMConfig):

        """

        Initializes the Local LLM client.


        Args:

            config: An LLMConfig instance containing local LLM-specific settings.

        """

        if config.llm_type != 'local':

            raise ValueError("LLMConfig must be of type 'local' for LocalLLMClient.")

        if not config.base_url:

            raise ValueError("Base URL is missing for local LLM configuration.")

        self.client = OpenAI(base_url=config.base_url, api_key="ollama") # API key is often dummy for local

        self.model_name = config.model_name

        print(f"Initialized Local LLM Client with model: {self.model_name} at {config.base_url}")


    def get_completion(self, prompt: str, temperature: float = 0.7) -> str:

        """

        Sends a prompt to the local LLM server and returns its completion.

        """

        try:

            response = self.client.chat.completions.create(

                model=self.model_name,

                messages=[

                    {"role": "system", "content": "You are a helpful assistant."},

                    {"role": "user", "content": prompt}

                ],

                temperature=temperature,

            )

            return response.choices[0].message.content if response.choices[0].message.content else ""

        except Exception as e:

            print(f"Error calling Local LLM API: {e}")

            return f"Error: Could not get completion from Local LLM API - {e}"



4. `summarization.py`

```python

# summarization.py


import os

import json

import mimetypes # Used for file type guessing

from typing import Dict, List, Tuple, Any, Optional


from llm_interface import LLMClient


class FileProcessor:

    """

    Handles reading and processing of individual files within the repository.

    """

    def __init__(self, repo_root: str):

        """

        Initializes the FileProcessor.


        Args:

            repo_root: The root directory of the Git repository.

        """

        self.repo_root = repo_root


    def read_file_content(self, file_path: str) -> Optional[str]:

        """

        Reads the content of a specified file.

        Handles common encoding issues and skips binary files.


        Args:

            file_path: The absolute path to the file.


        Returns:

            The content of the file as a string, or None if it's a binary file

            or cannot be read.

        """

        if not os.path.exists(file_path) or not os.path.isfile(file_path):

            print(f"Warning: File not found or is not a file: {file_path}")

            return None


        # Heuristic to skip binary files

        mime_type_guess = None

        try:

            mime_type_guess, _ = mimetypes.guess_type(file_path)

        except ImportError:

            pass # mimetypes might not be available in some minimal environments


        if mime_type_guess and not mime_type_guess.startswith('text'):

            print(f"Skipping binary file: {file_path} (MIME type: {mime_type_guess})")

            return None


        # Attempt to read as text

        try:

            with open(file_path, 'r', encoding='utf-8') as f:

                return f.read()

        except UnicodeDecodeError:

            print(f"Skipping non-UTF-8 or binary file: {file_path}")

            return None

        except Exception as e:

            print(f"Error reading file {file_path}: {e}")

            return None


class LLMSummarizer:

    """

    Uses an LLM to generate summaries for file contents and aggregated information.

    """

    def __init__(self, llm_client: LLMClient):

        """

        Initializes the LLMSummarizer with an LLM client.


        Args:

            llm_client: An instance of a concrete LLMClient implementation.

        """

        self.llm_client = llm_client


    def summarize_file(self, file_path: str, file_content: str) -> str:

        """

        Generates a concise summary for a single file's content.


        Args:

            file_path: The relative path of the file being summarized.

            file_content: The full content of the file.


        Returns:

            A summary string generated by the LLM.

        """

        prompt = (

            f"You are an expert software engineer tasked with summarizing code and configuration files. "

            f"Provide a concise summary of the purpose, key functionalities, and important configurations "

            f"or dependencies found in the following file. Focus on what this file *does* and its role "

            f"within a larger project. Keep the summary under 150 words.\n\n"

            f"File: {file_path}\n"

            f"Content:\n```\n{file_content}\n```\n\n"

            f"Concise Summary:"

        )

        return self.llm_client.get_completion(prompt)


    def summarize_directory(self, directory_path: str, combined_context: Dict[str, str]) -> str:

        """

        Generates a summary for a directory based on the summaries of its contained files and sub-directories.


        Args:

            directory_path: The relative path of the directory.

            combined_context: A dictionary mapping file/sub-directory paths to their summaries within this directory.


        Returns:

            A summary string for the directory.

        """

        if not combined_context:

            return f"Directory '{directory_path}' contains no relevant files or summaries."


        summaries_text = "\n".join([f"- {path}: {summary}" for path, summary in combined_context.items()])

        

        dir_name_display = directory_path if directory_path != "." else "the root directory"


        prompt = (

            f"You are an expert software architect analyzing a project structure. "

            f"Based on the following file and sub-directory summaries, provide a concise overview of the purpose "

            f"and primary functionalities of {dir_name_display}. "

            f"Identify any common themes, dependencies, or architectural patterns. "

            f"Keep the summary under 200 words.\n\n"

            f"Directory: {dir_name_display}\n"

            f"Contextual Summaries:\n{summaries_text}\n\n"

            f"Concise Directory Summary:"

        )

        return self.llm_client.get_completion(prompt)


    def summarize_repository(self,

                             repo_name: str,

                             repo_structure: str,

                             directory_summaries: Dict[str, str],

                             git_metadata: Dict[str, Any]) -> str:

        """

        Generates a comprehensive summary of the entire repository.


        Args:

            repo_name: The name of the repository.

            repo_structure: A string representation of the repository's file structure.

            directory_summaries: A dictionary mapping directory paths to their summaries.

            git_metadata: A dictionary containing aggregated Git metadata (contributors, commits, etc.).


        Returns:

            A comprehensive summary string for the entire repository.

        """

        dir_summaries_text = "\n".join([f"- {path}: {summary}" for path, summary in directory_summaries.items()])

        contributors_text = "\n".join([f"  - {author} ({count} commits)" for author, count in git_metadata.get('contributors', {}).items()])

        recent_commits_text = "\n".join([f"  - {c['date']} by {c['author']}: {c['message']}" for c in git_metadata.get('recent_commits', [])[:5]])

        branches_text = ", ".join(git_metadata.get('branches', []))

        tags_text = ", ".join(git_metadata.get('tags', []))


        prompt = (

            f"You are a highly intelligent AI assistant specializing in software project analysis. "

            f"Your task is to provide a comprehensive and detailed summary of the Git repository named '{repo_name}'. "

            f"Synthesize information from the repository's structure, directory-level summaries, and Git metadata. "

            f"Cover the following aspects:\n"

            f"1.  **Overall Purpose and Key Functionalities:** What is the project about? What problems does it solve?\n"

            f"2.  **Architectural Overview/Structure:** Describe the main components and how they are organized.\n"

            f"3.  **Core Technologies/Dependencies:** Identify programming languages, frameworks, and key libraries.\n"

            f"4.  **Development Environment/Setup:** How would one set up and run this project? (e.g., Docker, `requirements.txt`)\n"

            f"5.  **Key Contributors and Activity:** Who are the main developers and what is the recent activity?\n"

            f"6.  **Release Strategy/Versioning:** How are releases managed (tags, branches)?\n"

            f"7.  **Known Issues/Limitations:** Any explicit mentions of problems or areas for improvement (from README/comments).\n"

            f"8.  **Evolution/Changes:** High-level overview of recent significant changes.\n\n"

            f"Repository Name: {repo_name}\n"

            f"Repository Structure:\n{repo_structure}\n\n"

            f"Directory Summaries:\n{dir_summaries_text}\n\n"

            f"Git Metadata:\n"

            f"  Contributors:\n{contributors_text}\n"

            f"  Recent Commits:\n{recent_commits_text}\n"

            f"  Branches: {branches_text}\n"

            f"  Tags (Releases): {tags_text}\n\n"

            f"Comprehensive Repository Summary:"

        )

        return self.llm_client.get_completion(prompt, temperature=0.2) # Lower temperature for factual summary


class SummaryAggregator:

    """

    Manages the storage and aggregation of file and directory summaries.

    """

    def __init__(self, output_dir: str):

        """

        Initializes the SummaryAggregator.


        Args:

            output_dir: The directory where summaries will be saved.

        """

        self.output_dir = output_dir

        os.makedirs(output_dir, exist_ok=True)

        self.file_summaries: Dict[str, str] = {}

        self.directory_summaries: Dict[str, str] = {}

        self.repo_summary: Optional[str] = None

        self.git_metadata: Dict[str, Any] = {}


    def add_file_summary(self, relative_path: str, summary: str) -> None:

        """

        Adds a summary for a specific file.


        Args:

            relative_path: The path of the file relative to the repository root.

            summary: The LLM-generated summary for the file.

        """

        self.file_summaries[relative_path] = summary

        # Sanitize path for filename

        safe_filename = relative_path.replace(os.sep, '_').replace('.', '_')

        self._save_summary(f"file_summary_{safe_filename}.txt", summary)


    def add_directory_summary(self, relative_path: str, summary: str) -> None:

        """

        Adds a summary for a specific directory.


        Args:

            relative_path: The path of the directory relative to the repository root.

            summary: The LLM-generated summary for the directory.

        """

        self.directory_summaries[relative_path] = summary

        # Sanitize path for filename

        safe_filename = relative_path.replace(os.sep, '_')

        self._save_summary(f"dir_summary_{safe_filename}.txt", summary)


    def set_repo_summary(self, summary: str) -> None:

        """

        Sets the final comprehensive repository summary.


        Args:

            summary: The LLM-generated summary for the entire repository.

        """

        self.repo_summary = summary

        self._save_summary("repository_summary.txt", summary)


    def set_git_metadata(self, metadata: Dict[str, Any]) -> None:

        """

        Stores the extracted Git metadata.


        Args:

            metadata: A dictionary containing Git metadata.

        """

        self.git_metadata = metadata

        self._save_summary("git_metadata.json", json.dumps(metadata, indent=2))


    def get_file_summaries_for_directory(self, relative_dir_path: str) -> Dict[str, str]:

        """

        Retrieves file summaries belonging directly to a specific directory (not subdirectories).


        Args:

            relative_dir_path: The relative path of the directory (e.g., "src", "." for root).


        Returns:

            A dictionary of file paths to summaries within that directory.

        """

        if relative_dir_path == ".":

            # Files directly in the root, not in any subdirectory

            return {p: s for p, s in self.file_summaries.items() if os.path.dirname(p) == ""}

        else:

            # Files directly in the specified subdirectory

            return {p: s for p, s in self.file_summaries.items() if os.path.dirname(p) == relative_dir_path}



    def _save_summary(self, filename: str, content: str) -> None:

        """

        Helper method to save a summary to a file.

        """

        file_path = os.path.join(self.output_dir, filename)

        try:

            with open(file_path, 'w', encoding='utf-8') as f:

                f.write(content)

            print(f"Saved summary to {file_path}")

        except Exception as e:

            print(f"Error saving summary to {file_path}: {e}")



5. `agent.py`


# agent.py


import os

from typing import Any, Dict, Optional


from config import AgentConfig, LLMConfig

from git_operations import GitRepositoryManager, GitAnalyzer

from llm_interface import LLMClient, OpenAILLMClient, LocalLLMClient

from summarization import FileProcessor, LLMSummarizer, SummaryAggregator


class GitAnalysisAgent:

    """

    The main orchestrator for the LLM-based Git analysis agent.

    Coordinates repository acquisition, Git metadata extraction, file processing,

    LLM summarization, and report generation.

    """

    def __init__(self, config: AgentConfig):

        """

        Initializes the GitAnalysisAgent with the provided configuration.


        Args:

            config: An instance of AgentConfig containing all necessary settings.

        """

        self.config = config

        self.repo_manager = GitRepositoryManager(config.repo_path, config.output_dir)

        self.llm_client: LLMClient

        if config.llm_config.llm_type == 'openai':

            self.llm_client = OpenAILLMClient(config.llm_config)

        elif config.llm_config.llm_type == 'local':

            self.llm_client = LocalLLMClient(config.llm_config)

        else:

            raise ValueError(f"Unsupported LLM type: {config.llm_config.llm_type}")


        self.llm_summarizer = LLMSummarizer(self.llm_client)

        self.summary_aggregator = SummaryAggregator(config.output_dir)

        self.local_repo_path: Optional[str] = None

        self.git_analyzer: Optional[GitAnalyzer] = None

        self.file_processor: Optional[FileProcessor] = None


    def analyze_repository(self) -> str:

        """

        Executes the full repository analysis workflow.


        Returns:

            The final comprehensive repository summary as a string.

        """

        print("\n--- Starting Repository Analysis ---")

        try:

            # 1. Acquire Repository

            self.local_repo_path = self.repo_manager.acquire_repository()

            self.git_analyzer = GitAnalyzer(self.local_repo_path)

            self.file_processor = FileProcessor(self.local_repo_path)


            # 2. Extract Git Metadata

            print("\n--- Extracting Git Metadata ---")

            git_metadata = self._extract_git_metadata()

            self.summary_aggregator.set_git_metadata(git_metadata)


            # 3. Analyze and Summarize Files

            print("\n--- Analyzing and Summarizing Files ---")

            self._analyze_and_summarize_files()


            # 4. Summarize Directories

            print("\n--- Summarizing Directories ---")

            self._summarize_directories()


            # 5. Generate Final Repository Summary

            print("\n--- Generating Final Repository Summary ---")

            repo_name = os.path.basename(self.local_repo_path)

            repo_structure = self.git_analyzer.get_repo_structure() if self.git_analyzer else "Could not generate structure."

            final_repo_summary = self.llm_summarizer.summarize_repository(

                repo_name=repo_name,

                repo_structure=repo_structure,

                directory_summaries=self.summary_aggregator.directory_summaries,

                git_metadata=git_metadata

            )

            self.summary_aggregator.set_repo_summary(final_repo_summary)

            print("\n--- Repository Analysis Complete ---")

            return final_repo_summary


        except Exception as e:

            print(f"An error occurred during analysis: {e}")

            return f"Analysis failed due to an error: {e}"

        finally:

            self.repo_manager.cleanup() # Ensure cloned repos are removed


    def _extract_git_metadata(self) -> Dict[str, Any]:

        """Helper to extract and return Git metadata."""

        if not self.git_analyzer:

            raise RuntimeError("GitAnalyzer not initialized.")


        metadata = {

            "contributors": self.git_analyzer.get_contributors(),

            "recent_commits": self.git_analyzer.get_commit_summary(max_commits=10),

            "branches": self.git_analyzer.get_branches(),

            "tags": self.git_analyzer.get_tags(),

            "repo_structure_preview": self.git_analyzer.get_repo_structure() # Store a preview for context

        }

        print("Git metadata extracted.")

        return metadata


    def _analyze_and_summarize_files(self) -> None:

        """

        Traverses the repository, reads files, and generates LLM summaries for each.

        """

        if not self.local_repo_path or not self.file_processor:

            raise RuntimeError("Repository path or file processor not initialized.")


        # Walk through the repository, excluding common ignored directories

        ignore_dirs = ['.git', '__pycache__', 'venv', '.venv', 'node_modules',

                       'target', 'build', 'dist', '.idea', '.vscode']

        

        # Add common documentation files to process first, as they often contain purpose

        priority_files = ['README.md', 'Dockerfile', 'requirements.txt', 'package.json', 'pom.xml']

        processed_files = set()


        # Process priority files first if they exist at the root

        for p_file in priority_files:

            abs_path = os.path.join(self.local_repo_path, p_file)

            if os.path.exists(abs_path) and os.path.isfile(abs_path):

                relative_path = os.path.relpath(abs_path, self.local_repo_path)

                print(f"Processing priority file: {relative_path}")

                content = self.file_processor.read_file_content(abs_path)

                if content:

                    summary = self.llm_summarizer.summarize_file(relative_path, content)

                    self.summary_aggregator.add_file_summary(relative_path, summary)

                processed_files.add(relative_path)



        for root, dirs, files in os.walk(self.local_repo_path):

            # Modify dirs in-place to prune traversal

            dirs[:] = [d for d in dirs if d not in ignore_dirs]


            for file_name in files:

                abs_file_path = os.path.join(root, file_name)

                relative_file_path = os.path.relpath(abs_file_path, self.local_repo_path)


                if relative_file_path in processed_files:

                    continue # Skip files already processed as priority


                # Skip common non-source files or very large files

                if any(relative_file_path.endswith(ext) for ext in ['.png', '.jpg', '.jpeg', '.gif', '.bin', '.zip', '.tar.gz', '.log']) or \

                   os.path.getsize(abs_file_path) > 1024 * 1024: # e.g., 1MB limit for text files

                    print(f"Skipping large or non-text file: {relative_file_path}")

                    continue


                print(f"Processing file: {relative_file_path}")

                content = self.file_processor.read_file_content(abs_file_path)

                if content:

                    summary = self.llm_summarizer.summarize_file(relative_file_path, content)

                    self.summary_aggregator.add_file_summary(relative_file_path, summary)

                processed_files.add(relative_file_path)


    def _summarize_directories(self) -> None:

        """

        Generates summaries for directories based on their contained file summaries.

        Processes directories from deepest to shallowest to ensure dependencies.

        """

        if not self.local_repo_path:

            raise RuntimeError("Repository path not initialized.")


        # Get all unique directory paths that have files summarized

        all_file_paths = self.summary_aggregator.file_summaries.keys()

        all_dirs = set()

        for f_path in all_file_paths:

            current_dir = os.path.dirname(f_path)

            while current_dir and current_dir != '.':

                all_dirs.add(current_dir)

                current_dir = os.path.dirname(current_dir)

        

        # Ensure root directory is included if there are any files

        if all_file_paths:

            all_dirs.add(".") # Represents the root directory


        # Sort directories by depth (deepest first) to summarize from bottom-up

        sorted_dirs = sorted(list(all_dirs), key=lambda x: x.count(os.sep), reverse=True)


        for dir_path in sorted_dirs:

            print(f"Summarizing directory: {dir_path if dir_path != '.' else 'root'}")

            file_summaries_in_dir = self.summary_aggregator.get_file_summaries_for_directory(dir_path)

            

            # Include sub-directory summaries in the current directory's context

            # This is key for progressive summarization.

            # We look for summaries of directories that are direct children of the current dir_path.

            sub_dir_summaries_for_context = {}

            for existing_dir, existing_summary in self.summary_aggregator.directory_summaries.items():

                # Check if existing_dir is a direct child of dir_path

                # e.g., if dir_path is "src", existing_dir could be "src/utils"

                if existing_dir != dir_path and os.path.dirname(existing_dir) == dir_path:

                    sub_dir_summaries_for_context[existing_dir] = existing_summary

            

            combined_context = {**file_summaries_in_dir, **sub_dir_summaries_for_context}


            if combined_context:

                dir_summary = self.llm_summarizer.summarize_directory(dir_path, combined_context)

                self.summary_aggregator.add_directory_summary(dir_path, dir_summary)

            else:

                print(f"No relevant file or sub-directory summaries found for {dir_path}. Skipping directory summary.")




6. `main.py`


# main.py


import os

import shutil

import git # type: ignore

from config import AgentConfig, LLMConfig

from agent import GitAnalysisAgent


def setup_example_repo(repo_name: str = "my_simple_project") -> str:

    """

    Creates a dummy Git repository for demonstration purposes.

    """

    repo_path = os.path.join(os.getcwd(), repo_name)

    if os.path.exists(repo_path):

        print(f"Cleaning up existing example repository at {repo_path}")

        shutil.rmtree(repo_path) # Clean up previous run


    os.makedirs(repo_path, exist_ok=True)

    

    # Create files

    with open(os.path.join(repo_path, "README.md"), "w") as f:

        f.write("# My Simple Project\n\nThis is a basic Python project demonstrating a simple utility.\nIt includes a main script and a utility module.\n\n## Features\n- Greets a user.\n- Performs a simple arithmetic operation.\n\n## Setup\n1. Clone the repository.\n2. Install dependencies: `pip install -r requirements.txt`\n3. Run: `python src/main.py`\n\n## Known Issues\n- The arithmetic operation currently only supports integers.\n")

    with open(os.path.join(repo_path, "requirements.txt"), "w") as f:

        f.write("# No external dependencies for this simple example\n")

    with open(os.path.join(repo_path, "Dockerfile"), "w") as f:

        f.write("FROM python:3.9-slim-buster\nWORKDIR /app\nCOPY . /app\nRUN pip install --no-cache-dir -r requirements.txt\nCMD [\"python\", \"src/main.py\"]\n")

    with open(os.path.join(repo_path, ".gitignore"), "w") as f:

        f.write("__pycache__/\n*.pyc\nvenv/\n")


    src_dir = os.path.join(repo_path, "src")

    os.makedirs(src_dir, exist_ok=True)

    with open(os.path.join(src_dir, "__init__.py"), "w") as f:

        f.write("")

    with open(os.path.join(src_dir, "main.py"), "w") as f:

        f.write("from src.utils import add_numbers, greet\n\ndef run_application():\n    \"\"\"\n    Main function to run the simple application logic.\n    \"\"\"\n    print(\"Starting My Simple Project application...\")\n    name = \"Alice\"\n    greet(name)\n\n    num1 = 10\n    num2 = 5\n    result = add_numbers(num1, num2)\n    print(f\"The sum of {num1} and {num2} is: {result}\")\n    print(\"Application finished.\")\n\nif __name__ == \"__main__\":\n    run_application()\n")

    with open(os.path.join(src_dir, "utils.py"), "w") as f:

        f.write("def greet(name: str) -> None:\n    \"\"\"\n    Prints a greeting message to the console.\n\n    Args:\n        name: The name of the person to greet.\n    \"\"\"\n    print(f\"Hello, {name}! Welcome to the utility module.\")\n\ndef add_numbers(a: int, b: int) -> int:\n    \"\"\"\n    Adds two integer numbers and returns their sum.\n\n    Args:\n        a: The first integer.\n        b: The second integer.\n\n    Returns:\n        The sum of a and b.\n    \"\"\"\n    return a + b\n")


    # Initialize Git repository and make an initial commit

    repo = git.Repo.init(repo_path)

    repo.index.add(["."])

    repo.index.commit("Initial commit: Set up basic project structure and files")


    # Simulate another commit

    with open(os.path.join(src_dir, "main.py"), "a") as f:

        f.write("\n# Added a comment to simulate a change\n")

    repo.index.add([os.path.join(src_dir, "main.py")])

    repo.index.commit("Feature: Added a comment to main.py")


    print(f"Example repository '{repo_name}' created and initialized at {repo_path}")

    return repo_path


def main():

    """

    Main function to configure and run the Git analysis agent.

    """

    # --- IMPORTANT: Configure your LLM here ---

    # For OpenAI: Ensure OPENAI_API_KEY environment variable is set

    # llm_config = LLMConfig(llm_type='openai', model_name='gpt-4o-mini')


    # For Local LLM (e.g., Ollama running 'llama3' model at default port)

    # Make sure Ollama is running and you have 'llama3' model pulled:

    # ollama run llama3

    llm_config = LLMConfig(llm_type='local', model_name='llama3', base_url='http://localhost:11434/v1')


    # --- Setup example local repository ---

    local_repo_path = setup_example_repo("my_simple_project_to_analyze")

    # Alternatively, use a remote repository:

    # remote_repo_url = "https://github.com/git/git.git" # Example remote repo (will be cloned)

    # agent_config = AgentConfig(repo_path=remote_repo_url, llm_config=llm_config)


    agent_config = AgentConfig(repo_path=local_repo_path, llm_config=llm_config)


    agent = GitAnalysisAgent(agent_config)

    final_summary = agent.analyze_repository()


    print("\n==============================================================================")

    print("FINAL REPOSITORY ANALYSIS REPORT")

    print("==============================================================================")

    print(final_summary)

    print("==============================================================================")

    print(f"Detailed summaries are saved in: {agent_config.output_dir}")


if __name__ == "__main__":

    main()


No comments: