Hitchhiker's Guide to AI, Software Architecture, and Everything Else: ARTIFICIAL INTELLIGENCE AGENTS FOR JUPYTER NOTEBOOK GENERATION

An advancement in the realm of artificial intelligence is the emergence of LLM-based agents capable of autonomously generating executable Jupyter Notebooks from natural language prompts. These sophisticated agents empower users, from data scientists to business analysts, to rapidly prototype analyses, visualize data, and explore complex datasets without writing a single line of code themselves. This article delves deeply into the architecture, constituents, and implementation details required to construct such an agent, emphasizing support for diverse LLM deployments and hardware configurations.

INTRODUCTION

The paradigm of an LLM-based agent for Jupyter Notebook generation represents a significant leap in productivity and accessibility within data science and software development. Instead of manually crafting code, users can articulate their analytical goals or programming tasks in plain English. The agent then interprets these intentions, plans a series of actions, generates the necessary code, executes it (potentially), and compiles the results into a structured, runnable Jupyter Notebook. This capability democratizes advanced computational tasks, making them accessible to a broader audience and accelerating development cycles for experienced practitioners. The system we envision supports both local and remote Large Language Models (LLMs) and is designed to operate seamlessly across various GPU architectures including NVIDIA CUDA, AMD ROCm, Apple Metal Performance Shaders (MPS), and Intel's integrated and discrete GPUs.

CORE ARCHITECTURE OF THE NOTEBOOK GENERATION AGENT

The architecture of an LLM-based agent for generating Jupyter Notebooks is inherently modular, comprising several interconnected components that work in concert to fulfill user requests. At its heart lies an orchestration layer that leverages the reasoning capabilities of a Large Language Model. This layer interacts with a suite of specialized tools, an execution environment, and a robust LLM integration layer that abstracts away the complexities of different LLM providers and hardware backends. The final output is a well-structured Jupyter Notebook, ready for immediate use or further refinement.

Figure 1: High-Level Agent Architecture

CONSTITUENTS AND THEIR DETAILS

Let us explore each constituent of this architecture in detail.

User Interface and Prompt Engineering
The user interface serves as the primary gateway for interaction, allowing users to submit their requests in natural language. This interface can range from a simple command-line tool to a sophisticated web application. Effective prompt engineering is crucial here, as the clarity and specificity of the user's prompt directly impact the agent's ability to generate accurate and relevant notebooks. The agent's internal prompt, which guides the LLM, will often include instructions on the desired output format (e.g., Python code, markdown for explanations), available tools, and constraints.
Example of a user prompt: "Analyze the 'sales_data.csv' file. Show the top 5 products by total sales. Create a line plot of monthly sales trends and save the notebook as 'sales_analysis.ipynb'."

Agent Orchestration Layer

This layer acts as the brain of the agent, responsible for interpreting the user's prompt, devising a plan to achieve the stated goal, executing that plan using available tools, and refining the approach based on observations. It embodies the "Plan, Act, Observe, Refine" loop.

Planning: The LLM analyzes the user's request and breaks it down into a sequence of smaller, manageable tasks. For instance, "analyze sales data" might become "load data", "calculate total sales per product", "identify top 5 products", "aggregate sales by month", "generate plot code", "assemble notebook".
Acting: The agent invokes specific tools (e.g., a code interpreter, a file reader) to perform the planned tasks. The LLM generates the arguments or code for these tools.
Observing: The agent receives feedback from the tools, such as the output of executed code, error messages, or data summaries.
Refining: Based on the observations, the LLM adjusts its plan, corrects errors, or generates further steps to move closer to the goal. This iterative process is fundamental to the agent's intelligence and robustness.

A conceptual snippet for the agent's core loop might look like this:

class NotebookAgent:
    def __init__(self, llm_connector, tools):
        self.llm = llm_connector
        self.tools = tools
        self.notebook_cells = [] # Stores generated cells

    def generate_notebook_from_prompt(self, user_prompt):
        # Initial planning phase using the LLM
        initial_plan_prompt = f"""
        You are an expert data scientist agent. Your goal is to generate a Jupyter Notebook
        that fulfills the user's request. Break down the user's request into a series of
        steps, including data loading, processing, analysis, visualization, and notebook
        assembly. List the steps clearly.

        User Request: {user_prompt}
        """
        plan_response = self.llm.invoke(initial_plan_prompt)
        current_plan = self._parse_plan(plan_response)

        for step in current_plan:
            # For each step, generate code or invoke a tool
            action_prompt = f"""
            Based on the overall plan and the current step, generate the Python code
            or specify the tool to use.
            Current Step: {step}
            Previous Cells: {self.notebook_cells}
            """
            action_response = self.llm.invoke(action_prompt)
            action_type, content = self._parse_action(action_response)

            if action_type == "code":
                # Add code to notebook cells
                self.notebook_cells.append({"cell_type": "code", "source": content})
                # Potentially execute code and observe output for next steps
                # output = self.tools["code_interpreter"].execute(content)
                # self._process_observation(output)
            elif action_type == "markdown":
                self.notebook_cells.append({"cell_type": "markdown", "source": content})
            elif action_type == "tool_invocation":
                tool_name, tool_args = self._parse_tool_invocation(content)
                if tool_name in self.tools:
                    tool_output = self.tools[tool_name].run(tool_args)
                    # Process tool_output, potentially add to notebook or inform LLM
                    self._process_tool_output(tool_output)
                else:
                    print(f"Error: Tool '{tool_name}' not found.")

        # Final assembly and saving of the notebook
        return self._assemble_and_save_notebook(self.notebook_cells, "generated_notebook.ipynb")

    def _parse_plan(self, llm_output):
        # Placeholder for parsing LLM's plan output into actionable steps
        # This would typically involve more sophisticated parsing, potentially
        # using regex or another LLM call for structured output.
        print(f"Parsed plan: {llm_output}")
        return llm_output.split("\n") # Simple split for demonstration

    def _parse_action(self, llm_output):
        # Placeholder for parsing LLM's action output (code, markdown, tool)
        # Example: "CODE: print('Hello')" or "MARKDOWN: # Introduction"
        if llm_output.startswith("CODE:"):
            return "code", llm_output[len("CODE:"):].strip()
        elif llm_output.startswith("MARKDOWN:"):
            return "markdown", llm_output[len("MARKDOWN:"):].strip()
        elif llm_output.startswith("TOOL:"):
            # Example: TOOL: file_reader(path='data.csv')
            return "tool_invocation", llm_output[len("TOOL:"):].strip()
        return "unknown", llm_output

    def _process_tool_output(self, output):
        # Placeholder for processing tool output, e.g., feeding back to LLM
        print(f"Tool output processed: {output}")

    def _assemble_and_save_notebook(self, cells, filename):
        # This method would use nbformat to create and save the .ipynb file
        print(f"Assembling and saving notebook to {filename} with {len(cells)} cells.")
        # Actual implementation would use nbformat
        return filename

LLM Integration Layer

This is a critical component that abstracts the complexities of interacting with various LLMs, whether they are hosted remotely (e.g., OpenAI, Anthropic) or run locally (e.g., Llama 2, Mixtral). It also manages the underlying hardware configuration, ensuring optimal utilization of GPUs across different vendors.

Remote LLMs: For remote models, this layer handles API key management, rate limiting, request/response serialization, and error handling. It provides a unified interface regardless of the specific API endpoint.
Local LLMs: For local models, this layer manages model loading, memory allocation, and device placement. It needs to support various local inference engines and frameworks.

The key challenge here is supporting diverse GPU architectures. This layer must intelligently detect available hardware and configure the LLM inference engine accordingly.

NVIDIA CUDA: The most common, typically handled by PyTorch or TensorFlow, and specific libraries like llama_cpp_python when compiled with CUDA support. Detection often involves torch.cuda.is_available().
AMD ROCm: AMD's open-source platform. PyTorch and TensorFlow have ROCm backends. llama_cpp_python can be compiled with ROCm support. Detection might involve checking for ROCM_PATHenvironment variables or using torch.xpu.is_available() if using Intel's oneAPI for cross-vendor support.
Apple MPS (Metal Performance Shaders): Apple's framework for accelerating machine learning on Apple Silicon. PyTorch supports MPS via torch.backends.mps.is_available().
Intel GPUs (integrated and discrete): Intel provides oneAPI and specific optimizations for PyTorch and TensorFlow. Detection might involve torch.xpu.is_available() or checking for Intel-specific libraries.

A simplified LLMConnector class demonstrating this abstraction:

import os
import torch
from openai import OpenAI
from llama_cpp import Llama # For local GGUF models

class LLMConnector:
    def __init__(self, model_type="local", model_name="llama2-7b-chat.Q4_K_M.gguf", api_key=None, base_url=None):
        self.model_type = model_type
        self.model_name = model_name
        self.api_key = api_key
        self.base_url = base_url
        self.llm_instance = None
        self._initialize_llm()

    def _initialize_llm(self):
        if self.model_type == "remote":
            if not self.api_key:
                raise ValueError("API key is required for remote LLM.")
            self.llm_instance = OpenAI(api_key=self.api_key, base_url=self.base_url)
            print(f"Initialized remote LLM: {self.model_name}")
        elif self.model_type == "local":
            model_path = os.path.join("models", self.model_name)
            if not os.path.exists(model_path):
                raise FileNotFoundError(f"Local model not found at {model_path}")

            # Determine GPU layers based on available hardware
            n_gpu_layers = 0
            if torch.cuda.is_available():
                print("CUDA GPU detected.")
                n_gpu_layers = -1 # Use all GPU layers
            elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
                print("Apple MPS detected.")
                n_gpu_layers = -1 # Use all GPU layers
            elif os.getenv("ROCM_PATH") or (hasattr(torch, 'xpu') and torch.xpu.is_available()):
                # Basic check for ROCm or Intel XPU (oneAPI)
                print("ROCm or Intel XPU detected.")
                n_gpu_layers = -1 # Use all GPU layers
            else:
                print("No suitable GPU detected or configured, running on CPU.")
                n_gpu_layers = 0 # Run on CPU

            try:
                self.llm_instance = Llama(
                    model_path=model_path,
                    n_ctx=4096, # Context window size
                    n_gpu_layers=n_gpu_layers, # Number of layers to offload to GPU
                    verbose=False # Suppress Llama.cpp verbose output
                )
                print(f"Initialized local LLM: {self.model_name} with {n_gpu_layers} GPU layers.")
            except Exception as e:
                print(f"Error initializing local LLM: {e}. Falling back to CPU if possible.")
                self.llm_instance = Llama(
                    model_path=model_path,
                    n_ctx=4096,
                    n_gpu_layers=0, # Force CPU
                    verbose=False
                )

        else:
            raise ValueError(f"Unsupported LLM model type: {self.model_type}")

    def invoke(self, prompt, max_tokens=1024, temperature=0.7):
        if self.model_type == "remote":
            try:
                response = self.llm_instance.chat.completions.create(
                    model=self.model_name,
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=max_tokens,
                    temperature=temperature
                )
                return response.choices[0].message.content
            except Exception as e:
                print(f"Error invoking remote LLM: {e}")
                raise
        elif self.model_type == "local":
            try:
                response = self.llm_instance.create_chat_completion(
                    messages=[{"role": "user", "content": prompt}],
                    max_tokens=max_tokens,
                    temperature=temperature
                )
                return response["choices"][0]["message"]["content"]
            except Exception as e:
                print(f"Error invoking local LLM: {e}")
                raise
        return "" # Should not reach here

This LLMConnector demonstrates how to abstract away the LLM interaction. For local models, it attempts to detect and utilize available GPU resources from NVIDIA, Apple, AMD, or Intel. The n_gpu_layers=-1 for llama_cpp_python is a common way to instruct it to offload as many layers as possible to the GPU. For transformers-based models, explicit device placement (model.to("cuda"), model.to("mps"), model.to("xpu")) would be managed within this layer.

Tooling Layer

The tooling layer provides the agent with capabilities beyond pure text generation. These tools are essentially functions or modules that the LLM can call to interact with the external environment, perform computations, or access data.

Common tools include:

Code Interpreter: Executes Python code in a sandboxed environment. This is crucial for data loading, manipulation, statistical analysis, and plotting.
File System Access: Reads and writes files, lists directories.
Data Access: Connects to databases, APIs, or cloud storage.
Visualization Libraries: Generates plots and charts (e.g., Matplotlib, Seaborn, Plotly).
Internet Search: Fetches information from the web (e.g., for finding specific library usage or data formats).

Each tool should have a clear description that the LLM can understand, along with defined input parameters and expected output formats.

Example of a CodeInterpreter tool:

import io
import sys
import traceback
import pandas as pd # Example dependency for code execution

class CodeInterpreter:
    def __init__(self, sandbox_mode=False):
        self.sandbox_mode = sandbox_mode
        self.global_vars = {} # For maintaining state across executions
        self.local_vars = {}

    def execute(self, code_string):
        # Redirect stdout and stderr to capture output
        old_stdout = sys.stdout
        old_stderr = sys.stderr
        redirected_output = io.StringIO()
        redirected_error = io.StringIO()
        sys.stdout = redirected_output
        sys.stderr = redirected_error

        try:
            # Execute code in a controlled environment
            # For true sandboxing, this would involve subprocesses, Docker, or similar.
            exec(code_string, self.global_vars, self.local_vars)
            output = redirected_output.getvalue()
            error = redirected_error.getvalue()
            if error:
                return f"ERROR: {error}\nOUTPUT: {output}"
            return f"SUCCESS: {output}"
        except Exception as e:
            error_traceback = traceback.format_exc()
            return f"EXECUTION FAILED: {error_traceback}\nOUTPUT: {redirected_output.getvalue()}"
        finally:
            # Restore stdout and stderr
            sys.stdout = old_stdout
            sys.stderr = old_stderr

# Example usage within the agent
# code_interpreter = CodeInterpreter()
# result = code_interpreter.execute("import pandas as pd\ndf = pd.DataFrame({'col': [1,2,3]})\nprint(df)")
# print(result)

For production environments, the CodeInterpreter must be robustly sandboxed, perhaps by running code in a separate process, a Docker container, or a dedicated Jupyter kernel managed via jupyter_client. This prevents malicious code execution and isolates dependencies.

Notebook Generation Logic

Once the agent has generated code snippets, markdown explanations, and potentially executed some steps to gather results, these pieces need to be assembled into a coherent Jupyter Notebook. The nbformat library is the standard Python library for reading, writing, and manipulating .ipynb files.

The agent will construct a list of notebook cells, each containing either code or markdown. For code cells, it might also include execution outputs if the code was run internally for verification or to provide context.

import nbformat
from nbformat.v4 import new_notebook, new_code_cell, new_markdown_cell

class NotebookAssembler:
    def __init__(self):
        pass

    def assemble_notebook(self, cells, filename="generated_notebook.ipynb"):
        """
        Assembles a list of cells into a Jupyter Notebook file.

        Args:
            cells (list): A list of dictionaries, each representing a cell.
                          Example: [{"cell_type": "code", "source": "print('Hello')"},
                                    {"cell_type": "markdown", "source": "# Introduction"}]
            filename (str): The name of the output .ipynb file.
        Returns:
            str: The path to the generated notebook file.
        """
        notebook = new_notebook()
        for cell_data in cells:
            if cell_data["cell_type"] == "code":
                cell = new_code_cell(cell_data["source"])
                # If execution outputs were captured, they could be added here
                # cell.outputs = [...]
            elif cell_data["cell_type"] == "markdown":
                cell = new_markdown_cell(cell_data["source"])
            else:
                print(f"Warning: Unknown cell type '{cell_data['cell_type']}', skipping.")
                continue
            notebook.cells.append(cell)

        try:
            with open(filename, 'w', encoding='utf-8') as f:
                nbformat.write(notebook, f)
            print(f"Notebook successfully saved to {filename}")
            return filename
        except Exception as e:
            print(f"Error saving notebook to {filename}: {e}")
            raise

Execution Environment (for Verification and Testing)
While the agent generates the notebook, it's often beneficial for it to execute portions of the generated code internally to verify correctness, gather outputs, and inform subsequent steps. This execution must occur in a controlled, sandboxed environment to prevent security risks and manage dependencies.
Options for an execution environment include:
- Isolated Python subprocess calls.
- Dedicated Docker containers, providing strong isolation.
- Using jupyter_client to programmatically interact with a Jupyter kernel. This allows executing cells and capturing rich outputs, similar to how a user would interact with a notebook.
The execution environment should also manage dependencies. Before running generated code, it might need to install required libraries (e.g., pandas, matplotlib).
GPU/Hardware Abstraction
As highlighted in the LLM Integration Layer, supporting diverse GPU architectures is paramount for broad applicability. The strategy involves:
- Detection: Programmatically identify the available hardware (NVIDIA, AMD, Apple, Intel). Libraries like torch offer functions such as torch.cuda.is_available(), torch.backends.mps.is_available(), and potentially torch.xpu.is_available() for Intel/ROCm. Environment variables like ROCM_PATH can also be indicative.
- Configuration: Based on detection, configure the LLM inference engine.
  - For llama_cpp_python, this means setting n_gpu_layers appropriately during Llama object instantiation.
  - For transformers models, it involves moving the model to the correct device: model.to("cuda"), model.to("mps"), model.to("xpu"), or model.to("cpu").
  - For models that require specific backend installations (e.g., PyTorch with ROCm or Intel oneAPI), the system should guide the user on prerequisites or attempt to use a CPU fallback if GPU is unavailable or misconfigured.
- Fallback: Always provide a CPU fallback mechanism if GPU acceleration is not available or encounters errors. This ensures the agent remains functional, albeit with potentially slower performance.

DETAILED IMPLEMENTATION ASPECTS

Prompt Design for Notebook Generation
Crafting effective prompts for the LLM is an art and a science. The agent's internal prompt should guide the LLM to:
- Understand the user's intent.
- Identify necessary tools.
- Generate correct and executable code.
- Provide clear markdown explanations.
- Format the output appropriately for notebook cells.
- Handle potential errors gracefully.
The prompt should include:
- Role definition: "You are an expert Python programmer and data scientist."
- Task description: "Your goal is to generate a Jupyter Notebook to analyze data."
- Available tools: "You have access to a CodeInterpreter tool to run Python code and a NotebookAssemblerto save the final notebook."
- Output format instructions: "Generate code cells prefixed with 'CODE:' and markdown cells with 'MARKDOWN:'. If you need to execute code to get information, use the CodeInterpreter and respond with the output."
- Constraints: "Ensure all necessary imports are at the beginning of the code cells. Provide comments for complex logic."
Agent Loop: Plan, Act, Observe, Refine
The iterative nature of the agent's operation is key to its intelligence.
- Plan: The LLM generates a high-level plan.
- Act: The LLM generates code or tool calls based on the plan.
- Observe: The CodeInterpreter or other tools execute the action and return results (output, errors, data).
- Refine: The LLM analyzes the observations. If successful, it proceeds to the next plan step. If an error occurs, it attempts to debug and correct the code, or adjust the plan. This feedback loop is what makes the agent robust.
Code Execution and Sandboxing
As previously discussed, executing arbitrary code generated by an LLM requires strict sandboxing.
- Security: Prevent access to sensitive files, network resources, or system commands. Docker containers are an excellent solution for this, providing strong isolation.
- Dependency Management: Each execution environment should have its own set of dependencies. The agent might need to infer and install required libraries (e.g., pip install pandas) before running the analysis code.
- State Management: For a multi-step analysis, the execution environment needs to maintain state (e.g., variables defined in one cell should be accessible in subsequent cells). This is naturally handled by a single Jupyter kernel or by passing state explicitly between sandbox runs.
Handling Dependencies
The generated notebooks will inevitably rely on various Python libraries (e.g., pandas, matplotlib, scikit-learn). The agent should:
- Explicitly include import statements in the generated code.
- Potentially suggest or automatically add !pip install <library_name> commands in the notebook's initial cells if it detects missing dependencies in the execution environment.
- The execution environment itself must be configured with common data science libraries or have the capability to install them on demand.
Error Handling and Debugging
LLMs can make mistakes. The agent must be designed to handle errors gracefully.
- Capture Errors: The CodeInterpreter must capture stdout, stderr, and exceptions.
- Feedback to LLM: Error messages and stack traces should be fed back to the LLM in the "Observe" phase.
- Correction Loop: The LLM should then attempt to debug the code, generate a corrected version, or modify its plan. This might involve prompting the LLM with the error message and the problematic code, asking it to identify and fix the issue.
- User Notification: If the agent cannot resolve an error after several attempts, it should inform the user.
Security Considerations
Running LLM-generated code poses significant security risks.
- Sandboxing: This is the most critical measure. Isolate code execution in containers or virtual machines.
- Resource Limits: Limit CPU, memory, and execution time to prevent denial-of-service attacks or runaway processes.
- Input Validation: While the agent processes natural language, any direct file paths or external resource URLs provided by the user or generated by the LLM should be carefully validated.
- Least Privilege: The execution environment should run with the minimum necessary permissions.

CONCLUSION

Building an LLM-based agent for Jupyter Notebook generation is a complex yet highly rewarding endeavor. By meticulously designing the agent's orchestration, abstracting LLM interactions, providing robust tooling, and ensuring secure code execution across diverse hardware, we can create a powerful system that significantly enhances productivity and accessibility in data science and development. The ability to seamlessly switch between local and remote LLMs, coupled with comprehensive GPU support, ensures the agent's versatility and performance for a wide range of users and computational environments. Such an agent moves us closer to a future where natural language is a primary interface for complex computational tasks, empowering more individuals to harness the power of data and AI.

ADDENDUM: FULL RUNNING EXAMPLE

This full running example demonstrates a complete NotebookAgent that can process a user prompt, generate Python code, and assemble a Jupyter Notebook. It includes the LLMConnector, CodeInterpreter, and NotebookAssemblercomponents, integrated into a cohesive system. For the purpose of this running example, the LLMConnector will be configured to use a local LLM, and the CodeInterpreter will run in a simplified in-process mode for demonstration, but with the understanding that a production system would require robust sandboxing.

First, ensure you have the necessary libraries installed: pip install openai llama-cpp-python nbformat pandas matplotlib

You will also need a local GGUF model file, for example, llama2-7b-chat.Q4_K_M.gguf. Place this file in a directory named models relative to where your script runs. You can download such models from Hugging Face (e.g., TheBloke's repositories).

We will use a simple sales_data.csv file for our example. Create this file in the same directory as your Python script:

sales_data.csv

Date,Product,Sales 2023-01-01,Product A,100 2023-01-01,Product B,150 2023-01-02,Product A,120 2023-01-02,Product C,200 2023-01-03,Product B,180 2023-01-03,Product A,110 2023-02-01,Product A,90 2023-02-01,Product C,250 2023-02-02,Product B,160 2023-02-02,Product A,130 2023-03-01,Product A,110 2023-03-01,Product B,170

Now, here is the complete Python code for the agent:

import os
import io
import sys
import traceback
import pandas as pd
import matplotlib.pyplot as plt
import nbformat
from nbformat.v4 import new_notebook, new_code_cell, new_markdown_cell
import torch
from openai import OpenAI
from llama_cpp import Llama # For local GGUF models

# --- 1. LLM Integration Layer ---
class LLMConnector:
    """
    Connects to various LLMs, abstracting local and remote inference.
    Handles GPU detection and configuration for local models.
    """
    def __init__(self, model_type="local", model_name="llama2-7b-chat.Q4_K_M.gguf", api_key=None, base_url=None):
        self.model_type = model_type
        self.model_name = model_name
        self.api_key = api_key
        self.base_url = base_url
        self.llm_instance = None
        self._initialize_llm()

    def _initialize_llm(self):
        """Initializes the LLM instance based on model_type."""
        if self.model_type == "remote":
            if not self.api_key:
                raise ValueError("API key is required for remote LLM.")
            # If base_url is provided, it can be a custom endpoint (e.g., local OpenAI-compatible server)
            self.llm_instance = OpenAI(api_key=self.api_key, base_url=self.base_url)
            print(f"Initialized remote LLM: {self.model_name}")
        elif self.model_type == "local":
            model_path = os.path.join("models", self.model_name)
            if not os.path.exists(model_path):
                raise FileNotFoundError(f"Local model not found at {model_path}. Please download it and place it in the 'models' directory.")

            # Determine GPU layers based on available hardware
            n_gpu_layers = 0
            if torch.cuda.is_available():
                print("CUDA GPU detected. Using all GPU layers.")
                n_gpu_layers = -1 # Use all GPU layers
            elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
                print("Apple MPS detected. Using all GPU layers.")
                n_gpu_layers = -1 # Use all GPU layers
            elif os.getenv("ROCM_PATH") or (hasattr(torch, 'xpu') and torch.xpu.is_available()):
                # Basic check for ROCm (AMD) or Intel XPU (oneAPI)
                # Note: Full ROCm/Intel support with llama.cpp requires specific compilation.
                # This check is a best effort.
                print("ROCm or Intel XPU detected. Attempting to use all GPU layers.")
                n_gpu_layers = -1 # Use all GPU layers
            else:
                print("No suitable GPU detected or configured for local LLM. Running on CPU.")
                n_gpu_layers = 0 # Run on CPU

            try:
                self.llm_instance = Llama(
                    model_path=model_path,
                    n_ctx=4096, # Context window size, adjust as needed
                    n_gpu_layers=n_gpu_layers, # Number of layers to offload to GPU
                    verbose=False # Suppress Llama.cpp verbose output
                )
                print(f"Initialized local LLM: {self.model_name} with {n_gpu_layers} GPU layers.")
            except Exception as e:
                print(f"Error initializing local LLM with GPU support: {e}. Falling back to CPU.")
                self.llm_instance = Llama(
                    model_path=model_path,
                    n_ctx=4096,
                    n_gpu_layers=0, # Force CPU
                    verbose=False
                )
        else:
            raise ValueError(f"Unsupported LLM model type: {self.model_type}. Choose 'local' or 'remote'.")

    def invoke(self, prompt, max_tokens=1024, temperature=0.7):
        """
        Invokes the LLM with the given prompt.
        """
        messages = [{"role": "user", "content": prompt}]
        if self.model_type == "remote":
            try:
                response = self.llm_instance.chat.completions.create(
                    model=self.model_name,
                    messages=messages,
                    max_tokens=max_tokens,
                    temperature=temperature
                )
                return response.choices[0].message.content
            except Exception as e:
                print(f"Error invoking remote LLM: {e}")
                raise
        elif self.model_type == "local":
            try:
                response = self.llm_instance.create_chat_completion(
                    messages=messages,
                    max_tokens=max_tokens,
                    temperature=temperature
                )
                return response["choices"][0]["message"]["content"]
            except Exception as e:
                print(f"Error invoking local LLM: {e}")
                raise
        return "" # Should not be reached

# --- 2. Tooling Layer: Code Interpreter ---
class CodeInterpreter:
    """
    Executes Python code in a controlled environment.
    For production, this should be a sandboxed subprocess or Docker container.
    """
    def __init__(self):
        # Global and local variables for maintaining execution state
        self.global_vars = {'pd': pd, 'plt': plt} # Pre-import common libraries
        self.local_vars = {}

    def execute(self, code_string):
        """
        Executes the given Python code string and captures its output.
        """
        old_stdout = sys.stdout
        old_stderr = sys.stderr
        redirected_output = io.StringIO()
        redirected_error = io.StringIO()
        sys.stdout = redirected_output
        sys.stderr = redirected_error

        try:
            # Execute code in the current process's namespace (simplified sandboxing)
            exec(code_string, self.global_vars, self.local_vars)
            output = redirected_output.getvalue()
            error = redirected_error.getvalue()
            if error:
                return f"EXECUTION ERROR (stderr):\n{error}\nOUTPUT (stdout):\n{output}"
            return f"EXECUTION SUCCESS:\n{output}"
        except Exception as e:
            error_traceback = traceback.format_exc()
            return f"EXECUTION FAILED (exception):\n{error_traceback}\nOUTPUT (stdout):\n{redirected_output.getvalue()}"
        finally:
            sys.stdout = old_stdout
            sys.stderr = old_stderr

# --- 3. Notebook Generation Logic ---
class NotebookAssembler:
    """
    Assembles a list of cells into a Jupyter Notebook (.ipynb) file.
    """
    def __init__(self):
        pass

    def assemble_notebook(self, cells, filename="generated_notebook.ipynb"):
        """
        Assembles a list of cell data into a Jupyter Notebook file.

        Args:
            cells (list): A list of dictionaries, each representing a cell.
                          Example: [{"cell_type": "code", "source": "print('Hello')"},
                                    {"cell_type": "markdown", "source": "# Introduction"}]
            filename (str): The name of the output .ipynb file.
        Returns:
            str: The path to the generated notebook file.
        """
        notebook = new_notebook()
        for cell_data in cells:
            if cell_data["cell_type"] == "code":
                cell = new_code_cell(cell_data["source"])
                # In a more advanced system, outputs from CodeInterpreter could be added here
                if "outputs" in cell_data:
                    cell.outputs = cell_data["outputs"]
            elif cell_data["cell_type"] == "markdown":
                cell = new_markdown_cell(cell_data["source"])
            else:
                print(f"Warning: Unknown cell type '{cell_data['cell_type']}', skipping.")
                continue
            notebook.cells.append(cell)

        try:
            with open(filename, 'w', encoding='utf-8') as f:
                nbformat.write(notebook, f)
            print(f"Notebook successfully saved to {filename}")
            return filename
        except Exception as e:
            print(f"Error saving notebook to {filename}: {e}")
            raise

# --- 4. Agent Orchestration Layer ---
class NotebookAgent:
    """
    Orchestrates the LLM, tools, and notebook assembly to generate Jupyter Notebooks.
    """
    def __init__(self, llm_connector, code_interpreter, notebook_assembler):
        self.llm = llm_connector
        self.code_interpreter = code_interpreter
        self.notebook_assembler = notebook_assembler
        self.notebook_cells = [] # Stores generated cells
        self.conversation_history = [] # For maintaining context with the LLM

    def _add_to_history(self, role, content):
        """Adds a message to the conversation history."""
        self.conversation_history.append({"role": role, "content": content})

    def _get_full_prompt(self, current_instruction):
        """Constructs the full prompt including history and current instruction."""
        # This is a simplified approach; for production, a more sophisticated
        # prompt engineering strategy (e.g., few-shot examples, specific tool descriptions)
        # would be used.
        base_prompt = """
        You are an expert Python programmer and data scientist. Your goal is to generate a
        Jupyter Notebook based on the user's request. You have access to a CodeInterpreter
        tool to execute Python code and observe its output. You must generate code cells
        and markdown cells.

        Instructions:
        1.  Start with a markdown introduction.
        2.  For each step, generate the necessary Python code.
        3.  If you need to verify code or get data, use the CodeInterpreter tool by
            outputting "TOOL_CODE_EXEC:<your python code here>". The output of the tool
            will be provided to you.
        4.  If you want to output a code cell for the notebook, use "NOTEBOOK_CODE:<your python code here>".
        5.  If you want to output a markdown cell for the notebook, use "NOTEBOOK_MARKDOWN:<your markdown content here>".
        6.  Ensure all necessary imports are at the beginning of relevant code cells.
        7.  Provide explanations in markdown cells for each code block.
        8.  Do not include any `!pip install` commands in the generated code, assume libraries are available.
        9.  After completing the task, indicate completion with "TASK_COMPLETE".

        Current Notebook Cells (so far):
        """
        current_cells_str = "\n".join([f"  - {c['cell_type'].upper()}: {c['source'][:50]}..." for c in self.notebook_cells])
        if not current_cells_str:
            current_cells_str = "  (No cells yet)"

        history_str = "\n".join([f"{msg['role'].upper()}: {msg['content']}" for msg in self.conversation_history])

        return f"{base_prompt}\n{current_cells_str}\n\n{history_str}\n\nUSER_INSTRUCTION: {current_instruction}\n\nYOUR_RESPONSE:"

    def generate_notebook_from_prompt(self, user_prompt, output_filename="generated_notebook.ipynb"):
        """
        Generates a Jupyter Notebook based on the user's natural language prompt.
        """
        print(f"Agent received prompt: '{user_prompt}'")
        self._add_to_history("user", user_prompt)

        max_iterations = 15 # Prevent infinite loops
        iteration = 0
        task_completed = False

        while iteration < max_iterations and not task_completed:
            iteration += 1
            print(f"\n--- Agent Iteration {iteration} ---")
            current_instruction = f"Continue generating the notebook based on the user's request: '{user_prompt}'. " \
                                  f"Current state: {len(self.notebook_cells)} cells generated." \
                                  f"If the task is complete, output 'TASK_COMPLETE'."

            full_llm_prompt = self._get_full_prompt(current_instruction)
            llm_response = self.llm.invoke(full_llm_prompt, max_tokens=2048, temperature=0.2)
            self._add_to_history("assistant", llm_response)
            print(f"LLM Response:\n{llm_response}")

            if "TASK_COMPLETE" in llm_response:
                task_completed = True
                print("LLM indicated task completion.")
                break

            # Process LLM's response for actions
            lines = llm_response.strip().split('\n')
            action_taken = False
            for line in lines:
                if line.startswith("NOTEBOOK_MARKDOWN:"):
                    markdown_content = line[len("NOTEBOOK_MARKDOWN:"):].strip()
                    self.notebook_cells.append({"cell_type": "markdown", "source": markdown_content})
                    print(f"Added MARKDOWN cell: {markdown_content[:50]}...")
                    action_taken = True
                elif line.startswith("NOTEBOOK_CODE:"):
                    code_content = line[len("NOTEBOOK_CODE:"):].strip()
                    self.notebook_cells.append({"cell_type": "code", "source": code_content})
                    print(f"Added CODE cell: {code_content[:50]}...")
                    action_taken = True
                elif line.startswith("TOOL_CODE_EXEC:"):
                    code_to_execute = line[len("TOOL_CODE_EXEC:"):].strip()
                    print(f"Executing code with CodeInterpreter: {code_to_execute[:100]}...")
                    execution_result = self.code_interpreter.execute(code_to_execute)
                    self._add_to_history("tool_output", execution_result)
                    print(f"CodeInterpreter Output:\n{execution_result[:200]}...") # Limit output for console
                    action_taken = True
                # Handle cases where LLM might just output text without a specific tag
                elif not line.strip().startswith(("NOTEBOOK_", "TOOL_CODE_EXEC:", "TASK_COMPLETE")):
                    # If it's not a recognized command, treat as a general comment or instruction for next turn
                    pass # The LLM's response is already in history, it will see it next turn.

            if not action_taken and not task_completed:
                print("LLM did not provide a recognized action. Will re-prompt.")
                # This might indicate the LLM is stuck or needs more specific guidance.
                # In a real system, this might trigger an error or a more direct prompt to the LLM.

        if not task_completed:
            print("Agent reached maximum iterations without completing the task.")

        # Final assembly and saving
        if self.notebook_cells:
            final_notebook_path = self.notebook_assembler.assemble_notebook(self.notebook_cells, output_filename)
            print(f"Notebook generation complete. Saved to: {final_notebook_path}")
            return final_notebook_path
        else:
            print("No cells were generated for the notebook.")
            return None

# --- Main Execution Block ---
if __name__ == "__main__":
    # Ensure 'models' directory exists for local LLM
    if not os.path.exists("models"):
        os.makedirs("models")
        print("Created 'models' directory. Please place your GGUF model file (e.g., llama2-7b-chat.Q4_K_M.gguf) inside it.")
        sys.exit(1) # Exit if model not present

    # Create a dummy sales_data.csv for the example
    sales_data_content = """Date,Product,Sales
2023-01-01,Product A,100
2023-01-01,Product B,150
2023-01-02,Product A,120
2023-01-02,Product C,200
2023-01-03,Product B,180
2023-01-03,Product A,110
2023-02-01,Product A,90
2023-02-01,Product C,250
2023-02-02,Product B,160
2023-02-02,Product A,130
2023-03-01,Product A,110
2023-03-01,Product B,170
"""
    with open("sales_data.csv", "w") as f:
        f.write(sales_data_content)
    print("Created 'sales_data.csv' for the example.")

    # --- Configuration ---
    # Choose 'local' or 'remote'
    # For 'remote', provide your OpenAI API key and model name
    # For 'local', ensure your GGUF model is in the 'models' directory
    LLM_CONFIG = {
        "type": "local",
        "model_name": "llama2-7b-chat.Q4_K_M.gguf", # Replace with your model if different
        "api_key": os.getenv("OPENAI_API_KEY"), # Only needed for remote
        "base_url": None # For custom OpenAI-compatible endpoints
    }

    print("\nInitializing LLM Connector...")
    llm_connector = LLMConnector(
        model_type=LLM_CONFIG["type"],
        model_name=LLM_CONFIG["model_name"],
        api_key=LLM_CONFIG["api_key"],
        base_url=LLM_CONFIG["base_url"]
    )

    print("\nInitializing Code Interpreter and Notebook Assembler...")
    code_interpreter = CodeInterpreter()
    notebook_assembler = NotebookAssembler()

    print("\nInitializing Notebook Agent...")
    agent = NotebookAgent(llm_connector, code_interpreter, notebook_assembler)

    user_request = "Analyze 'sales_data.csv'. Show the top 5 products by total sales. Create a line plot of monthly sales trends. Save the notebook as 'sales_analysis.ipynb'."
    print(f"\nUser Request: {user_request}")

    generated_notebook_path = agent.generate_notebook_from_prompt(user_request, "sales_analysis.ipynb")

    if generated_notebook_path:
        print(f"\nSuccessfully generated notebook: {generated_notebook_path}")
        print("You can now open 'sales_analysis.ipynb' with Jupyter Lab or Jupyter Notebook.")
    else:
        print("\nNotebook generation failed or no cells were produced.")

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Tuesday, June 02, 2026

ARTIFICIAL INTELLIGENCE AGENTS FOR JUPYTER NOTEBOOK GENERATION

INTRODUCTION

CORE ARCHITECTURE OF THE NOTEBOOK GENERATION AGENT

sales_data.csv

No comments:

About Me