Welcome, brave developer, to one of the most rewarding adventures in computer science! Today we're going to build a recursive descent compiler from scratch. Think of it as constructing a mechanical translator that can understand mathematical expressions like "x + 2 * (y - 3)" and transform them into something a computer can execute. It's like teaching a robot to read mathematical poetry and turn it into precise instructions.

WHAT EXACTLY IS A RECURSIVE DESCENT COMPILER?

Imagine you're reading a complex sentence in a foreign language. You break it down piece by piece, understanding each part before moving to the next. A recursive descent compiler works similarly - it reads source code character by character, recognizes patterns (like numbers, operators, and parentheses), and builds a tree-like structure that represents the meaning of the code.

The "recursive" part comes from the fact that our compiler calls itself to handle nested structures. When it encounters parentheses in an expression like "(x + y) * z", it recursively processes the inner expression "x + y" before continuing with the multiplication.

The "descent" refers to how the compiler moves down through different levels of the grammar rules, from high-level concepts like "expression" down to basic elements like "number" or "variable".

THE ANATOMY OF OUR TARGET LANGUAGE

Before we start building, let's define exactly what our compiler will understand. Our mathematical expression language will support:

Variables represented by single letters like x, y, z. These act as placeholders for numbers that will be provided at runtime.

Integer numbers such as 42, 0, or 999. We'll keep things simple and stick to positive integers for now.

The four basic arithmetic operations: addition (+), subtraction (-), multiplication (*), and division (/). These follow standard mathematical precedence rules.

Parentheses for grouping expressions and overriding default precedence, just like in regular mathematics.

Here are some valid expressions in our language:

- x + 5

- (a + b) * c

- x * 2 + y / 3 - z

THE FORMAL GRAMMAR IN BNF NOTATION

Every programming language needs a precise specification of its syntax. We use Backus-Naur Form (BNF) to describe our grammar formally. BNF is like a recipe book for valid sentences in our language. Here's our complete grammar:

<expression> ::= <term> ( ( '+' | '-' ) <term> )*

<term> ::= <factor> ( ( '*' | '/' ) <factor> )*

<factor> ::= <number> | <variable> | '(' <expression> ')'

<number> ::= [0-9]+

<variable> ::= [a-zA-Z]

Let me explain what each symbol means in this mathematical poetry:

The angle brackets like <expression> represent non-terminal symbols - these are concepts that get broken down into smaller parts.

The ::= symbol means "is defined as" or "consists of".

The vertical bar | means "or" - it gives us choices.

The parentheses () group things together, and the asterisk * means "zero or more repetitions".

The square brackets in [0-9]+ represent character classes, where + means "one or more".

This grammar elegantly captures operator precedence through its hierarchical structure. Expressions are built from terms, terms are built from factors, and factors are the most basic elements. This hierarchy ensures that multiplication and division bind more tightly than addition and subtraction, exactly as in mathematics.

The beauty of this grammar is that it's both human-readable and directly translatable into code. Each non-terminal symbol will become a method in our recursive descent parser.

THE GRAMMAR: THE DNA OF OUR LANGUAGE

Now let's understand how our BNF grammar works in practice. The grammar follows the mathematical precedence rules you learned in school:

An Expression consists of one or more Terms connected by addition or subtraction operators. This handles the lowest precedence operations. When we see "a + b - c", we parse it as terms connected by operators.

A Term consists of one or more Factors connected by multiplication or division operators. This handles medium precedence operations. The expression "a * b / c" becomes factors connected by multiplicative operators.

A Factor is either a number, a variable, or a parenthesized expression. This handles the highest precedence elements. Parentheses can contain any expression, which is where the recursion happens.

This hierarchical structure automatically handles operator precedence. Multiplication and division (in Terms) bind more tightly than addition and subtraction (in Expressions), which is exactly what we want.

The recursive nature appears in the factor rule: a factor can be a parenthesized expression, and an expression can contain terms, which contain factors. This circular definition allows for unlimited nesting like "((a + b) * (c - d)) / e".

THE LEXICAL ANALYZER: BREAKING TEXT INTO TOKENS

Before we can parse expressions, we need to break the input text into meaningful chunks called tokens. Think of this as converting a stream of characters into a stream of words that our parser can understand.

class TokenType:

NUMBER = 'NUMBER'

VARIABLE = 'VARIABLE'

PLUS = 'PLUS'

MINUS = 'MINUS'

MULTIPLY = 'MULTIPLY'

DIVIDE = 'DIVIDE'

LPAREN = 'LPAREN'

RPAREN = 'RPAREN'

EOF = 'EOF'

class Token:

def __init__(self, type, value):

self.type = type

self.value = value

def __repr__(self):

return f'Token({self.type}, {self.value})'

Our Token class is beautifully simple. Each token has a type (what kind of thing it is) and a value (the actual content). For example, the number "42" becomes Token(NUMBER, 42), while the plus sign "+" becomes Token(PLUS, '+').

The lexical analyzer (or lexer) scans through the input character by character and groups them into tokens:

class Lexer:

def __init__(self, text):

self.text = text

self.pos = 0

self.current_char = self.text[self.pos] if self.pos < len(self.text) else None

def advance(self):

"""Move to the next character in the input"""

self.pos += 1

if self.pos >= len(self.text):

self.current_char = None

else:

self.current_char = self.text[self.pos]

def skip_whitespace(self):

"""Skip over spaces and tabs"""

while self.current_char is not None and self.current_char.isspace():

self.advance()

def read_number(self):

"""Read a multi-digit number"""

result = ''

while self.current_char is not None and self.current_char.isdigit():

result += self.current_char

self.advance()

return int(result)

The lexer maintains a position pointer that moves through the input text. The advance method moves this pointer forward, while skip_whitespace ignores spaces and tabs (which don't affect the meaning of mathematical expressions).

The read_number method demonstrates an important pattern in lexical analysis. When we encounter a digit, we don't just read one character - we keep reading until we've consumed the entire number. This allows us to handle multi-digit numbers like "123" as a single token rather than three separate digit tokens.

def get_next_token(self):

"""Extract the next token from the input"""

while self.current_char is not None:

if self.current_char.isspace():

self.skip_whitespace()

continue

if self.current_char.isdigit():

return Token(TokenType.NUMBER, self.read_number())

if self.current_char.isalpha():

var = self.current_char

self.advance()

return Token(TokenType.VARIABLE, var)

if self.current_char == '+':

self.advance()

return Token(TokenType.PLUS, '+')

if self.current_char == '-':

self.advance()

return Token(TokenType.MINUS, '-')

if self.current_char == '*':

self.advance()

return Token(TokenType.MULTIPLY, '*')

if self.current_char == '/':

self.advance()

return Token(TokenType.DIVIDE, '/')

if self.current_char == '(':

self.advance()

return Token(TokenType.LPAREN, '(')

if self.current_char == ')':

self.advance()

return Token(TokenType.RPAREN, ')')

raise Exception(f'Invalid character: {self.current_char}')

return Token(TokenType.EOF, None)

The get_next_token method is the heart of our lexer. It examines the current character and decides what kind of token to create. Notice how each branch handles a different type of input: digits become numbers, letters become variables, and special characters become operators or parentheses.

The method uses a while loop to skip over whitespace and continues processing until it finds a meaningful character. If it encounters a character it doesn't recognize, it raises an exception with a helpful error message.

ABSTRACT SYNTAX TREES: THE SKELETON OF MEANING

Once we have tokens, we need to organize them into a structure that represents the meaning of the expression. This structure is called an Abstract Syntax Tree (AST). Think of it as a family tree for mathematical operations.

class ASTNode:

"""Base class for all AST nodes"""

pass

class NumberNode(ASTNode):

def __init__(self, value):

self.value = value

def __repr__(self):

return f'NumberNode({self.value})'

class VariableNode(ASTNode):

def __init__(self, name):

self.name = name

def __repr__(self):

return f'VariableNode({self.name})'

class BinaryOpNode(ASTNode):

def __init__(self, left, operator, right):

self.left = left

self.operator = operator

self.right = right

def __repr__(self):

return f'BinaryOpNode({self.left}, {self.operator}, {self.right})'

Our AST nodes are like building blocks. NumberNode represents literal numbers, VariableNode represents variables, and BinaryOpNode represents operations that take two operands (like addition or multiplication).

The BinaryOpNode is particularly important because it captures the hierarchical nature of mathematical expressions. For the expression "2 + 3 * 4", we'd create a tree where the root is an addition operation, the left child is NumberNode(2), and the right child is a multiplication operation with its own children NumberNode(3) and NumberNode(4).

THE PARSER: WHERE THE MAGIC HAPPENS

Now we come to the star of our show: the recursive descent parser. This is where we implement the grammar rules and build our AST. The parser reads tokens and constructs the tree structure that represents the meaning of the input.

Notice how each method in our parser corresponds directly to a rule in our BNF grammar. This is the beauty of recursive descent - the code structure mirrors the grammar structure.

class Parser:

def __init__(self, lexer):

self.lexer = lexer

self.current_token = self.lexer.get_next_token()

def error(self, message="Invalid syntax"):

raise Exception(message)

def eat(self, token_type):

"""Consume a token of the expected type"""

if self.current_token.type == token_type:

self.current_token = self.lexer.get_next_token()

else:

self.error(f'Expected {token_type}, got {self.current_token.type}')

The eat method is crucial to understanding how recursive descent works. When we expect a specific type of token (like a plus sign), we call eat(TokenType.PLUS). This method verifies that the current token matches our expectation and then advances to the next token. If the token doesn't match, it raises an error.

This pattern of "expect and consume" appears throughout our parser. It's like reading a recipe and checking off ingredients as you use them.

def factor(self):

"""Parse a factor: number, variable, or parenthesized expression

Implements: <factor> ::= <number> | <variable> | '(' <expression> ')'

"""

token = self.current_token

if token.type == TokenType.NUMBER:

self.eat(TokenType.NUMBER)

return NumberNode(token.value)

elif token.type == TokenType.VARIABLE:

self.eat(TokenType.VARIABLE)

return VariableNode(token.value)

elif token.type == TokenType.LPAREN:

self.eat(TokenType.LPAREN)

node = self.expression() # Recursive call!

self.eat(TokenType.RPAREN)

return node

else:

self.error(f'Unexpected token: {token.type}')

The factor method implements the <factor> rule from our BNF grammar. Notice the direct correspondence: we handle numbers, variables, and parenthesized expressions exactly as specified in the grammar.

The recursive call to self.expression() when we encounter a left parenthesis is the essence of recursive descent. When we find a nested structure, we call the appropriate parsing method to handle it. This is where the "recursive" in "recursive descent" comes from.

The parentheses case is particularly elegant. We consume the opening parenthesis, recursively parse whatever expression is inside, then consume the closing parenthesis. The result is that parentheses effectively disappear from our AST, having served their purpose of controlling parsing order.

def term(self):

"""Parse a term: factors connected by * or /

Implements: <term> ::= <factor> ( ( '*' | '/' ) <factor> )*

"""

node = self.factor()

while self.current_token.type in (TokenType.MULTIPLY, TokenType.DIVIDE):

token = self.current_token

if token.type == TokenType.MULTIPLY:

self.eat(TokenType.MULTIPLY)

elif token.type == TokenType.DIVIDE:

self.eat(TokenType.DIVIDE)

node = BinaryOpNode(left=node, operator=token, right=self.factor())

return node

The term method implements the <term> rule from our BNF grammar. It demonstrates how we handle the repetition indicated by the asterisk in our grammar rule. We start by parsing one factor, then enter a loop that continues as long as we see multiplication or division operators.

Each iteration creates a new BinaryOpNode with the previous result as the left operand and a new factor as the right operand. This left-associative approach means that "a * b * c" is parsed as "(a * b) * c", which matches mathematical convention and ensures that operations are performed in the correct order.

def expression(self):

"""Parse expression: terms connected by + or -

Implements: <expression> ::= <term> ( ( '+' | '-' ) <term> )*

"""

node = self.term()

while self.current_token.type in (TokenType.PLUS, TokenType.MINUS):

token = self.current_token

if token.type == TokenType.PLUS:

self.eat(TokenType.PLUS)

elif token.type == TokenType.MINUS:

self.eat(TokenType.MINUS)

node = BinaryOpNode(left=node, operator=token, right=self.term())

return node

The expression method implements the <expression> rule from our BNF grammar. It follows the same pattern as term, but operates at a higher level in our grammar hierarchy. It combines terms using addition and subtraction.

Because expression calls term, and term calls factor, we automatically get the correct operator precedence: factors bind most tightly, then terms (multiplication/division), then expressions (addition/subtraction). This is the magic of our grammar design - the precedence is built into the structure.

def parse(self):

"""Parse the input and return the AST"""

node = self.expression()

if self.current_token.type != TokenType.EOF:

self.error("Expected end of input")

return node

The parse method is our entry point. It parses a complete expression and verifies that we've consumed all the input. If there are leftover tokens, it means the input contains syntax errors.

THE INTERPRETER: BRINGING EXPRESSIONS TO LIFE

Having an AST is wonderful, but it's just a data structure. To make our expressions actually compute results, we need an interpreter that can traverse the AST and evaluate it.

class Interpreter:

def __init__(self, variables=None):

self.variables = variables or {}

def visit(self, node):

"""Dispatch to the appropriate visit method"""

method_name = f'visit_{type(node).__name__}'

visitor = getattr(self, method_name, self.generic_visit)

return visitor(node)

def generic_visit(self, node):

raise Exception(f'No visit method for {type(node).__name__}')

Our interpreter uses the Visitor pattern, which is a clean way to perform operations on tree structures. The visit method looks at the type of node it receives and calls the corresponding visit method. This approach keeps the AST nodes simple (they just hold data) while allowing us to add new operations without modifying the node classes.

def visit_NumberNode(self, node):

"""Evaluate a number node"""

return node.value

def visit_VariableNode(self, node):

"""Evaluate a variable node"""

if node.name not in self.variables:

raise Exception(f'Undefined variable: {node.name}')

return self.variables[node.name]

def visit_BinaryOpNode(self, node):

"""Evaluate a binary operation node"""

left_val = self.visit(node.left)

right_val = self.visit(node.right)

if node.operator.type == TokenType.PLUS:

return left_val + right_val

elif node.operator.type == TokenType.MINUS:

return left_val - right_val

elif node.operator.type == TokenType.MULTIPLY:

return left_val * right_val

elif node.operator.type == TokenType.DIVIDE:

if right_val == 0:

raise Exception('Division by zero')

return left_val / right_val

else:

raise Exception(f'Unknown operator: {node.operator.type}')

The visit methods for each node type are beautifully simple. NumberNode just returns its value, VariableNode looks up the variable in our symbol table, and BinaryOpNode recursively evaluates its children and applies the operator.

Notice how the recursive structure of the AST naturally leads to recursive evaluation. To evaluate "2 + 3 * 4", we first evaluate the left side (2), then the right side (3 * 4), and finally apply the addition. The multiplication is evaluated first because it's deeper in the tree, which gives us the correct precedence.

PUTTING IT ALL TOGETHER: A COMPLETE EXAMPLE

Let's trace through a complete example to see how all the pieces work together. Consider the expression "x + 2 * (y - 3)" with variables x=5 and y=7.

First, the lexer breaks this into tokens:

- Token(VARIABLE, 'x')

- Token(PLUS, '+')

- Token(NUMBER, 2)

- Token(MULTIPLY, '*')

- Token(LPAREN, '(')

- Token(VARIABLE, 'y')

- Token(MINUS, '-')

- Token(NUMBER, 3)

- Token(RPAREN, ')')

- Token(EOF, None)

Then the parser builds an AST following our grammar rules. It starts with expression(), which calls term(), which calls factor() for the variable x. The parser continues, building a tree that represents the structure of the expression.

Finally, the interpreter traverses this tree, substituting variable values and performing calculations, ultimately returning the result:

5 + 2 * (7 - 3) = 5 + 2 * 4 = 5 + 8 = 13.

TESTING OUR COMPILER

A good compiler needs thorough testing. Here's how we can test our creation:

def test_compiler():

"""Test our compiler with various expressions"""

test_cases = [

("42", {}, 42),

("x", {"x": 10}, 10),

("2 + 3", {}, 5),

("2 * 3 + 4", {}, 10),

("2 + 3 * 4", {}, 14),

("(2 + 3) * 4", {}, 20),

("x + y * z", {"x": 1, "y": 2, "z": 3}, 7),

("(x + y) * (z - w)", {"x": 2, "y": 3, "z": 10, "w": 5}, 25),

]

for expression, variables, expected in test_cases:

lexer = Lexer(expression)

parser = Parser(lexer)

ast = parser.parse()

interpreter = Interpreter(variables)

result = interpreter.visit(ast)

print(f"Expression: {expression}")

print(f"Variables: {variables}")

print(f"Expected: {expected}, Got: {result}")

print(f"Test {'PASSED' if result == expected else 'FAILED'}")

print("-" * 40)

This test suite covers the essential features of our language: numbers, variables, all four operators, parentheses, and operator precedence. Each test case specifies an expression, the variable values to use, and the expected result.

FULL RUNNING EXAMPLE

Here's the complete, working implementation of our recursive descent compiler:

class TokenType:

NUMBER = 'NUMBER'

VARIABLE = 'VARIABLE'

PLUS = 'PLUS'

MINUS = 'MINUS'

MULTIPLY = 'MULTIPLY'

DIVIDE = 'DIVIDE'

LPAREN = 'LPAREN'

RPAREN = 'RPAREN'

EOF = 'EOF'

class Token:

def __init__(self, type, value):

self.type = type

self.value = value

def __repr__(self):

return f'Token({self.type}, {self.value})'

class Lexer:

def __init__(self, text):

self.text = text

self.pos = 0

self.current_char = self.text[self.pos] if self.pos < len(self.text) else None

def advance(self):

"""Move to the next character in the input"""

self.pos += 1

if self.pos >= len(self.text):

self.current_char = None

else:

self.current_char = self.text[self.pos]

def skip_whitespace(self):

"""Skip over spaces and tabs"""

while self.current_char is not None and self.current_char.isspace():

self.advance()

def read_number(self):

"""Read a multi-digit number"""

result = ''

while self.current_char is not None and self.current_char.isdigit():

result += self.current_char

self.advance()

return int(result)

def get_next_token(self):

"""Extract the next token from the input"""

while self.current_char is not None:

if self.current_char.isspace():

self.skip_whitespace()

continue

if self.current_char.isdigit():

return Token(TokenType.NUMBER, self.read_number())

if self.current_char.isalpha():

var = self.current_char

self.advance()

return Token(TokenType.VARIABLE, var)

if self.current_char == '+':

self.advance()

return Token(TokenType.PLUS, '+')

if self.current_char == '-':

self.advance()

return Token(TokenType.MINUS, '-')

if self.current_char == '*':

self.advance()

return Token(TokenType.MULTIPLY, '*')

if self.current_char == '/':

self.advance()

return Token(TokenType.DIVIDE, '/')

if self.current_char == '(':

self.advance()

return Token(TokenType.LPAREN, '(')

if self.current_char == ')':

self.advance()

return Token(TokenType.RPAREN, ')')

raise Exception(f'Invalid character: {self.current_char}')

return Token(TokenType.EOF, None)

class ASTNode:

"""Base class for all AST nodes"""

pass

class NumberNode(ASTNode):

def __init__(self, value):

self.value = value

def __repr__(self):

return f'NumberNode({self.value})'

class VariableNode(ASTNode):

def __init__(self, name):

self.name = name

def __repr__(self):

return f'VariableNode({self.name})'

class BinaryOpNode(ASTNode):

def __init__(self, left, operator, right):

self.left = left

self.operator = operator

self.right = right

def __repr__(self):

return f'BinaryOpNode({self.left}, {self.operator}, {self.right})'

class Parser:

def __init__(self, lexer):

self.lexer = lexer

self.current_token = self.lexer.get_next_token()

def error(self, message="Invalid syntax"):

raise Exception(message)

def eat(self, token_type):

"""Consume a token of the expected type"""

if self.current_token.type == token_type:

self.current_token = self.lexer.get_next_token()

else:

self.error(f'Expected {token_type}, got {self.current_token.type}')

def factor(self):

"""Parse a factor: number, variable, or parenthesized expression

Implements: <factor> ::= <number> | <variable> | '(' <expression> ')'

"""

token = self.current_token

if token.type == TokenType.NUMBER:

self.eat(TokenType.NUMBER)

return NumberNode(token.value)

elif token.type == TokenType.VARIABLE:

self.eat(TokenType.VARIABLE)

return VariableNode(token.value)

elif token.type == TokenType.LPAREN:

self.eat(TokenType.LPAREN)

node = self.expression()

self.eat(TokenType.RPAREN)

return node

else:

self.error(f'Unexpected token: {token.type}')

def term(self):

"""Parse a term: factors connected by * or /

Implements: <term> ::= <factor> ( ( '*' | '/' ) <factor> )*

"""

node = self.factor()

while self.current_token.type in (TokenType.MULTIPLY, TokenType.DIVIDE):

token = self.current_token

if token.type == TokenType.MULTIPLY:

self.eat(TokenType.MULTIPLY)

elif token.type == TokenType.DIVIDE:

self.eat(TokenType.DIVIDE)

node = BinaryOpNode(left=node, operator=token, right=self.factor())

return node

def expression(self):

"""Parse an expression: terms connected by + or -

Implements: <expression> ::= <term> ( ( '+' | '-' ) <term> )*

"""

node = self.term()

while self.current_token.type in (TokenType.PLUS, TokenType.MINUS):

token = self.current_token

if token.type == TokenType.PLUS:

self.eat(TokenType.PLUS)

elif token.type == TokenType.MINUS:

self.eat(TokenType.MINUS)

node = BinaryOpNode(left=node, operator=token, right=self.term())

return node

def parse(self):

"""Parse the input and return the AST"""

node = self.expression()

if self.current_token.type != TokenType.EOF:

self.error("Expected end of input")

return node

class Interpreter:

def __init__(self, variables=None):

self.variables = variables or {}

def visit(self, node):

"""Dispatch to the appropriate visit method"""

method_name = f'visit_{type(node).__name__}'

visitor = getattr(self, method_name, self.generic_visit)

return visitor(node)

def generic_visit(self, node):

raise Exception(f'No visit method for {type(node).__name__}')

def visit_NumberNode(self, node):

"""Evaluate a number node"""

return node.value

def visit_VariableNode(self, node):

"""Evaluate a variable node"""

if node.name not in self.variables:

raise Exception(f'Undefined variable: {node.name}')

return self.variables[node.name]

def visit_BinaryOpNode(self, node):

"""Evaluate a binary operation node"""

left_val = self.visit(node.left)

right_val = self.visit(node.right)

if node.operator.type == TokenType.PLUS:

return left_val + right_val

elif node.operator.type == TokenType.MINUS:

return left_val - right_val

elif node.operator.type == TokenType.MULTIPLY:

return left_val * right_val

elif node.operator.type == TokenType.DIVIDE:

if right_val == 0:

raise Exception('Division by zero')

return left_val / right_val

else:

raise Exception(f'Unknown operator: {node.operator.type}')

def evaluate_expression(expression, variables=None):

"""Convenience function to evaluate a mathematical expression"""

lexer = Lexer(expression)

parser = Parser(lexer)

ast = parser.parse()

interpreter = Interpreter(variables)

return interpreter.visit(ast)

def test_compiler():

"""Test our compiler with various expressions"""

test_cases = [

("42", {}, 42),

("x", {"x": 10}, 10),

("2 + 3", {}, 5),

("2 * 3 + 4", {}, 10),

("2 + 3 * 4", {}, 14),

("(2 + 3) * 4", {}, 20),

("x + y * z", {"x": 1, "y": 2, "z": 3}, 7),

("(x + y) * (z - w)", {"x": 2, "y": 3, "z": 10, "w": 5}, 25),

]

print("Testing our recursive descent compiler:")

print("=" * 50)

for expression, variables, expected in test_cases:

try:

result = evaluate_expression(expression, variables)

status = "PASSED" if result == expected else "FAILED"

print(f"Expression: {expression}")

print(f"Variables: {variables}")

print(f"Expected: {expected}, Got: {result}")

print(f"Test {status}")

except Exception as e:

print(f"Expression: {expression}")

print(f"Variables: {variables}")

print(f"ERROR: {e}")

print("Test FAILED")

print("-" * 40)

def main():

"""Interactive calculator using our compiler"""

print("Mathematical Expression Calculator")

print("Enter expressions like: x + 2 * (y - 3)")

print("Type 'quit' to exit")

print()

variables = {}

while True:

try:

expression = input("Expression: ").strip()

if expression.lower() == 'quit':

break

if '=' in expression and not any(op in expression for op in ['+', '-', '*', '/', '(', ')']):

# Variable assignment

var_name, value = expression.split('=')

var_name = var_name.strip()

value = float(value.strip())

variables[var_name] = value

print(f"Set {var_name} = {value}")

else:

# Expression evaluation

result = evaluate_expression(expression, variables)

print(f"Result: {result}")

except Exception as e:

print(f"Error: {e}")

print()

if __name__ == "__main__":

test_compiler()

print("\n" + "=" * 50)

main()

CONCLUSION: YOUR JOURNEY INTO COMPILER CONSTRUCTION

Congratulations! You've just built a complete recursive descent compiler from scratch. This isn't just a toy example - it's a real compiler that demonstrates all the fundamental concepts used in professional language implementations.

What makes recursive descent so powerful is its directness. The structure of your code mirrors the structure of your grammar, making it easy to understand, debug, and extend. Each grammar rule becomes a method, each choice becomes an if-statement, and each repetition becomes a loop.

Your compiler correctly handles operator precedence, associativity, and nested expressions. It builds a proper abstract syntax tree and can evaluate expressions with variables. These are the same techniques used in compilers for languages like Pascal, and they form the foundation for understanding more advanced parsing techniques.

From here, you could extend your compiler in many directions. You could add more operators (exponentiation, modulo), support for floating-point numbers, function calls, or even control structures like if-statements and loops. The recursive descent framework you've built can accommodate all of these extensions.

The journey from text to executable code is one of the most beautiful processes in computer science, and you've now experienced it firsthand. Every time you write code in any programming language, remember that somewhere, a compiler very much like the one you just built is working hard to transform your thoughts into machine instructions.

Welcome to the wonderful world of language implementation!

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Wednesday, February 11, 2026

BUILDING YOUR FIRST RECURSIVE DESCENT COMPILER: A JOURNEY INTO THE HEART OF LANGUAGE PROCESSING

WHAT EXACTLY IS A RECURSIVE DESCENT COMPILER?

THE ANATOMY OF OUR TARGET LANGUAGE

THE FORMAL GRAMMAR IN BNF NOTATION

THE GRAMMAR: THE DNA OF OUR LANGUAGE

THE LEXICAL ANALYZER: BREAKING TEXT INTO TOKENS

ABSTRACT SYNTAX TREES: THE SKELETON OF MEANING

THE PARSER: WHERE THE MAGIC HAPPENS

THE INTERPRETER: BRINGING EXPRESSIONS TO LIFE

PUTTING IT ALL TOGETHER: A COMPLETE EXAMPLE

TESTING OUR COMPILER

FULL RUNNING EXAMPLE

CONCLUSION: YOUR JOURNEY INTO COMPILER CONSTRUCTION

No comments:

About Me