INTRODUCTION TO SUPERMATH
SuperMath is a streamlined mathematical notation language designed to make writing complex mathematical formulas as simple as typing plain text. The primary goal of SuperMath is to eliminate the steep learning curve associated with traditional mathematical typesetting systems while maintaining the expressiveness needed for advanced mathematics, physics, quantum mechanics, tensor calculus, linear algebra, statistics, and engineering. Unlike LaTeX, which requires memorizing numerous commands and special syntax, SuperMath uses intuitive conventions that mirror how people naturally think about and speak mathematical expressions.
The design philosophy behind SuperMath centers on three core principles. First, readability comes before brevity. A SuperMath expression should be immediately understandable to anyone who reads it, even without prior knowledge of the syntax. Second, the syntax should follow natural mathematical conventions wherever possible. For instance, multiplication is implied when appropriate, just as in standard mathematical notation. Third, common operations should require minimal typing, while rare operations may require slightly more verbose syntax.
Consider a simple example. The quadratic formula in traditional LaTeX requires writing something like “x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}”. In SuperMath, this becomes “x = (-b +- sqrt(b^2 - 4ac)) / (2a)”. Notice how SuperMath uses familiar programming language conventions like parentheses for grouping and forward slashes for division, making it immediately accessible to anyone with basic computer literacy.
THE SUPERMATH SYNTAX SPECIFICATION
At its foundation, SuperMath treats mathematical expressions as combinations of operands and operators. Basic arithmetic operations use standard ASCII characters. Addition uses the plus sign, subtraction uses the minus sign or hyphen, multiplication can be explicit using an asterisk or implicit through juxtaposition, and division uses the forward slash. These choices align with how most programming languages handle arithmetic, reducing the cognitive load for users already familiar with such systems.
Exponents and powers represent one of the most common operations in mathematics. SuperMath uses the caret symbol to denote exponents. For simple single-character exponents, you can write “x^2” to represent x squared. For complex exponents, you must use curly braces like “x^{2n+1}”. This dual syntax accommodates both quick typing for simple cases and clear grouping for complex exponents like “e^{-t/tau}”.
Subscripts follow the same pattern but use the underscore character. Chemical formulas like water can be written as “H_2O”, while mathematical sequences use notation like “a_{n+1}”. Single-character subscripts do not require braces, but complex subscripts must be enclosed for clarity. A critical feature is that subscripts and superscripts can appear in any order and on the same symbol. Both “x_i^2” and “x^2_i” produce the same result, which is x with subscript i and superscript 2. This flexibility matches how mathematicians naturally write formulas.
Fractions in SuperMath can be expressed in two ways. The inline notation uses the forward slash, which is suitable for simple fractions appearing within running text. For display-style fractions that deserve more visual prominence, SuperMath provides the “frac(numerator, denominator)” function. Thus, “1/2” and “frac(1, 2)” both represent one-half, but the latter signals that the converter should render it as a stacked fraction.
Greek letters permeate mathematical and scientific writing. SuperMath uses full English names for Greek letters, enclosed in backslashes. For instance, “\alpha” represents the Greek letter alpha, “\beta” represents beta, and so forth. Capital Greek letters use capitalized names, so “\Omega” produces the capital omega symbol. This approach sacrifices some brevity but gains tremendous clarity, especially for those less familiar with Greek alphabet ordering.
Special mathematical operators extend beyond basic arithmetic. The nabla operator for gradient calculations uses “\nabla”. The partial derivative symbol uses “\partial”. Comparison operators include “!=” for not equal, “<=” for less than or equal, “>=” for greater than or equal, “<<” for much less than, and “>>” for much greater than. The approximately equal symbol uses “~~”. Proportionality uses “prop”. Infinity uses “inf” or “infty”. Set membership uses “in” and “notin”. Set operations include “cup” for union, “cap” for intersection, “subset” for proper subset, and “subseteq” for subset or equal.
Logical operators form another essential category. Conjunction uses “and” or the ampersand symbol. Disjunction uses “or”. Negation uses “not”. Implication uses “implies” or “=>”. Equivalence uses “iff” or “<=>”. Universal quantification uses “forall”. Existential quantification uses “exists”. These word-based operators make logical expressions readable without requiring knowledge of symbolic logic notation.
Special number sets have dedicated syntax. The real numbers use “\reals”, complex numbers use “\complex”, natural numbers use “\naturals”, integers use “\integers”, and rational numbers use “\rationals”. These produce the appropriate blackboard bold letters in the output.
Functions with multiple arguments represent a crucial capability in SuperMath. The function syntax uses parentheses with comma-separated arguments, exactly like programming languages. A function with two arguments looks like “func(arg1, arg2)” and with three arguments like “func(arg1, arg2, arg3)”. The parser handles any number of arguments, and individual renderers determine how to format them. For example, the logarithm with arbitrary base uses “log(10, 100)” to compute log base 10 of 100. The binomial coefficient, representing “n choose k” or “n over k”, uses “binomial(n, k)” or the shorthand “choose(n, k)”. Modular arithmetic uses “mod(a, n)” to compute a modulo n. The two-argument arctangent uses “atan2(y, x)”. Greatest common divisor uses “gcd(a, b)” and can accept more arguments like “gcd(a, b, c)”. Least common multiple uses “lcm(a, b)” similarly.
Calculus operations require special attention. Integrals in SuperMath use the syntax “integral(expression, variable, lower, upper)” for definite integrals and “integral(expression, variable)” for indefinite integrals. For example, the integral of x squared from zero to one becomes “integral(x^2, x, 0, 1)”. Derivatives use “derivative(expression, variable)” for first derivatives and “derivative(expression, variable, n)” for nth derivatives. Partial derivatives extend this to “partial(expression, variable)” for first partials and “partial(expression, variable, n)” for nth partial derivatives. The convenient shorthand “d/dx(f(x))” also works for derivatives.
Summation and product notation follow similar patterns. The sum from i equals one to n of i squared becomes “sum(i^2, i, 1, n)”. Products use “product(expression, index, lower, upper)” with identical syntax structure. Limits use “limit(expression, variable, value)” to represent the limit as a variable approaches a value. You can specify direction with “limit(expression, variable, value, direction)” where direction is “+” for right limit, “-” for left limit, or omitted for two-sided limit.
Matrices and vectors require two-dimensional notation. SuperMath uses square brackets with semicolons to separate rows. A two-by-two matrix looks like “[1, 2; 3, 4]” where the semicolon indicates a new row. Column vectors become “[1; 2; 3]” and row vectors are “[1, 2, 3]”. Matrix operations use functional notation. Transpose uses “transpose(A)”. Determinant uses “det(A)”. Matrix trace uses “trace(A)”. Matrix inverse uses “inv(A)”. Matrix rank uses “rank(A)”. Reduced row echelon form uses “rref(A)”. Eigenvalues use “eigenvalues(A)” or the shorthand “eig(A)”. Eigenvectors use “eigenvectors(A)”. The characteristic polynomial uses “charpoly(A)”. Matrix exponential uses “expm(A)”. Null space uses “null(A)” and column space uses “col(A)”.
Vector notation includes several conventions. Bold vectors use “vec(x)” to produce a bold x. Vector arrows use “arrow(x)” to produce x with an arrow on top. The dot product between two vectors uses “dot(a, b)” with two arguments. The cross product uses “cross(a, b)” also with two arguments. Vector magnitude uses “abs(v)” for absolute value notation or “norm(v)” for norm notation with double bars. These multi-argument functions demonstrate how SuperMath handles operations requiring multiple inputs.
Accents and decorations provide additional mathematical semantics. A dot above a variable, common in physics for time derivatives, uses “dot(x)” with one argument. Double dots use “ddot(x)”. Hats use “hat(x)”. Bars use “bar(x)”. Tildes use “tilde(x)”. These decorations can combine with subscripts and superscripts, so “dot(x)_i” produces x-dot with subscript i.
Quantum mechanics introduces specialized notation that SuperMath fully supports. Dirac bra-ket notation forms the foundation of quantum mechanical expressions. A ket vector uses “ket(psi)” which produces |ψ⟩. A bra vector uses “bra(phi)” which produces ⟨φ|. Inner products combine both as “braket(phi, psi)” which produces ⟨φ|ψ⟩ and demonstrates another multi-argument function. Expectation values use “expectation(A, psi)” with two arguments which produces ⟨ψ|A|ψ⟩. These notations are essential for quantum theory and previously required cumbersome LaTeX commands.
Tensor notation receives special treatment to support Einstein summation convention. While SuperMath cannot automatically infer summation from repeated indices, it provides clear notation for tensors. A tensor with multiple indices uses standard subscript and superscript notation like “T^{ij}*{kl}”. The Kronecker delta uses “delta*{ij}” or the specialized function “kronecker(i, j)” with two arguments. The Levi-Civita tensor uses “epsilon_{ijk}” or “levicivita(i, j, k)” with three arguments. Covariant derivatives use “nabla_i” combining the nabla operator with subscripts.
Statistics and probability introduce a comprehensive set of functions essential for data analysis and inference. Descriptive statistics functions operate on data sets or variables. The arithmetic mean uses “mean(X)” with a single argument representing the data. The median uses “median(X)”. The mode uses “mode(X)”. Sample variance uses “var(X)” and population variance uses “pvar(X)”. Sample standard deviation uses “std(X)” or “stdev(X)”, while population standard deviation uses “pstd(X)”. The range uses “range(X)”. Quantiles use “quantile(X, p)” with two arguments where p is the quantile level. Percentiles use “percentile(X, p)” similarly. The interquartile range uses “iqr(X)”.
Measures of association between two variables use two-argument functions. Covariance uses “cov(X, Y)” for sample covariance. Correlation coefficient uses “corr(X, Y)” or “cor(X, Y)”. These functions are essential for regression analysis and multivariate statistics.
Probability notation uses several conventions. Generic probability uses “prob(event)” or the shorthand “P(event)”. Expected value can be written as “E(X)” using the capital E function. Variance in probability notation uses “Var(X)” and covariance uses “Cov(X, Y)”. These notational functions distinguish between the statistical operator and the computed value.
Probability distributions require multi-argument functions specifying parameters. The normal distribution probability density function uses “normal(x, mu, sigma)” with three arguments for the value, mean, and standard deviation. The standard normal uses “normal(x, 0, 1)” or simply “phi(x)”. Binomial probability uses “binomial_prob(n, k, p)” with three arguments for number of trials, number of successes, and probability. Poisson probability uses “poisson_prob(k, lambda)” with the count and rate parameter. Other distributions follow similar patterns with appropriate parameters.
Statistical inference functions support hypothesis testing and confidence intervals. Standard error uses “se(X)” for a single sample. Z-scores use “zscore(x, mu, sigma)” with three arguments. T-statistics use “tstat(x, mu, se)” similarly. Confidence intervals use “ci(X, alpha)” where alpha is the significance level. P-values might be expressed as “pvalue(statistic, distribution)”.
Absolute values and norms use different syntax to distinguish them. Simple absolute values use “abs(x)” which produces vertical bars like |x|. Vector norms use “norm(x)” which produces double vertical bars like ||x||. Matrix norms can specify type with “norm(A, p)” where p is the norm type. Floor and ceiling functions use “floor(x)” and “ceil(x)” respectively.
Special functions extend SuperMath’s capabilities. The square root function uses “sqrt(x)”, while the nth root uses “root(n, x)” with two arguments showing the order first then the radicand. Trigonometric functions follow standard programming conventions with names like “sin(x)”, “cos(x)”, and “tan(x)”. Inverse trigonometric functions use “arcsin(x)”, “arccos(x)”, and “arctan(x)”. The two-argument arctangent uses “atan2(y, x)” for computing angles from coordinates. Hyperbolic functions use “sinh(x)”, “cosh(x)”, and “tanh(x)”. Logarithms use “log(x)” for base-10, “ln(x)” for natural logarithm, and “log(base, x)” with two arguments for arbitrary bases. Exponential function uses “exp(x)”.
Combinatorial functions handle discrete mathematics. Factorial uses “factorial(n)”. Binomial coefficients representing “n choose k” or “n over k” use “binomial(n, k)” or the alternative “choose(n, k)”, both with two arguments. This notation is fundamental in combinatorics and appears as the stacked fraction notation in output. Permutations use “permutation(n, k)” also with two arguments. The gamma function uses “gamma(x)”. The beta function uses “beta(a, b)” with two arguments.
Number theory functions support modular arithmetic and divisibility. Modular reduction uses “mod(a, n)” with two arguments. Greatest common divisor uses “gcd(a, b)” and can accept more arguments like “gcd(a, b, c)”. Least common multiple uses “lcm(a, b)” similarly. Floor division uses “floordiv(a, b)” with two arguments.
PRACTICAL EXAMPLES OF SUPERMATH NOTATION
To illustrate how SuperMath handles real mathematical content across different disciplines, consider several comprehensive examples. Einstein’s famous mass-energy equivalence becomes “E = mc^2”, which is straightforward and requires no special formatting. The Pythagorean theorem can be written as “a^2 + b^2 = c^2”, equally simple and readable.
Moving to more complex territory, the Gaussian distribution function demonstrates SuperMath’s handling of multi-level expressions. The probability density function becomes “f(x) = frac(1, \sigma\ sqrt(2\pi)) exp(-frac((x - \mu)^2, 2\sigma^2))”. Alternatively, using the normal distribution function, this becomes “f(x) = normal(x, \mu, \sigma)”. Notice how nested fractions and Greek letters combine naturally.
Statistics provides extensive practical examples. The sample variance formula demonstrates summation with statistical functions as “var(X) = frac(1, n-1) sum((x_i - mean(X))^2, i, 1, n)”. The correlation coefficient shows multi-argument function composition as “corr(X, Y) = frac(cov(X, Y), std(X) std(Y))”. The standard error of the mean uses “se(X) = frac(std(X), sqrt(n))”. The t-statistic for a one-sample test becomes “t = frac(mean(X) - \mu_0, se(X))”.
Linear regression equations use subscripts and statistical notation. Simple linear regression appears as “y = \beta_0 + \beta_1 x + \epsilon” where epsilon represents the error term. The least squares estimator for the slope uses “\beta_1 = frac(cov(X, Y), var(X))”. Multiple regression extends to “y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \epsilon”.
Probability theory provides examples using the probability notation. Bayes’ theorem becomes “P(A|B) = frac(P(B|A) P(A), P(B))”. The law of total expectation uses “E(X) = sum(P(X=x_i) x_i, i, 1, n)”. The variance formula in terms of expectation becomes “Var(X) = E(X^2) - E(X)^2”. The binomial probability mass function uses “P(X=k) = binomial_prob(n, k, p) = binomial(n, k) p^k (1-p)^{n-k}”.
The central limit theorem statement demonstrates how statistical notation combines with limits. It can be expressed as “limit(frac(mean(X) - \mu, \sigma/sqrt(n)), n, inf) follows normal(0, 1)” showing that the standardized sample mean approaches the standard normal distribution.
Confidence intervals for the mean use “ci(\mu, 0.05) = mean(X) +- t_{n-1, 0.025} se(X)” where the plus-minus operator and statistical functions combine. Hypothesis testing notation uses “H_0: \mu\ = \mu_0 versus H_1: \mu\ != \mu_0” with subscripted hypotheses.
Combinatorics demonstrates the binomial coefficient notation. Pascal’s identity becomes “binomial(n, k) = binomial(n-1, k-1) + binomial(n-1, k)” or using the alternative notation “choose(n, k) = choose(n-1, k-1) + choose(n-1, k)”. The binomial theorem uses “sum(binomial(n, k) x^k y^{n-k}, k, 0, n) = (x + y)^n”. These examples show how “n over k” notation integrates seamlessly.
The Schrodinger equation showcases quantum mechanics notation. The time-dependent version using Dirac notation becomes “i\hbar\ derivative(ket(\psi), t) = H ket(\psi)” where H represents the Hamiltonian operator. This demonstrates how derivative notation combines with quantum ket notation.
Quantum mechanical expectation values demonstrate the bra-ket notation with multiple arguments. The expectation value of momentum becomes “expectation(p, \psi)” which renders as ⟨ψ|p|ψ⟩. The uncertainty principle appears as “\Delta\ x \Delta\ p >= \hbar/2”. These examples show how SuperMath makes quantum notation accessible.
Maxwell’s equations in differential form demonstrate vector calculus notation. Gauss’s law becomes “div(vec(E)) = \rho\ / \epsilon_0”, where “div” represents the divergence operator applied to the electric field vector. Faraday’s law uses the curl operator as “curl(vec(E)) = -partial(vec(B), t)”. The complete set of Maxwell’s equations combines these operators with vector fields and scalar fields in a unified notation.
General relativity introduces tensor notation with Einstein summation convention. The Einstein field equations can be written as “R_{ij} - frac(1,2)R g_{ij} = frac(8\pi\ G, c^4) T_{ij}” where repeated indices imply summation. The Riemann curvature tensor uses multiple indices as “R^i_{jkl}”. The metric tensor appears as “g_{ij}” with covariant indices or “g^{ij}” with contravariant indices. The Kronecker delta can be written as “kronecker(i, j)” or simply “\delta_{ij}”.
Linear algebra provides extensive examples. Eigenvalue equations appear as “A vec(v) = \lambda\ vec(v)” showing the characteristic equation. Computing eigenvalues uses “eigenvalues(A)” as a function call. Matrix determinants use “det(A)” while traces use “trace(A)”. Matrix inversion appears as “inv(A)”. The characteristic polynomial uses “charpoly(A, \lambda)” with two arguments showing the matrix and variable. A complete eigendecomposition might be written as “A = V diag(eigenvalues(A)) inv(V)” showing how functions compose with operations.
Chemical equations also benefit from SuperMath’s subscript notation. The combustion of methane becomes “CH_4 + 2O_2 -> CO_2 + 2H_2O”, which is both visually clear and easy to type. More complex organic chemistry with multiple functional groups uses similar notation with appropriate groupings and parentheses.
THE CONVERTER TOOL ARCHITECTURE
The SuperMath converter tool consists of three main components working in concert. The lexical analyzer, or lexer, breaks the input string into meaningful tokens. The parser validates the token sequence and builds an abstract syntax tree representing the mathematical expression’s structure. Finally, the renderer traverses this tree and generates the target format, whether LaTeX, MathML for Microsoft Office, or other output formats. This clean separation of concerns allows each component to evolve independently and makes the system maintainable.
The lexer operates on a character-by-character basis, identifying token boundaries and classifying each token’s type. Numbers become numeric literals, operators like plus and minus become operator tokens, and function names become function tokens. Greek letter markers trigger special handling to recognize the full letter name. The lexer maintains position information for error reporting, allowing the tool to pinpoint exactly where syntax errors occur in the input. The lexer handles multi-character operators like “+-” for plus-minus, “=>” for implication, and “!=” for not equal by looking ahead one or more characters.
The parser implements a recursive descent strategy, processing tokens according to SuperMath’s grammar rules. It recognizes operator precedence, ensuring that multiplication and division bind more tightly than addition and subtraction. Function calls receive special parsing treatment, with the parser expecting opening parentheses, argument lists separated by commas, and closing parentheses. The parser handles functions with any number of arguments by accumulating expression nodes between commas. It constructs tree nodes representing each operation, with child nodes representing operands or sub-expressions. A critical improvement in the parser handles subscripts and superscripts at the same level, allowing both “x_i^2” and “x^2_i” to produce identical results representing x with both a subscript and superscript.
The abstract syntax tree provides a format-independent representation of the mathematical expression. Each node type corresponds to an operation or value. Leaf nodes represent numbers, variables, constants, or Greek letters. Internal nodes represent operations like addition, multiplication, or function application. Special node types handle subscripts, superscripts, matrices, and quantum mechanical notation. The FunctionNode type stores both the function name and a list of argument nodes, enabling arbitrary-arity functions. This tree structure makes it straightforward to add new output formats without modifying the parsing logic. The tree also enables future optimization passes that could simplify expressions or detect common patterns.
The LaTeX renderer walks the syntax tree recursively, generating LaTeX commands for each node type. Simple operations map directly to LaTeX operators. Functions like square root become LaTeX commands like “\sqrt{x}”. Fractions trigger “\frac{numerator}{denominator}” output. Greek letters map to their LaTeX equivalents such as “\alpha” or “\beta”. For multi-argument functions, the renderer formats arguments according to the specific function’s requirements. Some functions like binomial coefficients use specialized LaTeX commands like “\binom{n}{k}”. Others use standard function notation with parentheses and comma-separated arguments. The renderer handles parenthesization automatically, adding LaTeX grouping where needed for correct precedence. Special attention goes to quantum mechanical notation, where bra-ket notation requires specific LaTeX packages or custom commands. The renderer can optionally include necessary package declarations in its output.
The Microsoft Office renderer targets MathML, which is the mathematical markup language supported by Microsoft Word and other Office applications. MathML uses XML tags to represent mathematical structure. The renderer emits tags like “msup” for superscripts, “mfrac” for fractions, and “mi” for identifiers. For multi-argument functions, the renderer creates appropriate groupings with “mrow” elements and separates arguments with comma operators. Modern versions of Microsoft Office can import MathML directly, allowing SuperMath expressions to appear as native equation objects. The renderer produces MathML 3.0 compatible output to ensure maximum compatibility across different Office versions. Special MathML elements like “msubsup” for combined subscript-superscript notation produce cleaner and more semantic output than nested elements.
COMPLETE PRODUCTION-READY IMPLEMENTATION WITH FULL STATISTICS SUPPORT
The following presents a complete, production-ready implementation of the SuperMath converter tool with all corrections, enhancements, full linear algebra support, and comprehensive statistics functions. This implementation handles the full SuperMath syntax including quantum mechanical notation, tensor calculus, linear algebra operations, statistics and probability, multi-argument functions, and all common mathematical operations. The code follows clean architecture principles with clear separation between lexical analysis, parsing, and rendering concerns.
import re
from enum import Enum
from typing import List, Optional, Union, Dict
class TokenType(Enum):
NUMBER = 'NUMBER'
VARIABLE = 'VARIABLE'
PLUS = 'PLUS'
MINUS = 'MINUS'
MULTIPLY = 'MULTIPLY'
DIVIDE = 'DIVIDE'
POWER = 'POWER'
UNDERSCORE = 'UNDERSCORE'
LPAREN = 'LPAREN'
RPAREN = 'RPAREN'
LBRACE = 'LBRACE'
RBRACE = 'RBRACE'
LBRACKET = 'LBRACKET'
RBRACKET = 'RBRACKET'
LANGLE = 'LANGLE'
RANGLE = 'RANGLE'
PIPE = 'PIPE'
COMMA = 'COMMA'
SEMICOLON = 'SEMICOLON'
FUNCTION = 'FUNCTION'
GREEK = 'GREEK'
SPECIAL = 'SPECIAL'
OPERATOR = 'OPERATOR'
ARROW = 'ARROW'
EOF = 'EOF'
class Token:
def __init__(self, token_type: TokenType, value: any, position: int):
self.type = token_type
self.value = value
self.position = position
def __repr__(self):
return f"Token({self.type}, {self.value}, {self.position})"
class ASTNode:
pass
class NumberNode(ASTNode):
def __init__(self, value: float):
self.value = value
class VariableNode(ASTNode):
def __init__(self, name: str):
self.name = name
class BinaryOpNode(ASTNode):
def __init__(self, operator: str, left: ASTNode, right: ASTNode):
self.operator = operator
self.left = left
self.right = right
class UnaryOpNode(ASTNode):
def __init__(self, operator: str, operand: ASTNode):
self.operator = operator
self.operand = operand
class SuperscriptNode(ASTNode):
def __init__(self, base: ASTNode, exponent: ASTNode):
self.base = base
self.exponent = exponent
class SubscriptNode(ASTNode):
def __init__(self, base: ASTNode, subscript: ASTNode):
self.base = base
self.subscript = subscript
class SubSuperscriptNode(ASTNode):
def __init__(self, base: ASTNode, subscript: ASTNode, superscript: ASTNode):
self.base = base
self.subscript = subscript
self.superscript = superscript
class FunctionNode(ASTNode):
def __init__(self, name: str, arguments: List[ASTNode]):
self.name = name
self.arguments = arguments
class GreekLetterNode(ASTNode):
def __init__(self, letter: str):
self.letter = letter
class SpecialSymbolNode(ASTNode):
def __init__(self, symbol: str):
self.symbol = symbol
class MatrixNode(ASTNode):
def __init__(self, rows: List[List[ASTNode]]):
self.rows = rows
class Lexer:
def __init__(self, text: str):
self.text = text
self.position = 0
self.current_char = self.text[0] if text else None
self.functions = {
'sqrt', 'root', 'frac', 'sin', 'cos', 'tan', 'arcsin', 'arccos', 'arctan',
'sinh', 'cosh', 'tanh', 'log', 'ln', 'exp', 'integral', 'derivative',
'partial', 'sum', 'product', 'limit', 'transpose', 'det', 'trace',
'div', 'curl', 'grad', 'dot', 'cross', 'norm', 'abs', 'floor', 'ceil',
'vec', 'arrow', 'hat', 'bar', 'tilde', 'ddot',
'ket', 'bra', 'braket', 'expectation',
'min', 'max', 'gcd', 'lcm', 'mod', 'atan2', 'floordiv',
'eigenvalues', 'eigenvectors', 'eig', 'rank', 'inv', 'rref',
'charpoly', 'expm', 'null', 'col',
'binomial', 'choose', 'permutation', 'factorial', 'gamma', 'beta',
'kronecker', 'levicivita',
'mean', 'median', 'mode', 'var', 'pvar', 'std', 'stdev', 'pstd',
'cov', 'corr', 'cor', 'range', 'quantile', 'percentile', 'iqr',
'prob', 'E', 'Var', 'Cov', 'P',
'normal', 'phi', 'binomial_prob', 'poisson_prob',
'se', 'ci', 'zscore', 'tstat', 'pvalue', 'diag'
}
self.greek_letters = {
'alpha', 'beta', 'gamma', 'delta', 'epsilon', 'zeta',
'eta', 'theta', 'iota', 'kappa', 'lambda', 'mu',
'nu', 'xi', 'omicron', 'pi', 'rho', 'sigma',
'tau', 'upsilon', 'phi', 'chi', 'psi', 'omega',
'Gamma', 'Delta', 'Theta', 'Lambda', 'Xi', 'Pi',
'Sigma', 'Upsilon', 'Phi', 'Psi', 'Omega'
}
self.special_symbols = {
'nabla', 'partial', 'inf', 'infty', 'hbar',
'reals', 'complex', 'naturals', 'integers', 'rationals'
}
self.operators = {
'+-': 'PLUSMINUS',
'!=': 'NOTEQUAL',
'<=': 'LEQ',
'>=': 'GEQ',
'<<': 'MUCH_LESS',
'>>': 'MUCH_GREATER',
'~~': 'APPROX',
'->': 'RIGHTARROW',
'<-': 'LEFTARROW',
'=>': 'IMPLIES',
'<=>': 'IFF',
'prop': 'PROPORTIONAL',
'in': 'IN',
'notin': 'NOTIN',
'cup': 'UNION',
'cap': 'INTERSECTION',
'subset': 'SUBSET',
'subseteq': 'SUBSETEQ',
'and': 'AND',
'or': 'OR',
'not': 'NOT',
'forall': 'FORALL',
'exists': 'EXISTS'
}
def advance(self):
self.position += 1
if self.position < len(self.text):
self.current_char = self.text[self.position]
else:
self.current_char = None
def peek(self, offset: int = 1) -> Optional[str]:
peek_pos = self.position + offset
if peek_pos < len(self.text):
return self.text[peek_pos]
return None
def skip_whitespace(self):
while self.current_char and self.current_char.isspace():
self.advance()
def read_number(self) -> Token:
start_pos = self.position
num_str = ''
while self.current_char and (self.current_char.isdigit() or self.current_char == '.'):
num_str += self.current_char
self.advance()
return Token(TokenType.NUMBER, float(num_str), start_pos)
def read_identifier(self) -> Token:
start_pos = self.position
id_str = ''
while self.current_char and (self.current_char.isalnum() or self.current_char == '_'):
id_str += self.current_char
self.advance()
if id_str in self.functions:
return Token(TokenType.FUNCTION, id_str, start_pos)
elif id_str in self.special_symbols:
return Token(TokenType.SPECIAL, id_str, start_pos)
elif id_str in self.operators:
return Token(TokenType.OPERATOR, id_str, start_pos)
else:
return Token(TokenType.VARIABLE, id_str, start_pos)
def read_greek_letter(self) -> Token:
start_pos = self.position
self.advance()
greek_str = ''
while self.current_char and self.current_char.isalpha():
greek_str += self.current_char
self.advance()
if self.current_char == '\\':
self.advance()
if greek_str in self.greek_letters:
return Token(TokenType.GREEK, greek_str, start_pos)
elif greek_str in self.special_symbols:
return Token(TokenType.SPECIAL, greek_str, start_pos)
else:
raise ValueError(f"Unknown Greek letter or special symbol: {greek_str}")
def get_next_token(self) -> Token:
while self.current_char:
if self.current_char.isspace():
self.skip_whitespace()
continue
if self.current_char.isdigit():
return self.read_number()
if self.current_char.isalpha():
return self.read_identifier()
if self.current_char == '\\':
return self.read_greek_letter()
if self.current_char == '+':
if self.peek() == '-':
pos = self.position
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '+-', pos)
pos = self.position
self.advance()
return Token(TokenType.PLUS, '+', pos)
if self.current_char == '-':
if self.peek() == '>':
pos = self.position
self.advance()
self.advance()
return Token(TokenType.ARROW, '->', pos)
pos = self.position
self.advance()
return Token(TokenType.MINUS, '-', pos)
if self.current_char == '*':
pos = self.position
self.advance()
return Token(TokenType.MULTIPLY, '*', pos)
if self.current_char == '/':
pos = self.position
self.advance()
return Token(TokenType.DIVIDE, '/', pos)
if self.current_char == '^':
pos = self.position
self.advance()
return Token(TokenType.POWER, '^', pos)
if self.current_char == '_':
pos = self.position
self.advance()
return Token(TokenType.UNDERSCORE, '_', pos)
if self.current_char == '(':
pos = self.position
self.advance()
return Token(TokenType.LPAREN, '(', pos)
if self.current_char == ')':
pos = self.position
self.advance()
return Token(TokenType.RPAREN, ')', pos)
if self.current_char == '{':
pos = self.position
self.advance()
return Token(TokenType.LBRACE, '{', pos)
if self.current_char == '}':
pos = self.position
self.advance()
return Token(TokenType.RBRACE, '}', pos)
if self.current_char == '[':
pos = self.position
self.advance()
return Token(TokenType.LBRACKET, '[', pos)
if self.current_char == ']':
pos = self.position
self.advance()
return Token(TokenType.RBRACKET, ']', pos)
if self.current_char == '<':
pos = self.position
if self.peek() == '=':
if self.peek(2) == '>':
self.advance()
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '<=>', pos)
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '<=', pos)
elif self.peek() == '<':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '<<', pos)
elif self.peek() == '-':
self.advance()
self.advance()
return Token(TokenType.ARROW, '<-', pos)
self.advance()
return Token(TokenType.LANGLE, '<', pos)
if self.current_char == '>':
pos = self.position
if self.peek() == '=':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '>=', pos)
elif self.peek() == '>':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '>>', pos)
self.advance()
return Token(TokenType.RANGLE, '>', pos)
if self.current_char == '=':
pos = self.position
if self.peek() == '>':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '=>', pos)
self.advance()
return Token(TokenType.OPERATOR, '=', pos)
if self.current_char == '!':
pos = self.position
if self.peek() == '=':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '!=', pos)
self.advance()
return Token(TokenType.OPERATOR, '!', pos)
if self.current_char == '~':
pos = self.position
if self.peek() == '~':
self.advance()
self.advance()
return Token(TokenType.OPERATOR, '~~', pos)
self.advance()
return Token(TokenType.OPERATOR, '~', pos)
if self.current_char == '|':
pos = self.position
self.advance()
return Token(TokenType.PIPE, '|', pos)
if self.current_char == ',':
pos = self.position
self.advance()
return Token(TokenType.COMMA, ',', pos)
if self.current_char == ';':
pos = self.position
self.advance()
return Token(TokenType.SEMICOLON, ';', pos)
if self.current_char == '&':
pos = self.position
self.advance()
return Token(TokenType.OPERATOR, '&', pos)
raise ValueError(f"Invalid character: {self.current_char} at position {self.position}")
return Token(TokenType.EOF, None, self.position)
class Parser:
def __init__(self, lexer: Lexer):
self.lexer = lexer
self.current_token = self.lexer.get_next_token()
def eat(self, token_type: TokenType):
if self.current_token.type == token_type:
self.current_token = self.lexer.get_next_token()
else:
raise SyntaxError(f"Expected {token_type}, got {self.current_token.type} at position {self.current_token.position}")
def parse(self) -> ASTNode:
result = self.expression()
if self.current_token.type != TokenType.EOF:
raise SyntaxError(f"Unexpected token: {self.current_token} at position {self.current_token.position}")
return result
def expression(self) -> ASTNode:
node = self.term()
while self.current_token.type in [TokenType.PLUS, TokenType.MINUS] or \
(self.current_token.type == TokenType.OPERATOR and self.current_token.value in ['+-', '=', '!=', '<=', '>=', '<<', '>>', '~~', 'in', 'notin', 'and', 'or', 'implies', '=>', '<=>', 'iff', '->', '<-']):
operator = self.current_token.value if self.current_token.type == TokenType.OPERATOR else self.current_token.type.value
if self.current_token.type == TokenType.OPERATOR:
self.eat(TokenType.OPERATOR)
elif self.current_token.type == TokenType.ARROW:
operator = self.current_token.value
self.eat(TokenType.ARROW)
else:
self.eat(self.current_token.type)
right = self.term()
node = BinaryOpNode(operator, node, right)
return node
def term(self) -> ASTNode:
node = self.atom()
while self.current_token.type in [TokenType.MULTIPLY, TokenType.DIVIDE] or \
(self.current_token.type == TokenType.OPERATOR and self.current_token.value in ['dot', 'cross', 'cup', 'cap']):
if self.current_token.type in [TokenType.MULTIPLY, TokenType.DIVIDE]:
operator = self.current_token.type.value
self.eat(self.current_token.type)
else:
operator = self.current_token.value
self.eat(TokenType.OPERATOR)
right = self.atom()
node = BinaryOpNode(operator, node, right)
if self.current_token.type in [TokenType.NUMBER, TokenType.VARIABLE,
TokenType.LPAREN, TokenType.FUNCTION,
TokenType.GREEK, TokenType.SPECIAL,
TokenType.LBRACKET]:
right = self.atom()
node = BinaryOpNode('MULTIPLY', node, right)
return node
def atom(self) -> ASTNode:
node = self.factor()
subscript_node = None
superscript_node = None
while self.current_token.type in [TokenType.UNDERSCORE, TokenType.POWER]:
if self.current_token.type == TokenType.UNDERSCORE:
if subscript_node is not None:
raise SyntaxError("Multiple subscripts on same symbol")
self.eat(TokenType.UNDERSCORE)
if self.current_token.type == TokenType.LBRACE:
self.eat(TokenType.LBRACE)
subscript_node = self.expression()
self.eat(TokenType.RBRACE)
else:
subscript_node = self.factor()
elif self.current_token.type == TokenType.POWER:
if superscript_node is not None:
raise SyntaxError("Multiple superscripts on same symbol")
self.eat(TokenType.POWER)
if self.current_token.type == TokenType.LBRACE:
self.eat(TokenType.LBRACE)
superscript_node = self.expression()
self.eat(TokenType.RBRACE)
else:
superscript_node = self.factor()
if subscript_node and superscript_node:
node = SubSuperscriptNode(node, subscript_node, superscript_node)
elif subscript_node:
node = SubscriptNode(node, subscript_node)
elif superscript_node:
node = SuperscriptNode(node, superscript_node)
return node
def factor(self) -> ASTNode:
token = self.current_token
if token.type == TokenType.PLUS:
self.eat(TokenType.PLUS)
return UnaryOpNode('+', self.factor())
if token.type == TokenType.MINUS:
self.eat(TokenType.MINUS)
return UnaryOpNode('-', self.factor())
if token.type == TokenType.NUMBER:
self.eat(TokenType.NUMBER)
return NumberNode(token.value)
if token.type == TokenType.VARIABLE:
self.eat(TokenType.VARIABLE)
return VariableNode(token.value)
if token.type == TokenType.GREEK:
self.eat(TokenType.GREEK)
return GreekLetterNode(token.value)
if token.type == TokenType.SPECIAL:
self.eat(TokenType.SPECIAL)
return SpecialSymbolNode(token.value)
if token.type == TokenType.LPAREN:
self.eat(TokenType.LPAREN)
node = self.expression()
self.eat(TokenType.RPAREN)
return node
if token.type == TokenType.LBRACKET:
return self.parse_matrix()
if token.type == TokenType.PIPE:
return self.parse_absolute_value()
if token.type == TokenType.FUNCTION:
return self.parse_function()
raise SyntaxError(f"Unexpected token: {token}")
def parse_function(self) -> FunctionNode:
func_name = self.current_token.value
self.eat(TokenType.FUNCTION)
self.eat(TokenType.LPAREN)
arguments = []
if self.current_token.type != TokenType.RPAREN:
arguments.append(self.expression())
while self.current_token.type == TokenType.COMMA:
self.eat(TokenType.COMMA)
arguments.append(self.expression())
self.eat(TokenType.RPAREN)
self.validate_function_arguments(func_name, arguments)
return FunctionNode(func_name, arguments)
def validate_function_arguments(self, func_name: str, arguments: List[ASTNode]):
single_arg_functions = {
'sqrt', 'sin', 'cos', 'tan', 'arcsin', 'arccos', 'arctan',
'sinh', 'cosh', 'tanh', 'ln', 'exp', 'abs', 'floor', 'ceil',
'vec', 'arrow', 'hat', 'bar', 'tilde', 'ddot',
'det', 'trace', 'transpose', 'inv', 'rank', 'rref',
'eigenvalues', 'eigenvectors', 'eig', 'null', 'col', 'expm',
'factorial', 'gamma', 'ket', 'bra', 'phi', 'diag',
'mean', 'median', 'mode', 'var', 'pvar', 'std', 'stdev', 'pstd',
'range', 'iqr', 'se', 'E', 'Var', 'P', 'prob'
}
two_arg_functions = {
'root', 'frac', 'atan2', 'mod', 'floordiv', 'cross',
'binomial', 'choose', 'permutation', 'beta', 'expectation',
'cov', 'corr', 'cor', 'quantile', 'percentile', 'kronecker',
'ci', 'Cov'
}
three_arg_functions = {
'normal', 'binomial_prob', 'zscore', 'tstat'
}
if func_name in single_arg_functions:
if func_name == 'dot' and len(arguments) == 2:
return
if func_name == 'norm' and len(arguments) in [1, 2]:
return
if len(arguments) != 1:
raise ValueError(f"Function '{func_name}' expects 1 argument, got {len(arguments)}")
elif func_name in two_arg_functions:
if func_name == 'braket' and len(arguments) in [1, 2]:
return
if func_name == 'charpoly' and len(arguments) in [1, 2]:
return
if len(arguments) != 2:
raise ValueError(f"Function '{func_name}' expects 2 arguments, got {len(arguments)}")
elif func_name in three_arg_functions:
if len(arguments) != 3:
raise ValueError(f"Function '{func_name}' expects 3 arguments, got {len(arguments)}")
elif func_name == 'levicivita':
if len(arguments) != 3:
raise ValueError(f"Function 'levicivita' expects 3 arguments, got {len(arguments)}")
elif func_name == 'poisson_prob':
if len(arguments) != 2:
raise ValueError(f"Function 'poisson_prob' expects 2 arguments, got {len(arguments)}")
elif func_name == 'integral':
if len(arguments) not in [2, 4]:
raise ValueError(f"Function 'integral' expects 2 or 4 arguments, got {len(arguments)}")
elif func_name in ['sum', 'product']:
if len(arguments) != 4:
raise ValueError(f"Function '{func_name}' expects 4 arguments, got {len(arguments)}")
elif func_name == 'limit':
if len(arguments) not in [3, 4]:
raise ValueError(f"Function 'limit' expects 3 or 4 arguments, got {len(arguments)}")
elif func_name in ['derivative', 'partial']:
if len(arguments) not in [2, 3]:
raise ValueError(f"Function '{func_name}' expects 2 or 3 arguments, got {len(arguments)}")
elif func_name == 'log':
if len(arguments) not in [1, 2]:
raise ValueError(f"Function 'log' expects 1 or 2 arguments, got {len(arguments)}")
def parse_matrix(self) -> MatrixNode:
self.eat(TokenType.LBRACKET)
rows = []
current_row = []
if self.current_token.type != TokenType.RBRACKET:
current_row.append(self.expression())
while self.current_token.type in [TokenType.COMMA, TokenType.SEMICOLON]:
if self.current_token.type == TokenType.COMMA:
self.eat(TokenType.COMMA)
current_row.append(self.expression())
else:
self.eat(TokenType.SEMICOLON)
rows.append(current_row)
current_row = []
if self.current_token.type != TokenType.RBRACKET:
current_row.append(self.expression())
if current_row:
rows.append(current_row)
self.eat(TokenType.RBRACKET)
if rows:
row_length = len(rows[0])
for i, row in enumerate(rows):
if len(row) != row_length:
raise SyntaxError(f"Row {i} has {len(row)} elements, expected {row_length}")
return MatrixNode(rows)
def parse_absolute_value(self) -> FunctionNode:
self.eat(TokenType.PIPE)
content = self.expression()
self.eat(TokenType.PIPE)
return FunctionNode('abs', [content])
class LaTeXRenderer:
def __init__(self):
self.greek_map = {
'alpha': r'\alpha', 'beta': r'\beta', 'gamma': r'\gamma',
'delta': r'\delta', 'epsilon': r'\epsilon', 'zeta': r'\zeta',
'eta': r'\eta', 'theta': r'\theta', 'iota': r'\iota',
'kappa': r'\kappa', 'lambda': r'\lambda', 'mu': r'\mu',
'nu': r'\nu', 'xi': r'\xi', 'omicron': r'o',
'pi': r'\pi', 'rho': r'\rho', 'sigma': r'\sigma',
'tau': r'\tau', 'upsilon': r'\upsilon', 'phi': r'\phi',
'chi': r'\chi', 'psi': r'\psi', 'omega': r'\omega',
'Gamma': r'\Gamma', 'Delta': r'\Delta', 'Theta': r'\Theta',
'Lambda': r'\Lambda', 'Xi': r'\Xi', 'Pi': r'\Pi',
'Sigma': r'\Sigma', 'Upsilon': r'\Upsilon', 'Phi': r'\Phi',
'Psi': r'\Psi', 'Omega': r'\Omega'
}
self.special_map = {
'nabla': r'\nabla',
'partial': r'\partial',
'inf': r'\infty',
'infty': r'\infty',
'hbar': r'\hbar',
'reals': r'\mathbb{R}',
'complex': r'\mathbb{C}',
'naturals': r'\mathbb{N}',
'integers': r'\mathbb{Z}',
'rationals': r'\mathbb{Q}'
}
self.operator_map = {
'+-': r'\pm',
'=': '=',
'!=': r'\neq',
'<=': r'\leq',
'>=': r'\geq',
'<<': r'\ll',
'>>': r'\gg',
'~~': r'\approx',
'->': r'\rightarrow',
'<-': r'\leftarrow',
'=>': r'\Rightarrow',
'implies': r'\Rightarrow',
'<=>': r'\Leftrightarrow',
'iff': r'\Leftrightarrow',
'prop': r'\propto',
'in': r'\in',
'notin': r'\notin',
'cup': r'\cup',
'cap': r'\cap',
'subset': r'\subset',
'subseteq': r'\subseteq',
'and': r'\land',
'or': r'\lor',
'&': r'\land',
'not': r'\neg',
'forall': r'\forall',
'exists': r'\exists'
}
def render(self, node: ASTNode) -> str:
if isinstance(node, NumberNode):
return str(node.value) if node.value % 1 != 0 else str(int(node.value))
elif isinstance(node, VariableNode):
return node.name
elif isinstance(node, GreekLetterNode):
return self.greek_map.get(node.letter, node.letter)
elif isinstance(node, SpecialSymbolNode):
return self.special_map.get(node.symbol, node.symbol)
elif isinstance(node, BinaryOpNode):
left = self.render(node.left)
right = self.render(node.right)
if node.operator == 'PLUS':
return f"{left} + {right}"
elif node.operator == 'MINUS':
return f"{left} - {right}"
elif node.operator == 'MULTIPLY':
return f"{left} {right}"
elif node.operator == 'DIVIDE':
return f"\\frac{{{left}}}{{{right}}}"
elif node.operator in self.operator_map:
op_symbol = self.operator_map[node.operator]
return f"{left} {op_symbol} {right}"
else:
return f"{left} {node.operator} {right}"
elif isinstance(node, UnaryOpNode):
operand = self.render(node.operand)
if node.operator == '-':
return f"-{operand}"
else:
return f"+{operand}"
elif isinstance(node, SuperscriptNode):
base = self.render(node.base)
exponent = self.render(node.exponent)
return f"{base}^{{{exponent}}}"
elif isinstance(node, SubscriptNode):
base = self.render(node.base)
subscript = self.render(node.subscript)
return f"{base}_{{{subscript}}}"
elif isinstance(node, SubSuperscriptNode):
base = self.render(node.base)
subscript = self.render(node.subscript)
superscript = self.render(node.superscript)
return f"{base}_{{{subscript}}}^{{{superscript}}}"
elif isinstance(node, FunctionNode):
return self.render_function(node)
elif isinstance(node, MatrixNode):
return self.render_matrix(node)
else:
raise ValueError(f"Unknown node type: {type(node)}")
def render_function(self, node: FunctionNode) -> str:
if node.name == 'sqrt':
arg = self.render(node.arguments[0])
return f"\\sqrt{{{arg}}}"
elif node.name == 'root':
n = self.render(node.arguments[0])
arg = self.render(node.arguments[1])
return f"\\sqrt[{n}]{{{arg}}}"
elif node.name == 'frac':
num = self.render(node.arguments[0])
denom = self.render(node.arguments[1])
return f"\\frac{{{num}}}{{{denom}}}"
elif node.name in ['sin', 'cos', 'tan', 'arcsin', 'arccos', 'arctan',
'sinh', 'cosh', 'tanh', 'ln', 'exp', 'det']:
arg = self.render(node.arguments[0])
return f"\\{node.name}{{{arg}}}"
elif node.name == 'log':
if len(node.arguments) == 1:
arg = self.render(node.arguments[0])
return f"\\log{{{arg}}}"
else:
base = self.render(node.arguments[0])
arg = self.render(node.arguments[1])
return f"\\log_{{{base}}}{{{arg}}}"
elif node.name == 'abs':
arg = self.render(node.arguments[0])
return f"\\left| {arg} \\right|"
elif node.name == 'norm':
if len(node.arguments) == 1:
arg = self.render(node.arguments[0])
return f"\\left\\| {arg} \\right\\|"
else:
arg = self.render(node.arguments[0])
p = self.render(node.arguments[1])
return f"\\left\\| {arg} \\right\\|_{{{p}}}"
elif node.name == 'floor':
arg = self.render(node.arguments[0])
return f"\\lfloor {arg} \\rfloor"
elif node.name == 'ceil':
arg = self.render(node.arguments[0])
return f"\\lceil {arg} \\rceil"
elif node.name == 'vec':
arg = self.render(node.arguments[0])
return f"\\mathbf{{{arg}}}"
elif node.name == 'arrow':
arg = self.render(node.arguments[0])
return f"\\vec{{{arg}}}"
elif node.name == 'hat':
arg = self.render(node.arguments[0])
return f"\\hat{{{arg}}}"
elif node.name == 'bar':
arg = self.render(node.arguments[0])
return f"\\bar{{{arg}}}"
elif node.name == 'tilde':
arg = self.render(node.arguments[0])
return f"\\tilde{{{arg}}}"
elif node.name == 'dot':
if len(node.arguments) == 1:
arg = self.render(node.arguments[0])
return f"\\dot{{{arg}}}"
else:
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"{left} \\cdot {right}"
elif node.name == 'ddot':
arg = self.render(node.arguments[0])
return f"\\ddot{{{arg}}}"
elif node.name == 'ket':
content = self.render(node.arguments[0])
return f"\\left| {content} \\right\\rangle"
elif node.name == 'bra':
content = self.render(node.arguments[0])
return f"\\left\\langle {content} \\right|"
elif node.name == 'braket':
if len(node.arguments) == 2:
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"\\left\\langle {left} \\middle| {right} \\right\\rangle"
else:
content = self.render(node.arguments[0])
return f"\\left\\langle {content} \\right\\rangle"
elif node.name == 'expectation':
if len(node.arguments) == 2:
operator = self.render(node.arguments[0])
state = self.render(node.arguments[1])
return f"\\left\\langle {state} \\middle| {operator} \\middle| {state} \\right\\rangle"
else:
content = self.render(node.arguments[0])
return f"\\left\\langle {content} \\right\\rangle"
elif node.name == 'integral':
if len(node.arguments) == 2:
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
return f"\\int {expr} \\, d{var}"
elif len(node.arguments) == 4:
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"\\int_{{{lower}}}^{{{upper}}} {expr} \\, d{var}"
elif node.name == 'sum':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"\\sum_{{{var}={lower}}}^{{{upper}}} {expr}"
elif node.name == 'product':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"\\prod_{{{var}={lower}}}^{{{upper}}} {expr}"
elif node.name == 'limit':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
value = self.render(node.arguments[2])
if len(node.arguments) == 4:
direction = self.render(node.arguments[3])
return f"\\lim_{{{var} \\to {value}^{{{direction}}}}} {expr}"
else:
return f"\\lim_{{{var} \\to {value}}} {expr}"
elif node.name == 'derivative':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
if len(node.arguments) == 3:
order = self.render(node.arguments[2])
return f"\\frac{{d^{{{order}}}}}{{d{var}^{{{order}}}}} {expr}"
else:
return f"\\frac{{d}}{{d{var}}} {expr}"
elif node.name == 'partial':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
if len(node.arguments) == 3:
order = self.render(node.arguments[2])
return f"\\frac{{\\partial^{{{order}}}}}{{\\partial {var}^{{{order}}}}} {expr}"
else:
return f"\\frac{{\\partial}}{{\\partial {var}}} {expr}"
elif node.name in ['div', 'curl', 'grad']:
arg = self.render(node.arguments[0])
return f"\\text{{{node.name}}} {arg}"
elif node.name == 'cross':
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"{left} \\times {right}"
elif node.name == 'transpose':
arg = self.render(node.arguments[0])
return f"{arg}^T"
elif node.name == 'inv':
arg = self.render(node.arguments[0])
return f"{arg}^{{-1}}"
elif node.name == 'trace':
arg = self.render(node.arguments[0])
return f"\\text{{tr}}({arg})"
elif node.name in ['rank', 'null', 'col', 'rref', 'expm', 'diag']:
arg = self.render(node.arguments[0])
return f"\\text{{{node.name}}}({arg})"
elif node.name in ['eigenvalues', 'eigenvectors']:
arg = self.render(node.arguments[0])
return f"\\text{{{node.name}}}({arg})"
elif node.name == 'eig':
arg = self.render(node.arguments[0])
return f"\\text{{eig}}({arg})"
elif node.name == 'charpoly':
if len(node.arguments) == 1:
arg = self.render(node.arguments[0])
return f"\\text{{charpoly}}({arg})"
else:
matrix = self.render(node.arguments[0])
var = self.render(node.arguments[1])
return f"\\text{{charpoly}}({matrix}, {var})"
elif node.name in ['binomial', 'choose']:
n = self.render(node.arguments[0])
k = self.render(node.arguments[1])
return f"\\binom{{{n}}}{{{k}}}"
elif node.name == 'kronecker':
i = self.render(node.arguments[0])
j = self.render(node.arguments[1])
return f"\\delta_{{{i}{j}}}"
elif node.name == 'levicivita':
i = self.render(node.arguments[0])
j = self.render(node.arguments[1])
k = self.render(node.arguments[2])
return f"\\epsilon_{{{i}{j}{k}}}"
elif node.name in ['factorial', 'gamma']:
arg = self.render(node.arguments[0])
return f"\\{node.name}({arg})"
elif node.name in ['min', 'max', 'gcd', 'lcm']:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\{node.name}({args})"
elif node.name in ['mean', 'median', 'mode', 'var', 'pvar', 'std', 'stdev',
'pstd', 'range', 'iqr', 'se']:
arg = self.render(node.arguments[0])
return f"\\text{{{node.name}}}({arg})"
elif node.name in ['cov', 'corr', 'cor', 'quantile', 'percentile', 'ci', 'Cov']:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\text{{{node.name}}}({args})"
elif node.name in ['E', 'Var', 'P', 'prob']:
arg = self.render(node.arguments[0])
return f"\\text{{{node.name}}}({arg})"
elif node.name in ['normal', 'phi']:
args = ", ".join([self.render(arg) for arg in node.arguments])
func_display = 'N' if node.name == 'normal' else '\\phi'
return f"{func_display}({args})"
elif node.name in ['binomial_prob', 'poisson_prob']:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\text{{{node.name}}}({args})"
elif node.name in ['zscore', 'tstat', 'pvalue']:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\text{{{node.name}}}({args})"
elif node.name in ['mod', 'floordiv', 'atan2', 'permutation', 'beta']:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\text{{{node.name}}}({args})"
else:
args = ", ".join([self.render(arg) for arg in node.arguments])
return f"\\text{{{node.name}}}({args})"
def render_matrix(self, node: MatrixNode) -> str:
rows_str = []
for row in node.rows:
row_str = " & ".join([self.render(elem) for elem in row])
rows_str.append(row_str)
matrix_content = " \\\\ ".join(rows_str)
return f"\\begin{{bmatrix}} {matrix_content} \\end{{bmatrix}}"
class MathMLRenderer:
def __init__(self):
self.greek_unicode = {
'alpha': 'α', 'beta': 'β', 'gamma': 'γ', 'delta': 'δ',
'epsilon': 'ε', 'zeta': 'ζ', 'eta': 'η', 'theta': 'θ',
'iota': 'ι', 'kappa': 'κ', 'lambda': 'λ', 'mu': 'μ',
'nu': 'ν', 'xi': 'ξ', 'omicron': 'ο', 'pi': 'π',
'rho': 'ρ', 'sigma': 'σ', 'tau': 'τ', 'upsilon': 'υ',
'phi': 'φ', 'chi': 'χ', 'psi': 'ψ', 'omega': 'ω',
'Gamma': 'Γ', 'Delta': 'Δ', 'Theta': 'Θ', 'Lambda': 'Λ',
'Xi': 'Ξ', 'Pi': 'Π', 'Sigma': 'Σ', 'Upsilon': 'Υ',
'Phi': 'Φ', 'Psi': 'Ψ', 'Omega': 'Ω'
}
self.special_unicode = {
'nabla': '∇',
'partial': '∂',
'inf': '∞',
'infty': '∞',
'hbar': 'ℏ',
'reals': 'ℝ',
'complex': 'ℂ',
'naturals': 'ℕ',
'integers': 'ℤ',
'rationals': 'ℚ'
}
self.operator_unicode = {
'+-': '±',
'=': '=',
'!=': '≠',
'<=': '≤',
'>=': '≥',
'<<': '≪',
'>>': '≫',
'~~': '≈',
'->': '→',
'<-': '←',
'=>': '⇒',
'implies': '⇒',
'<=>': '⇔',
'iff': '⇔',
'prop': '∝',
'in': '∈',
'notin': '∉',
'cup': '∪',
'cap': '∩',
'subset': '⊂',
'subseteq': '⊆',
'and': '∧',
'or': '∨',
'&': '∧',
'not': '¬',
'forall': '∀',
'exists': '∃'
}
def render(self, node: ASTNode) -> str:
if isinstance(node, NumberNode):
value = str(node.value) if node.value % 1 != 0 else str(int(node.value))
return f"<mn>{value}</mn>"
elif isinstance(node, VariableNode):
return f"<mi>{node.name}</mi>"
elif isinstance(node, GreekLetterNode):
char = self.greek_unicode.get(node.letter, node.letter)
return f"<mi>{char}</mi>"
elif isinstance(node, SpecialSymbolNode):
char = self.special_unicode.get(node.symbol, node.symbol)
return f"<mi>{char}</mi>"
elif isinstance(node, BinaryOpNode):
left = self.render(node.left)
right = self.render(node.right)
if node.operator == 'PLUS':
return f"<mrow>{left}<mo>+</mo>{right}</mrow>"
elif node.operator == 'MINUS':
return f"<mrow>{left}<mo>-</mo>{right}</mrow>"
elif node.operator == 'MULTIPLY':
return f"<mrow>{left}<mo></mo>{right}</mrow>"
elif node.operator == 'DIVIDE':
return f"<mfrac>{left}{right}</mfrac>"
elif node.operator in self.operator_unicode:
op_char = self.operator_unicode[node.operator]
return f"<mrow>{left}<mo>{op_char}</mo>{right}</mrow>"
else:
return f"<mrow>{left}<mo>{node.operator}</mo>{right}</mrow>"
elif isinstance(node, UnaryOpNode):
operand = self.render(node.operand)
if node.operator == '-':
return f"<mrow><mo>-</mo>{operand}</mrow>"
else:
return f"<mrow><mo>+</mo>{operand}</mrow>"
elif isinstance(node, SuperscriptNode):
base = self.render(node.base)
exponent = self.render(node.exponent)
return f"<msup>{base}{exponent}</msup>"
elif isinstance(node, SubscriptNode):
base = self.render(node.base)
subscript = self.render(node.subscript)
return f"<msub>{base}{subscript}</msub>"
elif isinstance(node, SubSuperscriptNode):
base = self.render(node.base)
subscript = self.render(node.subscript)
superscript = self.render(node.superscript)
return f"<msubsup>{base}{subscript}{superscript}</msubsup>"
elif isinstance(node, FunctionNode):
return self.render_function(node)
elif isinstance(node, MatrixNode):
return self.render_matrix(node)
else:
raise ValueError(f"Unknown node type: {type(node)}")
def render_function(self, node: FunctionNode) -> str:
if node.name == 'sqrt':
arg = self.render(node.arguments[0])
return f"<msqrt>{arg}</msqrt>"
elif node.name == 'root':
n = self.render(node.arguments[0])
arg = self.render(node.arguments[1])
return f"<mroot>{arg}{n}</mroot>"
elif node.name == 'frac':
num = self.render(node.arguments[0])
denom = self.render(node.arguments[1])
return f"<mfrac>{num}{denom}</mfrac>"
elif node.name in ['sin', 'cos', 'tan', 'arcsin', 'arccos', 'arctan',
'sinh', 'cosh', 'tanh', 'log', 'ln', 'exp', 'det',
'mean', 'median', 'mode', 'var', 'pvar', 'std', 'stdev', 'pstd',
'range', 'iqr', 'se', 'E', 'Var', 'P', 'prob']:
if node.name == 'log' and len(node.arguments) == 2:
base = self.render(node.arguments[0])
arg = self.render(node.arguments[1])
return f"<mrow><msub><mi>log</mi>{base}</msub><mo></mo><mrow><mo>(</mo>{arg}<mo>)</mo></mrow></mrow>"
else:
arg = self.render(node.arguments[0])
return f"<mrow><mi>{node.name}</mi><mo></mo><mrow><mo>(</mo>{arg}<mo>)</mo></mrow></mrow>"
elif node.name == 'abs':
arg = self.render(node.arguments[0])
return f"<mrow><mo>|</mo>{arg}<mo>|</mo></mrow>"
elif node.name == 'norm':
arg = self.render(node.arguments[0])
if len(node.arguments) == 2:
p = self.render(node.arguments[1])
return f"<mrow><msub><mo>∥</mo>{p}</msub>{arg}<msub><mo>∥</mo>{p}</msub></mrow>"
return f"<mrow><mo>∥</mo>{arg}<mo>∥</mo></mrow>"
elif node.name == 'floor':
arg = self.render(node.arguments[0])
return f"<mrow><mo>⌊</mo>{arg}<mo>⌋</mo></mrow>"
elif node.name == 'ceil':
arg = self.render(node.arguments[0])
return f"<mrow><mo>⌈</mo>{arg}<mo>⌉</mo></mrow>"
elif node.name in ['vec', 'arrow', 'hat', 'bar', 'tilde', 'ddot']:
arg = self.render(node.arguments[0])
accent_map = {
'vec': '→', 'arrow': '→', 'hat': '^',
'bar': '¯', 'tilde': '~', 'ddot': '¨'
}
accent = accent_map.get(node.name, '^')
return f"<mover>{arg}<mo>{accent}</mo></mover>"
elif node.name == 'dot':
if len(node.arguments) == 1:
arg = self.render(node.arguments[0])
return f"<mover>{arg}<mo>˙</mo></mover>"
else:
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"<mrow>{left}<mo>·</mo>{right}</mrow>"
elif node.name == 'ket':
content = self.render(node.arguments[0])
return f"<mrow><mo>|</mo>{content}<mo>⟩</mo></mrow>"
elif node.name == 'bra':
content = self.render(node.arguments[0])
return f"<mrow><mo>⟨</mo>{content}<mo>|</mo></mrow>"
elif node.name == 'braket':
if len(node.arguments) == 2:
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"<mrow><mo>⟨</mo>{left}<mo>|</mo>{right}<mo>⟩</mo></mrow>"
else:
content = self.render(node.arguments[0])
return f"<mrow><mo>⟨</mo>{content}<mo>⟩</mo></mrow>"
elif node.name == 'expectation':
if len(node.arguments) == 2:
operator = self.render(node.arguments[0])
state = self.render(node.arguments[1])
return f"<mrow><mo>⟨</mo>{state}<mo>|</mo>{operator}<mo>|</mo>{state}<mo>⟩</mo></mrow>"
else:
content = self.render(node.arguments[0])
return f"<mrow><mo>⟨</mo>{content}<mo>⟩</mo></mrow>"
elif node.name == 'integral':
if len(node.arguments) == 2:
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
return f"<mrow><mo>∫</mo>{expr}<mo></mo><mi>d</mi>{var}</mrow>"
elif len(node.arguments) == 4:
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"<mrow><msubsup><mo>∫</mo>{lower}{upper}</msubsup>{expr}<mo></mo><mi>d</mi>{var}</mrow>"
elif node.name == 'sum':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"<mrow><munderover><mo>∑</mo><mrow>{var}<mo>=</mo>{lower}</mrow>{upper}</munderover>{expr}</mrow>"
elif node.name == 'product':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
lower = self.render(node.arguments[2])
upper = self.render(node.arguments[3])
return f"<mrow><munderover><mo>∏</mo><mrow>{var}<mo>=</mo>{lower}</mrow>{upper}</munderover>{expr}</mrow>"
elif node.name == 'limit':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
value = self.render(node.arguments[2])
return f"<mrow><munder><mo>lim</mo><mrow>{var}<mo>→</mo>{value}</mrow></munder>{expr}</mrow>"
elif node.name == 'derivative':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
if len(node.arguments) == 3:
order = self.render(node.arguments[2])
return f"<mfrac><mrow><msup><mi>d</mi>{order}</msup></mrow><mrow><mi>d</mi><msup>{var}{order}</msup></mrow></mfrac><mrow>{expr}</mrow>"
else:
return f"<mfrac><mi>d</mi><mrow><mi>d</mi>{var}</mrow></mfrac><mrow>{expr}</mrow>"
elif node.name == 'partial':
expr = self.render(node.arguments[0])
var = self.render(node.arguments[1])
if len(node.arguments) == 3:
order = self.render(node.arguments[2])
return f"<mfrac><mrow><msup><mo>∂</mo>{order}</msup></mrow><mrow><mo>∂</mo><msup>{var}{order}</msup></mrow></mfrac><mrow>{expr}</mrow>"
else:
return f"<mfrac><mo>∂</mo><mrow><mo>∂</mo>{var}</mrow></mfrac><mrow>{expr}</mrow>"
elif node.name == 'cross':
left = self.render(node.arguments[0])
right = self.render(node.arguments[1])
return f"<mrow>{left}<mo>×</mo>{right}</mrow>"
elif node.name == 'transpose':
arg = self.render(node.arguments[0])
return f"<msup>{arg}<mi>T</mi></msup>"
elif node.name == 'inv':
arg = self.render(node.arguments[0])
return f"<msup>{arg}<mn>-1</mn></msup>"
elif node.name in ['binomial', 'choose']:
n = self.render(node.arguments[0])
k = self.render(node.arguments[1])
return f"<mfenced open='(' close=')'><mfrac linethickness='0'>{n}{k}</mfrac></mfenced>"
else:
func_name = f"<mi>{node.name}</mi>"
if len(node.arguments) > 0:
args_list = []
for i, arg in enumerate(node.arguments):
args_list.append(self.render(arg))
if i < len(node.arguments) - 1:
args_list.append("<mo>,</mo>")
args = "".join(args_list)
return f"<mrow>{func_name}<mo>(</mo>{args}<mo>)</mo></mrow>"
else:
return func_name
def render_matrix(self, node: MatrixNode) -> str:
rows_xml = []
for row in node.rows:
row_xml = "<mtr>" + "".join([f"<mtd>{self.render(elem)}</mtd>" for elem in row]) + "</mtr>"
rows_xml.append(row_xml)
matrix_content = "".join(rows_xml)
return f"<mfenced open='[' close=']'><mtable>{matrix_content}</mtable></mfenced>"
class SuperMathConverter:
def __init__(self):
self.latex_renderer = LaTeXRenderer()
self.mathml_renderer = MathMLRenderer()
def convert(self, supermath_text: str, output_format: str = 'latex') -> str:
try:
lexer = Lexer(supermath_text)
parser = Parser(lexer)
ast = parser.parse()
if output_format.lower() == 'latex':
return self.latex_renderer.render(ast)
elif output_format.lower() == 'mathml':
return f"<math xmlns='http://www.w3.org/1998/Math/MathML'>{self.mathml_renderer.render(ast)}</math>"
else:
raise ValueError(f"Unknown output format: {output_format}")
except Exception as e:
raise ValueError(f"Conversion error: {str(e)}")
def main():
converter = SuperMathConverter()
test_expressions = [
("Basic algebra", "x^2 + 2x + 1"),
("Subscript and superscript", "x_i^2"),
("Subscript and superscript reversed", "x^2_i"),
("Fraction", "frac(a + b, c)"),
("Square root", "sqrt(x^2 + y^2)"),
("Integral", "integral(x^2, x, 0, 1)"),
("Summation", "sum(i^2, i, 1, n)"),
("Mass-energy equivalence", "E = mc^2"),
("Greek letters", r"\alpha\ + \beta\ = \gamma\"),
("Sequence notation", "x_{n+1} = x_n + 1"),
("Gaussian distribution", r"frac(1, \sigma\ sqrt(2\pi\)) exp(-frac((x - \mu\)^2, 2\sigma\^2))"),
("Matrix", "[1, 2; 3, 4]"),
("Determinant", "det([1, 2; 3, 4])"),
("Eigenvalue equation", r"A vec(v) = \lambda\ vec(v)"),
("Eigenvalues function", "eigenvalues(A)"),
("Matrix inverse", "inv(A)"),
("Binomial coefficient n choose k", "binomial(n, k)"),
("Alternative choose notation", "choose(n, k)"),
("Sample variance", r"var(X) = frac(1, n-1) sum((x_i - mean(X))^2, i, 1, n)"),
("Correlation", "corr(X, Y) = frac(cov(X, Y), std(X) std(Y))"),
("Normal distribution function", r"normal(x, \mu\, \sigma\)"),
("Expected value", "E(X^2) - E(X)^2"),
("Quantum ket", r"ket(\psi\)"),
("Quantum braket", r"braket(\phi\, \psi\)"),
("Tensor with indices", "T^{ij}_{kl}"),
("Kronecker delta", r"\delta\_{ij}"),
("Binomial probability", "binomial_prob(n, k, p) = binomial(n, k) p^k (1-p)^{n-k}"),
("Standard error", "se(X) = frac(std(X), sqrt(n))")
]
print("SUPERMATH CONVERTER DEMONSTRATION")
print("=" * 70)
print()
for caption, expr in test_expressions:
print(f"SuperMath: {caption} {expr}")
print()
try:
latex_output = converter.convert(expr, 'latex')
print(f"LaTeX: {latex_output}")
print()
mathml_output = converter.convert(expr, 'mathml')
print(f"MathML: {mathml_output[:100]}...")
print()
except Exception as e:
print(f"Error: {str(e)}")
print()
print("-" * 70)
print()
if __name__ == "__main__":
main()
No comments:
Post a Comment