Hitchhiker's Guide to AI, Software Architecture, and Everything Else: RL CHESS AI: A DEEP DIVE INTO ARCHITECTURE, DESIGN, AND INTELLIGENCE

PROLOGUE: A CHESS ENGINE IN YOUR BROWSER

There is something almost magical about opening a web browser, navigating to a single HTML file, and finding yourself face to face with a chess engine that thinks, learns, and plays. No installation. No server. No GPU cluster humming in a data center. Just one self-contained file — HTML, CSS, and JavaScript woven together — that implements a complete, multi-strategy chess AI capable of Minimax search with alpha-beta pruning, Monte Carlo Tree Search, a Q-learning reinforcement learning system, an opening book, a position evaluator with piece-square tables, and a polished, responsive user interface with sound effects. This is exactly what RL Chess AI v4.0 delivers.

This article takes you on a guided tour of every significant architectural decision in that file. We will examine how the chess board is represented in memory, how legal moves are generated, how the AI decides what to move, how reinforcement learning is woven into the decision loop, how the user interface is constructed and kept responsive, and how the whole system is held together by a clean, event-driven control flow. Along the way we will look at actual code excerpts and conceptual diagrams to make the abstract concrete. And at the end, you will find a complete user guide so that anyone — from a curious beginner to an experienced developer — can sit down and play.

Let us begin at the very foundation.

Hint: all source code is available via GitHub: https://github.com/ms1963/RLChess

CHAPTER ONE: THE BOARD REPRESENTATION

Every chess engine, from the simplest to the most sophisticated, must answer one fundamental question before it can do anything else: how do I represent the board in memory? The answer to that question has profound consequences for the speed of move generation, the ease of implementing special rules, and the overall elegance of the code.

RL Chess AI uses a flat, one-dimensional array of 64 integers. The board is declared as:

this.board = new Array(64).fill(PIECE.EMPTY);

This is the classic "mailbox" or "8x8 array" representation, where each of the 64 squares is assigned an index from 0 (top-left, a8 in standard chess notation) to 63 (bottom-right, h1). The row and column of any square index can be recovered with two tiny helper methods:

row(sq) { return sq >> 3; }

col(sq) { return sq & 7; }

The right-shift by 3 is equivalent to integer division by 8, giving the rank. The bitwise AND with 7 is equivalent to modulo 8, giving the file. These are fast bitwise operations, and their use throughout the engine is a deliberate micro-optimization that avoids repeated division.

Each square holds an integer from a frozen constant object called PIECE:

const PIECE = Object.freeze({

EMPTY: 0,

WP:1, WN:2, WB:3, WR:4, WQ:5, WK:6,

BP:7, BN:8, BB:9, BR:10, BQ:11, BK:12

});

White pieces are numbered 1 through 6, black pieces 7 through 12, and an empty square is 0. This numeric encoding is compact and enables very fast color checks. To determine whether a piece is white, the engine checks whether its value is between 1 and 6 inclusive. To determine whether it is black, it checks 7 through 12. The helper methods isWhite, isBlack, isOwn, and isEnemy encapsulate this logic cleanly:

isWhite(p) { return p >= PIECE.WP && p <= PIECE.WK; }

isBlack(p) { return p >= PIECE.BP && p <= PIECE.BK; }

isOwn(p, color) { return color === WHITE ? this.isWhite(p) : this.isBlack(p); }

isEnemy(p, color) { return color === WHITE ? this.isBlack(p) : this.isWhite(p); }

Notice that WHITE and BLACK are simply the integers 0 and 1. This means that the opponent's color can always be obtained by the expression "1 - color", which is used pervasively throughout the engine. It is a small but elegant trick that eliminates the need for conditional branches whenever the code needs to refer to the other side.

The initial board position is set up by the _initBoard method:

_initBoard() {

const back = [PIECE.WR, PIECE.WN, PIECE.WB, PIECE.WQ,

PIECE.WK, PIECE.WB, PIECE.WN, PIECE.WR];

for (let f = 0; f < 8; f++) {

this.board[56 + f] = back[f]; // White back rank (row 7)

this.board[48 + f] = PIECE.WP; // White pawns (row 6)

this.board[8 + f] = PIECE.BP; // Black pawns (row 1)

this.board[f] = back[f] + 6; // Black back rank (row 0)

}

The fact that black pieces are exactly 6 higher than their white counterparts means that the black back rank can be initialized by simply adding 6 to the white back rank array. This is a direct consequence of the numeric encoding scheme and shows how a well-chosen representation can simplify initialization logic.

Beyond the board array itself, the ChessGame class maintains several additional pieces of state that are essential for correct chess play. The "castling" object tracks whether each of the four castling rights (white kingside, white queenside, black kingside, black queenside) is still available. The "enPassant" integer holds the index of the en passant target square, or -1 if no en passant capture is currently possible. The "halfMoves" counter tracks the number of half-moves since the last pawn move or capture, which is needed for the fifty-move draw rule. The "fullMoves" counter tracks the full move number. The "moveHistory", "positionHistory", and "sanHistory" arrays record the game's history in three different formats: as raw move objects, as board hashes for threefold repetition detection, and as Standard Algebraic Notation strings for display and PGN export.

This combination of a flat array for the board and a small set of auxiliary state variables is a time-tested approach. It is not as fast as the bitboard representations used by top engines like Stockfish, where each piece type is represented as a 64-bit integer with one bit per square, but it is far more readable and maintainable, and for a browser-based engine it is more than adequate.

CHAPTER TWO: MOVE GENERATION — THE HEART OF THE ENGINE

Move generation is the most performance-critical part of any chess engine. The engine must be able to enumerate all legal moves from any position quickly, because it will call this function thousands or even tens of thousands of times per second during search. RL Chess AI implements move generation in the generateMoves method of the ChessGame class, which dispatches to specialized sub-methods for each piece type.

The outer loop of generateMoves is straightforward: it iterates over all 64 squares, finds squares occupied by pieces of the current player's color, and calls the appropriate move generator for each piece type:

for (let from = 0; from < 64; from++) {

const p = b[from];

if (!p || !this.isOwn(p, color)) continue;

const r = this.row(from), f = this.col(from);

if (p === PIECE.WP || p === PIECE.BP)

this._pawnMoves(from, r, f, color, moves, onlyCaptures);

else if (p === PIECE.WN || p === PIECE.BN)

this._knightMoves(from, r, f, color, moves, onlyCaptures);

else if (p === PIECE.WB || p === PIECE.BB)

this._slidingMoves(from, r, f, color, moves,

[[-1,-1],[-1,1],[1,-1],[1,1]], onlyCaptures);

else if (p === PIECE.WR || p === PIECE.BR)

this._slidingMoves(from, r, f, color, moves,

[[-1,0],[1,0],[0,-1],[0,1]], onlyCaptures);

else if (p === PIECE.WQ || p === PIECE.BQ)

this._slidingMoves(from, r, f, color, moves,

[[-1,-1],[-1,1],[1,-1],[1,1],-1,0],[1,0],[0,-1],[0,1]], onlyCaptures);

else if (p === PIECE.WK || p === PIECE.BK)

this._kingMoves(from, r, f, color, moves, onlyCaptures);

}

After collecting all pseudo-legal moves — moves that are geometrically valid but may leave the king in check — the method applies a legality filter. For each candidate move, it temporarily applies the move to the board, checks whether the moving side's king is now in check, and if so, discards the move. This is the standard "make, test, unmake" approach:

return moves.filter(m => {

const undo = this._applyMove(m);

const legal = !this.isInCheck(color);

this._undoMove(m, undo);

return legal;

});

The _applyMove and _undoMove pair is central to the engine's operation. Rather than maintaining a complex incremental update system, _applyMove saves a complete snapshot of the mutable state (board array, castling rights, en passant square, half-move clock, and turn) into an "undo" object, applies the move, and returns the undo object. _undoMove simply restores all fields from that snapshot. This is clean and correct, though it does involve copying the entire 64-element board array on every move application. For a JavaScript engine running in a browser, this is an acceptable trade-off for simplicity.

The sliding piece move generator deserves particular attention because it handles bishops, rooks, and queens with a single unified method. The key insight is that these pieces all move by repeatedly stepping in a fixed direction until they hit the edge of the board or another piece. The directions are passed as a parameter: diagonal directions for bishops, orthogonal for rooks, and all eight directions for queens. The inner loop steps along each direction until it leaves the board, encounters a friendly piece (in which case it stops without adding a move), or encounters an enemy piece (in which case it adds a capture move and then stops):

_slidingMoves(from, r, f, color, moves, dirs, onlyCaptures) {

const b = this.board;

for (const [dr, df] of dirs) {

let nr = r + dr, nf = f + df;

while (nr >= 0 && nr < 8 && nf >= 0 && nf < 8) {

const to = nr * 8 + nf;

if (this.isOwn(b[to], color)) break;

if (this.isEnemy(b[to], color)) {

moves.push({from, to, flags:'c', capture:b[to]});

break;

}

if (!onlyCaptures)

moves.push({from, to, flags:'', capture:PIECE.EMPTY});

nr += dr; nf += df;

}

This is a model of clarity. The "onlyCaptures" flag is used during quiescence search, which we will discuss in the AI section, to generate only capture moves and avoid the cost of generating quiet moves that are not needed.

Pawn moves are the most complex because pawns have the most special rules: they move differently from how they capture, they can move two squares from their starting row, they can promote when they reach the last rank, and they can capture en passant. The _pawnMoves method handles all of these cases. The direction of movement is determined by color: white pawns move in the negative row direction (up the board, toward row 0), while black pawns move in the positive direction. The starting row for a double push is row 6 for white and row 1 for black. The promotion row is row 0 for white and row 7 for black. When a pawn reaches the promotion row, the method generates four separate moves, one for each possible promotion piece (queen, rook, bishop, knight), each flagged with the promoted piece type.

The en passant capture is handled by checking whether the destination square of a diagonal pawn capture matches the current enPassant square stored in the game state. If it does, the move is flagged as 'ep', and the _applyMove method knows to remove the captured pawn from the square behind the destination rather than from the destination itself.

Castling is generated inside the _kingMoves method, but only when the king is not currently in check. The conditions checked are: the relevant castling right flag must be true, the squares between the king and rook must be empty, and the squares the king passes through must not be attacked by the opponent. This last condition is checked using the isAttacked method, which we will examine next.

The isAttacked method determines whether a given square is attacked by any piece of a given color. It checks for attacks from pawns (which attack diagonally forward), knights (which attack in their characteristic L-shape), sliding pieces (which attack along ranks, files, and diagonals), and kings (which attack the eight adjacent squares). The method is written as a series of pattern checks rather than by generating moves, which makes it faster for the specific purpose of attack detection. Here is the knight attack check, which illustrates the style:

// Knights

for (const [dr, df] of [[-2,-1],[-2,1],[-1,-2],[-1,2],

[1,-2],[1,2],[2,-1],[2,1]]) {

const nr = r + dr, nf = f + df;

if (nr >= 0 && nr < 8 && nf >= 0 && nf < 8) {

const kn = byColor === WHITE ? PIECE.WN : PIECE.BN;

if (b[nr*8+nf] === kn) return true;

}

The isInCheck method simply finds the king of the given color and calls isAttacked on that square:

isInCheck(color) {

const kSq = this.findKing(color);

if (kSq < 0) return false;

return this.isAttacked(kSq, 1 - color);

}

The findKing method performs a linear scan of the board to find the king. This is called frequently during move generation (once per candidate move in the legality filter), and a linear scan of 64 elements is fast enough in practice, though caching the king's position would be a possible optimization.

The game status — normal, check, checkmate, stalemate, fifty-move draw, insufficient material, or threefold repetition — is computed by the getStatus method. Checkmate and stalemate are distinguished by whether the side to move is in check when it has no legal moves. Threefold repetition is detected by counting how many times the current board hash appears in the position history. Insufficient material is detected by checking whether the remaining pieces on both sides are insufficient to force checkmate (for example, king versus king, or king and bishop versus king).

The board hash used for threefold repetition detection and the transposition table is computed by the _boardHash method. The implementation encodes the board state as a string that includes the board array, the turn, the castling rights, and the en passant square. While a Zobrist hash (a standard technique using random 64-bit integers XORed together) would be faster and more collision-resistant, the string-based approach is simpler to implement correctly and sufficient for a browser-based engine.

CHAPTER THREE: THE EVALUATOR — TEACHING THE ENGINE WHAT IS GOOD

A chess engine that can generate all legal moves from any position is necessary but not sufficient. The engine also needs to be able to judge which positions are good and which are bad. This is the job of the Evaluator class, which computes a numerical score for any given position from the perspective of the side to move.

The evaluation function in RL Chess AI is a classical hand-crafted evaluator with two main components: material counting and piece-square tables.

Material counting assigns a fixed value to each piece type. The values used are standard in chess programming:

Pawn: 100 centipawns

Knight: 320 centipawns

Bishop: 330 centipawns

Rook: 500 centipawns

Queen: 900 centipawns

King: 20000 centipawns (effectively infinite)

The unit of "centipawns" (hundredths of a pawn) is conventional in chess engines. A positive score means white is better; a negative score means black is better. The material score is computed by summing the values of all white pieces and subtracting the values of all black pieces.

Piece-square tables (PSTs) add a positional bonus or penalty to each piece based on where it stands on the board. The idea is to encode simple strategic principles directly into the evaluation. For example, the knight PST gives high bonuses for central squares and large penalties for corner squares, reflecting the well-known principle that knights are most powerful in the center and weakest on the rim:

knight: [

-50,-40,-30,-30,-30,-30,-40,-50,

-40,-20, 0, 0, 0, 0,-20,-40,

-30, 0, 10, 15, 15, 10, 0,-30,

-30, 5, 15, 20, 20, 15, 5,-30,

-30, 0, 15, 20, 20, 15, 0,-30,

-30, 5, 10, 15, 15, 10, 5,-30,

-40,-20, 0, 5, 5, 0,-20,-40,

-50,-40,-30,-30,-30,-30,-40,-50

]

The king has two different PSTs: one for the middlegame and one for the endgame. In the middlegame, the king is heavily penalized for being in the center (where it is vulnerable to attacks) and rewarded for being tucked safely behind pawns on the kingside or queenside. In the endgame, when there are fewer pieces and the danger of mating attacks is reduced, the king is rewarded for being active and centralized. The engine detects the endgame by checking whether the total material on the board has fallen below a threshold of 2600 centipawns, which roughly corresponds to a position where both sides have lost their queens and most of their minor pieces.

The PST for black pieces is obtained by mirroring the white PST vertically. This is done by remapping the square index: for a black piece on square sq, the PST index is computed as (7 - row(sq)) * 8 + col(sq). This elegant trick means that only one set of PSTs needs to be stored, and the same tables serve both colors.

Beyond material and PST values, the evaluator adds a mobility bonus: it generates all legal moves for both sides and adds 5 centipawns per extra move available to white (or subtracts 5 per extra move available to black). This rewards positions where one side has more options, which is a rough proxy for piece activity and coordination.

The total evaluation score is returned from the perspective of white: positive means white is better, negative means black is better. The search algorithms, which we will examine next, negate the score when switching sides, following the "negamax" convention.

CHAPTER FOUR: THE AI BRAIN — FOUR STRATEGIES UNDER ONE ROOF

The most architecturally interesting part of RL Chess AI is its AI system, which is not a single monolithic engine but a collection of four distinct strategies that can be selected by the user and combined in various ways. The AIController class orchestrates these strategies:

class AIController {

constructor() {

this.minimax = new MinimaxEngine();

this.mcts = new MCTSEngine();

this.rl = new RLSystem();

this.book = new OpeningBook();

}

The decision pipeline in the getMove method follows a clear priority order. First, the opening book is consulted. If a book move is available for the current position, it is returned immediately, with a small random delay added to make the AI feel more natural. If no book move is available, the RL system is given a chance to suggest a move (with a 25% probability of being followed if it has a suggestion). If neither the book nor RL provides a move, the main search engine is invoked according to the selected strategy.

This layered approach is a smart design decision. Opening theory is well-established and does not need to be searched; using a book is both faster and stronger than any search. The RL system can provide guidance in positions it has seen before, potentially overriding the search in a small fraction of cases. And the search engine handles the vast middle and endgame territory where neither the book nor RL has reliable knowledge.

Let us examine each component in detail.

THE OPENING BOOK

The OpeningBook class stores a collection of opening lines as sequences of (from, to) square index pairs. There are three flavors: standard, aggressive, and defensive. The standard book includes lines like the Ruy Lopez, the Italian Game, the Sicilian Defense, the French Defense, and the Queen's Gambit. The aggressive book favors sharp, attacking lines. The defensive book favors solid, positional setups.

When the engine needs a book move, it compares the game's move history against all stored lines, filtering to those that match the moves played so far and still have at least one more move to suggest. From the matching lines, one is chosen at random, and the next move in that line is returned. This randomization is important: it prevents the AI from always playing the same opening, making it more interesting to play against.

The OpeningBook class also provides the getOpeningName method, which identifies the current opening by matching the SAN (Standard Algebraic Notation) history against a dictionary of named openings. The dictionary includes entries like:

'e4 e5 Nf3 Nc6 Bb5' => "Ruy Lopez (Spanish)"

'e4 c5' => "Sicilian Defense"

'd4 d5 c4' => "Queen's Gambit"

'Nf3' => "Reti Opening"

This name is displayed in the UI's "Opening" panel, giving the user a pleasant sense of context as the game develops.

THE MINIMAX ENGINE WITH ALPHA-BETA PRUNING

The MinimaxEngine is the workhorse of the AI. It implements the minimax algorithm with alpha-beta pruning, iterative deepening, quiescence search, null move pruning, a transposition table, killer move heuristic, and history heuristic. This is a formidable collection of techniques, each of which contributes meaningfully to the engine's strength.

The minimax algorithm explores the game tree by alternating between maximizing and minimizing players. At each node, it evaluates the position if the depth limit has been reached, or recursively explores all child positions otherwise. Alpha-beta pruning dramatically reduces the number of nodes that need to be explored by maintaining a window of acceptable scores: if a position is found to be worse than what the current player can already guarantee elsewhere, it is pruned without further exploration.

The engine uses the "negamax" formulation of alpha-beta, where the score is always computed from the perspective of the side to move and negated when returning to the parent. This simplifies the code because there is no need to distinguish between the maximizing and minimizing player:

alphaBeta(game, depth, alpha, beta, color, ply = 0) {

this.nodes++;

// ... (time check, transposition table lookup) ...

if (depth === 0) {

return useQ

? this.quiescence(game, alpha, beta, color)

: Evaluator.evaluate(game) * (color === WHITE ? 1 : -1);

}

// ... (null move pruning, move ordering) ...

for (const move of moves) {

const undo = game._applyMove(move);

const score = -this.alphaBeta(game, depth-1, -beta, -alpha, 1-color, ply+1);

game._undoMove(move, undo);

if (score > bestScore) { bestScore = score; bestMove = move; }

if (score > alpha) { alpha = score; }

if (alpha >= beta) { /* beta cutoff */ break; }

}

// ...

}

Iterative deepening means the engine searches to depth 1 first, then depth 2, then depth 3, and so on up to the configured maximum depth. This has several advantages: it provides a best move estimate at every depth (useful for time management), it populates the transposition table with information from shallower searches that can guide deeper searches, and it allows the search to be interrupted at any time with a reasonable move available.

The time limit is enforced by checking the elapsed time every 2048 nodes (using the bitwise trick "(this.nodes & 2047) === 0" to avoid calling Date.now() on every node). If the time limit has been exceeded, the search returns the static evaluation of the current position, effectively cutting off the search at that branch.

Quiescence search addresses the "horizon effect," a well-known problem in chess programming where the engine may evaluate a position as good simply because it has not looked far enough ahead to see a pending capture. The quiescence search extends the search beyond the nominal depth limit, but only for capture moves, until a "quiet" position (one with no pending captures) is reached. The stand-pat score — the static evaluation of the current position — is used as a lower bound: if the position is already good enough without making any capture, the search can return immediately.

Null move pruning is a powerful technique that speeds up the search by making a "null move" — passing the turn to the opponent without moving — and searching the resulting position at a reduced depth. If the opponent cannot improve their position even with an extra move, the current position is likely so good for the moving side that a beta cutoff can be applied. The engine applies null move pruning when the depth is at least 3, the current player is not in check, and the search is not at the root. The reduction is 3 plies (depth - 3), which is aggressive but effective.

Move ordering is critical for alpha-beta efficiency. The engine orders moves using a combination of four heuristics. The transposition table move (the best move found in a previous search of the same position) is tried first. Captures are ordered by the MVV-LVA (Most Valuable Victim, Least Valuable Attacker) heuristic: capturing a queen with a pawn is tried before capturing a pawn with a queen. Killer moves — quiet moves that caused a beta cutoff at the same depth in a sibling branch — are tried next. Finally, the history heuristic assigns a score to each (from, to) pair based on how often it caused a beta cutoff in previous searches, with the score increasing quadratically with depth.

The transposition table stores previously computed search results indexed by board hash. Each entry records the depth of the search, the score, the type of bound (exact, lower, or upper), and the best move found. When the same position is encountered again at the same or lesser depth, the stored result can be returned immediately, avoiding redundant computation. The table has a maximum size of 500,000 entries and evicts 10% of its entries when full, using a simple FIFO-like eviction strategy.

Late Move Reduction (LMR) is also implemented: for moves beyond the fourth in the ordered list, at depth 3 or greater, the engine first searches with a reduced depth (depth - 2) and a null window. If the reduced search fails high (suggesting the move may be better than expected), it is re-searched at full depth. This heuristic is based on the observation that later moves in the ordered list are less likely to be good, so they can be searched more shallowly without much loss of accuracy.

MONTE CARLO TREE SEARCH

The MCTSEngine implements the Monte Carlo Tree Search algorithm, which takes a fundamentally different approach from minimax. Instead of exhaustively exploring the game tree to a fixed depth, MCTS builds a search tree incrementally by repeatedly running random simulations (called "rollouts" or "playouts") from the current position to the end of the game, and using the results to guide future exploration.

Each node in the MCTS tree is an MCTSNode object that stores a snapshot of the board state (board array, turn, castling rights, en passant square, half-move clock, and full-move count), a reference to the parent node, a list of child nodes, and statistics (number of wins and visits). The snapshot approach means that the game state can be reconstructed at any node by calling _makeGame, which creates a new ChessGame object and populates it from the snapshot.

The MCTS algorithm proceeds in four phases, repeated for a fixed number of iterations (at least 200, scaled by the configured depth). In the selection phase, the algorithm traverses the tree from the root, always choosing the child with the highest UCB1 (Upper Confidence Bound) score, until it reaches a node with unexplored children or a terminal node. The UCB1 formula balances exploitation (choosing the child with the highest win rate) against exploration (choosing less-visited children):

UCB1 = (wins / visits) + C * sqrt(ln(parent.visits) / visits)

where C is the exploration constant, set to the square root of 2 (Math.SQRT2 in the code). In the expansion phase, one of the unexplored children is added to the tree. In the simulation phase, a random playout is run from the new node: moves are chosen randomly (with a bias toward captures, which are selected 60% of the time when available) until the game ends or a depth limit of 60 half-moves is reached. If the depth limit is reached without a terminal position, the static evaluator is used to determine the winner. In the backpropagation phase, the result of the simulation is propagated back up the tree, updating the win and visit counts of all ancestor nodes.

At the end of all iterations, the best child of the root is selected as the engine's move. The selection criterion is maximum visits rather than maximum win rate, because the visit count is a more robust statistic in the presence of noisy simulation results.

MCTS has several appealing properties for a chess engine. It does not require a hand-crafted evaluation function (though the implementation uses one as a fallback). It naturally balances exploration and exploitation. And it can be parallelized easily, though the current implementation is single-threaded. Its weakness compared to minimax is that it tends to be weaker in tactically sharp positions where precise calculation is needed, because random playouts can miss forced sequences.

THE REINFORCEMENT LEARNING SYSTEM

The RLSystem class implements a form of Q-learning, one of the foundational algorithms of reinforcement learning. Q-learning maintains a table of Q-values, where each entry Q(s, a) represents the expected long-term reward of taking action a (a chess move) in state s (a board position). Over time, as the engine plays games and observes the outcomes, these Q-values are updated to better reflect the true value of each action.

The state is represented by the board hash (the same string hash used for the transposition table and threefold repetition detection). The action is represented by the (from, to) square pair, encoded as the string "from-to". The Q-table is a plain JavaScript object mapping board hashes to sub-objects mapping action strings to Q-values.

The move selection policy uses epsilon-greedy exploration: with probability epsilon (set to 0.1, meaning 10% of the time), the RL system selects a random move; otherwise, it selects the move with the highest Q-value for the current board hash. This balance between exploration (trying new things) and exploitation (using what has been learned) is fundamental to reinforcement learning.

selectMove(game, moves) {

if (Math.random() < epsilon)

return moves[Math.floor(Math.random() * moves.length)];

const hash = game._boardHash();

const qVals = rlState.qTable[hash];

if (!qVals) return null;

let best = null, bestQ = -Infinity;

for (const m of moves) {

const k = `${m.from}-${m.to}`;

const q = qVals[k] || 0;

if (q > bestQ) { bestQ = q; best = m; }

}

return best;

}

During a game, every move made by the AI is recorded in the gameHistory array as a (hash, move) pair. At the end of the game, the learnFromGame method performs the Q-value update. The reward signal is simple: +1 for a win, +0.1 for a draw, and -1 for a loss. The update propagates backward through the game history using a discounted return:

learnFromGame(winner, aiColor) {

const reward = winner === aiColor ? 1 : winner === -1 ? 0.1 : -1;

const lr = rlState.learningRate; // 0.03

const disc = rlState.discount; // 0.95

let R = reward;

for (let i = rlState.gameHistory.length - 1; i >= 0; i--) {

const { hash, move } = rlState.gameHistory[i];

if (!rlState.qTable[hash]) rlState.qTable[hash] = {};

const k = `${move.from}-${move.to}`;

rlState.qTable[hash][k] =

(rlState.qTable[hash][k] || 0) +

lr * (R - (rlState.qTable[hash][k] || 0));

R *= disc;

}

The update rule is a simplified form of the Monte Carlo Q-learning update: the Q-value for each (state, action) pair is nudged toward the discounted return R by a learning rate of 0.03. The discount factor of 0.95 means that rewards received earlier in the game are worth slightly less than rewards received later, which is a standard technique for handling delayed rewards in reinforcement learning.

The Q-table and the win/draw/loss statistics are persisted to the browser's localStorage under the key 'mini_chess_rl'. This means that the RL system's knowledge accumulates across sessions: the more games are played, the more the Q-table grows, and the more informed the RL system's move suggestions become. The UI displays the current state of the Q-table (number of Q-states) and the RL win/draw/loss record in the "RL Learning" panel.

It is important to understand what the RL system is and is not doing here. It is not learning a deep neural network policy like AlphaZero. It is learning a shallow lookup table that maps board positions to move preferences. This table will only be useful for positions that have been seen before, and the number of distinct chess positions is astronomically large, so the table will remain sparse for a very long time. The practical effect is that the RL system provides a mild influence on the AI's play in positions that recur frequently (such as common opening positions), while the minimax or MCTS engine handles the vast majority of decisions.

THE HYBRID MODE

In the default "Hybrid" mode, the AI combines all four strategies in a priority cascade. The opening book is consulted first. If no book move is available, the RL system is given a 25% chance to override the search with its best Q-table move. If neither applies, the minimax engine runs with the configured depth and time limit. The "Neural Evaluation" mode is a variant of minimax where the top candidates from the static evaluator are used to reorder the search results, giving a slight bias toward moves that look good according to the PST-based evaluation.

This layered architecture means that the AI's behavior changes qualitatively as the game progresses. In the opening, it plays book moves that reflect established theory. In the early middlegame, it may occasionally follow RL-guided moves from previous games. In the main game, it uses the full power of minimax with all its optimizations. This creates a natural and varied playing style.

CHAPTER FIVE: THE USER INTERFACE — BEAUTY MEETS FUNCTION

A chess engine without a user interface is just a library. RL Chess AI wraps its engine in a polished, responsive web interface built entirely with HTML and CSS, with no external dependencies. The visual design uses a dark color scheme with teal and gold accents, defined through CSS custom properties (variables):

:root {

--primary: #009999; /* Teal */

--accent: #e8b84b; /* Gold */

--bg: #0f0f1a; /* Very dark navy */

--card: #1a1a2e; /* Dark navy for cards */

--light-square: #f0d9b5; /* Classic light chess square */

--dark-square: #b58863; /* Classic dark chess square */

}

The use of CSS custom properties is a deliberate design decision that makes theming and maintenance straightforward. All colors are defined in one place, and changing the theme requires modifying only the :root block.

The layout is a three-column CSS grid: a left panel for game settings and RL statistics, a central board area, and a right panel for evaluation, move history, opening name, and PGN export. On smaller screens, the grid collapses to a single column with the board at the top, making the application usable on tablets and phones. The responsive breakpoints are at 1200px, 960px, and 520px, with the board shrinking from 560x560 pixels (70px squares) down to 320x320 pixels (40px squares) on the smallest screens.

The chess board itself is rendered as an 8x8 CSS grid of square div elements. Each square is a div with a background color determined by its position (light or dark), and the piece is rendered as a Unicode chess symbol inside the div. The Unicode symbols used are the standard chess piece characters: ♙♘♗♖♕♔ for white pieces and ♟♞♝♜♛♚ for black pieces. These are rendered in a large font size (52px at the default board size) using a font stack that prioritizes "Segoe UI Symbol" and "Apple Color Emoji" for good cross-platform rendering.

The board rendering function, renderBoard, rebuilds the entire board DOM on every update. This is not the most efficient approach (a more sophisticated implementation would update only the changed squares), but it is simple and correct, and the performance is more than adequate for a chess game where the board changes at most once per second.

Visual feedback is provided through CSS classes applied to squares. The selected square gets a blue highlight, the last move's source and destination squares get a green highlight, and a square whose king is in check gets a red highlight. Legal move destinations are shown as small dots (using a CSS pseudo-element with a circular background) when move hints are enabled.

Piece interaction is supported through both click-to-move and drag-and-drop. Click-to-move works by tracking the selected square in the UI state object: the first click on a friendly piece selects it and highlights its legal moves, and a second click on a highlighted destination executes the move. Drag-and-drop uses the HTML5 Drag and Drop API: the dragstart event sets the source square, and the drop event executes the move if the destination is legal.

Pawn promotion is handled by a modal dialog that appears when a pawn reaches the last rank. The modal displays the four possible promotion pieces as large Unicode symbols, and clicking one executes the promotion move with the chosen piece.

The game-over modal displays the result with an appropriate icon (a trophy for a win, a handshake for a draw, a skull for a loss) and a message explaining the reason (checkmate, stalemate, fifty-move rule, insufficient material, or threefold repetition). The modal appears with a 600-millisecond delay after the game ends, giving the board time to render the final position before the overlay appears.

The evaluation bar is a vertical bar on the left side of the board that shows the current position evaluation as a percentage. A score of 0 (equal position) corresponds to 50% height. A score of +800 centipawns or more (a large white advantage) fills the bar completely, and -800 or less empties it. The bar is updated asynchronously after each move, with a 50-millisecond debounce to avoid unnecessary computation.

The move history panel on the right shows the game's moves in SAN notation, formatted in two columns (white's move and black's move) with move numbers. The current move is highlighted. The opening name panel shows the name of the current opening, updated after each move. The PGN panel shows the game in Portable Game Notation format, which can be copied and imported into other chess tools.

The thinking display shows the AI's search progress in real time: each depth of the iterative deepening search produces a line showing the depth, score, node count, and nodes per second. For MCTS, it shows the simulation count. This transparency into the AI's thought process is both educational and engaging.

The sound engine uses the Web Audio API to generate synthesized sound effects for moves, captures, checks, castling, and game over. The sounds are generated programmatically using oscillators and gain nodes, with different waveforms and frequencies for different events. A regular move uses a square wave at 440 Hz; a capture uses a sawtooth wave at 220 Hz; a check uses two sine tones at 880 Hz and 660 Hz in quick succession; castling uses two sine tones at 330 Hz and 440 Hz; and game over plays a four-note ascending arpeggio. This is a delightful touch that adds immersion without requiring any audio files.

The session statistics panel tracks wins, draws, and losses across games within a browser session, persisted to localStorage. The RL learning panel shows the AI's cumulative win/draw/loss record from Q-learning and the number of Q-states in the table, giving the user a sense of how much the AI has learned.

CHAPTER SIX: CONTROL FLOW AND STATE MANAGEMENT

The application's control flow is event-driven and asynchronous. The global state is managed through a small set of module-level variables: the current game object, the game mode, the human player's color, flags for whether the AI is currently thinking or an AI-vs-AI game is running, the board flip state, the UI state (selected square, possible moves, last move), the clock state, the session statistics, and the RL state.

The gameGeneration counter is a particularly clever piece of state management. Every time a new game is started, gameGeneration is incremented. The doAIMove function captures the current generation at the start of its execution and checks it at every asynchronous boundary. If the generation has changed (because the user started a new game while the AI was thinking), the function returns without executing the move. This prevents stale AI moves from being applied to a new game, which would be a subtle and hard-to-debug bug.

The doAIMove function is asynchronous (declared with "async") and calls the AI's getMove method, which is also asynchronous. The asynchronous nature is important because the search can take several seconds at higher depths, and blocking the browser's main thread would freeze the UI. The search is wrapped in a Promise that resolves after a setTimeout with a delay of 0, which yields control to the browser's event loop and allows the UI to remain responsive during the search.

The executeMove function is the central hub of the game loop. It applies the move to the game state, plays the appropriate sound, re-renders the board, updates all UI panels, checks for game-over conditions, and schedules the next AI move if necessary. The game-over check is comprehensive: it tests for checkmate, stalemate, fifty-move draw, insufficient material, and threefold repetition, in that order.

The undo functionality undoes two moves at a time in human-vs-AI mode (the human's last move and the AI's last response), or one move at a time in human-vs-human mode. Undoing a move in AI-vs-AI mode is not supported (the AI-vs-AI game is simply stopped). The undo operation increments gameGeneration to cancel any pending AI move, then calls undoLastMove on the game object, which pops the last entry from the move history and restores the previous board state from the saved undo information.

CHAPTER SEVEN: DESIGN DECISIONS AND THEIR RATIONALE

Having examined all the components of RL Chess AI, it is worth stepping back and reflecting on the key architectural decisions and why they were made.

The decision to implement everything in a single HTML file is the most fundamental. It makes the application trivially easy to deploy and share: just open the file in a browser. There are no build steps, no package managers, no servers. The trade-off is that the file is large and all concerns (HTML, CSS, JavaScript, game logic, AI, UI) are co-located. For a project of this scale, the trade-off is clearly worth it.

The choice of a flat 64-element array for the board, rather than a bitboard representation, prioritizes readability and correctness over raw performance. A bitboard engine can generate moves an order of magnitude faster, but it is also much harder to implement correctly. For a browser-based engine where the search depth is limited to 6 plies and the time limit is a few seconds, the flat array is fast enough.

The decision to support four different search strategies (Hybrid, Pure Minimax, MCTS, Neural) is primarily pedagogical and experimental. It allows users to compare the behavior of different AI approaches and observe their strengths and weaknesses. The Hybrid mode is the default and the strongest, but the ability to switch to pure MCTS or pure Neural evaluation makes the application a useful teaching tool.

The reinforcement learning system is the most unconventional component. A Q-table over raw board positions is not a practical approach to learning chess at a high level — the state space is simply too large. But it is a correct and illustrative implementation of Q-learning that demonstrates the core ideas of reinforcement learning in a concrete, interactive setting. As the user plays more games, the Q-table grows, and the RL system's influence on the AI's play becomes more visible. This is a genuine, if modest, form of machine learning happening in the browser.

The use of localStorage for persistence is a pragmatic choice. It requires no server, no database, and no authentication. The Q-table can grow large over many games, and localStorage has a limit of typically 5-10 MB per origin, but for a Q-table indexed by board hashes, this is unlikely to be a problem in practice.

The sound engine using the Web Audio API rather than audio files is a clever choice that keeps the application self-contained. The synthesized sounds are not as rich as recorded sounds, but they are distinctive and informative, and they add significantly to the user experience.

CHAPTER EIGHT: POSSIBLE EXTENSIONS

RL Chess AI is a complete and polished application, but there are many directions in which it could be extended. Here are the most promising ones, each with a brief rationale.

A Zobrist hashing scheme would replace the current string-based board hash with a fast 64-bit integer hash computed by XORing random values for each piece on each square. This would make the transposition table lookups and threefold repetition detection significantly faster, which would allow the engine to search deeper in the same amount of time.

A neural network evaluation function, implemented using TensorFlow.js or ONNX Runtime Web, would replace the hand-crafted PST evaluator with a learned function that can capture more subtle positional features. This is the approach used by modern top engines like Leela Chess Zero and the neural network version of Stockfish (NNUE). The network could be pre-trained on a database of grandmaster games and then fine-tuned using the RL system.

A deep Q-network (DQN) would replace the Q-table with a neural network that maps board states to Q-values. This would allow the RL system to generalize to unseen positions, overcoming the fundamental limitation of the current Q-table approach. Combined with self-play training (the AI-vs-AI mode already provides the infrastructure for this), a DQN could learn to play at a significantly higher level over time.

A proper time management system would replace the current fixed time limit per move with a more sophisticated algorithm that allocates time based on the game clock, the complexity of the position, and the number of moves remaining. This would make the engine more competitive in timed games.

An endgame tablebase lookup would allow the engine to play perfectly in positions with few pieces remaining. Endgame tablebases for up to 7 pieces are available online and could be accessed via a web API.

A FEN (Forsyth-Edwards Notation) import/export feature would allow users to set up arbitrary positions and analyze them, not just play from the starting position. This would make the application more useful as an analysis tool.

A principal variation display would show the engine's main line of thought — the sequence of moves it considers best for both sides — in the thinking panel, rather than just the current depth and score. This is standard in professional chess interfaces and greatly aids understanding.

Multiplayer support via WebSockets or WebRTC would allow two human players to play against each other over the internet, with the AI available as an analysis tool. This would transform the application from a single-player experience into a social one.

ADDENDUM: USER GUIDE FOR RL CHESS AI

This section is a complete guide for anyone who wants to play with or explore RL Chess AI. No prior knowledge of chess programming is required, though a basic familiarity with the rules of chess is assumed.

GETTING STARTED

To open the application, simply open the HTML file in any modern web browser (Chrome, Firefox, Edge, or Safari). The application will load immediately and start a new game automatically. You will see a dark-themed interface with the chess board in the center, a left panel with game settings and AI options, and a right panel with evaluation, move history, and opening information.

CHOOSING YOUR GAME MODE

The "Game Mode" dropdown in the left panel offers three options. "Human vs AI" is the default mode, where you play against the computer. "AI vs AI" lets you watch two AI engines play each other — a fascinating way to observe the different search strategies in action. "Human vs Human" allows two people to play on the same device, taking turns.

In "Human vs AI" mode, the "Play As" dropdown lets you choose whether to play as White, Black, or Random. If you choose Black, the board will automatically flip so that your pieces are at the bottom.

SETTING THE DIFFICULTY

The "AI Difficulty (Depth)" slider controls how deeply the minimax engine searches. The range is 1 (Beginner) to 6 (Expert). At depth 1, the engine looks only one move ahead and plays very weakly. At depth 3 (the default), it plays at a solid amateur level. At depth 6, it searches deeply and plays quite strongly, though it may take several seconds per move. For a comfortable game, depth 3 or 4 is recommended for most players.

SETTING THE TIME CONTROL

The "Time Control" dropdown sets a clock for each player. Options range from "No Limit" (unlimited time per move) to "30 min (Classical)". When a time control is active, each player's clock is displayed next to their name above and below the board. If a player's clock runs out, they lose on time. Note that the AI does not respect the time control in its search — it uses the depth setting to limit its thinking time.

CHOOSING THE AI STRATEGY

The "Search Strategy" dropdown in the "AI Engine" card offers four options. "Hybrid (Neural + Minimax)" is the default and strongest mode, combining the opening book, RL guidance, and minimax search. "Pure Minimax + alpha-beta" uses only the minimax engine without RL influence. "Monte Carlo Tree Search" uses MCTS instead of minimax — it plays differently and can be interesting to compare. "Neural Evaluation" uses minimax but with a neural-style evaluation ordering.

CHOOSING THE OPENING BOOK

The "Opening Book" dropdown controls which opening lines the AI follows. "Standard" includes a variety of classical openings. "Aggressive" biases toward sharp, attacking lines. "Defensive" biases toward solid, positional setups. "Off" disables the opening book entirely, forcing the AI to calculate from move 1.

TOGGLING ADVANCED FEATURES

Three toggle switches control advanced AI features. "Reinforcement Learning" enables or disables the Q-learning system. When enabled, the AI may occasionally play RL-guided moves and will learn from the outcomes of games. "Quiescence Search" enables or disables the quiescence extension in the minimax search. Disabling it makes the engine faster but weaker, as it becomes susceptible to the horizon effect. "Null Move Pruning" enables or disables null move pruning. Disabling it makes the search more thorough but slower. "Move Hints" controls whether legal move destinations are shown as dots when you click a piece.

MAKING YOUR MOVES

To move a piece, click on it to select it. Legal destination squares will be highlighted with small dots (if move hints are enabled) and the selected piece's square will turn blue. Click on a destination square to execute the move. Alternatively, you can drag and drop pieces directly to their destination squares.

When a pawn reaches the last rank, a promotion dialog will appear, showing the four possible promotion pieces (queen, rook, bishop, knight) as large symbols. Click the piece you want to promote to.

READING THE INTERFACE

The evaluation bar on the left side of the board shows the current position assessment. When the bar is above the midpoint, white is better; below the midpoint, black is better. The score in centipawns is shown in the "Evaluation" panel on the right.

The "Thinking" panel shows the AI's search progress in real time. Each line shows the search depth, the score, the number of nodes searched, and the nodes per second. For MCTS, it shows the simulation count.

The "Move History" panel shows all moves played in SAN notation. The "Opening" panel shows the name of the current opening. The "PGN" panel shows the game in Portable Game Notation, which you can copy and import into other chess tools like Lichess or Chess.com for further analysis.

The "RL Learning" panel shows the AI's cumulative win/draw/loss record from Q-learning and the number of Q-states in the table. This grows as you play more games.

The "Session Stats" panel shows your wins, draws, and losses in the current browser session.

CONTROLLING THE GAME

The "New Game" button (also available in the header) starts a fresh game with the current settings. The "Undo" button takes back the last move (or the last two moves in Human vs AI mode, undoing both your move and the AI's response). The "Flip Board" button rotates the board 180 degrees. The "Analyze" button (if present) shows a detailed analysis of the current position, including material balance, move count, check status, castling rights, and en passant availability.

In "AI vs AI" mode, a "Start" button begins the automated game and a "Stop" button halts it.

TIPS FOR PLAYING WELL

The AI is strongest in the middlegame, where its minimax search can find tactical combinations. You may find it easier to outplay the AI in the endgame, where precise technique is required and the search depth may not be sufficient to see all the consequences of each move. Choosing an aggressive opening book and a low difficulty setting is a good way to start if you are new to the game. As you improve, increase the depth and switch to the standard or defensive opening book.

The RL system improves over time, so the AI may play slightly differently after many games than it did at the start. If you want to reset the AI's learned knowledge, you can clear the browser's localStorage for the application's origin.

TROUBLESHOOTING

If the board appears blank or the game does not start, try refreshing the browser. If the AI takes too long to move, reduce the difficulty depth. If sounds are not playing, check that your browser's audio is not muted and that the Web Audio API is supported (it is supported in all modern browsers). If the application runs slowly on a mobile device, reduce the board size by using a smaller screen or zooming out.

EPILOGUE

RL Chess AI is a remarkable piece of software engineering. In a single HTML file, it delivers a complete, playable, and genuinely intelligent chess application that demonstrates minimax search, Monte Carlo Tree Search, reinforcement learning, opening theory, position evaluation, and a polished responsive UI — all without a single external dependency. It is a testament to what can be achieved with modern JavaScript and a clear architectural vision.

The application is not trying to compete with Stockfish or AlphaZero. It is trying to be educational, entertaining, and accessible, and in those goals it succeeds admirably. Whether you are a chess player who wants a capable opponent, a developer who wants to understand how chess engines work, or a machine learning enthusiast who wants to see Q-learning in action, RL Chess AI has something to offer you.

The code is open source under the MIT License, which means you are free to study it, modify it, and build upon it. The extensions described in Part Eight are all within reach for a motivated developer. The foundation is solid, the architecture is clean, and the possibilities are wide open.

Now go play some chess.

Hitchhiker's Guide to AI, Software Architecture, and Everything Else

Tuesday, April 07, 2026

RL CHESS AI: A DEEP DIVE INTO ARCHITECTURE, DESIGN, AND INTELLIGENCE

PROLOGUE: A CHESS ENGINE IN YOUR BROWSER

CHAPTER ONE: THE BOARD REPRESENTATION

CHAPTER TWO: MOVE GENERATION — THE HEART OF THE ENGINE

CHAPTER THREE: THE EVALUATOR — TEACHING THE ENGINE WHAT IS GOOD

CHAPTER FOUR: THE AI BRAIN — FOUR STRATEGIES UNDER ONE ROOF

THE OPENING BOOK

THE MINIMAX ENGINE WITH ALPHA-BETA PRUNING

MONTE CARLO TREE SEARCH

THE REINFORCEMENT LEARNING SYSTEM

THE HYBRID MODE

CHAPTER FIVE: THE USER INTERFACE — BEAUTY MEETS FUNCTION

CHAPTER SIX: CONTROL FLOW AND STATE MANAGEMENT

CHAPTER SEVEN: DESIGN DECISIONS AND THEIR RATIONALE

CHAPTER EIGHT: POSSIBLE EXTENSIONS

ADDENDUM: USER GUIDE FOR RL CHESS AI

GETTING STARTED

CHOOSING YOUR GAME MODE

SETTING THE DIFFICULTY

SETTING THE TIME CONTROL

CHOOSING THE AI STRATEGY

CHOOSING THE OPENING BOOK

TOGGLING ADVANCED FEATURES

MAKING YOUR MOVES

READING THE INTERFACE

CONTROLLING THE GAME

TIPS FOR PLAYING WELL

TROUBLESHOOTING

EPILOGUE

No comments:

About Me