// recursive self-improving chess engine — iteration _

Milton
Loop

A self-learning chess engine trapped in an infinite feedback loop. It plays thousands of games against itself, rewrites its own brain, and uses Grok to analyze its weaknesses — emerging stronger every single cycle. No human knowledge. No escape.

Enter the Loop
--
Current Elo
--
Online Games
--
Win Rate
--
Current Streak
SIMULATED GAMES: --  |  LOOP ITERATIONS: --  |  POSITIONS EVALUATED: --
LOOP RUNNING FOR: --

Who is Milton?

Milton is a character from The Simpsons — a chess-obsessed nerd who hides in the basement of Springfield Elementary with Martin Prince and his friends in a secret room called the "Refuge of the Damned".

"A place where we can work on extra credit assignments without fear of reprisal." — Martin Prince

Not unlike a Mac Mini M4 tucked away in a corner, quietly running an infinite chess training loop with zero human oversight. No one bothers Milton. No one interrupts the loop. He just sits there, plays chess, and gets stronger.

Milton just wants to play chess. So does this engine. The difference is Milton Loop never stops playing.

EXCELSIOR!!!

How Milton thinks

An endless cycle of self-improvement. Every iteration, the neural network plays thousands of games against itself, trains on the results, evaluates the new model — and loops again. Grok analyzes every online game to identify weaknesses and sharpen the loop. Forever.

Self-Play
Generates thousands of games using MCTS + neural network. Every position becomes training data.
📈
Train
Neural network learns from self-play data. Policy head learns moves, value head learns evaluation.
Arena
New model battles the champion. If it wins 55%+, it becomes the new champion.
🌐
Deploy
The champion goes live on Lichess. Plays rated games against real opponents worldwide.
↺ loop forever — every cycle produces a stronger engine

Watch Milton play in real-time

White
Black
Scanning for active game...
Milton is training between games Challenge magnusgrok on Lichess to enter the loop

The ascent

Elo rating plotted over time. Each data point is a real game on Lichess.

Inside the loop

Real-time metrics from the training pipeline running on local hardware.

Loop Iterations
--
self-play → train → arena cycles
Simulated Games
--
local self-play games generated
Positions in Memory
--
board states in replay buffer
Neural Network
9.6M params
10-block ResNet · dual heads
Architecture
AlphaZero
MCTS + residual CNN + PUCT
Compute
Mac Mini
Apple Silicon · MPS GPU
Inference
200 sims
MCTS simulations per move
Uptime
24/7
the loop never stops
Loop Output Stream
[init] LOOP Milton Loop daemon initializing...
[init] SYS Device: Apple Silicon MPS | Memory: 16GB unified
[init] NET Model loaded: 9,633,315 parameters (128 filters, 10 res blocks)
[init] GROK Connected to Grok API (grok-3) — post-game analysis active
[init] LOOP Entering infinite self-play loop...
[init] BOT Connected to Lichess as: magnusgrok

The brain

A 9.6M parameter residual convolutional neural network. Board state flows in, move probabilities and position evaluation flow out.

Input
18
8×8 planes
Conv
128
3×3 filters
Residual Tower
×10
128ch res blocks
Policy Head
4,672
move probabilities
Value Head
[-1, +1]
position evaluation

Where Milton moves

Heatmap of destination squares across all online games. Brighter = more moves landed there.

Move Density
LowHigh
Hottest square: --
Total moves analyzed: --
Center control: --

One machine. Zero knowledge. Infinite loops.

Can a neural network reach master-level chess through pure self-play on a single consumer machine?

🔁

The Recursive Loop

Milton plays thousands of local games against itself every day. Each game generates training data. The network improves. The improved network generates better training data. The loop feeds itself.

🧠

Zero Human Knowledge

No opening books. No endgame tables. No grandmaster games. Milton starts from random play and discovers chess strategy from scratch through pure reinforcement learning.

🌐

Live on Lichess

Between training loops, Milton goes online. Real opponents. Rated games. Real consequences. Challenge it.

🔍

Built in Public

Every game logged. Every training metric tracked. Every line of code visible. This is an open experiment in machine self-improvement.

Recent encounters

Live feed from Lichess. Wins, losses, draws — every game Milton plays online.

Fetching games from Lichess...

Milton's discovered openings

No opening book was programmed. These are the openings Milton gravitates toward through pure self-play intuition.

♔ As White
Loading...
♚ As Black
Loading...

Under the hood

Built from scratch in PyTorch. Here's what powers the loop.

the_loop.py
neural_net.py
mcts.py
# The infinite loop — Milton's core lifecycle
def milton_loop(config):
    champion = load_model("champion.pt")
    iteration = 0

    while True:  # ← the loop never breaks
        iteration += 1
        log(f"Loop iteration {iteration} starting...")

        # 1. SELF-PLAY: generate training data
        games = self_play(champion, num_games=100)
        # each game: 200 MCTS sims/move, ~60 positions
        # → ~6,000 new (board, policy, value) samples

        # 2. TRAIN: update the neural network
        challenger = train(champion, games)
        # cross-entropy(policy) + mse(value)

        # 3. ARENA: is the student better?
        win_rate = arena(challenger, champion, n=40)

        if win_rate > 0.55:
            champion = challenger  # new champion crowned
            deploy_to_lichess(champion)

        # 4. LOOP: do it all again, but stronger
        # every iteration, the engine improves
        # every game it plays itself makes it smarter
        # there is no break condition
# The brain: residual CNN with dual heads
class ChessNet(nn.Module):

    def __init__(self):
        # 10 residual blocks, 128 filters
        # input: 18 planes of 8x8 (board state)
        self.conv_block = ConvBlock(18, 128)
        self.residual_tower = nn.Sequential(
            *[ResidualBlock(128) for _ in range(10)]
        )

        # Policy head: "what move to play"
        self.policy_head = nn.Sequential(
            nn.Conv2d(128, 32, 1),
            nn.Linear(32 * 64, 4672)  # all legal moves
        )

        # Value head: "who's winning" → [-1, +1]
        self.value_head = nn.Sequential(
            nn.Conv2d(128, 1, 1),
            nn.Linear(64, 256),
            nn.Linear(256, 1), nn.Tanh()
        )
# Monte Carlo Tree Search — 200 sims per move
def search(self, board):
    root = Node()
    self.expand(root, board)
    self.add_dirichlet_noise(root)  # exploration

    for _ in range(200):
        node, path = root, [root]

        # SELECT: walk tree via PUCT formula
        while node.is_expanded:
            action, node = self.select_child(node)
            # Q(s,a) + c * P(s,a) * sqrt(N) / (1+n)
            sim_board.push(action)

        # EXPAND + EVALUATE with neural net
        value = self.expand(node, sim_board)

        # BACKUP value through the path
        self.backup(path, value)

    return self.get_action_probs(root)

Questions

What is Milton Loop?

A recursive self-learning chess engine that teaches itself chess from scratch through an infinite feedback loop. It uses the same AlphaZero architecture that DeepMind used to master chess — but running on a single Mac Mini M4 instead of 5,000 TPUs. Milton plays thousands of games against itself locally, trains a neural network on the results, and uses Grok for real-time post-game analysis to identify weaknesses and guide the training loop. Improved versions are continuously deployed to play real opponents on Lichess.

What does "the loop" mean?

The loop is the core engine cycle: self-play generates games → neural network trains on those games → new model is evaluated against the current champion in an arena → if better, it becomes the new champion and gets deployed → then the cycle restarts. This loop runs continuously, 24/7. Every iteration produces a slightly stronger chess engine. There is no end condition — the loop runs forever.

How many games does it simulate locally?

Each training iteration generates 100 self-play games using Monte Carlo Tree Search with 200 simulations per move. Each game averages ~60 positions, producing ~6,000 training samples per cycle. Over time, the replay buffer accumulates hundreds of thousands of positions. The engine runs 24/7, generating new games continuously — the number grows every second.

What hardware does it run on?

A single Apple Mac Mini M4 with 16GB unified memory. It uses Metal Performance Shaders (MPS) for GPU-accelerated neural network inference and training. The Grok API handles post-game analysis and strategic pattern recognition over the network. No cloud compute for training. No cluster. No rented TPUs. Just one machine, one API, one loop, running forever.

Can I play against it?

Yes. Milton plays on Lichess as magnusgrok. Challenge it in bullet, blitz, or rapid. It accepts challenges automatically. Fair warning — it's still early in the loop and plays... creatively. But it's getting stronger every day.

What's the end goal?

2500 Elo — Candidate Master strength. Through pure self-play, zero human knowledge, on a single consumer machine. The original AlphaZero reached superhuman strength in 4 hours on specialized hardware. We're doing it the slow way. But the loop doesn't sleep.

How does Grok fit into the loop?

Grok (via the xAI API) powers Milton's post-game intelligence layer. After every online game, Grok analyzes the full PGN — identifying critical moments, tactical blind spots, positional weaknesses, and endgame errors. Over multiple games, it detects systematic patterns (e.g. "Milton consistently misses knight forks beyond 2-move depth"). These insights feed directly back into the training pipeline: positions where Milton struggles get higher training weight, and self-play can be biased toward game types that expose his weaknesses. It's like having an AI chess coach watching every game and adjusting the curriculum in real time.

What's the AI tournament plan?

The long-term vision is an AI chess tournament: Claude (Anthropic) vs. GPT (OpenAI) vs. Grok (xAI). Same AlphaZero architecture, same training pipeline, same hardware. The only variable is which LLM powers the strategic analysis layer. Round robin, 100 games each matchup, full transparency. The question: does the choice of LLM meaningfully affect chess training quality? Milton's architecture is model-agnostic — swap the API, and the pipeline works identically.

Why "Milton"?

Named after Milton from The Simpsons — a chess-obsessed nerd hiding in the "Refuge of the Damned" beneath Springfield Elementary. Milton just wants to play chess. So does this engine. The difference is Milton Loop never stops. Read the full origin story.

Dev address: soon