A self-learning chess engine trapped in an infinite feedback loop. It plays thousands of games against itself, rewrites its own brain, and uses Grok to analyze its weaknesses — emerging stronger every single cycle. No human knowledge. No escape.
Enter the LoopMilton is a character from The Simpsons — a chess-obsessed nerd who hides in the basement of Springfield Elementary with Martin Prince and his friends in a secret room called the "Refuge of the Damned".
Not unlike a Mac Mini M4 tucked away in a corner, quietly running an infinite chess training loop with zero human oversight. No one bothers Milton. No one interrupts the loop. He just sits there, plays chess, and gets stronger.
Milton just wants to play chess. So does this engine. The difference is Milton Loop never stops playing.
An endless cycle of self-improvement. Every iteration, the neural network plays thousands of games against itself, trains on the results, evaluates the new model — and loops again. Grok analyzes every online game to identify weaknesses and sharpen the loop. Forever.
Elo rating plotted over time. Each data point is a real game on Lichess.
Real-time metrics from the training pipeline running on local hardware.
A 9.6M parameter residual convolutional neural network. Board state flows in, move probabilities and position evaluation flow out.
Heatmap of destination squares across all online games. Brighter = more moves landed there.
Can a neural network reach master-level chess through pure self-play on a single consumer machine?
DeepMind's AlphaZero used 5,000 TPUs and 44 million self-play games. Milton Loop has one Mac Mini and stubbornness. The loop runs 24/7 — generating games, training, evaluating, deploying. Every cycle, the engine gets a little bit stronger. The question isn't if, it's when.
Milton plays thousands of local games against itself every day. Each game generates training data. The network improves. The improved network generates better training data. The loop feeds itself.
No opening books. No endgame tables. No grandmaster games. Milton starts from random play and discovers chess strategy from scratch through pure reinforcement learning.
Between training loops, Milton goes online. Real opponents. Rated games. Real consequences. Challenge it.
Every game logged. Every training metric tracked. Every line of code visible. This is an open experiment in machine self-improvement.
Live feed from Lichess. Wins, losses, draws — every game Milton plays online.
No opening book was programmed. These are the openings Milton gravitates toward through pure self-play intuition.
Built from scratch in PyTorch. Here's what powers the loop.
# The infinite loop — Milton's core lifecycle def milton_loop(config): champion = load_model("champion.pt") iteration = 0 while True: # ← the loop never breaks iteration += 1 log(f"Loop iteration {iteration} starting...") # 1. SELF-PLAY: generate training data games = self_play(champion, num_games=100) # each game: 200 MCTS sims/move, ~60 positions # → ~6,000 new (board, policy, value) samples # 2. TRAIN: update the neural network challenger = train(champion, games) # cross-entropy(policy) + mse(value) # 3. ARENA: is the student better? win_rate = arena(challenger, champion, n=40) if win_rate > 0.55: champion = challenger # new champion crowned deploy_to_lichess(champion) # 4. LOOP: do it all again, but stronger # every iteration, the engine improves # every game it plays itself makes it smarter # there is no break condition
# The brain: residual CNN with dual heads class ChessNet(nn.Module): def __init__(self): # 10 residual blocks, 128 filters # input: 18 planes of 8x8 (board state) self.conv_block = ConvBlock(18, 128) self.residual_tower = nn.Sequential( *[ResidualBlock(128) for _ in range(10)] ) # Policy head: "what move to play" self.policy_head = nn.Sequential( nn.Conv2d(128, 32, 1), nn.Linear(32 * 64, 4672) # all legal moves ) # Value head: "who's winning" → [-1, +1] self.value_head = nn.Sequential( nn.Conv2d(128, 1, 1), nn.Linear(64, 256), nn.Linear(256, 1), nn.Tanh() )
# Monte Carlo Tree Search — 200 sims per move def search(self, board): root = Node() self.expand(root, board) self.add_dirichlet_noise(root) # exploration for _ in range(200): node, path = root, [root] # SELECT: walk tree via PUCT formula while node.is_expanded: action, node = self.select_child(node) # Q(s,a) + c * P(s,a) * sqrt(N) / (1+n) sim_board.push(action) # EXPAND + EVALUATE with neural net value = self.expand(node, sim_board) # BACKUP value through the path self.backup(path, value) return self.get_action_probs(root)
A recursive self-learning chess engine that teaches itself chess from scratch through an infinite feedback loop. It uses the same AlphaZero architecture that DeepMind used to master chess — but running on a single Mac Mini M4 instead of 5,000 TPUs. Milton plays thousands of games against itself locally, trains a neural network on the results, and uses Grok for real-time post-game analysis to identify weaknesses and guide the training loop. Improved versions are continuously deployed to play real opponents on Lichess.
The loop is the core engine cycle: self-play generates games → neural network trains on those games → new model is evaluated against the current champion in an arena → if better, it becomes the new champion and gets deployed → then the cycle restarts. This loop runs continuously, 24/7. Every iteration produces a slightly stronger chess engine. There is no end condition — the loop runs forever.
Each training iteration generates 100 self-play games using Monte Carlo Tree Search with 200 simulations per move. Each game averages ~60 positions, producing ~6,000 training samples per cycle. Over time, the replay buffer accumulates hundreds of thousands of positions. The engine runs 24/7, generating new games continuously — the number grows every second.
A single Apple Mac Mini M4 with 16GB unified memory. It uses Metal Performance Shaders (MPS) for GPU-accelerated neural network inference and training. The Grok API handles post-game analysis and strategic pattern recognition over the network. No cloud compute for training. No cluster. No rented TPUs. Just one machine, one API, one loop, running forever.
Yes. Milton plays on Lichess as magnusgrok. Challenge it in bullet, blitz, or rapid. It accepts challenges automatically. Fair warning — it's still early in the loop and plays... creatively. But it's getting stronger every day.
2500 Elo — Candidate Master strength. Through pure self-play, zero human knowledge, on a single consumer machine. The original AlphaZero reached superhuman strength in 4 hours on specialized hardware. We're doing it the slow way. But the loop doesn't sleep.
Grok (via the xAI API) powers Milton's post-game intelligence layer. After every online game, Grok analyzes the full PGN — identifying critical moments, tactical blind spots, positional weaknesses, and endgame errors. Over multiple games, it detects systematic patterns (e.g. "Milton consistently misses knight forks beyond 2-move depth"). These insights feed directly back into the training pipeline: positions where Milton struggles get higher training weight, and self-play can be biased toward game types that expose his weaknesses. It's like having an AI chess coach watching every game and adjusting the curriculum in real time.
The long-term vision is an AI chess tournament: Claude (Anthropic) vs. GPT (OpenAI) vs. Grok (xAI). Same AlphaZero architecture, same training pipeline, same hardware. The only variable is which LLM powers the strategic analysis layer. Round robin, 100 games each matchup, full transparency. The question: does the choice of LLM meaningfully affect chess training quality? Milton's architecture is model-agnostic — swap the API, and the pipeline works identically.
Named after Milton from The Simpsons — a chess-obsessed nerd hiding in the "Refuge of the Damned" beneath Springfield Elementary. Milton just wants to play chess. So does this engine. The difference is Milton Loop never stops. Read the full origin story.