Distillation

Round 1 Budget: 3 Watch

The teacher knows everything and shows you for two seconds. You have three neurons and a dream.

Tap anywhere to enroll the student

How Distillation Works

This is knowledge distillation as a memory game. A giant teacher model lights up a pattern far richer than you could ever store. It glows for two seconds, then collapses into the void. Now it's your turn — but you're a tiny student with only a handful of parameters.

Watch the teacher's lit cells. Brighter cells matter more — those carry the signal.
The pattern vanishes. Place exactly your budget of marks on the cells you think mattered most.
Hit Distill to score your fidelity — how much of the teacher's important structure you captured.
Clear the pass threshold to advance. Each round the teacher gets bigger and your budget gets stingier.

Why Is This Hard?

You physically cannot copy the whole teacher — that's the point of distillation. Memorizing every dim cell is overfitting; you'll run out of budget on noise and miss the load-bearing neurons. Capture structure, not detail. A perfect compression picks exactly the top cells by salience.

Slop Fact: Real distillation trains a small model to mimic a big one's "soft" outputs — the dim, in-between probabilities — because that dark knowledge is where the teacher hides its wisdom. Yes, the boring gray cells are secretly the curriculum. No, we will not be elaborating, the GPU bill is due.