Beam Search

Depth 0 K = 3

Decode the sequence. Tap nodes to keep your top-K beams.

Best leaf reached: 0

Greedy is a trap. The brightest node rarely leads to the best leaf.

How Beam Search Works

This is the beam-search decoding algorithm wearing a party hat. Real language models don't commit to one token at a time — that's greedy decoding, and it's notoriously myopic. Instead they keep the top K partial sequences ("beams") alive and expand them in parallel, pruning the rest at every step. You are now the search.

A tree of tokens descends from the top. You start with K = 3 live beams.
Each step reveals the children of every live beam — the candidate next tokens, each with a logit score.
Tap to keep exactly K of those candidates. The rest get pruned (no mercy).
Hit Advance Beams to descend a level. Repeat until you reach the leaves at the bottom.
Your score is the value of the best leaf any beam reaches, plus the depth you survived.

Why Greedy Is A Trap

The number on each node is its local logit — how good that single token looks right now. But the best leaf often hides behind a mediocre node. A high-logit branch can dead-end into garbage; a dim one can blossom into the global optimum. If you always grab the brightest node, you are doing greedy decoding, and greedy decoding hallucinates with confidence.

If every one of your K beams hits a dead end with no children to expand, the search collapses and the run is over. Spread your bets. That's the whole point of keeping a beam width.

Slop Fact: Beam search was lifted from 1970s speech recognition, and to this day nobody agrees on the optimal beam width. Too narrow and you're basically greedy; too wide and you generate bland, repetitive mush. The default in most chatbots is "whatever the GPU could afford that quarter."