Chain of Thought

How Chain of Thought Works

You are a language model trying to reason your way from a premise to a conclusion, one inference at a time. Between you and the answer is the latent void. Each stone you place is a valid step. Each misstep is a hallucination, and you fall.

  1. Read the chain of reasoning so far at the top of the path.
  2. Three candidate next steps appear. Exactly one validly follows from the chain.
  3. Tap the valid inference to lay down the next stepping stone and advance.
  4. Tap a non-sequitur, fallacy, or confident nonsense and you hallucinate — lose a life.
  5. Reach the conclusion to complete the chain. Survive on three lives; chains get longer as you go.

Why Is This Hard?

The wrong answers are not random — they are plausible. They are the things a confident model says when it has run out of actual reasoning: affirming the consequent, smuggling in a fact nobody gave you, or just vibing toward a conclusion that feels right. Resisting them is the whole eval.

Slop Fact: "Let's think step by step" reliably boosts model accuracy, which is either a profound result about reasoning or proof that we built a machine that performs better when politely encouraged. Researchers remain too afraid to ask which.

Back to the Slop