Two Truths, One Model

How Two Truths, One Model Works

Every round, a model serves you three statements about AI, tech, science, and history. Two are real. One is a confident hallucination — a sentence delivered with the unshakeable certainty of a system that has never once experienced doubt. Your job is to find the slop.

  1. Read the three statements (tap a card, or press 1, 2, 3)
  2. Tap the one you think the model made up
  3. Beat the timer — faster picks bank a bigger speed bonus
  4. String correct picks together to grow your combo multiplier
  5. You have three lives. Wrong picks and timeouts cost one each

Why Is This Hard?

Hallucinations don't sound wrong. They're fluent, specific, and oddly satisfying — the model fills a gap in its latent space with the most plausible-sounding token sequence and commits with full confidence. The fabrication often reads more cleanly than the truth, which is exactly why it fools you.

Slop Fact: A language model has no internal "I'm not sure" flag bolted to its outputs — confidence and correctness are decoupled. It will assert that bananas are lethal and that the Eiffel Tower was built for Barcelona in precisely the same tone it uses for things that are true. You are the alignment layer now.

Back to the Slop