Calibrated
How sure are you? Report your confidence.
50% is the safe hedge: tiny reward, tiny risk.
How Calibrated Works
This is a calibration eval wearing a trivia costume. Anyone can guess. The skill we're measuring is whether you know how much you know. So every question is a two-part bet: first the answer, then your confidence in it.
- Read the statement or question and pick an answer
- Declare your confidence: 55%, 70%, 85%, or 99%
- Right + confident = big points. Wrong + confident = big penalty (yes, it goes negative)
- Hedge at low confidence when you're unsure — small reward, small risk
Why The Scoring Punishes Bravado
Points come from a proper scoring rule (a Brier-style payoff). Saying 99% and being right is glorious; saying 99% and being wrong is catastrophic. The math is rigged so that the only way to maximize your expected score is to report your true probability. Lying to yourself loses points. Just like in production.
The Verdict
At the end we compare your stated confidence to your actual accuracy. Say 90% but only land 60%? Overconfident — flagged for recall. Hedge everything and ace it anyway? Underconfident — grow a spine. Match them up? Well-calibrated, suspiciously so.
Slop Fact: Real language models are graded on exactly this. A "calibrated" model is one whose 70%-confidence answers are right about 70% of the time. Most models, like most humans, are wildly overconfident and require sedation (RLHF).