Few-Shot

Tap label an example Test grade the model

How Few-Shot Works

You are the prompt engineer. There is a hidden concept, and a tiny model that wants to learn it from your examples. Your job is to label the fewest items you can while still getting the model to generalize perfectly to data it has never seen.

A grid of items appears. Each has features (shape, color, count, etc.).
Tap an item once to label it positive (it fits the concept). Tap again for negative. Tap a third time to un-label.
After each label, the model re-fits its best guess of the rule and reports accuracy on a hidden test set.
Hit Test the model when you think it has generalized. Reach 100% to clear the concept.

Why fewer is better

Every redundant example is wasted compute and a token you'll never get back. A single well-chosen counterexample can collapse the whole hypothesis space. This is the entire art of in-context learning, minus the API bill.

Slop Fact: "Few-shot" originally meant "we couldn't afford to fine-tune, so we stuffed three examples into the prompt and called it emergent behavior." It worked, somehow, which is the scariest part of the whole field.