Mixture of Experts
Tap a lane to route the front request
Tap a request first to route out of order
Served
0
Combo
x1
Drops
0/10
Incoming traffic
How Mixture of Experts Works
You are the gating network. A real Mixture of Experts model doesn't run every parameter on every token — it routes each input to a handful of specialist sub-networks. You are doing that by hand, badly, under load, while the datacenter overheats.
- Typed requests stream into the incoming queue — math, code, poems, vision, and more, each with its own icon and color.
- Send each one to its matching expert lane: tap a lane to route the front request, or tap a specific request first to send it out of order.
- Every expert has finite throughput. Dump too much on one lane and its buffer backs up, glows red, and starts dropping tokens.
- Mis-route to the wrong specialist and you eat a penalty — the math nerd does not write your sonnets.
- Experts occasionally go OFFLINE (preemption happens). Route around them until they spin back up.
How To Survive
Keep the queue short, spread the load so no lane melts, and chain correct routes for a fat combo multiplier. The run ends when your drop rate melts the cluster. Throughput is the only metric your manager remembers.
Slop Fact: Sparse MoE models can have a trillion-plus parameters but only activate a sliver per token, which is how labs claim "more parameters" and "same compute" in the same breath without anyone laughing. You are the load balancer they swore they'd automate next quarter.