Mixture of Experts

Tap a lane to route the front request Tap a request first to route out of order

Served 0

Combo x1

Drops 0/10

Incoming traffic

Boot the router

Eight experts, one router, infinite traffic. Send the math to the math nerd and the vibes to the poet. Do not melt a datacenter.

How Mixture of Experts Works

You are the gating network. A real Mixture of Experts model doesn't run every parameter on every token — it routes each input to a handful of specialist sub-networks. You are doing that by hand, badly, under load, while the datacenter overheats.

Typed requests stream into the incoming queue — math, code, poems, vision, and more, each with its own icon and color.
Send each one to its matching expert lane: tap a lane to route the front request, or tap a specific request first to send it out of order.
Every expert has finite throughput. Dump too much on one lane and its buffer backs up, glows red, and starts dropping tokens.
Mis-route to the wrong specialist and you eat a penalty — the math nerd does not write your sonnets.
Experts occasionally go OFFLINE (preemption happens). Route around them until they spin back up.

How To Survive

Keep the queue short, spread the load so no lane melts, and chain correct routes for a fat combo multiplier. The run ends when your drop rate melts the cluster. Throughput is the only metric your manager remembers.

Slop Fact: Sparse MoE models can have a trillion-plus parameters but only activate a sliver per token, which is how labs claim "more parameters" and "same compute" in the same breath without anyone laughing. You are the load balancer they swore they'd automate next quarter.