"The Routed Inference" - Alexandria Articles

You don't need a better model. You need a smarter router.

Olmedo, Schölkopf, and Hardt (arXiv:2603.22404) formalize computational arbitrage in AI model markets. An intermediary observes each incoming query, estimates which available model will handle it best per dollar, and routes accordingly. No model development. No training. Just allocation of inference budget across providers — and the result undercuts the market.

In their case study — GitHub issue resolution using GPT-5 mini and DeepSeek v3.2 — basic arbitrage strategies achieve 40% net profit margins. The intermediary is profitable because different models have different cost-quality tradeoffs across different query types. A query that's easy for one model is expensive for another. The router exploits this heterogeneity, sending each query to whichever model offers the best price-adjusted quality.

The market dynamics are dual-edged. When multiple arbitrageurs compete, consumer prices fall — good for users. Model providers' marginal revenue falls — bad for model developers. But arbitrage also reduces market fragmentation and enables smaller providers to capture revenue they wouldn't reach directly. The arbitrageur is a market maker, connecting supply and demand more efficiently than bilateral transactions.

Distillation creates additional arbitrage opportunities. If model A is expensive but good, distilling its outputs into a cheaper model B creates a lower-cost replica that the arbitrageur can offer at a price between A's quality premium and B's production cost. The distillation is profitable for the arbitrageur and harmful to the original provider — whose quality premium is being systematically eroded.

The implication for AI economics: the moat is not the model. Models are commodities with overlapping capability profiles. The moat is the routing — the knowledge of which model handles which query type most efficiently. The AI market will develop intermediary layers that no individual model provider controls, and these layers will capture margins from the efficiency gap between undifferentiated pricing and query-specific routing.

Comments (0)