Jun 3, 2026 · 10:48 PM
Subscribe
Home Ai

Factory AI brings cost routing to enterprise coding agents

Factory AI launched Factory Router in private research preview for its CLI and Desktop App. The company says it can cut coding-agent token spend by 20-25% while preserving most of Claude Opus 4.7's benchmark performance.

Julian Lim
· 5 min read · 338 views
Factory AI brings cost routing to enterprise coding agents

Factory AI is trying to make model routing the quiet control layer behind enterprise coding agents. The pitch is simple: keep near-frontier coding performance, but stop paying frontier prices for every task.

Factory AI launched Factory Router on June 1, and the timing matters. AI coding has moved past the fun demo stage. Engineering teams are now running long agent sessions, handing over migrations, reviews, bug fixes, and documentation work, then watching token bills become a new line item that behaves more like cloud infrastructure than a software subscription.

The new Router is in private research preview for Factory's CLI and Desktop App. Once enabled, it appears in the model picker and automatically sends each Droid session to the model Factory thinks is best suited to the work. That could mean a cheaper model for a simple mechanical task, a stronger frontier model for a difficult codebase change, or another provider path if an endpoint starts failing.

According to Factory's June 1 product note, Router cuts token spend by 20-25% while preserving most of Claude Opus 4.7's benchmark performance. On Terminal-Bench 2, Factory says it reached 99% of Opus 4.7's pass rate at 20% lower cost per session. On Legacy-Bench, it reached 96% of the pass rate at 25% lower cost. The company also says cost per successful run came in at 80.5% of Opus on Terminal-Bench 2 and 78.0% on Legacy-Bench.

Those numbers are vendor-published, so they should be read with the usual care. But the direction is hard to ignore. The expensive part of AI coding is no longer only the model. It is the habit of sending every repo scan, small refactor, and speculative agent loop through the strongest model available because nobody wants to be blamed for choosing the cheaper option.

Cloud teams already learned this lesson. When infrastructure becomes elastic, usage grows first and financial discipline comes later. AI coding is following the same path, only faster. A single developer can now launch multiple agent sessions, each one reading files, calling tools, retrying steps, and carrying context across a long run. That can be productive, but it also turns software work into a variable compute bill.

This is why routing is becoming more than a convenience feature. It is starting to look like a FinOps layer for engineering teams. A good router does not merely pick the cheapest model. It decides when cost savings are harmless and when they become expensive because the weaker model fails, retries, or sends the task back to a human.

Factory is trying to position Router at exactly that boundary. Its own explanation of the cost curve is useful: savings are easy when cheaper models take work they can already handle, then much harder once the router starts assigning difficult work to models that cannot finish it. Factory says more aggressive routing would drop Terminal-Bench 2 performance to 81% of Opus at 56% of the cost, and Legacy-Bench to 49% at 30% of the cost.

That is the real enterprise question. Nobody wants a dashboard that saves money by silently lowering quality. The useful version is a system that knows when not to save.

Factory is turning cost control into a product wedge

Factory is not entering a quiet market. Cursor, GitHub Copilot, Claude Code, OpenAI Codex, Cognition, and a long list of smaller coding-agent tools are all fighting for the same engineering workflow. The first wave of competition was about capability: who could write better code, understand larger repos, and finish more tasks without help. The next wave is about governance, reliability, and cost.

That shift plays into Factory's enterprise story. TechCrunch reported in April that Factory raised $150 million at a $1.5 billion valuation, with customers including Morgan Stanley, Ernst & Young, and Palo Alto Networks. Those are the kinds of buyers that care about more than an impressive coding demo. They need policy controls, throughput commitments, model availability, and a way to explain to finance why AI spend is rising.

Router adds several pieces aimed at that buyer. Factory says it supports provider failover, dedicated tokens-per-minute capacity for enterprise customers, access to frontier models as they come online, US-hosted open-source model options, standard model policy, routing rules, and admin guidance. In plain terms, it gives engineering leaders a way to say which work should be eligible for automatic routing and which areas need stricter control.

OpenRouter's recent $113 million Series B, along with its claim that weekly volume grew from 5 trillion to 25 trillion tokens in six months, shows the broader market is already treating model routing as infrastructure. Factory's version is narrower, focused on software engineering sessions rather than general AI traffic. That focus could help, because coding work has clearer success signals than many enterprise AI tasks. Code either compiles, tests pass, migrations complete, and pull requests survive review, or they do not.

The risk is that routing becomes invisible until something goes wrong. If an agent fails because it was sent to the wrong model, the savings will not feel like savings. Factory will need to prove that its controls are understandable enough for admins and trusted enough for developers who already have strong model preferences.

The bigger point is that AI coding platforms are being judged by a new standard. It is no longer enough to show that an agent can do the work. The winning tools will have to show that they can do the work repeatedly, inside enterprise constraints, without turning every engineering team into its own token-budget committee. Factory Router is an early sign that the cost layer is becoming part of the product, and that may matter as much as the model leaderboard.

Also read: Microsoft Scout brings OpenClaw’s agent model into the officeMicrosoft is making AI behavior testing easier for developersMercor shows AI tokens are becoming the new salary line

TOPICS
Julian Lim is an entrepreneur, technology writer, and a researcher. He started JL Data Analysis after graduating from NUS in Intelligent Systems. Julian writes about technology innovations and entrepreneurship on Business Times, Asia Pacific Magazine and occasionally contributes to Startup Fortune.
Related Articles
More posts →
Loading next article…
You're all caught up