Jun 29, 2026 · 12:57 PM
Subscribe
Home Ai

Coinbase halved its AI bill without restricting engineers and the playbook is worth stealing

Brian Armstrong cut Coinbase's AI bill by nearly 50% without capping usage, defaulting engineers to Chinese open-weight models like GLM 5.2 at $1.40 per million tokens versus Anthropic Opus at $5, while lifting cache hit rates from 5% to 60%. The playbook is replicable , but the regulatory question around Chinese-origin models in a federally registered financial firm is one Armstrong's X thread quietly skipped.

Janet Harrison
· 4 min read · 4 views

Brian Armstrong cut Coinbase's AI spend by nearly 50% while token usage kept climbing, by defaulting to Chinese open-weight models at a fraction of the cost of Anthropic Opus , a move that works technically but raises questions Washington hasn't finished asking.

The standard corporate response to a runaway AI bill is friction: usage caps, approval gates, spend alerts, a manager who emails you when you've hit 80% of your quota. Coinbase tried a different approach, and the numbers are hard to argue with. Brian Armstrong published the company's five-lever playbook on X last week, and the headline figure is a nearly 50% cut in AI spend against growing token consumption , without limiting what any engineer could do.

The core move is model routing through an internal LLM gateway. Coinbase now defaults its engineers to Zhipu AI's GLM 5.2, priced at roughly $1.40 per million tokens, and Moonshot AI's Kimi K2.7 Code. Compare that to Anthropic's Opus at $5 per million tokens, and you start to see where the savings come from. Engineers can still choose whatever model they want; the defaults just changed. And here's the kicker Armstrong shared: 91% of Coinbase's engineers never hit the old usage caps in the first place. The problem was never overuse. It was expensive defaults sitting where cheap ones could do the same job.

Caching did as much heavy lifting as model selection. Coinbase's internal LibreChat deployment had a cache hit rate of 5% before the optimization push. After it: 60%. That's not a marginal gain , that's eliminating the majority of redundant API calls before they reach a model at all. Armstrong's third lever was context discipline: engineers are now expected to start fresh sessions when switching tasks, narrow the scope of files passed in, and close unused tools. Lean context windows cost less to process, and the cumulative effect across a large engineering org is real money.

The fourth and fifth levers , task-based model selection and spend visibility tied to impact , are the harder ones to operationalize but arguably the most durable. Matching the cheapest capable model to each specific task requires someone to actually categorize the tasks first, which is more systems work than it looks. Spend visibility only changes behavior if the people generating the spend can see what they're producing with it, which means tying cost data to output quality rather than just output volume.

GLM 5.2 is made by Zhipu AI, a Beijing company. Kimi K2.7 Code is from Moonshot AI, also Chinese. Both are open-weight models, which means Coinbase runs them on its own servers rather than routing data to any Chinese endpoint , and that matters legally. Self-hosting eliminates the API data-routing risk that the US Department of Homeland Security flagged explicitly in its guidance on Chinese AI: China's National Intelligence Law of 2017 can compel Chinese firms to hand over data from US persons or businesses on government demand. If you're not hitting their API, that risk doesn't apply to you the same way.

But the regulatory picture is still unsettled. As TechTimes reported this week, US House lawmakers opened a formal inquiry in May 2026 into cybersecurity risks from Chinese AI models in critical infrastructure, naming Zhipu AI specifically alongside DeepSeek, MiniMax, and ByteDance. Coinbase is a federally registered money services business that handles financial data for millions of Americans. Whether that puts it inside the blast radius of whatever that inquiry produces is a question Armstrong conspicuously didn't address in his X thread , and it's not a hypothetical given Washington's escalating export control posture.

Armstrong did offer one forward-looking claim worth tracking: he expects 80% of AI workloads will shift to models that are 99% cheaper than today's frontier within 12 to 18 months, with the top-tier models retained only for things like scientific research. That's a specific enough prediction to hold him to. If it's right, the cost structure of running AI at scale changes entirely, and the companies that built routing infrastructure now rather than later will have a real advantage over the ones still paying Opus rates for autocomplete.

The practical takeaway for any startup or mid-size company running meaningful AI workloads is that the Coinbase playbook doesn't require Coinbase's engineering headcount to copy. A gateway with sensible defaults, aggressive caching, and task-matched routing is buildable for almost any team. The regulatory piece is more situational: a consumer fintech handling sensitive financial data sits in a different risk posture than a SaaS tool writing marketing copy. Know which one you are before you default to GLM.

Also read: China is turning humanoid robots into an industrial assembly line and the rest of the world is watchingBlackstone is using a Singapore REIT to cash in on its $16 billion AirTrunk bet before the AI infrastructure wave peaksHexaware's Anthropic reseller deal reveals how frontier AI will actually reach enterprise clients

TOPICS
Janet Harrison has over 16 years experience in the financial services industry giving her a vast understanding of how news affects the financial markets, and an early adopter of blockchain technology and digital currencies. Janet is an active holder and trader spending the majority of her time analyzing blockchain projects, reports and watching new and upcoming projects and other initiatives in the industry. She has a Masters Degree in Economics with previous roles counting Investment Banking.
Related Articles
More posts →
Loading next article…
You're all caught up