AI token costs are forcing startups to rethink how they scale

AI is no longer just a product decision for founders. The token bill is becoming a real operating constraint, and the companies that treat it casually will feel it first.

The next big startup expense may not be headcount, cloud storage or paid acquisition. It may be the invisible meter running behind every AI feature, every coding agent and every customer support workflow that calls a large language model again and again until the monthly bill becomes hard to explain.

That is the signal coming out of the latest reporting on AI infrastructure costs. As TechCrunch reported this week, companies that rushed into generative AI are now trying to understand why falling per-token prices have not translated into lower bills. The reason is simple enough: usage is growing faster than prices are falling. Agentic tools make that worse because they do not just answer one question. They plan, search, retry, rewrite, call tools and keep context alive.

For founders, this changes the shape of the business. A feature that looks cheap in a demo can become expensive in production. A customer who appears profitable under a flat subscription plan can become loss-making once they start using AI heavily. A sales team can promise automation, while finance later discovers the gross margin was never built to handle the real cost of inference.

The industry is now treating token spend the way cloud spend was treated a decade ago. At first, the appeal was speed. Teams could ship without buying servers, negotiating long contracts or waiting on internal infrastructure. Then the bill arrived, and someone had to decide which workloads mattered and which ones were just expensive habits.

OpenAI CEO Sam Altman has helped put numbers around the shift. Axios reported this week that OpenAI's top token user is consuming about 100 billion tokens a month, compared with roughly 100,000 tokens a month for the company's top user more than six years ago. That is not a normal increase in software usage. It is a different operating model.

The problem is not that AI is useless. That would be easier. The problem is that AI is useful enough for teams to use it constantly, while the cost of that usage is still poorly measured. TechCrunch cited Jellyfish research showing that heavy AI users can be more productive, but may consume far more tokens to get there. Nicholas Arcolano of Jellyfish told TechCrunch that per-developer consumption rose about 18.6 times in nine months, driven largely by agentic features.

That is why the founder question is no longer whether the model works. It is whether the model works at the right price, for the right task, with the right controls. If an AI agent saves an employee two hours, the spend may be justified. If it burns through repeated calls to produce a mediocre answer that still needs human repair, the company has not automated work. It has rented an expensive assistant that needs supervision.

A New Tooling Market Is Forming

The response is already visible. Startups and larger vendors are building the missing layer between AI usage and financial control. Pay-i is focused on measuring and optimizing the cost and performance of generative AI investments. Faros AI, Jellyfish and Waydev are watching how developer agents affect engineering output and spend. Datadog, New Relic and Ramp are adding cost visibility around AI, cloud and usage workflows because the buyer is no longer only the developer.

The Linux Foundation also announced plans on June 3, 2026, to launch the Tokenomics Foundation, a standards effort intended to give companies a common way to measure AI infrastructure economics. That matters because a token is not the same thing across every model, vendor or workload. A cheap model can become expensive if it takes too many retries. A costly frontier model can still be worth it if it solves the task with fewer calls and fewer failures.

This is where model routing becomes important. Instead of sending every request to the most capable model, companies are beginning to route simple tasks to cheaper models and reserve frontier systems for work that genuinely needs them. Prompt caching, shorter context windows, tighter prompts and better output controls are moving from engineering tricks to margin protection. They are not glamorous, but neither was cloud cost tagging. Both become important once waste starts showing up in board materials.

Open-weight and self-hosted models will benefit from this pressure, but only for companies with enough scale and discipline. Running your own model can reduce marginal cost, yet it adds infrastructure, talent, maintenance and utilization risk. For many startups, the better first step is not buying GPUs. It is knowing which product flows are profitable after the model bill is included.

This is the practical lesson for founders. AI pricing cannot be an afterthought attached to a SaaS plan built for older software economics. If your product depends on tokens, then token usage belongs in product analytics, customer segmentation, pricing, finance and investor updates. The companies that understand cost per successful task will have a clearer path than those that only track monthly active users and hope the model providers keep getting cheaper.

The market will not punish every AI-heavy company. It will punish the ones that confuse adoption with value. The next phase of AI startups will be shaped by teams that can prove not just that their product works, but that every expensive call to a model earns its place.

Also read: AI is forcing wealth managers to prove what human advice is worth • AI debt is becoming a serious funding option for founders