DeepInfra, a cloud inference platform that gives developers API access to open-weight models at aggressively low prices, has closed a $107 million Series B with Nvidia among the investors, bringing total funding to over $133 million for a startup that scaled its processing volume more than 8,000 times since its 2022 seed round and is now processing production AI workloads for cost-sensitive developers and enterprises who cannot justify paying hyperscaler margins for every inference call.
The round lands at a moment when the inference layer of the AI stack is moving from an afterthought into a strategic battleground. DeepInfra was founded by Nikola Borisov, Yessenzhar Kanapin, and Georgios Papoutsis, all former engineers at imo.im, a messaging app that served over 200 million monthly active users. Their background in high-throughput, latency-sensitive infrastructure at scale is not incidental to what they built. DeepInfra's core value proposition is simple to explain and hard to execute: take open-weight models from Google, Meta, Mistral, DeepSeek, and others, host them on owned hardware, and serve them through a clean API at prices that undercut OpenAI and Anthropic by a factor of five to ten. The $1 per million token rate they were advertising in 2023 against GPT-4 Turbo's $10 was a provocative opening position. The 8,000x processing volume growth since suggests the market responded.
Nvidia's participation in the round is the detail that elevates this beyond a standard infrastructure funding story. Nvidia has now invested directly in CoreWeave, Lambda Labs, Baseten, Cursor, Anthropic, and a portfolio of over 100 AI startups. The pattern is consistent enough to read as a deliberate strategy rather than opportunistic deal flow. Each investment deepens the dependency of a key customer or platform on Nvidia hardware, creates an information channel about how frontier AI workloads are being structured and what GPU configurations they require, and builds relationships with the companies most likely to anchor large GPU procurement decisions as they scale. DeepInfra's Series A in April 2025, led by Felicis and Georges Harik, explicitly cited additional Nvidia Blackwell GPU capacity as the primary use of funds. Nvidia investing in the Series B is, among other things, an investment in a company whose growth directly drives Blackwell procurement.
The ecosystem-versus-dependency-building question is worth taking seriously rather than treating as rhetorical. Nvidia's investment arm, NVentures, describes its mandate as supporting the AI ecosystem broadly. The practical effect of that mandate is that independent inference clouds increasingly have Nvidia as a shareholder while simultaneously depending on Nvidia for the hardware that makes their business possible. That is not necessarily a conflict, but it creates a structural dynamic where Nvidia has both financial and informational influence over the infrastructure layer that determines which open-weight models developers can affordably access. DeepInfra's founding logic, that owning hardware is more cost-effective than renting from cloud providers, only holds if the cost and availability of that hardware remains favourable. A shareholder with significant influence over GPU supply and pricing is not a neutral party in evaluating that assumption over time.
For founders choosing where to host their AI inference workloads, the practical comparison between DeepInfra and its competitors comes down to a specific set of tradeoffs. AWS Bedrock, Google Vertex, and Azure AI provide model inference with the security posture, compliance certifications, and enterprise support contracts that regulated industries require. They also carry the margin structures of hyperscaler cloud businesses. DeepInfra, Baseten, Together AI, and similar independent inference clouds offer lower per-token costs, faster model availability for newly released open-weight checkpoints, and API designs built specifically for developer workflows rather than enterprise procurement teams. The tradeoff is that these platforms carry more concentration risk, less contractual SLA protection, and, now, in some cases, a hardware supplier as an investor whose interests are not purely aligned with minimising your inference costs.
DeepInfra's growth trajectory is a data point about how the AI infrastructure market is actually stratifying. The hyperscalers are capturing enterprise procurement budgets through bundling, compliance infrastructure, and relationship-driven sales. Independent inference clouds are capturing developer and startup budgets through price, model variety, and API simplicity. The middle segment, established startups and scale-ups with real inference volumes but genuine cost sensitivity, is the territory that both sides are actively competing for, and it is the territory where DeepInfra's combination of owned hardware, Nvidia hardware partnerships, and aggressive pricing creates the most competitive position. The $107 million Series B gives DeepInfra the capital to expand GPU capacity, support more model families, and develop the enterprise features that move it into that middle segment without abandoning the developer-first positioning that generated its growth.
The deeper market signal in this round is not about DeepInfra specifically. It is that Nvidia is increasingly present at every layer of the AI stack simultaneously: chip designer, hardware supplier, cloud infrastructure investor, enterprise software partner, and now inference cloud shareholder. That vertical integration, achieved through investment rather than acquisition, gives Nvidia a form of market intelligence and influence that is structurally different from what a pure hardware company possesses. Founders building on inference infrastructure should understand that when Nvidia invests in their cloud provider, their GPU supplier has a financial interest in their product roadmap, their pricing decisions, and their hardware procurement choices. That is not a reason to avoid platforms Nvidia has invested in. It is a reason to understand the incentive structure of the infrastructure you are building on, because in the AI stack, hardware and capital are increasingly the same thing.
Also read: If You Downloaded Gemma 4 GGUFs at Launch, You Need to Redownload Them and the Reason Why Matters More Than the Fix Itself • Six Intelligence Agencies Just Told Enterprise Builders That Agentic AI Is a Live Security Risk and the Guidance Is More Specific Than Anyone Expected • Sierra Has $635 Million, $150 Million in ARR, and a Clear Theory of How to Own Enterprise AI Before the Incumbents Wake Up