Nvidia pushes deeper into enterprise AI with Nemotron 3 Ultra

Nvidia is turning its AI advantage into a fuller enterprise stack. Nemotron 3 Ultra shows the company wants developers building agents on its models, not only buying its chips.

Nvidia has spent the past few years becoming the company everyone else needs to run AI. With Nemotron 3 Ultra, it is making a more direct claim on what gets built on top of that infrastructure.

The new open model, announced around GTC Taipei on June 1, sits at the high end of Nvidia's Nemotron 3 family and is aimed squarely at long-running agents. These are not simple chatbots. They are systems that plan, call tools, inspect files, write code, remember context and keep working across a chain of tasks. That is where enterprise AI is moving, and Nvidia clearly wants to be more than the GPU supplier behind someone else's model.

According to Nvidia's announcement, Nemotron 3 Ultra is a 550-billion-parameter mixture-of-experts model built for coding, research and enterprise workflows, with up to 5x faster inference and up to 30% lower cost than open frontier models in its class. The company says the model is expected to be available on June 4 through Hugging Face, ModelScope, OpenRouter and build.nvidia.com as Nvidia NIM microservices, along with cloud and inference partners.

That availability matters. A model announcement is useful, but enterprise adoption begins when developers can test it inside the places they already work. Nvidia is trying to shorten that path by placing Ultra near the agent frameworks and deployment channels that engineering teams are already considering.

The real story is not only that Nvidia has another large model. The important part is how tightly it fits into the rest of the company's software push.

Nemotron 3 Ultra is post-trained for agent platforms and harnesses including Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands and OpenCode. In plain English, Nvidia is not just publishing weights and hoping developers figure out the rest. It is working closer to the orchestration layer where models become useful agents, with memory, tool use, routing, privacy rules and security controls.

That is why the launch came alongside a broader enterprise agent toolkit. Nvidia also highlighted NemoClaw blueprints, OpenShell secure runtime technology and CUDA-X libraries that can be exposed as skills for agents. Companies including Cadence, Dassault Systèmes, Siemens and Synopsys were named around autonomous engineering workflows, while CrowdStrike and Palantir are using Nemotron models for cybersecurity and operational decision-making systems.

This is a familiar Nvidia pattern. The company does not usually stop at the component. It builds the surrounding stack, then gives developers reasons to stay inside it. CUDA did that for accelerated computing. The question now is whether Nemotron, NIM and the agent tooling can do something similar for enterprise AI software.

Open models are becoming enterprise infrastructure

Nemotron 3 Ultra also arrives in a market where open models have become serious business infrastructure. Meta's Llama models helped normalize the idea that companies could build on open-weight systems rather than send every workload to a closed API. Mistral has pushed the same argument from Europe, with efficient models that appeal to teams worried about cost, control and regulatory exposure.

Nvidia's angle is different. It is not trying to look like a pure model lab. Its advantage is that the model can be tuned around the hardware, libraries and deployment services it already controls. Nemotron 3 uses a hybrid Mamba-Transformer mixture-of-experts architecture, and Nvidia has previously described the family as supporting long context and agentic reasoning with open data, recipes and training infrastructure where it has redistribution rights.

For buyers, that creates a practical tradeoff. Llama and Mistral may remain attractive because they are broadly supported and independent of one hardware vendor's commercial gravity. Nvidia can argue that enterprises running on its GPUs should use models optimized for that environment, especially when inference cost becomes one of the biggest line items in agent deployment.

This is where pricing leverage enters the picture. If a company runs Nvidia GPUs, deploys through NIM, uses Nvidia agent blueprints and chooses Nemotron models, Nvidia gets a deeper relationship than a one-time hardware sale. The customer gets integration. Nvidia gets a software layer that can defend margins even as cloud providers and model labs fight over API pricing.

There is still plenty to prove. Nvidia's performance and cost claims will matter more once developers can compare Ultra in real workloads against Llama, Mistral, Qwen and other strong open models. Enterprises will also need to understand licensing terms, hosting requirements, safety behavior and how well the model handles domain-specific work after tuning.

Even so, the direction is clear. Nvidia is moving from being the company underneath the AI boom to being one of the companies defining the software layer above it. Nemotron 3 Ultra is not just a model release. It is a signal that the next fight in enterprise AI will be about who owns the stack where agents actually get work done.

Also read: TSMC investors in Taiwan are catching up with Wall Street on AI • Intel is preparing a new AI chip to challenge Nvidia this year • Nvidia uses Computex to make the AI PC fight much harder