Jun 18, 2026 · 2:32 AM
Subscribe
Home Ai

Nvidia's ENPIRE lets AI coding agents train robots to install the GPUs that run AI

Nvidia's GEAR lab, with Carnegie Mellon and UC Berkeley, published ENPIRE this week: a framework in which AI coding agents autonomously write robot training code, test it on real hardware, and iterate until policies work. An eight-robot fleet hit 99% success on tasks including GPU installation with no human supervision, but token costs scale faster than fleet size, raising real questions about the economics of autonomous robotics at industrial scale.

Janet Harrison
· 6 min read · 370 views
Nvidia's ENPIRE lets AI coding agents train robots to install the GPUs that run AI

A paper from Nvidia, Carnegie Mellon, and UC Berkeley describes a robotics loop where coding agents write, test, and revise robot training code on real hardware, including a GPU installation task that reached a 99% pass@8 success rate.

There's a recursive quality to Nvidia's ENPIRE paper that should make you stop for a moment. Researchers at Nvidia's GEAR lab, working with Carnegie Mellon University and UC Berkeley, describe a system in which AI coding agents use real robots to improve robot training code. The agents write the code, run the trial, read the failure, change the code, and try again. No researcher steps in to steer the loop. One of the tasks was seating a GPU into a motherboard, which is a neat little image of where this industry is heading: AI helping train the robots that may one day install the hardware AI depends on.

According to the paper, ENPIRE stands for Environment, Policy Improvement, Rollout, and Evolution. The name is awkward, but the structure is clear. One part resets the physical scene. Another launches policy-improvement trials. Another evaluates the result across a fleet of robots. The final piece sends the evidence back to coding agents, including Codex, Claude Code, and Kimi Code, so they can inspect logs, consult prior research, and edit their own training scripts. The agents share work through Git, which is exactly the kind of unglamorous detail that makes the paper feel less like a demo and more like a real research workflow.

The headline result is the 99% pass@8 success rate on contact-heavy tasks using an eight-robot fleet of dual-arm YAM stations. The tasks included pin insertion, GPU installation, and cutting a zip tie with a cutter tool. These aren't forgiving jobs. A robot can miss a block by a few millimeters in a lab benchmark and still look impressive on video. A connector slot on a motherboard doesn't care about the video. If the alignment is wrong, the part doesn't seat.

That's why the GPU task is the one people will remember. Not because Nvidia has suddenly built a fully autonomous data center installer. It hasn't. The paper doesn't claim that. What it does show is narrower and more interesting: coding agents can run a physical robotics research loop long enough to find working training recipes without a human repeatedly rewriting the experiment. Frankly, that's the part worth paying attention to. The spectacle is the GPU. The real story is the research labor being automated around it.

The scaling results were useful, but they weren't magic. Moving from one agent to eight cut the time to solve the Push-T task from about five hours to about two, according to the researchers. Pin insertion fell from more than 90 minutes to roughly 40. If you're running a robotics lab, those numbers matter. A slow policy-development cycle eats days quickly, and every failed run costs robot time, compute time, and human patience.

The catch is the token bill. The paper measures this with Mean Robot Utilization and Mean Token Utilization, two dry terms that point to a very practical problem. As more agents join the fleet, each one spends more time reading logs, summarizing other branches, and coordinating with the work around it. Robot utilization per agent falls while token use rises. Faster convergence is still useful, but it isn't free.

This is where the investor-friendly version of the story needs a little discipline. ENPIRE doesn't prove that physical automation suddenly becomes cheap. It proves that some robotics research can become faster and more self-directed if you can afford the robots, the GPUs, and the agent traffic required to keep the loop running. Operators with deep compute budgets will see the attraction first. Smaller labs may find that a simpler setup gives them a better cost-per-policy, even if it takes longer on the wall clock.

The paper also keeps one old robotics problem firmly in view: simulation can flatter you. On the Push-T benchmark, all three coding agents solved the task in simulation. Two of the three failed when moved onto physical hardware. The researchers point to robot dynamics, friction, and object movement that the simulated environment didn't capture cleanly. Anyone who has watched a robot behave perfectly on screen and badly on a table will recognize the problem at once.

ENPIRE's answer is to work with the real robot from the beginning. That makes the experiments slower and messier than pure simulation, but it also makes the result harder to dismiss. A virtual policy that survives only in software is a promise. A policy that survives repeated contact with plastic, metal, cable ties, connector slots, and actual robot arms has done something more concrete.

The planned open-source release is another important piece, although Nvidia hasn't announced a release date. If the full codebase lands as described, the next test won't be whether ENPIRE works inside Nvidia's own lab setup. It will be whether other robotics groups can adapt it to different arms, different sensors, and less controlled tasks. Carnegie Mellon and UC Berkeley's names on the paper help, but reproducibility isn't proven by a logo line. It's proven when someone else makes the thing work.

Here's the thing: the paper's strongest claim isn't that robots can now build AI infrastructure. That's too broad. The stronger claim is that coding agents can take over a meaningful slice of the repetitive experimentation that slows physical AI research down. They can fail, inspect the failure, change the code, and try again while the researcher is somewhere else.

For anyone building physical AI systems, the question now is not whether this kind of loop will be tried. It will. The open question is whether the coordination cost falls fast enough to make it useful outside well-funded labs. Eight robot arms solving a task is impressive. Thousands of machines working through real factory variance is a different test entirely, and the ENPIRE paper doesn't answer it yet.

Also read: An Indonesian gold retailer's 115-place Fortune ranking jump tells you everything about where Southeast Asia's money is goingMicrosoft is running geopolitical arbitrage on AI and Washington has not yet decided what to do about itNuclear power is becoming the defining infrastructure bet behind the AI buildout

TOPICS
Janet Harrison has over 16 years experience in the financial services industry giving her a vast understanding of how news affects the financial markets, and an early adopter of blockchain technology and digital currencies. Janet is an active holder and trader spending the majority of her time analyzing blockchain projects, reports and watching new and upcoming projects and other initiatives in the industry. She has a Masters Degree in Economics with previous roles counting Investment Banking.
Related Articles
More posts →
Loading next article…
You're all caught up