Origin Lab is betting that video games will become a serious data source for AI systems that need to understand worlds, not just words.
The next fight in AI training data may not be over books, websites, or YouTube clips. It may be over the material sitting inside game engines: camera paths, player inputs, depth maps, physics states, objects, scenes, and the structured information that makes virtual worlds measurable.
Origin Lab has raised $8 million to help video game companies license that kind of data to builders of AI world models, according to a report from TechCrunch. The San Francisco startup is positioning itself as a broker and infrastructure layer between rights holders who own valuable interactive content and AI labs that need cleaner, richer data than scraped web video can provide.
That sounds narrow at first. It is not. World models are one of the most competitive areas in AI because they promise systems that can reason about environments, movement, space, and consequences. Large language models learned from text. Video models learned from flat media. A model built to operate in robotics, simulation, gaming, autonomous systems, or 3D design needs something deeper. It needs to know what happened, where it happened, what caused it, and what changed next.
Origin Lab says it captures and delivers rights-cleared multimodal content for AI training, spanning video game play capture, 3D environments, TV and film, and animation. Its company materials describe catalogs enriched with queryable metadata, structured annotations, and clear licensing. The important phrase is not just rights-cleared. It is captured at the source.
Video games are useful because they are already structured worlds. A game engine knows where the camera is, what the player did, which objects are present, how physics behaved, and how the environment responded. A Twitch clip or gameplay video shows the surface. The engine data explains the scene underneath.
That difference matters for world-model builders. Origin Lab job postings describe synchronized streams of video, depth, telemetry, and input captured directly from inside the engine. They also refer to camera telemetry, depth buffers, world actors, deterministic game state resets, Unreal, Unity, and proprietary engines. This is not the same as downloading a gameplay video and tagging it afterward. It is closer to turning a game world into a controlled data factory.
For AI labs, that could be valuable because the open web is messy, legally contested, and thin on the kind of causal information that spatial systems need. A video can show a car turning. Engine-level data can show the camera angle, the road geometry, the objects nearby, the control input, the timing, and the result. That is a different training signal.
There is also a business reason this is happening now. AI companies have already made licensing deals with media owners, publishers, stock image libraries, and data providers. Game companies have watched that market develop while sitting on interactive worlds that may be even more useful for the next generation of models. If Origin Lab can package those worlds into usable datasets, studios may get a new revenue stream without simply handing their intellectual property to model developers for free.
The rights question becomes the product
Origin Lab is trying to make provenance part of the value. Its LinkedIn profile says every asset is rights-cleared and never scraped, and that the company was founded by a team from Twitch, Amazon, AI, and the video game industry. Colin Carrier, listed as a co-founder, has written that world models need structured, interactive, multimodal data from environments where actions produce consequences and states change over time.
That is also the moat the company appears to be building. Anyone can argue that games are rich training environments. Fewer companies can persuade rights holders to license the material, build the capture tools, preserve the metadata, and deliver it in a form AI researchers can actually use. The legal wrapper and the technical wrapper have to work together.
This is where Origin Lab will need to prove the model. Game studios are careful with assets, source access, anti-cheat systems, player data, and brand control. AI labs, meanwhile, want scale, consistency, and pricing that makes sense compared with synthetic data, open datasets, simulation platforms, and internal capture pipelines. Origin Lab sits in the middle of those expectations.
The timing helps. World Labs raised $1 billion in February with backing from Autodesk, Nvidia, AMD, Fidelity, and others to bring world models into 3D workflows. AMI Labs, co-founded by Yann LeCun, raised $1.03 billion in March to build world models. Runway has also been pushing beyond video generation toward models that better understand physical environments. When that much capital moves into a category, the supporting data market usually follows.
Origin Lab is still small, and an $8 million round does not make it the default marketplace for game data. But it does show where the pressure is moving. AI companies need training data that is cleaner, more structured, and easier to defend. Rights holders want to participate in the upside instead of watching their work become raw material for someone else's model.
The practical question now is whether licensed game data becomes a real market or stays a specialized input for a handful of frontier labs. If world models keep moving from research demos into products for games, robotics, film, architecture, and simulation, the demand for structured virtual-world data will not stay theoretical for long. Origin Lab is making a simple bet: the next valuable AI dataset is not just what the internet has already shown us, but what interactive worlds can measure from the inside.
Also read: Fractile raises $220 million as inference chips become the next AI fight • Adaption launches AutoScientist to make model training more adaptive • TextGen turns local AI into a desktop product developers can trust