Major AI Labs Have Agreed to Give the US Government Early Model Access and the Arrangement Is Already Reshaping Who Controls Frontier AI Release Cadence

OpenAI, Google DeepMind, Anthropic, and several other frontier AI laboratories have agreed to provide US government agencies with early access to their models before public release for safety and national security evaluation, marking the point at which pre-release government review transitions from a voluntary commitment made under the Biden executive order on AI to an operating assumption embedded in how the most powerful AI companies in the world manage their release process, with implications that extend well beyond the labs directly involved.

The arrangement Bloomberg describes differs from the prior executive order commitments in a structurally important way. The Biden administration's October 2023 executive order required AI companies developing models above a specified compute threshold to report safety test results to the federal government and share information about red-teaming outcomes. That requirement was framed primarily as information sharing after internal evaluation, not as a pre-release access window that allows government evaluators to conduct their own independent testing before the model becomes available to the public. The new agreement creates an earlier intervention point: government evaluators receive model access before the release date, allowing independent assessment of capabilities, risks, and national security implications that is not dependent on the lab's own characterisation of what its model can do. The difference is the same as the difference between a pharmaceutical company submitting its own clinical trial data and an FDA inspector having independent access to test the drug before it ships. The underlying dynamic shifts from voluntary disclosure to structured external review, even if the current arrangement remains technically voluntary rather than legally mandated.

Which federal body conducts the evaluations is the detail that most determines how consequential this arrangement becomes in practice. The US AI Safety Institute, housed within the National Institute of Standards and Technology, has been the primary government entity building technical evaluation capacity for frontier AI models, and it conducted pre-deployment evaluations of GPT-4o and Claude 3 Opus under the framework established by the prior administration. The current Trump administration has maintained NIST's AI Safety Institute while reorienting its emphasis from safety toward competitiveness and national security considerations, which reflects a different risk prioritisation but not a complete discontinuation of the evaluation function. The arrangement Bloomberg reports appears to involve both the AI Safety Institute's technical evaluation capacity and broader national security review by agencies including the Department of Defense and the intelligence community, which is the dimension that most distinguishes it from the prior executive order framework. A model evaluation that includes national security agencies assessing offensive capability risks, dual-use potential, and adversarial applications is a qualitatively different process than a safety institute running standardised benchmarks and red-team protocols.

The incumbency advantage this creates is real and worth naming directly rather than treating as a secondary consideration. Running a pre-release government evaluation requires a lab to have legal and policy teams capable of managing government access protocols, information security processes that meet federal requirements for handling classified or sensitive evaluation results, engineering resources to provide evaluation environments that give government testers meaningful access without exposing production infrastructure or training code, and the institutional relationships that make the evaluation process operate smoothly rather than generating friction that delays release on unpredictable timelines. OpenAI, Google DeepMind, and Anthropic have invested substantially in Washington policy presence, have cleared personnel who can interact with national security reviewers, and have built the internal processes needed to execute this kind of arrangement. A smaller lab releasing a model that approaches frontier capability does not have those resources and cannot quickly acquire them. The practical effect is that the labs inside the agreement can navigate pre-release government review as a manageable operational step, while labs outside the agreement face a choice between building equivalent infrastructure at significant cost, releasing without government review and accepting the reputational and regulatory risk that implies, or simply not releasing models above the capability threshold where the review expectation applies.

The open-source dimension creates a specific tension that the current arrangement does not resolve. Meta's Llama series, Mistral's open-weight releases, and the growing ecosystem of open-source frontier models operate on a release cadence and transparency model that is structurally different from the closed-model API approach of the companies inside the government review agreement. An open-source model that is released publicly by definition cannot be subject to a meaningful pre-release evaluation window, because the weights are available immediately to anyone who downloads them. The government can evaluate the model after release, but the early-access advantage that the arrangement provides for assessing novel capability risks is absent. This creates a regulatory asymmetry that the current voluntary framework leaves unresolved: closed-model labs participate in pre-release review and gain the trust signal and institutional relationship that comes with it, while open-source releases are evaluated after the fact if at all, which may create downstream pressure on open-source release norms if a future open-source model is associated with a high-profile misuse incident that a pre-release evaluation might have flagged.

For startups building on frontier APIs, the pre-release government evaluation arrangement creates a downstream effect that will be visible in release timelines rather than in any direct regulatory requirement. If government evaluation adds two to four weeks to a frontier model release cycle, and if evaluation findings occasionally result in capability modifications or delayed release while a specific risk is mitigated, the API dependencies that startups build on top of frontier models become slightly less predictable. A startup that has built a product feature on a specific model capability and has communicated a launch date based on an expected model release may find that the government evaluation process introduces uncertainty that the lab cannot fully disclose publicly because the evaluation findings involve sensitive national security considerations. That timeline uncertainty is manageable if it is anticipated and built into product development planning, but startups that assume frontier model releases will arrive on the schedule implied by leaked benchmarks and conference announcements will periodically be surprised by evaluation-related delays that the labs cannot explain in detail. The practical mitigation is to build on model capabilities that are already released and evaluated rather than racing to launch on day one of a new model release, which is good product engineering practice regardless of the government evaluation dynamic but becomes more important as the review process formalises.

The arrangement also creates a new trust signal in enterprise and government procurement that will widen the competitive distance between frontier labs inside the review process and those outside it. An enterprise buyer in a regulated industry, a bank, a healthcare system, a defence contractor, evaluating which AI model to deploy in production workflows will increasingly treat government pre-release evaluation as a credibility indicator that reduces their own due diligence burden. A model that has been evaluated by federal agencies and released without modification signals a risk profile that the buyer can reference in their own compliance documentation. A model that has not been through that process requires the buyer to conduct equivalent evaluation independently or accept the compliance gap. As government-reviewed models become the default expectation in procurement decisions for sensitive use cases, the labs inside the evaluation arrangement gain a sales advantage in regulated markets that compounds over time and that smaller competitors cannot easily overcome without equivalent government relationships.

Also read: Coinbase Is Cutting 14 Percent of Its Staff and AI Is Now Both the Explanation and the Strategy That Makes the Cuts Credible • Google Chrome Quietly Installed a 4 GB AI Model on User Devices and the Backlash Is About Platform Power More Than Storage Space • A Reddit Post About €800 in Unexpected Anthropic Charges Raises Real Questions About How AI Billing Systems Handle Gift Flows and Fraud Disputes