DeepSeek has released DeepEP V2 and TileKernels, two open-source infrastructure tools that target the systems-level bottlenecks of training and serving massive AI models, putting fresh pressure on proprietary solutions from NVIDIA and hyperscalers.
DeepSeek built its reputation on releasing model weights. Today it's doing something arguably more consequential: open-sourcing the plumbing. The Chinese AI research organization dropped DeepEP V2 and TileKernels on April 23, a pair of low-level software tools aimed squarely at the engineering constraints that emerge when you're training trillion-parameter models across thousands of GPUs. This isn't a new model announcement. It's an infrastructure play, and the timing makes clear that DeepSeek sees systems-level optimization as the next competitive front in frontier AI.
DeepEP V2 is the second generation of DeepSeek's distributed communication library, built to address a specific and painful bottleneck in Mixture-of-Experts training. MoE architectures, which DeepSeek helped popularize through its V3 model release, don't activate all parameters simultaneously. Instead, a router dynamically selects which expert nodes to engage for any given input, which means GPUs must communicate rapidly and constantly across nodes to dispatch inputs and collect outputs. That all-to-all communication pattern is notoriously expensive. DeepEP V2 claims to cut latency specifically during the router dispatch and combine phases, which is where MoE training tends to stall at scale.
TileKernels takes a different approach to the same underlying goal. Rather than optimizing how GPUs talk to each other, it optimizes what each GPU actually does with its compute time. The release is a collection of CUDA kernels for matrix operations, but written to bypass standard libraries like NVIDIA's CUTLASS by targeting the tile-level architecture of Hopper and Blackwell GPUs directly. Assembly-level instruction tuning of this kind allows for non-standard tiling strategies that wring more floating-point throughput out of the heavy computation phases of transformer inference. It's the kind of work that typically happens inside NVIDIA or inside a hyperscaler's internal ML systems team, not in the open.
The open-source pressure on proprietary stacks
The significance here extends beyond DeepSeek's own training pipeline. By releasing both tools publicly, DeepSeek hands the broader open-source community the ability to train massive MoE models more efficiently on heterogeneous hardware clusters. That directly undermines one of the stickiest value propositions of NVIDIA's DGX Cloud and similar closed-stack inference runtimes: the argument that you need a vertically integrated, proprietary environment to get reliable performance at frontier scale. If the communication and compute layers are open, that argument gets harder to sustain.
This also reflects a broader shift in where the real engineering leverage in AI now lives. For most of the past few years, the race was about model architecture and data. The organizations that cracked attention mechanisms, scaling laws, and instruction tuning had the edge. That gap has narrowed considerably. What separates competitive frontier labs today is increasingly their ability to run efficiently at scale, and that's a software infrastructure problem as much as a research one. DeepSeek is betting that releasing these tools builds goodwill and ecosystem momentum while simultaneously demonstrating that its engineering bench is genuinely world-class.
For investors and operators watching the AI infrastructure market, the practical takeaway is worth sitting with. Every release like this chips away at the moat that premium closed-stack vendors have been defending. It won't happen overnight, and enterprise buyers will remain cautious about stitching together open components for mission-critical workloads. But the direction of travel is clear. Watch whether Western labs respond by accelerating their own open infrastructure contributions, or whether they retreat further into proprietary APIs as a defensive posture. DeepSeek just made the latter strategy a little more expensive to sustain.
Also read: Alibaba's international unit launches Accio Work as the agentic AI race moves from hype to operational infrastructure • Google revealing that three quarters of its new code is AI-generated marks a turning point for how software gets built • Alphabet-X just released Astra and the internet is calling it the moment AGI arrived