Skymizer's HTX301 fits a 700B parameter model on a single PCIe card and that changes the on-prem AI calculus

Taiwanese AI compiler and silicon company Skymizer announced the HTX301 ahead of Computex 2026, a single PCIe card packing six HTX301 chips and 384GB of memory that can run 700-billion-parameter model inference locally at roughly 240 watts, eliminating the GPU clusters, NVLink interconnects, and intensive cooling systems that previously made frontier-scale inference a hyperscaler-only proposition.

The architectural choice here is deliberate and worth understanding before reaching for the performance claims. Skymizer built the HTX301 as a decode-first chip, meaning it is optimised specifically for the memory-bandwidth-intensive token generation phase that dominates real-world inference latency, not the compute-dense prefill stage where Nvidia GPUs operate most efficiently. The split-phase logic means HTX301 cards are designed to work alongside existing GPU infrastructure for prefill, while handling decode where memory capacity and bandwidth matter most. That framing positions the card as an inference efficiency layer, not a general-purpose GPU replacement, which is a more credible claim than trying to beat Nvidia at everything simultaneously.

The underlying platform is HyperThought, a software and hardware co-design architecture Skymizer first introduced at Computex 2025. HyperThought runs on LISA, a proprietary language instruction set architecture optimised for transformer inference, and includes a unified software stack with a KV-cache manager, a phase-aware scheduler, and a dynamic placement engine that rebalances prefill and decode pools in real time as workloads shift. That software layer matters because raw silicon claims without a production-ready runtime are common in the challenger chip market and rarely survive contact with enterprise deployment requirements. Skymizer is presenting the HTX301 as a full-stack product, not just a chip specification sheet.

The target market spans enterprise inference appliances, AI workstations, edge servers, and smart NAS systems, with the company also citing use cases including transcription, translation, visual understanding, and multimodal AI at the edge. The 700B parameter capability at 240W per card is the headline number because it directly addresses the operational problem most on-premises AI buyers face. Running Llama 3 405B or a model of comparable scale today requires multiple H100 GPUs connected by high-speed fabric, demanding data center cooling, significant power headroom, and a capital expense that only large enterprises or well-funded startups can absorb. A single PCIe card that fits a standard server chassis and draws about the power of a high-end workstation GPU changes that equation significantly, if the performance holds at production scale.

For SF readers, the HTX301 announcement points to a structural shift in where AI inference economics are contested. Nvidia dominates cloud GPU training and prefill-heavy inference. The long tail of private inference, on-premises deployment, agentic workloads that run continuously, and startup experimentation with large open models is a different market. It is more price-sensitive, more constrained by power and space, and more interested in data sovereignty than hyperscaler customers who sign GPU cluster reservations years in advance. That is exactly the wedge Skymizer is targeting, and it is not alone. Groq's LPU architecture targets inference speed. Cerebras targets very large model capacity. Tenstorrent, SambaNova, and a growing list of others are all trying to own specific segments of the inference market that Nvidia's pricing and allocation model leaves underserved.

The Taiwan angle matters separately. TSMC manufactures the chips that power the AI industry, but Taiwan has historically stayed upstream of the branded accelerator market, supplying the silicon that becomes Nvidia H100s, AMD MI300Xs, and Apple M-series chips rather than competing on finished AI products. Skymizer's HTX301 represents a genuine push into branded accelerator territory from a Taiwanese AI company, timed to a moment when enterprise buyers are actively looking for alternatives. That combination of deep manufacturing knowledge, proximity to the supply chain, and software co-design expertise is not easy to replicate quickly. Whether the HTX301 delivers on its specifications at volume, and at what price, will determine whether Skymizer becomes a real challenger or a compelling prototype that never reaches enterprise procurement lists.

Pricing and commercial availability were not announced in the April preview, with the full product reveal positioned for Computex 2026 in late May. That timing makes sense. Computex is where enterprise hardware decisions get made for the following buying cycle, and Skymizer will be competing for attention alongside every major GPU and AI chip vendor in the industry. For founders building AI products that depend on inference costs, the HTX301 is worth watching not because it solves everything today, but because it signals how quickly memory-capacity-first inference hardware is maturing outside Nvidia's ecosystem.

Also read: Anthropic is putting Claude inside Office and that changes the enterprise AI fight • Principal raised $3.64 billion for AI data center development and the story is bigger than the number • Bumble is trying to move beyond the swipe before AI dating does it for them