Skymizer crams 700B LLMs onto one low-power PCIe card

Taiwan's Skymizer is making a bold claim for AI inference: ultra-large models can run on a single low-power PCIe card, without the cost and complexity of GPU clusters.

Skymizer Taiwan has previewed HTX301, the first reference chip for its HyperThought platform, with a pitch aimed directly at one of enterprise AI's most expensive problems. According to the company announcement distributed by PR Newswire, a single PCIe card using six HTX301 chips and 384 GB of memory can run inference for models with up to 700 billion parameters at about 240 watts. If that holds up in real deployments, it would give companies a new path to running large models on-premise without building around Nvidia GPU clusters, NVLink-style interconnects, or heavy cooling infrastructure.

The design is built around a practical distinction that matters more as AI workloads move from demos to daily use. Large language model inference has two phases: prefill, where the prompt is processed, and decode, where the model generates tokens one by one. GPUs can handle both, but decode-heavy workloads often run into memory bandwidth limits. Skymizer says HyperThought addresses that by pairing decode-first silicon with software orchestration, including a KV-cache manager, phase-aware scheduler, and dynamic placement engine. Its LISA v3 instruction set is also designed for transformer workloads, with support for multimodal models that process text, images, and agent-style tasks.

That is why the power and memory claims matter. Ultra-large models usually force companies into expensive infrastructure decisions before they even know whether the application will justify the cost. Skymizer says HyperThought can scale from one chip to six chips on a card, with memory configurations from 32 GB to 384 GB, supporting models from 4 billion to 700 billion parameters. The same IP can also be packaged for SoCs, which points to a broader market than data centers: edge servers, AI workstations, smart storage systems, vehicles, robots, and other environments where privacy, latency, or data control make cloud inference less attractive.

The economics are the real story. Cloud inference is easy to start but can become a recurring tax as usage grows, especially for internal copilots, code assistants, RTL generators, and customer-facing agents that run all day. On-premise inference shifts that burden toward hardware, power, maintenance, and utilization. Skymizer is arguing that a purpose-built card can make that tradeoff more attractive by reducing the amount of infrastructure needed to serve large models privately. It is not the same as making inference free, but it could make the cost curve easier to manage for companies that already know they need steady, secure AI capacity.

Taiwan Sharpens Its Inference Edge

Taiwan already sits near the center of the AI hardware supply chain, and Skymizer is leaning into a different part of that advantage. Founded in 2013, the company built its reputation around compiler technology before moving deeper into AI silicon and accelerator IP. That background matters because inference performance is not just about raw compute. It depends on how software schedules work, moves memory, preserves cache state, and keeps the chip busy under real traffic. HyperThought is therefore being sold less as a chip alone and more as a hardware and software stack for enterprises that want tighter control over AI deployment.

Skymizer still has to prove the claims outside its own announcement. Nvidia remains the dominant force in AI acceleration, and rivals from AMD, Intel's Gaudi line, and custom ASIC vendors are all chasing parts of the same inference market. The question is whether HyperThought can deliver strong enough throughput, latency, software support, and developer adoption to move from an impressive architecture to a product buyers trust. COMPUTEX 2026 should give the industry a clearer look at the roadmap. For startups and enterprise teams watching inference costs rise, the takeaway is simple: the next phase of AI infrastructure will not be judged only by who trains the biggest model, but by who can run useful models privately, efficiently, and repeatedly.

Also read: Trump White House kills Republican state AI bills and splits its own party • Sequoia and Nvidia back David Silver's Ineffable Intelligence at $5.1 billion • Cohere and Aleph Alpha merge to challenge US AI giants