Orthrus makes local AI inference economics look worth rechecking
Orthrus claims lossless parallel decoding for Qwen3-based models by adding a trainable diffusion view while keeping the backbone frozen. The real test is whether its reported speedups survive production serving in frameworks such as vLLM and SGLang.