AI llama.cpp adds Multi-Token Prediction and doubles Qwen3.6 27B throughput for local inference 5 min 4.7K views
ENTREPRENEURSHIP Qwen3.6 Heretic v2 shows the local AI community is now engineering refusal-free frontier models 4 min 1.5K views
AI 2.5x faster local inference on 48GB of VRAM is starting to make the case for replacing hosted APIs 5 min 356 views