llama.cpp MTP leak fix stabilizes local AI agents
A VRAM leak in llama.cpp's Multi-Token Prediction stack could crash servers after repeated sleep cycles. A fix merged on May 21 now ensures that speculative decoding resources are properly freed, making self-hosted coding agents more reliable for extended use.