IBM's Neel Sundaresan says most AI coding wastes frontier models on trivial tasks

IBM Automation and AI GM Neel Sundaresan, founding engineer of GitHub Copilot, told The New Stack that most AI coding is like taking your Ferrari to buy milk, deploying expensive frontier models on routine tasks where cheaper specialised models and better orchestration deliver better outcomes, as evidenced by IBM Bob's intelligent task routing already deployed to 80,000 IBM developers.

Sundaresan's critique is grounded in two decades of developer productivity research. Frontier models like Claude 3.5 Sonnet or GPT-4o excel at complex reasoning and novel code generation. They are overkill for 80 percent of coding tasks: boilerplate, refactoring, test generation, and documentation. Those tasks do not require 175 billion parameters and $15 per million tokens. A 7B Granite model or Mistral Nemo can handle them at 10 percent of the cost with 95 percent of the quality. The problem is not model capability. The problem is deployment architecture: tools that default to the most expensive model for every prompt, regardless of task complexity.

IBM Bob solves this through intelligent routing. The system analyses the task, selects the optimal model from Claude, Mistral, Granite, or fine-tuned specials, and adds human checkpoints for high-risk actions. Sundaresan reported 80,000 IBM developers using Bob daily, with measurable gains in delivery velocity. The key innovation is not the models themselves. It is the orchestration layer that matches task to capability, keeping costs down while maintaining quality. That approach extends to full development workflows: planning, coding, testing, deployment, and security review all routed to specialised agents rather than a single generalist model.

The r/technology post with 195 upvotes reflects a broader developer sentiment. Frontier model marketing emphasises raw intelligence, but enterprise buyers care about total cost of ownership. A tool that burns $10,000 monthly on token spend for routine code generation gets defunded when the VP of engineering sees the bill. Sundaresan argues the industry has reached the ceiling of model performance gains. The next leap comes from product design: context structuring, human-in-the-loop safeguards, and task-specific model selection. IBM's approach prioritises those elements over model size.

For SF founders, Sundaresan's framing exposes the distortion in AI coding ROI. Agentic stacks like Devin and Cursor sell the vision of autonomous engineering teams, but the economics favour specialised tools for 80 percent of the workload. Enterprise buyers will adopt expensive generalists for high-value tasks like system architecture and novel algorithm design. They will route routine work to cheaper alternatives. Startups that build the routing layer, model marketplace, or workflow orchestration platform capture the value between the Ferrari and the daily driver.

The investor question is whether AI coding companies are building durable workflow businesses or packaging compute-heavy demos. The durable businesses are those that solve the orchestration problem. Replit's agentic IDE routes tasks across models and adds verification steps. Sourcegraph's Cody integrates code search with generation and testing. Continue's open-source model supports local deployment with model switching. The compute-heavy demos, like Devin, generate impressive videos but face token cost scrutiny at scale. Investors who fund the former build category leaders. Those who fund the latter chase short-term hype.

Startups can position cheaper, narrower coding assistants by owning specific workflows. A tool that excels at test generation, with built-in coverage analysis and CI integration, beats a generalist on that task 90 percent of the time at 10 percent of the cost. Security code review agents that scan for OWASP Top 10 violations and suggest fixes have a clear enterprise path. Refactoring specialists that optimise legacy codebases for performance and maintainability address a $100 billion market. The generalist agent hype creates space for vertical specialists that integrate into existing developer workflows rather than replacing them.

Also read: Vibe coding is expanding the attack surface faster than any security team can monitor it • Mythos vulnerability scare forces Trump White House to revive pre-release AI safety testing • Tech layoffs are funding AI capex, and the labor market reset is creating startup opportunity on both sides