Cursor makes Composer 2.5 a cheaper rival for coding agents

Cursor is trying to make the economics of AI coding look different, not just the benchmarks.

Cursor has released Composer 2.5, and the most important point is not that it can sit near the top of several coding tests. It is that Cursor is putting a capable coding model inside its own development environment at a price that makes heavy agent use feel less like an experiment and more like an operating cost.

The model became available on May 18, 2026, which keeps this firmly in the current news cycle. Cursor says Composer 2.5 is a substantial step up from Composer 2, especially on long-running work, complex instructions and the kind of sustained tool use that matters when an agent is changing real code rather than answering a one-off prompt.

That distinction matters. Developers do not judge these systems only by whether they can solve a benchmark. They judge them by whether the model can stay on task, remember the purpose of a change, use the terminal without wandering off and avoid wasting expensive tokens while it tries to recover from its own mistakes.

Composer 2.5 is priced at $0.50 per million input tokens and $2.50 per million output tokens in its standard version. Cursor also offers a faster default version at $3.00 per million input tokens and $15.00 per million output tokens. Those numbers are low enough to change how teams think about when to use an agent and when to keep a task manual.

As The Decoder recently noted, Composer 2.5 scored 79.8 percent on SWE-Bench Multilingual and 63.2 percent on CursorBench v3.1, putting it close to models such as Claude Opus 4.7 and GPT-5.5 on several software engineering measures. OfficeChai also reported a 69.3 percent score on Terminal-Bench 2.0, almost level with Opus 4.7 at 69.4 percent, while GPT-5.5 remained well ahead on that specific test.

That is a useful reality check. Composer 2.5 is not a clean sweep over every frontier model. The more practical claim is that it is close enough on important coding tasks while costing far less per token and, according to Cursor's own effort curves, less than one dollar per task on CursorBench v3.1.

For a startup, that matters. A single impressive demo does not pay the cloud bill. A model that can help engineers across thousands of edits, reviews, refactors and test runs at a lower unit cost can become part of the workflow in a way that a premium model may not.

Cursor is becoming less dependent on other labs

Composer 2.5 is built on Moonshot AI's open-source Kimi K2.5 checkpoint, the same base used for Composer 2. Cursor says the new model was improved through more difficult reinforcement learning environments, targeted textual feedback and 25 times more synthetic tasks than its predecessor.

The targeted feedback point is more than a technical footnote. Long coding sessions can run through hundreds of thousands of tokens, and a single final reward is a blunt way to teach a model what went wrong. Cursor's approach gives the model more localized signals, such as how to correct a bad tool call or a confusing response at the moment it happened.

That is the kind of work that turns a general model into a product model. The base intelligence comes from Kimi K2.5, but the value Cursor wants to own is how the model behaves inside Cursor: when it edits files, when it calls tools, when it explains a plan and when it decides how much effort a task deserves.

There is also a larger signal behind the launch. Cursor says it is working with SpaceXAI on a significantly larger model trained from scratch with 10 times more total compute, using Colossus 2's million H100-equivalent capacity. If that effort lands, Cursor will be moving from model customizer toward something closer to a full-stack AI coding lab.

That would put pressure on the current AI coding market. Anthropic, OpenAI, Google and other model providers still have broad advantages, but software development is a narrow enough domain for specialization to matter. A model that knows the product surface deeply can sometimes be more useful than a stronger general model that sits outside the workflow.

The next question is whether developers feel the difference outside charts. Early adoption will depend on latency, reliability, code quality and whether Composer 2.5 can handle messy production repositories without creating more review work than it saves. Benchmarks open the door, but daily use decides whether teams keep the model in their default setup.

For now, Cursor has made a clear move. It is not only competing on intelligence. It is competing on the cost of letting AI agents work all day, and that may be the more important battle for the next phase of software development.

Also read: Torvalds warns AI bug reports are flooding Linux maintainers • llama.cpp adds Multi-Token Prediction and doubles Qwen3.6 27B throughput for local inference • Gemini 3.2 Flash pushes Google deeper into elite math territory