Amazon's AI push shows how bad metrics can distort good tools

Amazon wanted faster AI adoption. The warning for every company is that workers quickly learn to optimize whatever management decides to count.

Amazon's internal AI push has moved from productivity story to management lesson after employees reportedly began using an internal agent called MeshClaw for low-value work to lift their usage numbers. That is not a small footnote in the AI race. It is what happens when a company turns adoption into a scoreboard before it has fully answered what useful adoption looks like.

According to a fresh report from the Financial Times, Amazon has been rolling out MeshClaw in recent weeks as an internal agent that can connect to workplace software and act on behalf of employees, including tasks such as deployment, email triage and Slack interaction. The company is targeting more than 80 per cent of developers using AI each week, and it has tracked AI token consumption on internal leaderboards. Some employees have described a behavior they call tokenmaxxing, where AI activity itself becomes the thing to maximize.

Amazon management reportedly says the numbers will not be used in performance reviews. That may be true in the formal sense. But anyone who has worked inside a large organization knows the gap between official policy and practical pressure. If teams can see rankings, leaders can see rankings and workers believe low usage may make them look behind the curve, the metric starts behaving like an instruction.

The problem is not MeshClaw itself. Agentic tools can be genuinely useful when they remove friction from repetitive work, help engineers navigate large codebases or let teams test ideas faster. Amazon also has obvious reasons to push hard. It sells AI infrastructure through AWS, competes with Microsoft and Google in cloud AI, and needs its own workforce to operate like customers are expected to operate.

The problem is that token consumption is a poor stand-in for productivity. Tokens measure how much text an AI model processes. They do not measure whether a bug was fixed cleanly, whether a customer problem was solved, whether a system became easier to maintain or whether an engineer made a better decision. A developer can burn through huge volumes of tokens asking an agent to summarize unnecessary threads, rewrite harmless notes or run tasks that did not need automation in the first place.

This is an old mistake in a new format. Companies once learned that counting lines of code could reward bloated software. Counting meetings can reward busyness. Counting AI tokens can reward the appearance of modern work while hiding whether the work mattered. The more visible the leaderboard, the stronger the incentive to perform for it.

There is also a cost issue. AI usage is not free, especially when agents run across long contexts, chain tasks together and repeatedly call large models. A company such as Amazon can absorb far more compute waste than most startups, but the operating lesson is sharper for smaller companies. If a startup copies the same behavior without discipline, it may end up paying for noise while congratulating itself on cultural transformation.

Meta Shows The Same Tension

Amazon is not alone. Meta has faced its own version of the same debate after reports of an internal AI token leaderboard known as Claudeonomics, which ranked heavy users and turned AI consumption into a visible status signal. That kind of system can push employees to experiment, and experimentation matters. But it can also make the question too simple: who used the most AI, rather than who used it well.

The better comparison is not between companies that use AI and companies that do not. That debate is already over. The real divide is between companies that treat AI as a capability and companies that treat AI usage as a trophy. Meta, Amazon, Box, Indeed and other tech firms are all trying to get employees to move faster with generative tools, but the most durable approaches will connect AI activity to delivery, quality, cost control and customer impact.

That means a product team should care whether AI shortened the time from issue discovery to release. An engineering leader should care whether generated code survives review and does not create rework two weeks later. A support organization should care whether AI improves resolution time without annoying customers. These are harder things to measure than tokens, but they are closer to the business.

For founders, the lesson is practical. If you want people to adopt AI, make the desired behavior specific. Ask teams to show where AI reduced cycle time, improved documentation, helped test coverage or removed repetitive operational work. Give examples of useful workflows. Set guardrails for agents with broad permissions. Watch spend. And be careful with public rankings, because employees will always notice what leadership chooses to display.

Amazon's MeshClaw episode may fade quickly as an internal workplace story, but the larger point will not. AI adoption is now a management system, not just a software rollout. The companies that get the most from it will not be the ones that generate the largest dashboards. They will be the ones that reward useful work, keep incentives honest and know the difference between automation and activity.

Also read: Artificial Analysis shows coding agents are more than model scores • Unitree turns a mecha demo into a robotics startup test • Thinking Machines is betting AI will win by listening better