MIT researchers tested 41 AI models across 11,000 real workplace tasks and found that most outputs barely meet minimum standards, scoring well below superior on complicated work.
If you have been lying awake worrying that a large language model is about to take your job, new research from the Massachusetts Institute of Technology offers a heavy dose of reassurance. The headline finding from a sweeping new study is that artificial intelligence, despite billions of dollars in corporate investment and breathless media coverage, currently operates at roughly the level of a disinterested intern. It can clear the lowest possible bar for acceptable work in many scenarios, but the moment you need precision, creativity, or multi-step reasoning, it struggles to deliver.
The MIT team put 41 different language models, including well-known systems from OpenAI, Google, and Anthropic, through their paces on more than 11,000 text-based tasks drawn from official Labor Department job descriptions. Human evaluators with actual professional experience in those fields then graded the outputs on a one-to-nine scale. A score of seven was defined as "minimally sufficient," meaning the work product was usable without any human edits. As things stand today, AI models hit that seven roughly 65 percent of the time across all tasks. That sounds reasonable until you look at the upper bound. The probability of an AI model achieving a nine, defined as superior quality, never exceeded 50 percent regardless of how much time the system was given. When a task demanded multiple steps or nuanced judgment, the models were more likely to fail than succeed.
The data paints a clear picture of where the technology excels and where it stumbles. Routine, text-heavy tasks associated with fields like construction administration and maintenance logistics were handled with relative ease. Highly skilled roles in legal services and information technology told a different story, with noticeably lower success rates. This tracks with what we have seen play out publicly over the past year. Deloitte had to answer for two separate government reports, one in Australia and another in Canada, that were riddled with fabricated information generated by AI. CNET and Sports Illustrated both faced backlash after quietly publishing AI-generated articles under invented bylines. A law firm in New York was forced to apologize in court after fake AI-generated citations made their way into a bankruptcy filing. The technology is not just occasionally inaccurate, it can be confidently wrong in ways that create real legal and reputational liability.
The Business Reality Behind the Hype
What the MIT data makes clear is that the current deployment strategy for most businesses should be focused on augmentation rather than wholesale replacement. The study aligns with broader workforce trends documented by the World Economic Forum and McKinsey Global Institute, which have consistently found that AI adoption is most effective when it targets repetitive, predictable tasks rather than complex decision-making. Companies chasing full automation of knowledge work are finding the teething pains are far more expensive than expected. The cost of error correction, fact-checking, and reputational damage can easily erase the labor savings gained from removing human workers. According to Fortune's coverage of the MIT findings, success rates at the analyzed tasks are improving by up to 11 percentage points annually as models become more capable. The researchers estimate that by 2029, AI could handle 80 to 95 percent of text-based tasks at a minimally sufficient level. That timeline suggests the technology is on a clear upward trajectory, but it is moving at a pace that demands patience rather than panic.
For startup founders and enterprise leaders making capital allocation decisions right now, the practical takeaway is straightforward. AI should be integrated into workflows where it can handle grunt work, draft initial documents, and accelerate data processing, all under strict human supervision. The competitive advantage in 2025 does not belong to the companies that try to replace their workforce entirely, but rather to those that figure out how to make their human workers significantly more productive by pairing them with capable but imperfect machines. Watch for incremental gains in multimodal capabilities and reasoning over the next 12 to 18 months as the real bellwether for when the economics of partial automation start to shift dramatically.