Nature Just Retracted the Most-Cited Study on ChatGPT in Education and 262 Papers Built on Its Conclusions Are Now Citing Discredited Evidence

Nature's Humanities and Social Sciences Communications journal has retracted a May 2025 meta-analysis by Jin Wang and Wenxiang Fan claiming ChatGPT produces a large positive effect on student learning performance, learning perception, and higher-order thinking, citing "discrepancies in the meta-analysis" that "ultimately undermine the confidence the Editor can place in the validity of the analysis and resulting conclusions," with the authors having not responded to correspondence, leaving 262 citing papers, approximately half a million reader accesses, and an Altmetric score of 767 built on findings that the publisher has now formally disavowed.

The scale of the paper's circulation before retraction is the detail that matters most. A Humanities and Social Sciences Communications paper with an Altmetric of 767 within twelve months of publication is not a technical research finding circulating in specialist academic channels. It is a finding that education journalists covered, that education influencers cited on LinkedIn, that edtech sales teams included in procurement decks, and that policy advisers referenced when making arguments to school districts and university administrators about whether to adopt AI learning tools. The paper's central claim, an overall effect size of 0.867 for ChatGPT's impact on student learning performance from a meta-analysis of 35 studies, was not a modest or hedged conclusion. It was a large, confident, easily quotable number that answered precisely the question every institutional buyer of edtech AI wanted answered before signing a contract. That is why it circulated at half a million accesses. It was commercially useful to a large number of people who were motivated to believe it.

The methodological concerns the editors cited without elaborating publicly are described by researchers who examined the paper at publication as spanning several categories. The most fundamental is structural: a meta-analysis of ChatGPT's educational effects published in May 2025 was necessarily built on primary studies published between 2022 and 2024, covering a technology that had been publicly available for approximately two years and that was changing materially with every model release during that period. The studies being pooled were measuring different versions of ChatGPT, in different educational contexts, with different task designs, different comparison conditions, and different outcome measures. Meta-analysis methodology requires that the studies being pooled are measuring sufficiently similar constructs that combining their effect sizes produces a meaningful aggregate. Education researcher Ben Williamson, whose LinkedIn post reached thousands of readers this week, described the paper as "a meta-analysis of studies of a technology that was only 2 years old, feeding on junk science and then amplifying those findings." The primary studies themselves, many from under-resourced research environments with small samples, short intervention periods, and outcome measures that could not be independently verified, were not the robust foundation that a credible meta-analysis requires. The retraction notice's language about "discrepancies" suggests the editors found specific numerical inconsistencies in the pooled analysis rather than simply disagreeing with the interpretive conclusions, which is a more serious finding than methodological disagreement.

The edtech AI sales cycle is where this retraction has its most immediate commercial consequence. Enterprise and institutional procurement of AI learning tools typically moves through several stages: vendor demonstration, reference case review, pilot study design, academic literature review, and contract negotiation. The literature review stage is where a paper like Wang and Fan's does its most damage, because procurement teams that lack deep research methodology expertise are not equipped to evaluate effect size inflation, sample pooling appropriateness, or the structural problem of meta-analysing a rapidly evolving technology with two years of primary literature. They are equipped to ask whether a peer-reviewed study published in a Nature journal supports the vendor's claims. When the answer was yes, many institutional buyers treated that as sufficient validation. It was not, and now the paper that provided it no longer exists as a credible source.

The 262 papers that cite Wang and Fan now cite a retracted study, but retraction does not automatically propagate through the academic literature. Citing papers are not required to issue corrections, and many of those papers have been published in journals that will not audit their reference lists for retracted sources. More practically, the sales decks and procurement documents that cited the study directly or cited secondary sources that relied on it are not going to be updated. An edtech company that included "research published in Nature shows large positive effects on student learning" in its materials in 2025 may still have those materials in circulation, with nothing in the market to alert the procurement team reading them that the underlying paper has been withdrawn. The fragility of academic evidence as a foundation for product marketing is not unique to AI or to education. It is a structural feature of the relationship between commercial product development timelines and research validation timelines. Products are sold now. Research takes years to replicate, validate, and synthesise into conclusions robust enough to bear commercial weight. The gap between those timelines is routinely filled with preliminary studies, single-institution pilots, and meta-analyses of immature literature, dressed in journal imprimatur and cited as though they represent settled evidence.

What procurement teams at school districts, universities, and enterprise learning and development functions should actually demand from AI learning tool vendors is more specific than "show me a published study." The useful questions are: how large were the studies you are citing, how long did the interventions run, what was the comparison condition, were the outcome measures standardised or vendor-designed, and has the finding been independently replicated by a team with no commercial relationship to the product. A randomised controlled trial with several hundred students, a semester-long intervention, standardised test outcomes measured by a party independent of the vendor, and replication by at least one independent research group is the evidence standard that justifies changing how students are educated. None of the studies pooled in Wang and Fan's meta-analysis met that standard individually, which is precisely why the meta-analysis was required to pool them, and why the pooled result was fragile enough to collapse under editorial scrutiny. The retraction does not mean ChatGPT has no educational value. It means the evidence base for confident claims about its educational value is substantially thinner than the circulating literature suggested, and that the question remains genuinely open rather than settled.

","excerpt":"Nature's Humanities and Social Sciences Communications journal retracted a May 2025 meta-analysis claiming ChatGPT produces an effect size of 0.867 on student learning outcomes, citing undisclosed "discrepancies" that undermine the validity of the analysis, with the two authors not responding to editorial correspondence.

Also read: Strive Has Accumulated 15,000 Bitcoin in Under Five Months and the Capital Structure Behind That Pace Is the Most Interesting Part of the Story • Rain Is Building the Layer That Makes Stablecoins Disappear Into Normal Payments and That Is Exactly the Point • WLFI and Justin Sun Are Now Suing Each Other and the Documents Are More Revealing Than Either Side Intended