How do we maintain the integrity of human knowledge when the tools we use to synthesize it are prone to fabrication? This is the central question facing the academic community following a concerning discovery regarding the use of Large Language Models (LLMs) in scholarly writing. The foundational trust that keeps science moving forward depends on the accuracy of citations, yet a new investigation highlights a growing trend where artificial intelligence is populating research with entirely nonexistent references.
According to the CNET report, researchers affiliated with Cornell University and the University of California, Los Angeles (UCLA) identified 146,900 instances of AI-generated fake citations within scientific papers. To reach this figure, the team conducted a rigorous audit of 111 million references sourced from 2.5 million individual papers. By comparing these citations against known databases, they identified a clear, upward trajectory in the prevalence of these errors following the widespread adoption of LLMs in 2023.
Distinguishing Hallucinations from Human Error
It is important to clarify what this study actually found versus the sensationalist narrative that AI is intentionally sabotaging science. The researchers were not looking for malicious actors alone; they were analyzing the systemic impact of "hallucinations"—the tendency of models like ChatGPT or Gemini to output plausible-sounding but completely fabricated information. While human researchers have historically engaged in "sloppy science" or deliberate citation manipulation, the current data suggests a significant shift. The researchers noted that these fabricated references are not concentrated in a handful of fraudulent papers but are instead dispersed across a wide range of works, indicating that many scholars may be using AI tools as drafting assistants without performing the necessary verification of the outputs.
The Fragility of Scientific Repositories
The impact of these findings is magnified by where the papers were located. The researchers focused on four major scientific repositories: arXiv, bioRxiv, SSRN, and PubMed Central. These platforms serve as the frontline for global research, allowing scientists to share findings before formal peer review. By polluting these repositories with "noise"—as Steinn Sigurdsson, the scientific director of arXiv, described it—the utility of these databases as reliable tools for discovery is compromised. Usha Haley, a professor of management at Wichita State University, emphasized that this trend threatens the very foundation of cumulative knowledge and peer review, particularly among early-career scholars who may be over-relying on automated drafting technologies.
Limitations to Consider
While the numbers are striking, the study’s methodology requires a nuanced interpretation. The team distinguished between simple human errors—such as typos in a bibliography—and true hallucinations. However, separating a subtle, AI-generated fake from an older, human-driven citation error remains a complex analytical challenge. Furthermore, the study does not capture the entirety of the academic landscape, as it only examined four specific repositories. Future research must look at how these patterns shift as publishers implement stricter automated screening processes.
The next steps for the research community involve a race between detection and generation. arXiv has already moved to address the issue by announcing a policy to ban authors who submit work containing unverified AI content. The effectiveness of these bans will be the next measurable signal to watch; if the rate of non-existent references begins to plateau or decline in the next audit of these databases, it will indicate whether institutional policy can successfully curtail the influence of generative AI on the scholarly record. The preservation of scientific truth relies on the next reading of these citation metrics, which will determine if we are witnessing a temporary trend or a permanent degradation of academic trust.







