【AI前沿】Research repository ArXiv will ban authors for a year if they let AI do all the work
ArXiv, a widely used open repository for preprint research, is doing more to crack down on the careless use of large language models in scientific papers.Although papers are posted to the site before they are peer-reviewed, arXiv (pronounced “archive”) has become one of the main ways that research circulates in fields like computer science and math, and the site itself has becomea source of data on trends in scientific research.ArXiv has already taken steps to combat a growing number of low-quality, AI-generated papers, for example by requiring first-time posters toget an endorsement from an established author. And after being hosted by Cornell for more than 20 years, the organization is becoming an independent nonprofit, which should allow it toraise more money to address issues like AI slop.In its latest move, Thomas Dietterich — the chair of arXiv’s computer science section —postedThursdaythat “if a submission contains incontrovertible evidence that the authors did not check the results of LLM generation, this means we can’t trust anything in the paper.”That incontrovertible evidence could include things like “hallucinated references” and comments to or from the LLM, Dietterich said. If such evidence is found, a paper’s authors will face “a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted by a reputable peer-reviewed venue.”Note that this isn’t an outright prohibition on using LLMs, but rather an insistence that, as Dietterich put it, authors take “full responsibility” for the content, “irrespective of how the contents are generated.” So if researchers copy-paste “inappropriate language, plagiarized content, biased content, errors, mistakes, incorrect references, or misleading content” directly from an LLM, then they’re still responsible for it.Dietterichtold 404 Mediathat this will be a “one-strike” rule, but moderators must flag the issue and section chairs must confirm the evidence before imposing the penalty. Authors will also be able to appeal the decision.Recent peer-reviewed research has found thatfabricated citations are on the risein biomedical research, likely due to LLMs — though to be fair, scientists aren’t the only ones getting caughtusing citations that were made up by AI.TopicsAI,arxivWhen you purchase through links in our articles,we may earn a small commission. This doesn’t affect our editorial independence.Anthony HaAnthony Ha is TechCrunch’s weekend editor. Previously, he worked as a tech reporter at Adweek, a senior editor at VentureBeat, a local government reporter at the Hollister Free Lance, and vice president of content at a VC firm. He lives in New York City.You can contact or verify outreach from Anthony by [email protected] BioMay 27Athens, GreeceStrictlyVC Athens is up next. Hear unfiltered insights straight from Europe’s tech leaders and connect with the people shaping what’s ahead. Lock in your spot before it’s gone.REGISTER NOWMost PopularOpenAI launches ChatGPT for personal finance, will let you connect bank accountsIvan MehtaUS orders travelers on Air Force One to throw away gifts, pins, and burner phones after China tripLorenzo Franceschi-BicchieraiOpenAI is reportedly preparing legal action against Apple; it wouldn’t be the first partner to feel burnedConnie LoizosHow to turn off Instagram’s new Instants feature and retract photos you accidentally sharedAisha MalikMusk’s xAI is running nearly 50 gas turbines unchecked at its Mississippi data centerTim De ChantAI voice startup Vapi hits $500M valuation after winning Amazon Ring over 40 rivalsJagmeet SinghAmazon launches 30-minute delivery across the USSarah Perez