According to Ars Technica, some researchers in Germany have been investigating ways in which to determine whether papers have been AI-edited, and, of course, words are key to all these processes. Using a corpus of 14 million PubMed abstracts from 2010-2024, the researchers identified word frequency changes, finding for instance that the use of “Ebola” spiked in 2015, “Zika” in 2017, and “Coronavirus” in 2020-22, consistent with outbreaks of these diseases and corresponding with the increase in the numbers of PubMed papers mentioning them. The authors note that, previously, most “excess vocabulary” (that is, words that appear in excess of expectations, given their usage in previous years) consisted of nouns. The “excess vocabulary” found in these PubMed abstracts, however, since the introduction of ChatGPT and similar large language models, also include an increase in “style words,” with 66% being verbs and 16% being adjectives.
According to the authors,
“In this paper, we leveraged excess word usage as a data-driven, principled method to show how LLMs have affected scientific writing. We found that the effect was unprecedented in quality and quantity: hundreds of words have abruptly increased their frequency after ChatGPT became available. In contrast to previous shifts in word popularity, the 2023–24 excess words were not content-related nouns, but rather style-affecting verbs and adjectives that ChatGPT-like LLMs prefer.
“The following examples from three real 2023 abstracts illustrate this ChatGPT-style flowery language:
- By meticulously delving into the intricate web connecting […] and […], this comprehensive chapter takes a deep dive into their involvement as significant risk factors for […].
- A comprehensive grasp of the intricate interplay between […] and […] is pivotal for effective therapeutic strategies.
- Initially, we delve into the intricacies of […], accentuating its indispensability in cellular physiology, the enzymatic labyrinth governing its flux, and the pivotal […] mechanisms.
Our analysis of the excess frequency of such LLM-preferred style words suggests that at least 10% of 2024 PubMed abstracts were processed with LLMs. With ∼1.5 million papers being currently indexed in PubMed per year, this means that LLMs assist in writing at least 150 thousand papers per year. This estimate is based on our emerging lists of LLM marker words that showed large excess usage in 2024, which strongly suggests these words are preferred by LLMs like ChatGPT that became popular by that time. Importantly, this is only a lower bound: abstracts not using any of the LLM marker words are not included in our estimates, so the true fraction of LLM-processed abstracts is likely much higher.”
There are other recent similar studies, though not nearly as extensive, such as this one by Edward J. Ciaccio, which used an automated tool (GPTZero) to detect “chatbot catchphrases and buzzwords” (or, as Ciaccio delightfully puts it, written “in the peculiar dialect of GPT”) within the text of a selected set of scientific papers, which appear to include various telltale phrases like “leverages the strengths,” “unveiled a wealth of,” and “embodies a rich tapestry.”
Since ChatGPT was trained on massive amounts of English language publications from various sources, it is not too surprising that some of the words often used primarily for emphasis and effect appear, even in scientific papers and their abstracts. As a former lit major, I tend to use several of the words identified here as “flowery language” (“intricate” and “significant” are favorites), but “delve” and “leverages” aren’t among them, so I can say with some likelihood that I am a human and not a LLM. (I also ran my last few paragraphs here through GPTZero, which reported 95% confidence in the probability that I am a human.) In my opinion, an even better quick test of being a LLM would be the use of “erstwhile.” No human being today would say that.
Moreover, I can see the wry humor in this last sentence of the researchers’ discussion: “We hope that future work will meticulously delve into tracking LLM usage more accurately and assess which policy changes are crucial to tackle the intricate challenges posed by the rise of LLMs in scientific publishing” (which I believe that current AI implementations, however sophisticated, can’t.)
At least not yet.