Reproducibility and replicability are core tenets of modern science, but can a study’s findings be replicated, or are the results merely a one-off?
Replicable studies are necessary for credible and reliable research. But in recent years, a lack of reproducibility across a range of scientific disciplines has raised questions about the validity of certain findings. Some commentators have called the phenomenon a “replication crisis.”
Oded Netzer, the Arthur J. Samberg Professor of Business at Columbia Business School, and a team of researchers wanted to examine the reproducibility of social science through a linguistic lens. “The language we use is often a signal of our intentions, personality, identity, and state of mind,” says Netzer. “We wondered: Can the language researchers use when they write a scientific paper tell us about the replicability of that work?”
Key takeaways:
- The language used in academic papers can predict the research’s replicability, even after controlling for various characteristics of the authors, the paper, and study design.
- Replicable studies often use detailed, complex, and confident language, which aligns with markers of truthful communication.
- Nonreplicable studies tend to use vague language and exhibit markers of persuasion, such as “positive words” and clout-focused terms.
- Language analysis could be a useful tool in assessing the credibility of scientific studies, supporting the goals of the open science movement.
How the research was done: The researchers examined 299 papers, all of which had previously been subjected to replication studies conducted as part of large-scale replication projects in psychology and economics. The success or failure of these replication attempts served as the ground truth for Netzer’s study.
The texts of the 299 papers were run through a statistical model powered by machine learning to pinpoint linguistic patterns. The analysis relied on representation learning models, as well as tools like the Linguistic Inquiry and Word Count dictionaries, which are a set of 92 predefined categories of words that reflect various psychological constructs, cognitive processes, and linguistic dimensions. “We targeted features like whether the paper uses more numeric and quantifables words. Does it use a more positive tone? Does it use more or less emotional words?” explains Netzer. Another focus area was obfuscation and readability — what level of reading comprehension is needed to understand the text? The analysis also examined the narrative arc of the texts. “We wanted to know whether these papers tell a story,” says Netzer.
The researchers implemented a strict set of controls to ensure findings weren’t influenced by other factors. “We collected comprehensive metadata, or control variables that are not strictly the text,” says Netzer. “This includes things like the keywords used in the paper, the area within psychology or economics it is focused on, the types of subjects studied, and if the research was conducted in the United States or elsewhere.” The researchers also identified control variables like the number of figures and tables included, sources cited, and authors’ level of seniority, among other metrics.
What the researchers found: Even taking the control variables into account, the study found that the language used in academic papers was a significant predictor of replicability.
In replicable papers, the language tended to be more complex, detailed, and confident sounding. There were more quantifying and comparative terms, auxiliary verbs, and interrogative words — who, what, when, why, where — indicating a deeper engagement with the data. “The replicable papers had a lot of the markers we find in linguistic work around truth-telling,” says Netzer. “They did a better job at contextualizing the results.”
In contrast, nonreplicable papers used more positive words, future tense, and vague language. They also tended to contain more phrases associated with clout — like we, us, group, and social.
Additionally, nonreplicable studies relied more on abstraction, readability, and skilled storytelling. “Readability was higher and these papers tended to have more of a narrative arc,” says Netzer. He hypothesizes that this is due to the nature of how papers are typically evaluated: “If the science is weak but the writing is good, it may still pass the bar.”
An important caveat, Netzer notes, is that the findings don’t imply that authors of nonreplicable studies are operating with ill intent. “It’s not that the authors are lying or know they’re wrong. But they may have a hunch, or they may simply be less confident in their results,” he explains. That lack of confidence shows up in the paper’s writing style.
Why it matters: Amid a larger “attack on science,” the replicability crisis adds fuel to the fire, undermining public trust in scientific findings.
As such, this research has significant implications for the scientific and academic communities. By highlighting the connection between language and replicability, it provides a new tool for assessing credibility. Netzer also hopes that his team’s work can contribute to the mounting efforts of the open science movement, which aims to increase transparency and reproducibility in research.
He also believes the general public can apply these findings to their critical thinking process when, for instance, listening to science podcasts or reading about a new “breakthrough” on social media. "As we’re listening to or reading science, we should pay attention to signals like, does the research provide sufficient details and elaboration, or does it report the results in an overly positive manner?” says Netzer. “Assessing these things can help us figure out if we can trust the research."
Adapted from “The Language of (Non)replicable Social Science” by Oded Netzer from Columbia Business School, Michal Herzenstein from the University of Delaware, Sanjana Rosario from Columbia Business School, and Shin Oblander from the University of British Columbia.