Everyone knows that how you say something is often just as important as what you intend to convey. People intuitively understand that storytelling, verbal flow, and logical connection make for a more compelling argument, but researchers have long struggled to both concretize and quantify this insight. As computational technology has advanced, researchers in fields ranging from computer science to English to psychology to management science have realized that digital tools may offer previously unattainable information about how we communicate.
Olivier Toubia, the Glaubinger Professor of Business in the Marketing Division of Columbia Business School, wanted to determine whether the semantic structure of a piece of writing (the way meaning is organized and conveyed in language), regardless of the specific content, might reveal something about the writer’s thought processes and creativity.
For most of human history, analysis at the scale required to obtain meaningful and statistically reliable results would be impossible. Today, however, natural language processing (NLP) allows researchers to evaluate hundreds or thousands of discrete texts and draw connections that would be impossible to identify without technological assistance.
These techniques offer new and revealing perspectives on ancient questions. As Toubia puts it, “It’s fascinating to be able to use computational methods to quantify these very human-specific activities.”
Key takeaways from the research:
- A natural language processing analysis of 40,000 college application essays showed significant correlations between the semantic structure of the essay and future academic performance, even after more than 100 controls were applied.
- High semantic volume, or the size of the conceptual territory covered by the essay, correlated positively with future GPA.
- Low semantic speed, meaning the tendency to clearly link parts of an argument or piece of writing, also corresponded with higher future GPA.
- Further research on structures of writing may have significant business applications in human resources, product development, and other areas.
How the research was done: Toubia and his co-researcher began by identifying a unique data set: 40,000 college application essays and corresponding student information, including major, high school attended, socioeconomic background, and eventual GPA. Though Toubia and his colleague wrote code to sift, sort, and analyze the data, for reasons of privacy and research ethics, they had no direct access to student records.
NLP often works by mapping material onto hypothetical spaces. As human beings, we can perceive three dimensions in space as well as a fourth dimension: time. NLP allows analysis along hypothetical dimensions. Just as different types of ambiguity can coexist in written language — as critic William Empson laid out in his book Seven Types of Ambiguity — a natural language system can chart a piece of writing along dozens or hundreds of dimensions. Toubia calls this “a space of knowledge, a conceptual space.”
Toubia’s analysis used preexisting and extensively tested methods to break pieces of writing into 25-word “chunks” and geometrically map “key writing features of interest” in a multidimensional “semantic space.” With the data points mapped, he says, “each essay is represented by a path in the latent 300-dimensional semantic space.” Principles of geometry were then used to determine semantic volume, or the amount of ground covered, and semantic speed, or the speed with which content moves.
The study found that higher semantic volume paired with lower semantic speed in college essays predicted better academic performance, even after more than 100 control factors — including major, high school attended, and socioeconomic background — were considered. High semantic volume means that an essay covers a good deal of conceptual and semantic ground; it does not end where it began. Lower semantic speed means that the essay doesn’t jump about in a cryptic or free-associative way; its logic and linkages are apparent to an attentive reader.
What the researchers found: For Toubia, this latest research is a continuation of ongoing studies that have examined, for example, the semantic structure of successful television shows. He believes future research may help demonstrate things that artificial intelligence cannot do, though so far, he and his colleagues have had success evaluating everything from speed-dating transcripts to television scripts.
In business, Toubia believes that topographical analysis of writing might be useful for human resources professionals. One application, he says, would be “detecting the talents and skills of your prospective employees, not just about what they write about but how they write. And this would be easier and fairer to apply than some common tests, which people can study for or use ChatGPT to answer.”
Eventually, these tools may also prove valuable for companies looking to improve idea generation and hire more innovative people. By analyzing effective and creative writing, employers may even be able to nurture those skills and better understand the thought process for coming up with new products.
The authors conclude their paper with suggestions for further research on a range of topics, from the predictive value of cover letters to potential relationships between semantic speed and semantic volume and an individual’s penchant for convergent or divergent thinking. Convergent thinking involves identifying the best ideas or solutions, while divergent thinking excels at developing new ideas or options, with each having a vital role to play in everyday life and in business. Similarly, they hope to examine and quantify depth of understanding and propose studying the speed and volume of generative AI outputs.
Ultimately, Toubia says, this research offers insights into fundamental questions about humanity: “We’ve seen lots about GenAI and creativity. What is creativity? We’re trying to bring a scientific approach to creativity that suggests creativity is about finding these connections. It’s not about something that’s never been seen before. It’s about finding new combinations.”
Adapted from “The Topography of Thought” by Jonah Berger of the University of Pennsylvania’s Wharton School and Olivier Toubia of Columbia Business School.