As privacy regulations limit access to third-party data, companies are struggling to personalize customer experiences.
New research from Columbia Business School shows that real-time customer journey data — like search queries and filter use — can accurately predict preferences without relying on past purchase history.
The findings suggest companies don’t need to store customer data long term to deliver personalized experiences, offering a potential win-win-win for privacy advocates, businesses, and consumers alike.
Key Takeaways:
- In recent years, regulations and industry shifts have limited companies’ access to third-party data — information about online behavior collected by external companies as users browse different websites.
- As a result, companies are increasingly looking to leverage first-party customer journey data — like queries, clicks, and filtering behavior that occur on a brand’s own website — to better understand customer preferences.
- Analyzing first-party interactions during the purchase journey can help companies predict customers’ preferences with remarkable accuracy: After just five clicks, predictions improve by up to 73 percent compared to initial insights.
- This approach helps solve the “cold start” problem of serving new or unidentified customers, while at the same time respecting users’ privacy by relying only on current-session data.
Preserving User Privacy
Think back to the last time you searched for a flight — maybe it was for a family vacation to the Caribbean or a business trip to Chicago. When you started your search, you probably used the filter features on your travel platform of choice, selecting preferences like departure time or specific airlines.
Those filters — and the flight options you clicked in the search results — likely varied depending on the trip you were planning to take. Knowing you’d be traveling with kids, you might have searched for exclusively nonstop flights to St. Lucia. For that business conference on your company’s dime, you might have excluded Basic Economy in favor of more comfortable classes. For another work trip, you might have even booked flights taking you to and from your destination in a single day.
These digital breadcrumbs — or first-party data collected directly from interactions with a company’s website or app — provide valuable insights into your buying behavior. For travel platforms, this intel is incredibly useful. If they know you prefer to fly on Sundays and want a nonstop journey, they won’t waste your time showing you Tuesday departures with two layovers.
But how much first-party data do companies really need to predict customer preferences — and how can they leverage it effectively while preserving user privacy? These are the core questions Oded Netzer, the Arthur J. Samberg Professor of Business at Columbia Business School, wanted to answer in a recent research initiative.
“Our thought process was that maybe there’s more information on a company’s own website than they tend to give credit to — especially when you start looking not only at what someone bought but their entire purchase journey,” he says.
First-Party Data and the ‘Cold Start’ Problem
First-party data has become a first-tier priority, thanks to evolving trends in consumer protection and privacy advocacy — from Europe’s GDPR to Google’s purported move away from third-party cookies.
“The past few years, we’ve been hearing a lot about the movement to increase consumer privacy by restricting companies' ability to store historical data or use third-party data. This could be a positive change in terms of consumer privacy, but it could also make it more difficult for consumers to find what they want,” says Netzer. “That sparked us to think about what companies can actually learn from the data they already have on hand in order to offer better products to consumers.”
Traditional first-party data rely heavily on purchase history. However, in many instances, this data may be sparse — or irrelevant to a consumer’s current needs. (Your family vacation last December reveals little about the ideal flight for that midweek business trip.)
This challenge is known as the “cold start” problem: It’s difficult to serve customers effectively when you have little or no historical data to go off of. “Our goal with analyzing the journey was to compensate for that lack of historical data by looking at what consumers are doing now,” says Netzer.
A New Model for Analyzing Buying Decisions
To analyze consumer behavior, the researchers developed a probabilistic machine learning model that combined various types of first-party data: search queries, clicks, filters, and ultimate purchase decisions. It also factored in historical purchase data when available but weighted it differently based on relevance to the current search. “If I know that you’ve bought a flight on United three times in the past but I currently see four clicks on an American Airlines flight, our model allows us to figure out how to weigh those pieces of information,” says Netzer.
The researchers applied their model to data from a major online travel platform, analyzing more than 25,000 customer journeys. Leveraging the Bayesian non-parametric Pitman-Yor process — a sophisticated statistical method that dynamically identifies distinct contexts within customer journeys — they were able to analyze and segment different travel needs. Each context revealed unique patterns related to things like price sensitivity, advance booking windows, and preferences for direct flights.
Surprising Depth of Insights from Just a Few Clicks
The research revealed that companies can learn a surprising amount about customer preferences from just a few interactions. “After just two clicks, we could predict your airline alliance by 25 percent more than we could when you entered the website — and after five clicks, by 73 percent more,” says Netzer. And when it came to final flight selection, the researchers’ model predicted the actual product chosen by a customer 10 times more accurately than models relying on historical purchase data alone.
The model’s effectiveness has implications for personalization in marketing, including improved recommendations. The research also demonstrates promising applications for retargeting campaigns directed at potential customers who have previously engaged with the brand. The model identified which flights to show in retargeting ads, improving predicted click-through rates by up to 28 percent compared to traditional methods.
The Future of First-Party Data
While a growing emphasis on privacy-compliant data handling poses significant challenges for businesses, Netzer believes this model could offer a practical solution — one that could enable firms to serve customers better and, at the same time, appease regulators. For instance, policymakers could let companies use customer journey data within a single browsing session, so long as they delete it immediately after the session concludes.
“The beauty here is that this data is cheap,” he adds. “You can delete it immediately after the customer leaves the website, so the privacy issue is very low. At the same time, it can be valuable for helping the customer find what they want.”
In addition to improving product recommendations, Netzer thinks such real-time analysis could help companies zero in on the elusive “moment of truth” — the point where a customer shifts from simply exploring their options to making a purchasing decision. By identifying such inflection points, companies can better help the customer progress in their purchase journey. “In a world where there are almost infinite choices, this could help customers identify what they want very fast. This can help firms leverage that information and serve the right product to consumers as soon as they show up,” he says.
Netzer adds that there are many areas beyond flight purchases where this approach could apply — like for industries that tend to see infrequent purchases, such as furniture or automotive sales, and thus tend to struggle with the cold-start issue more often.
“Every journey has a context,” says Netzer. “The flexibility of this approach allows us to realize that not all historical data is relevant to the current journey, so you shouldn’t be pulling information from just that one bucket.”
Chart shows price sensitivity throughout the customer journey.
Adapted from “The Customer Journey as a Source of Information” by Oded Netzer of Columbia Business School, Nicolas Padilla of London Business School, and Eva Ascarza of Harvard Business School.