Skip to main content
Official Logo of Columbia Business School
Academics
  • Visit Academics
  • Degree Programs
  • Admissions
  • Tuition & Financial Aid
  • Campus Life
  • Career Management
Faculty & Research
  • Visit Faculty & Research
  • Academic Divisions
  • Search the Directory
  • Research
  • Faculty Resources
  • Teaching Excellence
Executive Education
  • Visit Executive Education
  • For Organizations
  • For Individuals
  • Program Finder
  • Online Programs
  • Certificates
About Us
  • Visit About Us
  • CBS Directory
  • Events Calendar
  • Leadership
  • Our History
  • The CBS Experience
  • Newsroom
Alumni
  • Visit Alumni
  • Update Your Information
  • Lifetime Network
  • Alumni Benefits
  • Alumni Career Management
  • Women's Circle
  • Alumni Clubs
Insights
  • Visit Insights
  • Digital Future
  • Climate
  • Business & Society
  • Entrepreneurship
  • 21st Century Finance
  • Magazine
Digital Future, Marketplace Design

Uncovering the Costly Bias in Marketplace Testing

Average Read Time:

Statistical bias could be misleading your product and feature testing, according to research from Columbia Business School Professor Hannah Li, but solutions might be easier than you think.

Article Author(s)
  • Jonathan Sperling
Published
April 21, 2025
Publication
Research In Brief
Jump to main content
Online real estate listings
Category
Thought Leadership
Topic(s)
Data/Big Data
Digital Future
Marketplace
Save Article

Download PDF

About the Researcher(s)

Hannah Li

Hannah Li

Assistant Professor of Business
Decision, Risk, and Operations Division

0%

Share
  • Share on Facebook
  • Share on Threads
  • Share on LinkedIn

A/B testing can be a relatively quick and cost-efficient tool for leaders and their companies to test new features on a subset of users to understand the impact before broader deployment.  However, this testing can come with a serious caveat in many industries.

Imagine you're testing a new feature on your website — the impact of showing better-quality photos for rental listings on a platform like Airbnb. You randomly split users into two groups in preparation for an A/B test: a treatment group sees new, high-quality photos, while a control group sees the original, standard images.

In a perfect world, each user's behavior would be unaffected by what the other group sees. But that assumption often breaks down in reality, especially in marketplaces or social networks. According to research from Hannah Li, an assistant professor in Columbia Business School's Decision, Risk, and Operations Division, users don't operate in isolation- they interact, compete, and influence each other.

"When you run A/B testing in marketplaces where you have users buying and selling things from each other, the users are no longer going to be independent," Li says.

Key Takeaways:

-Traditional A/B testing assumes a type of user independence, i.e. that the treatment assigned to one individual does not influence the behavior of another.

-In platforms like marketplaces or social networks, this assumption often fails because users interact, compete, or influence one another, creating interference bias.

-As a result, companies risk wrongly rolling out or rejecting features, all while believing they're making sound, data-driven decisions.

-Smarter experimental designs, such as Two-Sided Randomization, can reduce bias.

-Other forms of biases can arise in recommendation systems, where users can strategically interact with their recommendation algorithms by deliberately changing how they engage with content. 

Preventing Statistical Bias

Li explained that when someone in the treatment group books a listing due to the higher-quality photos, there's now one less listing available for someone in the control group. This means the treatment unintentionally affects the control group, violating a core assumption of A/B testing: independence. 

That distortion is what Li and her fellow researchers call interference bias – an occurrence that can bias as high as 230%, meaning companies might believe an intervention is more than twice as effective as it is. That can lead to false confidence in a product change — launching something you think is a success, only to find it doesn't work in the real world. Worse, it might cause you to kill ideas that would've worked simply because your experiment didn't account for how users affect one another. All the while, a company believes they are making air-tight, data-driven decisions. 

In their research, Li and her co-researchers found that implementing the right experimental systems can curtail this bias.

Interference in Action

To investigate how interference bias arises in two-sided platforms, the researchers developed a formal marketplace model using continuous-time Markov chains. This mathematical framework allowed them to simulate a dynamic environment where buyers and sellers arrive, interact, and transact over time. 

Li and her co-researchers found that preventing this bias can be done through a novel form of experimental design, known as Two-Sided Randomization (TSR). TSR randomizes both sides of the marketplace simultaneously. Instead of randomizing either  sellers or buyers to treatment or control groups, TSR randomizes both sides, sellers and buyers, to these groups. This type of design allows the platform to measure competition effects between sellers and between buyers, the source of the interference bias, and account for these effects in the experiment estimates. 

This leads to far more accurate estimates of an experiment's Global Treatment Effect (GTE) — the metric most companies care about when deciding whether to roll out a feature to all users. Simulations from Li and her co-researchers' paper show that TSR consistently produces lower bias than standard experimental methods, across a wide range of market conditions.

If TSR is not feasible, there are other approaches companies can take, according to Li. Cluster Randomization, for example, groups users (e.g., by region) and randomizes them to minimize cross-group interaction.

Another technique is Switchback Testing. Instead of splitting users into a control group and a treatment group, alternate the treatment across time periods for the entire platform (e.g., on one day, off the next).

When Users are Strategic

A subsequent paper by Li studies how systems of people strategically interact with online platforms to influence recommended content—another form of bias that can throw companies off.

Typically, platforms like TikTok, Netflix, and Amazon suggest content based on users' past behaviors, assuming user interactions are straightforward reflections of their preferences. However, Li and her co-researchers' study suggests that users often engage in strategic behavior to shape their future recommendations.

For instance, when participants were informed that an algorithm prioritizes "likes" and "dislikes," they used these features almost twice as much as those told the algorithm focuses on viewing time. Through surveys, the researchers found that nearly half of the participants admitted to altering their behavior on platforms to control future recommendations. Some users even reported avoiding content they enjoy to prevent the platform from over-recommending similar content in the future.

"If you watch a video on YouTube, the platform learns that you like it. If you don't watch it, they learn you don't like it. But what we heard is that users are strategizing. They may see a YouTube video and actually like it, but they know that if they click on it, they will get millions of the same videos for the next three weeks. So, they don't watch the video," Li says, adding that "when this happens, the data that's being collected is not representative of the user's true preferences."

Experimental Music

To study how users adapt their behavior in response to recommendation systems, Li and her co-authors created their own music streaming app—essentially a simplified version of Spotify. This gave them total control over what users saw and how the system reacted. By stripping away real-world platform complexities, they could focus entirely on whether users tried to “game” the algorithm.

The study’s 750 participants were randomly assigned to different conditions in a controlled environment. Everyone listened to songs and could “like” or “dislike” them, or just skip ahead. In the first session, participants used the music player naturally, as if they were on a real platform. 

In the following session, participants were randomly told different things about how the recommendation algorithm worked. Some were told the system cared most about likes/dislikes, others were told it prioritized listening time, and a control group got no guidance.

This setup let researchers test how user behavior changed depending on what users believed the algorithm cared about—without changing the actual algorithm. By observing how people’s actions varied under these scenarios, the researchers could see whether users acted strategically—choosing actions not just based on personal enjoyment, but also based on what they thought would “train” the algorithm in their favor. he main behavioral metrics tracked included:

The researchers paid close attention to the number of “likes” and “dislikes” and how long users stayed on each song, or dwell time. The researchers also conducted follow-up surveys to confirm whether users admitted to similar strategic behaviors on real-world platforms like Spotify or TikTok.

Li suggested that the fact users are strategizing indicates that recommendation systems, such as Instagram’s “Explore”  page, may be over-indexing on known user preferences rather than exploring new content. Adjusting the algorithms to be less heavy-handed in pushing familiar content could help address this issue.

She also noted that users would ideally be able to more easily alter the algorithm behind their personal feed rather than strategize their behavior. Giving users more control and transparency over the recommendation system could help mitigate stratification.

 

Adapted from “Measuring Strategization in Recommendation,” by Hannah Li of Columbia Business School, Sarah H. Cen of Massachusetts Institute of Technology, Andrew Ilyas of Massachusetts Institute of Technology, Jennifer Allen of Massachusetts Institute of Technology, and Aleksander Mądry Massachusetts Institute of Technology.

Also adapted from “Experimental Design in Two-Sided Platforms,” by Hannah Li of Columbia Business School, Ramesh Johari of Stanford University, Inessa Liskovich of Stanford University, and Gabriel Y. Weintraub of Stanford University.

About the Researcher(s)

Hannah Li

Hannah Li

Assistant Professor of Business
Decision, Risk, and Operations Division

Related Articles

Algorithms
Artificial Intelligence
Business and Society
Digital Future
Entrepreneurship
Management
Date
May 05, 2025
Illustration of hands, resumes and laptop
Algorithms
Artificial Intelligence
Business and Society
Digital Future
Entrepreneurship
Management

Did AI Write That Pitch? The Impact of Generative AI on Hiring and Startup Evaluations

Research from Columbia Business School examines the challenges posed by generative AI in hiring and entrepreneurial pitching, offering insights into when AI helps — and when it hinders.

  • Read more about Did AI Write That Pitch? The Impact of Generative AI on Hiring and Startup Evaluations about Did AI Write That Pitch? The Impact of Generative AI on Hiring and Startup Evaluations
Economics and Policy
Finance
Financial Institutions
Financial Policy
Financial Technology
Date
April 23, 2025
Woman working on finances
Economics and Policy
Finance
Financial Institutions
Financial Policy
Financial Technology

How Tax-Deferred Retirement Accounts Cost the U.S. Government $23 Billion a Year

Columbia Business School research reveals the hidden cost of traditional retirement accounts: a $3.8 trillion government-owned investment portfolio driving $23.4 billion in annual fees. A shift to Roth accounts could save billions — and fund a national retirement match.

  • Read more about How Tax-Deferred Retirement Accounts Cost the U.S. Government $23 Billion a Year about How Tax-Deferred Retirement Accounts Cost the U.S. Government $23 Billion a Year
Algorithms
Analytics
Artificial Intelligence
Business and Society
Business Economics and Public Policy
Data and Business Analytics
Digital Future
Digital IQ
Finance
Marketing
Marketplace
Date
April 17, 2025
Close-up computer monitor with trading software
Algorithms
Analytics
Artificial Intelligence
Business and Society
Business Economics and Public Policy
Data and Business Analytics
Digital Future
Digital IQ
Finance
Marketing
Marketplace

Designing Smarter Economic Systems: A New Approach to Mechanism Design

Award-winning research from Professor Laura Doval tackles the “limited commitment” problem in economics, offering a model that helps governments and firms adjust rules and strategies based on new information over time.

  • Read more about Designing Smarter Economic Systems: A New Approach to Mechanism Design about Designing Smarter Economic Systems: A New Approach to Mechanism Design
Data and Business Analytics
Data/Big Data
Digital Future
Digital IQ
Marketing
Media and Technology
Date
April 04, 2025
Shopping for travel online
Data and Business Analytics
Data/Big Data
Digital Future
Digital IQ
Marketing
Media and Technology

How Real-Time Click Data Drives Smarter Personalization

New Columbia Business School research reveals how analyzing real-time customer journey data — from search queries to filtering behavior — can predict preferences with remarkable accuracy, even without historical data.

  • Read more about How Real-Time Click Data Drives Smarter Personalization about How Real-Time Click Data Drives Smarter Personalization

External CSS

Articles A11y button

Official Logo of Columbia Business School

Columbia University in the City of New York
665 West 130th Street, New York, NY 10027
Tel. 212-854-1100

Maps and Directions
    • Centers & Programs
    • Current Students
    • Corporate
    • Directory
    • Support Us
    • Recruiters & Partners
    • Faculty & Staff
    • Newsroom
    • Careers
    • Contact Us
    • Privacy & Policy Statements
Back to Top Upward arrow
TOP

© Columbia University

  • X
  • Instagram
  • Facebook
  • YouTube
  • LinkedIn