Skip to main content
Official Logo of Columbia Business School
Academics
  • Visit Academics
  • Degree Programs
  • Admissions
  • Tuition & Financial Aid
  • Campus Life
  • Career Management
Faculty & Research
  • Visit Faculty & Research
  • Academic Divisions
  • Search the Directory
  • Research
  • Research Resources
  • Teaching Excellence
Executive Education
  • Visit Executive Education
  • For Organizations
  • For Individuals
  • Program Finder
  • Online Programs
  • Certificates
About Us
  • Visit About Us
  • CBS Directory
  • Events Calendar
  • Leadership
  • Our History
  • The CBS Experience
  • Newsroom
Alumni
  • Visit Alumni
  • Update Your Information
  • Lifetime Network
  • Alumni Benefits
  • Alumni Career Management
  • Women's Circle
  • Alumni Clubs
Insights
  • Visit Insights
  • AI & Transformative Tech
  • Climate
  • Business & Society
  • Entrepreneurship
  • Finance & Investing
  • Magazine

No Matter the Question, ChatGPT wants to be First: New Research Reveals ChatGPT's Puzzling Bias for ‘Option A’

New Columbia Business School study finds A.I. tools like ChatGPT consistently favor the first option presented when evaluating choices—but aggregating results from multiple, differently-worded prompts cancels out the bias

Based on Research by
Melanie Brucks, Olivier Toubia
Published
November 12, 2025
Publication
CBS Newsroom
Focus On
Artificial Intelligence (AI), AI & Transformative Tech
Jump to main content
Shutterstock Photo Image
Category
General Release
News Type(s)
Press Release
Topic(s)
Artificial Intelligence, Decisions, Digital IQ

About the Researcher(s)

Melanie Brucks

Melanie Brucks

Assistant Professor of Business
Marketing Division
Photo of Prof. Olivier Toubia

Olivier Toubia

Glaubinger Professor of Business
Marketing Division

View the Research

Prompt architecture induces methodological artifacts in large language models

0%

New York, NY – Businesses have quickly incorporated large-language models like ChatGPT, to handle time-consuming tasks such as analyzing a large amount of text. However, as the technology continues to grow in popularity in the corporate world, experts are sounding the alarm about an emerging pitfall with ChatGPT: a new Columbia Business School study reveals that when A.I. tools like ChatGPT are asked to evaluate options—for example, candidates for a job opening—the tools overwhelmingly favor the first option in the prompt, regardless of how the prompt is written. Researchers find that this phenomenon applies across a broad combination of multiple-choice questions and answers, meaning that ChatGPT and other large language models have an overwhelming bias for selecting the first answer to a question they’re given.

The study, Prompt Architecture Induces Methodological Artifacts in Large Language Models, co-authored by Columbia Business School Professors Melanie Brucks and Olivier Toubia, finds that even subtle differences in how prompts are written can significantly shape the results produced by A.I. tools like ChatGPT. The researchers discovered these models consistently favor the first option presented regardless of order, assigned labels, and framing of the question. For example, ChatGPT may tell a recruiter that ‘Candidate A’ is more qualified than ‘Candidate B’ simply because ‘Candidate A’ was first in their prompt, even when the prompt clearly states that order should not matter. However, researchers found that by feeding multiple differently phrased and structured prompts—switching the order, labels, and framing in each—then taking the average result, the ordinal bias is almost entirely eliminated.

“As A.I. becomes part of everyday decision-making from hiring and healthcare to public policy, companies and users need to recognize that no prompt is ever truly neutral,” said Olivier Toubia, the Glaubinger Professor of Business in the Marketing Division at Columbia Business School. “Rather than searching for the perfect way to phrase a question, our research shows that combining results from multiple, differently worded prompts can effectively cancel out bias and lead to more reliable outcomes.”

To test how prompt design influences A.I. behavior, the researchers conducted a series of large-scale, full-factorial experiments across two studies using OpenAI’s ChatGPT (including GPT-3 and GPT-4) and Meta’s Llama 3.1. In the first study, they asked the models to compare three sets of items—for example, three sets (A, B, C) of five countries—and decide whether the second or third set was more similar to the first. They varied how each prompt was written, changing the labels, the order of the sets, the framing of the question, and whether they asked for a justification for the answer. Across 5,447 prompts, the A.I. systems displayed strong bias: the A.I. tools chose the first option listed 63% of the time, and option ‘B’ over ‘C’ 74% of the time, while an unbiased model would choose each option 50% of the time. The researchers replicated this experiment with a simpler setup across 64,800 trials and found nearly identical results: the tools chose the first option 64.29% of the time. However, when they aggregated responses from multiple randomized prompts, the bias nearly disappeared, dropping down to 50.01% for ChatGPT and 50.06% for Llama. This suggests that combining results from several prompt variations is far more effective than the futile exercise of attempting to create a single, “perfect” prompt.

Key findings from the research include:

  • Prompt Design Shapes Every A.I. Output. Each prompt fed to a large-language model carries features—such as order, framing, and labeling—that collectively form its “prompt architecture” and can significantly influence results.

  • There is no such thing as a perfect prompt. Telling ChatGPT that “order doesn’t matter” or using different kinds of labels or wording does not significantly reduce bias.

  • Aggregating prompts eliminates bias. Combining results from multiple, differently worded prompts effectively cancels out the bias introduced by any single prompt.

“No matter how sophisticated A.I. tools become, bias will always exist in the prompts we provide.  Our hope is that businesses and users make it standard practice to aggregate results rather than relying on a single prompt,” said Melanie Brucks, Assistant Professor of Business in the Marketing Division at Columbia Business School.

About the Researcher(s)

Melanie Brucks

Melanie Brucks

Assistant Professor of Business
Marketing Division
Photo of Prof. Olivier Toubia

Olivier Toubia

Glaubinger Professor of Business
Marketing Division

View the Research

Prompt architecture induces methodological artifacts in large language models

To learn more about the cutting-edge research being conducted, please visit the Columbia Business School Research website.

###

Save Press Release

Download PDF

Share
  • Share on Facebook
  • Share on Threads
  • Share on LinkedIn
Official Logo of Columbia Business School

Columbia University in the City of New York
665 West 130th Street, New York, NY 10027
Tel. 212-854-1100

Maps and Directions
    • Centers & Programs
    • Current Students
    • Corporate
    • Directory
    • Support Us
    • Recruiters & Partners
    • Faculty & Staff
    • Newsroom
    • Careers
    • Contact Us
    • Accessibility
    • Privacy & Policy Statements
Back to Top Upward arrow
TOP

© Columbia University

  • X
  • Instagram
  • Facebook
  • YouTube
  • LinkedIn

External CSS