Why better AI doesn't always mean better outcomes

There’s a popular narrative that as generative AI models get smarter, people and businesses will automatically become more productive. Each new release—whether from OpenAI, Google, Anthropic, or others—tends to be treated as a near-linear upgrade over the last, promising sharper reasoning, better writing, stronger coding, and more sophisticated image generation. The implication is that progress in AI capability naturally translates into progress in human performance.

But new research suggests the reality is more complicated. In a recent paper, Columbia Business School Assistant Professor David Holtz, along with researchers from MIT, Stanford, the University of Maryland, the University of Cyprus, and Microsoft, argue that better AI models alone do not guarantee better outcomes. A significant share of the upside depends on whether users learn to interact with the new systems differently.

“Forecasts tend to assume that generative AI models will just get better and better, and as a result, the productivity gains will grow and grow,” Holtz says. “Our research shows that in order to maximize those gains, we need to be constantly updating the way we interact with the models.”

The researchers call this process “prompt adaptation.” Through two experiments involving roughly 37,000 prompts, the team found that, in some cases, those behavioral shifts accounted for nearly half of the improvement associated with a newer generative AI model.

But the importance of prompting turns out to vary dramatically depending on the task. When participants were trying to steer an AI system toward a precise outcome, prompt adaptation mattered enormously. In more open-ended creative work, the improvements came overwhelmingly from the model itself rather than from better prompting strategies.

The findings complicate the idea that AI progress is simply something users passively receive. Instead, to see the best results, individuals and organizations must work with AI, evolving their behavior as the models evolve.

Filling a gap in AI research

While an enormous amount of research has examined how generative AI models function, how capable they are becoming, and what their economic effects might be, far less attention has focused on how people actually learn to use these systems effectively. Prompting advice is often shared through online tutorials, Reddit threads, and “prompt engineering” guides, but there has been relatively little rigorous evidence demonstrating the importance of prompting strategies.

To investigate that angle, the researchers designed two online experiments involving nearly 3,750 participants. In both studies, users interacted with DALL-E 2 and DALL-E 3, two different image generation models released by OpenAI, repeatedly refining prompts over a series of attempts.

In the first study, participants were asked to complete a tightly bound task: recreating a target image as accurately as possible. Success could be measured objectively by comparing the generated image with the original.

The second study was more open-ended and creative. Participants were asked to design logos for hypothetical organizations based on short briefs. For this task, there was no single correct answer, and quality had to be evaluated comparatively.

Results differed dramatically between the two studies. In both tasks, participants assigned to DALL-E 3 performed better on both tasks. However, in the image-replication task, roughly half of the performance improvement from DALL-E 3 came not from the model alone, but from users adapting their prompts to the newer system. In the creative logo task, by contrast, most of the improvement came directly from the stronger model itself rather than from changes in prompting behavior.

A long-term investment for organizations

The findings carry important implications for organizations rushing to integrate generative AI into everyday operations. The biggest lesson is simply that adopting a stronger model is not enough on its own. Instead, as AI tools improve, it’s important to make recurring investments in the workers using those tools.

Consider a company using AI to sort and summarize customer-service complaints. A more advanced model might perform somewhat better out of the box. But employees who learn, through experimentation or training, how to better structure prompts using the new model could unlock substantially larger gains in accuracy and usefulness.

“Prompting these models is a lot more similar to managing human workers than it is to writing software,” Holtz says. “And you can't manage every human the same way. You need to learn what works for different people and adapt your management style accordingly.”

The paper also challenges the idea that prompting expertise is reserved for elite “prompt engineers.” In the experiments, ordinary users improved rapidly through trial and error, often within a single session. These results suggest that organizations may need to think less about static prompt templates and more about building adaptive habits around AI use. As models evolve, those habits may need to be updated.

“Every model is going to have its idiosyncrasies that you need to learn,” Holtz says. “If you want to get the best results out of any given model, you need to learn those idiosyncrasies.”

About the Researcher(s)

David Holtz

Assistant Professor of Business: Decision, Risk, and Operations Division

View the Research

Prompt Adaptation as a Dynamic Complement in Generative AI Systems

Why better AI doesn't always mean better outcomes

Key Takeaways

Filling a gap in AI research

A long-term investment for organizations

About the Researcher(s)

View the Research

You Might Like

Why speaking to AI may be better for brainstorming

AI and the MBA: Preparing for a changing job market

Filling a gap in AI research

A long-term investment for organizations

About the Researcher(s)

View the Research

You Might Like

Why speaking to AI may be better for brainstorming

AI and the MBA: Preparing for a changing job market

External CSS

Homepage Breadcrumb Block