The Exploration-Exploitation Trade-off in the Newsvendor Problem

Abstract

When an inventory manager attempts to construct probabilistic models of demand based on past data, demand samples are almost never available: only sales data can be used. This limitation, referred to as demand censoring, introduces an exploration-exploitation trade-off as the ordering decisions impact the information collected. Much of the literature has sought to understand how operational decisions should be modified to incorporate this trade-off. We ask an even more basic question: when does the exploration-exploitation trade-off matter? To what extent should one deviate from a myopic policy that takes, given the information at hand, the optimal decision for the current period without consideration for future periods? We analyze these questions in the context of a well-studied stationary multi-period newsvendor problem in which the decision-maker starts with a prior on a vector of parameters characterizing the demand distribution. We show that, under very general conditions, the myopic policy will almost surely learn, in the long run, the optimal decision one would have taken with knowledge of the unknown parameters. Furthermore, we analyze finite time performance for a broad family of tractable cases. Through a combination of analytical parametric bounds and exhaustive exact analysis, we show that the myopic optimality gap is negligible for most practical instances, articulating the conjunction of conditions that could lead to a non-trivial value. The collection of results establishes that the myopic policy is a viable and appealing heuristic for the newsvendor problem with demand censoring.

Authors: Omar Besbes , Juan Manuel Chaneton, and Ciamac Moallemi

Format: Working Paper

Publication Date: November 4, 2021

Full Citation

Besbes, Omar, Juan Manuel Chaneton, and Ciamac Moallemi

. The Exploration-Exploitation Trade-off in the Newsvendor Problem. November 04, 2021.