Abstract
We consider a repeated newsvendor problem in which the decision-maker (DM) does not have access to the underlying distribution of discrete demand. We analyze three informational settings: i) the DM observes realized demand in each period; ii) the DM only observes realized sales; and iii) the DM observes realized sales but also a lost sales indicator that records whether demand was censored or not. We analyze the implications of censoring on performance and key characteristics that effective policies should possess. We provide a characterization of the best achievable performance in each of these cases, where we measure performance in terms of regret: the worst case difference between the cumulative costs of any policy and the optimal cumulative costs with knowledge of the demand distribution. In particular, we show that for both the first and the third settings, the best achievable performance is bounded (i.e., does not scale with the number of periods) while in the second setting, it grows logarithmically with the number of periods. We link the latter degradation in performance to the need for continuous exploration with sub-optimal decisions and provide a characterization of the frequency with which this should occur.