We consider a sequential adaptive allocation problem which is formulated as a traditional two armed bandit problem but with one important modification: at each time step t, before selecting which arm to pull, the decision maker has access to a random variable Xt which provides information on the reward in each arm. Performance is measured as the fraction of time an inferior arm (generating lower mean reward) is pulled. We derive a minimax lower bound that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate X, a property that we shall refer to as lack of persistent excitation, no policy can improve on the best achievable performance in the traditional bandit problem without side information.

© Copyright 2011 IEEE.

Assaf Zeevi and Alexander Goldenshluger
Journal Article
Publication Date
IEEE Transactions on Information Theory

Full Citation

Zeevi, Assaf and Alexander Goldenshluger
. “A Note on Performance Limitations in Bandit Problems with Side Information.”
IEEE Transactions on Information Theory
, (March 01, 2011):