Optimal Exploration-Exploitation in a Multi-armed Bandit Problem with Non-stationary Rewards