on February 25, 2025 at 2:30 pm
Solving pure exploration tasks in bandit models and beyond
Bandit models are well-studied in the machine learning community due to numerous applications to online content optimization. In a multi-armed bandit model, an agent sequentially tries different actions, called arms. Each arm is associated with an unknown probability distribution. In a pure exploration task, the agent wants to learn something about these distributions (e.g., which arms has the largest expectation) by querying as few samples as possible from them. I will first present a generic algorithm, called Track and Stop, that attains the minimal sample complexity in an asymptotic regime in which the error probability is small. The high computational cost of Track and Stop lead to the development of other types of algorithms that are computationally more appealing while keeping a (near)-optimal sample complexity. In particular, I will advocate the use of the so-called Top Two algorithms. Finally, I will present some results about pure exploration in Markov Decision processes in which the agent’s state can evolve with the chosen action and the goal is to learn a good policy (mapping from state to action).
More...Amphi Ircica - 50 avenue Halley - Haute Borne - Villeneuve d'Ascq