Abstract -R ́esum ́eThis document presents in a unified way different results about the optimal solution of several multi-armed bandit problems. We present and analyze algorithms for sequential decision making that adap-tively sample several probability distributions with unknown characteristics, in order to achieve differenttypes of objectives. Our contributions cover two types of problems. On the one hand, we study rewardsmaximization in some variants of the classical bandit model and on the other hand we focus and so-called active identification problems, in which there is no incentive to maximize reward, but one shouldoptimize exploration in order to answer some (possibly complex) question about the underlying distri-butions. We highlight several common tools for solving these problems. First, lower bounds, that notonly permit to assess the optimality of an algorithm, but also guide the design of asymptotically optimalalgorithms. We indeed provide several examples of lower-bound-inspired algorithms. Then, we empha-size the importance of time-uniform self-normalized concentration inequalities to analyze algorithms.Finally, on the algorithmic side, we present several variants of an important Bayesian principle calledThompson Sampling, which leads to easy-to-implement asymptotically optimal algorithms in some par-ticular cases.
defended on 13/11/2020