The context of this thesis work is dynamic recommendation. We formalize this problem as a contextual bandit problem. Recommendation is the action, for an intelligent system, to supply a user of an application with personalized content so as to enhance what is refered to as "user experience" e.g. recommending a product on a merchant website or even an article on a blog. Recommendation is considered dynamic when the content to recommend or user tastes evolve rapidly e.g. news recommendation. Many applications that are of interest to us generates a tremendous amount of data through the millions of online users they have. Nevertheless, using this data to evaluate a new recommendation technique or even compare two dynamic recommendation algorithms is far from trivial. This is the problem we consider here. Some approaches have already been proposed. Nonetheless they were not studied very thoroughly both from a theoretical point of view (unquantified bias, loose convergence bounds...) and from an empirical one (experiments on private data only). In this work we start by filling many blanks within the theoretical analysis. Then we comment on the result of an experiment of unprecedented scale in this area: a public challenge we organized. This challenge along with a some complementary experiments revealed a unexpected source of a huge bias: time acceleration. The rest of this work tackles this issue. We show that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it.
Directeur de thèse : Philippe PREUX, Université de Lille 3 Co-encadrant : Jérémie MARY, Université de Lille 3 Rapporteurs : Olivier CAPPÉ, CNRS, Telecom ParisTech Ludovic DENOYER, Université Paris 6 Examinateurs : Olivier CHAPELLE, CRITEO Labs Rémi GILLERON, Université de Lille 3 Invité : Lihong LI, Microsoft Research