This thesis presents our work on reinforcement learning in stochastic games. The two first parts of this manuscript study the problem of learning from batch data. We propose a first approach using approximate dynamic programming in zero-sum two-player Markov games and discuss its limitations for general sum Markov games. As a second approach, we study the direct minimization of the Bellman residual for zero-sum two player Markov games. This apprach is generalized to general sum Markov games. Finally we study the online setting and propose an actor-critic algorithm that converges in both zero-sum two-player Markov games and in cooperative multistage games.
Directeur de thèse : Olivier PIETQUIN Rapporteurs : Damien ERNST, Doina PRECUP Examinateurs : Laurance Duchien, Ronald ORTNER, Bilal PIOT, Bruno SCHERRER, Karl TUYLS