Colloquium Polaris 07/09/2015

on July 9, 2015 at 2:00 pm

Speaker : Inderjit S. Dhillon

NOMAD :" A Distributed Framework for Latent Variable Models “.Latent variable models are the cornerstone for many machine learning problems. As data grows in size and complexity, it is a contemporary challenge todevelop scalable and distributed algorithms for this task. In this talk, I willfocus on two such problems of considerable current interest: matrix completion and topic modeling. We tackle these problems by developing a new framework, which we call NOMAD. In both our problems, certain variables behave NOMAD-ically,as they migrate from processor to processor after performing their tasks at each processor. As a result of our framework, the corresponding distributed algorithms are decentralized, lock-free, asynchronous and serializable (or almost serializable). As a result of these properties, our NOMAD-ic algorithms exhibit good scaling behavior on matrix completion problems with billions of ratings, and topic modeling problems with billions of words. As examples, on a distributed machine with 32 processors where each processor has 4 cores, we can solve a matrix completion problem with 2.7B ratings in 10 minutes, and a topic modeling problem with 1.5B word occurrences and 1024 topics in 16 minutes.

Highlights