Title: Scaling Datalog for Machine Learning on Big Data
Mr. Vinayak Borkar, University of California, Irvine
Date & Time: April 17, 2012 15:30
Venue: SIC 201, 02nd Floor, C Block, Dept. of Computer Science & Engg., Kanwal Rekhi Bldg.
The talk will present a case for a declarative foundation for machine learning on big data. Instead of creating a new system for each machine learning task, we explore the use of recursive queries to program a variety of machine learning systems. This approach allows us to use database optimization techniques to identify efficient evaluation strategies and execute the tasks on a general purpose data flow execution engine (Hyracks). As a proof of concept, we consider two diverse programming models, Pregel and Iterative Map-Reduce-Update, from the machine learning domain and show how they can be captured as Datalog queries. Experimental evaluation of the system on a large cluster with real data shows that such an approach provides good performance without losing generality or ease of programming. Joint work with Yingyi Bu, Michael J. Carey, Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, and Raghu Ramakrishnan.
Speaker Profile:
Vinayak Borkar is Lead Software Engineer on the ASTERIX project at UC Irvine, in addition to being a PhD candidate. Vinayak has worked in Silicon Valley for many years, after finishing his MTech at IIT Bombay.
