Title: Distributed Cube Materialization on Holistic Measures
Dr. Arnab Nandi,
Date & Time: October 19, 2011 11:00
Venue: Conference Room, 01st floor, C Block, Dept. of CSE, Kanwal Rekhi Bldg.
Cube computation over massive datasets is critical for many important analyses done in the real world. Unlike commonly studied algebraic measures such as SUM that are amenable to parallel computation, efficient cube computation of holistic measures such as TOP-K is non-trivial and often impossible with current methods. In this talk, we discuss real-world challenges faced while performing cube materialization tasks on Web-scale datasets. Specifically, we identify an important subset of holistic measures and introduce MR-Cube, a MapReduce based framework for efficient cube computation on these measures. We provide extensive experimental analyses over both real and synthetic data. We demonstrate that, unlike existing techniques which cannot scale to the 100 million tuple mark for our datasets, MR-Cube successfully and efficiently computes cubes with holistic measures over billion-tuple datasets. This work was done at Yahoo! Research, and was published at ICDE 2011.
Speaker Profile:
Arnab Nandi recently completed his PhD from Michigan, and will be joining Ohio State Univ. in Spring 2011 as a tenure-track faculty. Arnab spent a year at IIT Bombay after his Bachelors, before going on for his PhD at Michigan.
