Title: DataHub: A Hosted Platform for Organizing, Managing, Sharing, Collaborating, and Processing Data
Mr. Anant Bharadwaj, CSAIL MIT
Date & Time: January 7, 2015 14:30
Venue: Conference Room, C Block, 01st Floor, Department of Computer Science and Engineering, Kanwal Rekhi (KReSIT) Building
In this talk, I will describe DataHub -- a hosted data platform we are building at MIT. The DataHub platform is 1) a hosted data-store (files as well as databases) with versioning and collaboration capabilities, and 2) an app ecosystem which hosts apps for various data-processing activities such as ingestion, curation, integration, discovery, query, analytics, visualization, machine learning, etc. The DataHub platform exposes a SDK (thrift-based APIs -- can be compiled into any of the 20+ thrift-supported languages) which can be used by developers/vendors to write apps and publish to the DataHub App Center. The DataHub users can use any of the apps from the App Center for processing their data as it fits their need. I will also discuss some data-processing apps we have built for DataHub - a) Distill: an example based ETL system for converting semi-structured text into a structured table, b) DViz: a simple visualization interface, and c) DataHub Notebook: an IPython extension that enables sophisticated data science directly inside DataHub.
Speaker Profile:
Anant Bhardwaj is a Ph.D. student in the Computer Science & Artificial Intelligence Laboratory (CSAIL) at MIT, co-advised by David Karger, and Samuel Madden. His primary interest these days is in developing platforms and tools for managing and making sense of data. His research projects draw ideas from various fields such as databases, distributed systems, algorithms, machine learning, and human-computer interaction. His current projects are: 1) DataHub: a hosted platform for data management, 2) Distill: an example based ETL system for converting semi-structured text into a structured table, 3) Barista: a distributed, synchronously replicated, fault tolerant, relational data store, and 4) Confer: a tool for conference planning (has been deployed at 13 academic conferences including CHI, CSCW, KDD, ACM MM, SIGMOD, SIGIR, and WSDM; more than 18,000 unique users). He received a M.S. in Computer Science from Stanford University, and a B.E. in Computer Engineering from the University of Pune. At Stanford, he worked in the Human-Computer Interaction (HCI) group with Scott Klemmer, and Jeff Heer.
