CS 632 Advanced Data Base(d) Management Systems Home Page

CS 632: Advanced DBMS

S. Sudarshan

Spring 2013

All students must sign up for CS 632 on Piazza; click here to sign up.

Previous offerings: 2011, 2010, 2009, 2007, 2006, 2004, 2003, 2002, 2001, 2000, 1999.

End sem paper and Midsem paper from 2011

About The Course

Reading material will consist primarily of research papers. All students will have to present a research paper of their choice, either from the list below or other papers subject to instructors approval. There will also be two exams (midsem/endsem), assignments, and a course project.

Anyone who does an exceptional course project that has the potential to be a publishable paper is eligible for a straight AA grade. Otherwise the grading breakup would be midsem 25, endsem 40, project 20 and assignments plus seminar presentation 15 (the breakup of these will depend on whether we have individual or joint seminars, which depends on the final enrollment).

Assignments To be decided.

Project This year the project is mandatorily an implementation oriented project, unlike some previous years where a literature survey was acceptable as a project. (You may still need to do some literature survey to figure out your project though.) Projects should be done in groups of 2.

A basic project will take any of the papers we study in the course, or other related papers, and implement the algorithms in the paper, and do a very basic performance study. However, I would expect most projects to improve upon existing techniques.

A more advanced project would take a problem specification for which no solution is publicly available, figure out how to solve it, and implement the solution.

Textbook (for background material only)

Database System Concepts, 6th Ed.
Avi Silberschatz, Hank Korth, and S. Sudarshan. McGraw Hill, 2010.
(book home page) <

Query Optimization
1. (Jan 7/8)	Rule-Based Query Optimization using the Volcano Framework. Chapter 2 from Multiquery Optimization and Applications, Prasan Roy, PhD thesis, IIT Bombay, 2000. Related papers, not required reading: Prasan's full thesis The Volcano Optimizer Generator: Extensibility and Efficient Search. Goetz Graefe, William J. McKenna ICDE 1993: 209-218 The Cascades Framework for Query Optimization. Goetz Graefe IEEE Data Eng. Bull. 18(3): 19-29 (1995)	Talk (ppt)
2. (Jan 10/14)	Efficient and Extensible Algorithms for Multi-Query Optimization, Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobhe, In ACM SIGMOD Conf. on the Management of Data., 2000.	Talk (ppt)
3. (Jan 15/17)	Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan, VLDB 2008 Related papers, not required reading: Optimizing Nested Queries with Parameter Sort Orders Ravindra Guravannavar, Ramanujam H.S., S. Sudarshan Talk (ppt) Reducing Order Enforcement Cost in Complex Query Plans Ravindra Guravannavar and S. Sudarshan, ICDE 2007 Tech Report (@arxiv.org) Talk (ppt)	Talk (ppt)
4. (Jan 21/22)	Execution strategies for SQL subqueries Mostafa Elhemali, Cesar A. Galindo-Legaria, Torsten Grabs, Milind Joshi SIGMOD Conference 2007: 993-1004 (Talk from SIGMOD 07 (ppt)) Related papers, not required reading: Query Processing for SQL Updates Cesar A. Galindo-Legaria, Stefano Stefani, Florian Waas SIGMOD Conference 2004: 844-849 Talk (ppt)	Talk (ppt)
Adaptive Query processing
5. (Jan 28/29)	Eddies: Continuously Adaptive Query Processing, Avnur and Hellerstein, SIGMOD 2000. (Talk taken from http://web.cs.wpi.edu/~cs561/s05/talks/eddy-sigmod00-cs561.ppt) (Adaptive Query Processing using Eddies (ppt) by Amol Deshpande) (Jan 25, 2011)	Talk (ppt)
6. (Jan 29/31)	Robust Query Processing through Progressive Optimization, Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M. Lohman, Hamid Pirahesh, SIGMOD 2004: 659-670	Talk (ppt)
Massively Parallel Data Management Systems (a.k.a. Big Data Systems)
Background reading: The parallel database chapter and the distributed database chapter from DB Concepts. Slides: Chapter 18: Parallel Databases, and Chapter 19: Distributed Databases (plus 3PC, not available on book site)
7. (Feb 4/5)	Bigtable: A Distributed Storage System for Structured Data Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 06) Video of talk by Jeff Dean: Local mp4 copy OR on video.google.com Related papers, not required reading: Megastore: Providing Scalable, Highly Available Storage for Interactive Services, Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson, Jean-Michel Leon, Yawei Li, Alexander Lloyd, and Vadim Yushprakh CIDR 2011 You can also read about the Google AppEngines DataStore API, an API in Python.	Talk (ppt)
8. (Feb 5/7)	PNUTS: Yahoo!'s Hosted Data Serving Platform, Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni. VLDB (industry track) 2008. Related papers, not required reading: database implementation on S3 (Brantner et al SIGMOD 2008)	VLDB Talk by Brian Cooper (ppt)
9. (Feb 11/12)	Asynchronous view maintenance for VLSD databases Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan, SIGMOD 2009 Old talk from 2010 (ppt) Related papers, not required reading: The Megastore paper (see above), to understand how it does asynchronous maintenance of indices.	Talk (odp) and (pdf)
10. (Feb 12)	SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets Ronnie Chaiken, Bob Jenkins, Per-Åke Larson, Bill Ramsey, Darren Shakib, Simon Weaver, and Jingren Zhou, VLDB 2008 Related papers, not required reading: Hive - a petabyte scale data warehouse using Hadoop, A. Thusoo, J. S. Sarma, N. Jain, Shao Zheng, P. Chakka, Zhang Ning, S. Antony, Liu Hao, and R. Murthy, ICDE 2010 Pig Latin: A Not-So-Foreign Language for Data Processing Chris Olston, Brian Reed, Utarsh Srivastava, Ravi Kumar and Andrew Tomkins SIGMOD 2008, Talk (ppt)	Talk (pptx)
11. (Feb 14)	Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer Jingren Zhou, Per-Ake (Paul) Larson, and Ronnie Chaiken, in Proc. of the Int'l Conf. on Data Engineering (ICDE), 2010.	Talk (pptx)
Week of 18-23: Midsemester Exam
12. (Feb 25)	Guest lecture on HSearch by Abinasha Karana, Founder and CTO Bizosys, Bangalore	Talk (pdf)
IR and DB
13 and 14. (Feb 26, 28)	Keyword Searching and Browsing in Databases using BANKS Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, Soumen Chakrabarti and S. Sudarshan, ICDE 2002 Related papers, not required reading: Bidirectional Expansion For Keyword Search on Graph Databases, Varun Kacholia, Shashank Pandit, Soumen Chakrabarti, S Sudarshan, Rushi Desai and Hrishikesh Karambelkar, VLDB 2005 (Talk: ppt)	Talk (ppt)
Peer to Peer Systems
15. (Mar 4 or 5)	Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications, I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan, In Proc. ACM SIGCOMM 2001. Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003. P2P overview talk (in pdf) and Talk (pptx)) Old talk from 2010 Related papers, not required reading: A Scalable Content-Addressable Network, S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, In Proc. ACM SIGCOMM 2001) Querying the Internet with PIER Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo, Scott Shenker, and Ion Stoica, VLDB 03 (Talk:ppt)	Talk (pdf) by Durgesh Samant
Big Data (again)
16. (Mar 5)	Spanner: Google's Globally-Distributed Database James C. Corbett et al., OSDI 2012	Talk (pptx) by Sagar Chordia
Column Stores
17. (Mar 7)	Column-stores vs. row-stores: how different are they really? Daniel J. Abadi, Samuel Madden, Nabil Hachem: SIGMOD Conference 2008: 967-980 Talk from 2010 and Talk from 2011 (ppt) by Paresh Modak and Souman Mandal . See also VLDB 09 tutorial on column stores by Hariozopoulos, Abadi and Boncz	Talk (pdf) (source files) by Subhro Bhattacharyya and Souvik Pal
Streaming Data
18. (Mar 11, 5 PM)	Monitoring Streams - A New Class of Data Management Applications, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik VLDB 2002: 215-226 Talk (pptx) by Joydip Datta and Debarghya Majumdar You must also read this talk: (PODS 2002 talk by Motwani) Related papers, not required reading Aurora: A New Model and Architecture for Data Stream Management. Abadi, D. J., Carney, D., Cetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., and Zdonik, S. The VLDB Journal 12 (2003), 120-139. Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J.-H., Lindner, W., Maskey, A. S., Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y., and Zdonik, S. The Design of the Borealis Stream Processing Engine. In Proceedings of the 2nd Conference on Innovative Databasee Research (CIDR) (Jan. 2005), pp. 277-289. Models and issues in data stream systems, Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, Jennifer Widom PODS 2002 (PODS 2002 talk by Motwani)	Talk (pptx) by Ajay Gupta,Vinit Deodhar
19. (Mar 12)	Physically Independent Stream Merging Badrish Chandramouli, David Maier and Jonathan Goldstein ICDE 2012	Talk (pdf) by Amol Bhangdiya and Pushkar Khadilkar
Big Data yet again
20. (Mar 14)	Calvin: Fast Distributed Transactions for Partitioned Database systems Systems Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi. SIGMOD 2012	Talk (pptx) by K. V. Mahesh and Abhishek Gupta
Declarative Data Processing (outside of databases)
21. (Mar 18)	Declarative Networking Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica CACM 52(11), Nov 2009	Talk (pptx) by Harsh Vardhan and Sandeep Joshi
22. (Mar 19)	Scalability for Virtual Worlds Nitin Gupta, Alan J. Demers, Johannes Gehrke, Philipp Unterbrunner, Walker M. White ICDE 2009 Talk (ppt) by Siddharth Chinoy and Zibran Shaikh Related papers, not required reading: SEMMO: A Scalable Engine for Massively Multiplayer Online Games (Demonstration Paper) Nitin Gupta, Alan Demers, and Johannes Gehrke, SIGMOD 2008 Database Research Opportunities in Computer Games Walker White, Christoph Koch, Nitin Gupta, Johannes Gehrke, and Alan Demers, In SIGMOD Record, September 2007.	Talk (pptx) by Pratik Patre and Biplab Kar
Test Data Generation
23. (Mar 25)	Generating Test Data for Killing SQL Mutants: A Constraint-based Approach, Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, Devang Vira, ICDE 2011 Related papers, not required reading: X-Data: Test Data Generation for Killing SQL Mutants Bhanu Pratap Gupta, Devang Vira and S. Sudarshan, ICDE 2010 (short paper). Talk at ICDE	Talk (ppt) by Shetal Shah
Big Data Returns once more
24. (Mar 25)	Nephele/PACTs: a programming model and execution framework for web-scale analytical processing Dominic Battré, Stephan Ewen, Fabian Hueske, Odej Kao, Volker Markl, Daniel Warneke SoCC 2010: 119-130	Talk by Karthik Ramachandra (DeWitt's slides (pptx) and Karthik's slides (pptx)
OLAP
25. (Apr 1)	OLAP Over Uncertain and Imprecise Data Douglas Burdick, Prasad Deshpande, T. S. Jayram, Raghu Ramakrishnan and Shivakumar Vaithyanathan, VLDB 2005 (Talk: Olap basics(pdf) and OLAP on uncertain/imprecise data (Talk from 2011 in ppt), Talk1 from 2010 in (pdf) ) and Talk 2 in odp and in ppt and by T. S. Jayram (Related material if you are interested, but not part of CS632: Probablistic Databases talk by Amol Deshpande on 12 Jan 2006) Probabilistic databases: diamonds in the dirt (and an extended version) Nilesh Dalvi, Christopher Re and Dan Suciu, CACM 52(7) July 2009	Talk by Haseeb and Deepankar
Big Data Yet Again and Again
26. (Apr 2)	Pregel: a system for large-scale graph processing Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, SIGMOD 2010. Talk (pptx) by (Gaurav Malpani and Mayank Singhal) (Feb 18, 2011) Related papers, not required reading: Keyword Search on External Memory Data Graphs Bhavana Dalvi, Meghana Kshirsagar and S. Sudarshan, Proceedings of the VLDB 1(1), 1189-1204, 2008 ( ppt)	Talk (pptx) by Aishwarya G and Subashish Saha
27. (Apr 4)	Automated Selection of Materialized Views and Indexes for SQL Databases by Sanjay Agrawal, Surajit Chaudhuri and Vivek Narasayya.	Talk (pptx) by Hasan Kumar Reddy A.
28. (Apr 8)	Overflow from previous classes.
29. (Apr 9)	Highly available storage systems - A Practioner's perspective Talk by Joydeep Sen Sarma (Co-Founder Qubole, Bangalore, and co-creator of Hive) Abstract: In recent years a large number distributed systems have come up that aim to provide reliable implementations of stateful applications (like file systems, hash-tables) in the presence of different types of failures. We will do a quick survey of such applications, look at failure scenarios a system designer has to deal with and some common patterns on choosing tradeoffs in consistency, availability and partitioning (CAP theorem) in building such systems. I will be providing perspectives from my experience with building and operating such systems. A good reference for this talk is Dr. Brewer's recent summary on CAP theorem: http://www.infoq.com/articles/cap-twelve-years-later-how-the-rules-have-changed	Talk (pptx) by Joydeep Sen Sarma
30. (Thu Apr 11)	April 11 is actually a holiday, but still, view this video and we will discuss it in a later class. Facebook News Feed: Social Data at Scale by Serkan Piantino
31. (Mon Apr 15)	RDF-3X: a RISC-style Engine for RDF Thomas Neumann, Gerhard Weikum, VLDB 2008	Talk (pptx) by Pankaj Vanwari
32. (Tue Apr 16)	Discussion on future of data management