CS 632: Advanced DBMS

S. Sudarshan

Spring 2014  

All students must sign up for CS 632 on Piazza; click here to sign up for CS 632, Spring 2014.

Previous offerings: 2013, 2011, 2010, 2009, 2007, 2006, 2004, 2003, 2002, 2001, 2000, 1999.

End sem paper and Midsem paper from 2011

About The Course

Reading material will consist primarily of research papers. All students will have to present a research paper of their choice, either from the list below or other papers subject to instructors approval. There will also be two exams (midsem/endsem), assignments, and a course project.

Anyone who does an exceptional course project that has the potential to be a publishable paper is eligible for a straight AA grade. Otherwise the grading breakup would be midsem 25, endsem 40, project 20 and assignments plus seminar presentation 15 (the breakup of these will depend on whether we have individual or joint seminars, which depends on the final enrollment).

Assignments To be decided.

Project This year the project is mandatorily an implementation oriented project, unlike some previous years where a literature survey was acceptable as a project. (You may still need to do some literature survey to figure out your project though.) Projects should be done in groups of 2.

A basic project will take any of the papers we study in the course, or other related papers, and implement the algorithms in the paper, and do a very basic performance study. However, I would expect most projects to improve upon existing techniques.

A more advanced project would take a problem specification for which no solution is publicly available, figure out how to solve it, and implement the solution.

Textbook (for background material only)

Database System Concepts, 6th Ed.
Avi Silberschatz, Hank Korth, and S. Sudarshan. McGraw Hill, 2010.
(book home page)
Query Optimization
1. (Jan 6/7) Rule-Based Query Optimization using the Volcano Framework.
Chapter 2 from Multiquery Optimization and Applications,
Prasan Roy, PhD thesis, IIT Bombay, 2000.

Related papers, not required reading:
Talk (ppt)
2. (Jan 9/13) Efficient and Extensible Algorithms for Multi-Query Optimization,
Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobhe,
In ACM SIGMOD Conf. on the Management of Data., 2000.
Talk (ppt)
3. (Jan 14/16) Rewriting Procedures for Batched Bindings
Ravindra Guravannavar and S. Sudarshan, VLDB 2008

Related papers, not required reading:
Talk (ppt)
4. (Jan 21/22) Execution strategies for SQL subqueries
Mostafa Elhemali, Cesar A. Galindo-Legaria, Torsten Grabs, Milind Joshi
SIGMOD Conference 2007: 993-1004
(Talk from SIGMOD 07 (ppt))

Related papers, not required reading:
Talk (ppt)
Adaptive Query processing
5. (Jan 27) Robust Query Processing through Progressive Optimization,
Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M. Lohman, Hamid Pirahesh, SIGMOD 2004: 659-670
Talk (ppt)
Massively Parallel Data Management Systems (a.k.a. Big Data Systems)
Background reading: The parallel database chapter and the distributed database chapter from DB Concepts.
Slides: Chapter 18: Parallel Databases, and Chapter 19: Distributed Databases (plus 3PC, not available on book site)
6. (Jan 30) Bigtable: A Distributed Storage System for Structured Data
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 06)
Video of talk by Jeff Dean: Local mp4 copy OR on video.google.com

Related papers, not required reading:
Talk (ppt)
7. (Jan 30, Feb 3) PNUTS: Yahoo!'s Hosted Data Serving Platform,
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni.
VLDB (industry track) 2008.

Related papers, not required reading:
  • database implementation on S3 (Brantner et al SIGMOD 2008)
VLDB Talk by Brian Cooper (ppt)
8. (Feb 4) Asynchronous view maintenance for VLSD databases
Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan, SIGMOD 2009

Old talk (odp) and (pdf)
Related papers, not required reading:
  • The Megastore paper (see above), to understand how it does asynchronous maintenance of indices.
Talk (ppt)
9. (Feb 6) Incorporating Partitioning and Parallel Plans into the SCOPE Optimizer
Jingren Zhou, Per-Ake (Paul) Larson, and Ronnie Chaiken,
in Proc. of the Int'l Conf. on Data Engineering (ICDE), 2010.
Related papers, not required reading:
Talk (pptx)
10. (Feb 10, 11, 13) Spanner: Google's Globally-Distributed Database
James C. Corbett et al., OSDI 2012
Talk (pptx)
Week of 18-23: Midsemester Exam
11. (Feb 24) Pregel: a system for large-scale graph processing
Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, Grzegorz Czajkowski, SIGMOD 2010.
Talk (pptx) by (Gaurav Malpani and Mayank Singhal) (Feb 18, 2011)
Related papers, not required reading:
Talk (pptx)
12. (Feb 25, 27) Calvin: Fast Distributed Transactions for Partitioned Database systems Systems
Alexander Thomson, Thaddeus Diamond, Shu-Chun Weng, Kun Ren, Philip Shao, and Daniel J. Abadi.
SIGMOD 2012
Talk (pptx)
13. (Mar 3) Hekaton: SQL server's memory-optimized OLTP engine.
Cristian Diaconu, Craig Freedman, Erik Ismert, Per-Åke Larson, Pravin Mittal, Ryan Stonecipher, Nitin Verma, Mike Zwilling
SIGMOD Conference 2013: 1243-1254
Talk ha been put up on Moodle site
Column Stores
14. (Mar 4) Column-stores vs. row-stores: how different are they really?
Daniel J. Abadi, Samuel Madden, Nabil Hachem:
SIGMOD Conference 2008: 967-980
Talk from 2010, Talk from 2011 (ppt) by Paresh Modak and Souman Mandal , and Talk from 2010 (pdf) (Talk from 2010 source files) by Subhro Bhattacharyya and Souvik Pal
See also VLDB 09 tutorial on column stores by Hariozopoulos, Abadi and Boncz
Related papers, not required reading
Talk (pdf) by E. K. Venkatesh
15. (Mon Mar 10, Tue Mar 11) RDF-3X: a RISC-style Engine for RDF
Thomas Neumann, Gerhard Weikum, VLDB 2008
Talk in 2013 (pptx) by Pankaj Vanwari
Talk (odp) by Karan Punamiya
Streaming Data
16. (Thu Mar 13, Mon March 18) Monitoring Streams - A New Class of Data Management Applications,
Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Greg Seidman, Michael Stonebraker, Nesime Tatbul, Stanley B. Zdonik
VLDB 2002: 215-226
Talk in 2011 (pptx) by Joydip Datta and Debarghya Majumdar
, and Talk in 2013 (pptx) by Ajay Gupta,Vinit Deodhar You must also read this talk: (PODS 2002 talk by Motwani)
Related papers, not required reading
Talk (pptx) by Bharat Radhakrishnan
OLAP
17. (Mar 18/20/24) OLAP Over Uncertain and Imprecise Data Douglas Burdick, Prasad Deshpande, T. S. Jayram, Raghu Ramakrishnan and Shivakumar Vaithyanathan, VLDB 2005
(Talk: Olap basics(pdf) and OLAP on uncertain/imprecise data (Talk from 2011 in ppt),
Talk1 from 2010 in (pdf) ) and Talk 2 in odp and in ppt
and by T. S. Jayram
(Related material if you are interested, but not part of CS632:
Talk (pptx) by Raghav Sagar
Test Data Generation
18. (Mar 24/April 1) Generating Test Data for Killing SQL Mutants: A Constraint-based Approach,
Shetal Shah, S. Sudarshan, Suhas Kajbaje, Sandeep Patidar, Bhanu Pratap Gupta, Devang Vira, ICDE 2011
Talk (ppt) by Shetal Shah in 2013
Related papers, not required reading:
Talk (ppt) by Sunny Raj Rathod
Declarative Data Processing (outside of databases)
19. (April 3/7) Declarative Networking
Boon Thau Loo, Tyson Condie, Minos Garofalakis, David E. Gay, Joseph M. Hellerstein, Petros Maniatis, Raghu Ramakrishnan, Timothy Roscoe, and Ion Stoica
CACM 52(11), Nov 2009
Talk (pptx) by Harsh Vardhan and Sandeep Joshi
Talk (pdf) by Sandeep Kale
20. (April 7/8) CrowdDB: answering queries with crowdsourcing Michael J. Franklin, Donald Kossmann, Tim Kraska, Sukriti Ramesh and Reynold Xin
ACM SIGMOD 2011
Talk (and talk sources) by Anil Shanbag
21. (Apr 8/9) Automated Selection of Materialized Views and Indexes for SQL Databases by Sanjay Agrawal,
Surajit Chaudhuri and Vivek Narasayya.
VLDB 2000
Talk from 2013 (pptx) by Hasan Kumar Reddy A.
Talk (odp) and (pdf) by Tarun Jain
22. (Apr 10/14) Scalability for Virtual Worlds
Nitin Gupta, Alan J. Demers, Johannes Gehrke, Philipp Unterbrunner, Walker M. White
ICDE 2009
Talk in 2011 (ppt), and Talk in 2013 (pptx) by Pratik Patre and Biplab Kar by Siddharth Chinoy and Zibran Shaikh
Related papers, not required reading:
Talk (odp) and pdf by Mayuri Khardikar
IR and DB
23. (April 14,15) Keyword Searching and Browsing in Databases using BANKS
Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, Soumen Chakrabarti and S. Sudarshan, ICDE 2002

Related papers, not required reading:
Talk (ppt)
(Wed Apr 16) Discussion on future of data management Check out: Facebook News Feed: Social Data at Scale by Serkan Piantino