CS 632:  Advanced DBMS 
 S. Sudarshan  
Spring 2010  
 
  Previous offerings: 
2009, 
2007, 
2006, 
2004, 
2003, 
2002, 
2001, 2000, 1999. 
 End paper and
Midsem paper
from 2009
About The Course
Reading material will consist primarily of research papers. 
All students will have to present a research paper
of their choice, either from the list below or other
papers subject to instructors approval.
There will also be two exams (midsem/endsem) and a course
project.
Anyone who does an exceptional course project that has the
potential to be a publishable paper is eligible for a 
straight AA grade.  Otherwise the grading breakup would be
midsem 25, endsem 40, project 25 and seminar presentation 10.
Textbook (for background material only)
Database System Concepts, 6th Ed.
Avi Silberschatz, Hank Korth, and S. Sudarshan.
McGraw Hill, 2010.
(book home page,
 Local copy of slides for all chapters)
 
The list of papers below is from 2009, and will get refined as we 
go along in 2010.  
Query Optimization
- 
Rule-Based Query Optimization using the 
Volcano Framework.  
 Chapter 2 from 
Multiquery Optimization and Applications,
 Prasan Roy, PhD thesis, IIT Bombay, 2000. 
ppt
 (Jan 4, 2010)
-  Efficient and Extensible Algorithms for Multi-Query Optimization,
 Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobhe,
 In ACM SIGMOD Conf. on the Management of Data., 2000.
ppt
 (Jan 7, 2010)
- 
Rewriting Procedures for Batched Bindings
 Ravindra Guravannavar and S. Sudarshan, VLDB 2008
 Talk (ppt)(Jan 11, 2010)
 Related papers, not required reading:
- 
Execution strategies for SQL subqueries
 Mostafa Elhemali, Cesar A. Galindo-Legaria, Torsten Grabs, Milind Joshi
 SIGMOD Conference 2007: 993-1004
 Talk
from SIGMOD 07 (ppt)
Class lecture (ppt) (Jan 14, 2010)
- 
Query Processing for SQL Updates
 Cesar A. Galindo-Legaria, Stefano Stefani, Florian Waas
 SIGMOD Conference 2004: 844-849
 talk (ppt) (18 Jan 2010) Adaptive Query Processing 
-  
Eddies: Continuously Adaptive Query Processing, 
 Avnur and Hellerstein, SIGMOD 2000.
 (Eddies(ppt)) (taken from
http://web.cs.wpi.edu/~cs561/s05/talks/eddy-sigmod00-cs561.ppt)
 (Adaptive Query Processing using Eddies (ppt) by Amol Deshpande) 
(Jan 21, 25, 2010)
-  
Robust Query Processing through Progressive Optimization,
 Volker Markl, Vijayshankar Raman, David E. Simmen, Guy M. Lohman, 
Hamid Pirahesh,
SIGMOD 2004: 659-670
 PPT (Jan 25, 28, 2010)
- 
Scalable Join Processing on Very Large RDF Graphs
 Thomas Neumann and Gerhard Weikum, SIGMOD 2009 (Feb 4, 2010)
 (talk on basic rdf3x,
 talk on scalabe join proc)
 IR and DB
-  Keyword Searching and Browsing 
in Databases using BANKS 
 Gaurav Bhalotia, Charuta Nakhe, Arvind Hulgeri, Soumen Chakrabarti and
S. Sudarshan, ICDE 2002
 (Long talk by Sudarshan, ppt),
(Short talk by Ramdas, pdf) 
(Feb 8 and 11 2010)
 
 Related papers, not required reading:
-  
Combining Keyword Search and Forms for Ad Hoc Querying of Databases
 Eric Chu, Akanksha Baid, Xiaoyong Chai, AnHai Doan and Jeffrey Naughton, 
SIGMOD 2009 (Feb 22, 2010) (talk (pptx))
 
 Related papers, not required reading:Week of 13-20 Feb: Midsemester ExamMassively Parallel Database/Storage Systems
-  Background reading: The parallel database chapter and the
distributed database chapter from DB Concepts.  
 Slides:  
Chapter 18: Parallel Databases, and 
Chapter 19: Distributed Databases (plus 3PC, not available on book site)
(Feb 22, 2010)
-  
Bigtable: A Distributed Storage System for Structured Data
 Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, OSDI 06)
 Video of talk by Jeff Dean: Local mp4 copy OR on video.google.com
 Class Presentation (Feb 25, 2010)
 -  You can also read about the Google AppEngines DataStore API, 
an API in Python, which is allegedly built on top of Google's
MegaStore, which itself is supposedly a relational engine on top of 
BigTable.  However, no details of Megastore are public, and 
the only online information comes from (believe it or not) 
a blog entry of James Hamilton of Microsoft SQL Server, 
derived from a talk by Jonas Karlsson at SIGMOD 2008.
 
- 
PNUTS: Yahoo!'s Hosted Data Serving Platform,
 Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni.
 VLDB (industry track) 2008.
 VLDB Talk by Brian Cooper (ppt) (March 4, 2010)
 Related papers, not required reading:
-  
MapReduce: Simplifed Data Processing on Large Clusters
 Jeffrey Dean and Sanjay Ghemawat, OSDI 2004,
 Talk by Dinesh Dharme (8 March 2010)
- 
Map-Reduce-Merge: Simplified Relational Data Processing on Large Cluster
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao and D. Stott Parker, 
SIGMOD 2007
 Talk by Senthilnathan N (8 March 2010)Database Testing
-  Reverse Query Processing 
 Carsten Binnig, Donald Kossmann and Eric Lo, ICDE 2007,
 Talk by  Bhupesh Chawda (11 March 2010)
 Related papers, not required reading:
- 
 Automating the Detection of Snapshot 
Isolation Anomalies
 Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan
 VLDB 2007: 1263-1274
 Talk by Shailendra Shrivastav (15 March 2010)
 Related papers, not required reading: Peer to Peer Systems 
-  
  Chord: A Scalable Peer-to-Peer Lookup Service for Internet
  Applications,
 I. Stoica, R. Morris, D. Karger, M. Frans Kaashoek, H. Balakrishnan,
 In Proc. ACM SIGCOMM 2001. 
Expanded version appears in IEEE/ACM Trans. Networking, 11(1), February 2003.
 P2P overview talk (in pdf)  and
 talk by
Mohammed Junaid and Gopalakrishnan S. (18 March 2010))
 Related papers, not required reading:
- 
A Scalable Content-Addressable Network,
 S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker, 
In Proc. ACM SIGCOMM 2001)
 
-    Querying the Internet with PIER
 Ryan Huebsch, Joseph M. Hellerstein, Nick Lanham, Boon Thau Loo,
Scott Shenker, and Ion Stoica, VLDB 03
 (Talk:ppt)
  Consistency and Asynchrony 
-  Consistency Rationing in the Cloud: Pay only when it matters
 Tim Kraska, Martin Hentschel, Gustavo Alonso and Donald Kossmann, VLDB 2009
 Talk by Sandeep and Rajashekar (22 March 2010)
-  
Asynchronous view maintenance for VLSD databases
 Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava, Raghu Ramakrishnan, SIGMOD 2009
 Talk by Purva Joshi (05 April 2010)Data Streams
-  
Finding the Frequent Items in Streams of Data
 Graham Cormode and Marios Hadjielftheriou, VLDB 2008 and CACM 52(1) Oct 2009
 Talk by 
 Ankur Agarwal (25 March 2010)
 
 Related paper, not required reading
Query Processing, Resource Management, and 
	Approximation in a Data Stream Management System 
 Motwani, Widom, Arasu, Babcock, Babu, Datar, Manku, Olston, 
	Rosenstein and Varma, CIDR 2003
 (PODS 2002 talk by Motwani)
 Data Storage
-  
Column-stores vs. row-stores: how different are they really?
 Daniel J. Abadi, Samuel Madden, Nabil Hachem:
 SIGMOD Conference 2008: 967-980
 Talk by 
 Karthik SR (07 April 2010).
See also VLDB 09 tutorial on column stores by Hariozopoulos, 
Abadi and BonczSecurity and Privacy
-  
Redundancy and Information Leakage
in Fine-Grained Access Control,
 Govind Kabra, Ravishankar Ramamurthy and S. Sudarshan
 Talk by 
 (Aditya Joshi and Subhait Datta, 08 April 2010)
 Also:  SIGMOD Talk, 
Overview of database security and 
an Overview of Finegrained Authorization
 
 
 Other interesting papers on privacy, not covered this year:
-  
l-Diversity: Privacy Beyond k-Anonymity,
 Ashwin Machanavajjhala, Johannes Gehrke, Daniel Kifer and
Muthuramakrishnan Venkitasubramaniam
 Talk: ppt
-  
Mondrian Multidimensional K-Anonymity
K. LeFevre, D. DeWitt, and R. Ramakrishnan. 
ICDE 2006
 
-  
Incognito: Efficient Full-Domain 
K-Anonymity.
 K. LeFevre, D. DeWitt, and R. Ramakrishnan, SIGMOD 2005.
 
-  
Protecting Privacy when Disclosing Information: k-Anonymity
and its Enforcement through Generalization and Suppression, 
 Pierangela Samarati and Latanya Sweeney,
 Procs. of the IEEE Symposium on Research in Security and Privacy, 1998.)
 Talk (pdf)
 Dependence Detection 
-  Integrating conflicting data: the role of source dependence.
 Xin Luna Dong, Laure Berti-Equille and Divesh Srivastava. 
Procs. VLDB Endowment (PVLDB), 2(1): 550-561, 2009.
 talk by Divesh Srivastava (12 April 2010)
 Additional reading XML Query Processing
-  Structural Joins: A Primitive for 
Efficient XML Query Pattern Matching, 
 D. Srivastava, S. Al-Khalifa, H.V. Jagadish, N. Koudas, J.M. Patel, Y.Wu, 
ICDE 2002.
 Talk by
Sandhya and Prabhas Samant (14 April 2010)
 Related papers, not required reading: Uncertain and Probabilistic Data
-  
OLAP Over Uncertain and Imprecise Data
Douglas Burdick, Prasad Deshpande, T. S. Jayram, Raghu Ramakrishnan 
and Shivakumar Vaithyanathan, VLDB 2005
 (Talk:
Olap basics(pdf) and 
OLAP on uncertain/imprecise data (pdf) 
)
 Talk odp and 
ppt
 and by T. S. Jayram
(15 April 2010)
(Related material if you are interested, but not part of CS632:
-  Current research directions in data management: A discussion
(15 April 2009)