CS631 : Implementation techniques in DBMS

Spring 2012

Previous years:2006, 2005 (Exams in 2005: midsem and endsem)
Instructor: S. Sudarshan
Teaching Assistants: TBA

Textbook Database System Concepts, Silberschatz, Korth and Sudarshan, 6th edition (2010), McGraw Hill.
Other reading material will be made available periodically
Book Slides, Errata, solutions to selected exercises and other resources are available at: http://db-book.com

Course contents We will be covering Implementation techniques including storage and indexing, query processing, and transaction processing (Chapters 10-16), Database architectures (Chapters 17-19), Advanced Topics (Chapters 22-26). Chapters 27-29 will be self study, but will be touched upon briefly in class. We will also cover several research papers during the course. This year we will have an increased focus on Big Data, a hot area today. As part of the course assignments, you will write Map-Reduce programs on the Hadoop system. The infrastructure for your course project will be (by default) PostgreSQL, but if you wish to do it on any other infrastructure, you can discuss it with me.

Evaluation scheme Quizes 20%, Mid-sem 20%, Assignments + Project 20%, and End-sem 40% Homeworks: 5% (sure this adds up to 105% so what?)
Note : All quizzes will be surprise quizzes. The best N-2 out of N quiz scores will be counted.
Audit Requirements: Must attend all classes and take all exams. No need to do homeworks/assignments/projects.

Lecture schedule TBA:
Office hours: TBA
Project Information
  1. Project topic suggestions (under construction, will add more ideas)
  2. Project Groups TBA Information about Projects TBA

PostgreSQL Resources Click Here (Instruction on creating patch files for submitting your project)

Schedule of Lectures

Lecture Date TopicNotes
1 Jan 3 Introduction/Overview + Chapter 10: Storage and File Structure Assignment: Download and compile PostgreSQL, and open in Eclipse, using instructions provided here
2 Jan 10 Chapter 10: Storage and File Structure (Cont.) Physical Storage in PostgreSQL, Look inside above link for information on file layout, database page layout, and free space map (and lots more details src/backend/storage/freespace/README). And how PostgreSQL stores oversized attributes using the TOAST technique
3 Jan 13 Guest Lecture: Anupam Singh, Jovian Data: A multidimensional database for the cloud talk part 1, part 2
4 Jan 17 Chapter 11: Indexing Index access method interface in PostgreSQL (read only if you want to create new index types)
5 Jan 20 Chapter 11: Indexing (Cont.) Assignment on indexing and record representation (on moodle)
6 Jan 24 Chapter 12: Query Processing .
7 Jan 27 Chapter 12: Query Processing (Cont.) .
8 Jan 31 Intro. to Query Optimization .
8a Feb 1 Extra class: Overview of PostgreSQL internals -1 PostgreSQL resources, in particular see Tom Lane's talk
9 Feb 3 Chapter 13: Query Optimization (Cont.) Statistics in PostgreSQL, and examples of row estimation in PostgreSQL in particular see how PostgreSQL special cases most common values (MCVs).
10 Feb 7 Chapter 13: Query Optimization (Cont.) Assignments on query plans in PostgreSQL
10a Feb 8 Extra class: Overview of PostgreSQL internals -2, Debugging in Eclipse ..
11 Feb 10 Chapter 14 Transactions .
12 Feb 14 Chapter 15: Concurrency Control .
13 Feb 17 Chapter 15: Conc. Control (Cont)
Feb 21 No Class (Midsem) .
Feb 25 Midsemester Exam .
14 Feb 28 Chapter 15: Conc. Control (Cont): Snapshot isolation .
15 Mar 2 Chapter 16: Recovery .
16 Mar 6 Chapter 16: Recovery (Cont) ..
17 Mar 13 Chapter 16: Recovery (Cont): ARIES ..
18 Mar 16 Chapter 17: Database Architecture Case study: Intel white paper on the SAP HANA system architecture
19 Mar 20 Chapter 18: Parallel Databases .
20 Mar 21 Chapter 18: Parallel Databases + Map Reduce Map reduce assignment
21 Mar 27 Chapter 19: Distributed Databases .
22 Mar 30 Chapter 19: Distributed Databases (Cont.) .
23 Apr 10 Chapter 19: Distributed Databases (Cont.) NoSQL Databases (ppt)
24 Apr 11 Chapter 5: Sections on OLAP and Warehousing ..
25 Apr 13 Chapter 25: Spatial and Temporal Data and Mobility Read the R-Tree paper by Guttman before class, talk on Spatial Index
26 Apr 17 Guest Lecture by Vinayak Borkar: "Scaling Datalog for Machine Learning on Big Data", by Yingyi Bu, Vinayak Borkar, Michael J. Carey Joshua Rosen, Neoklis Polyzotis, Tyson Condie, Markus Weimer, Raghu Ramakrishnan Tech report Talk is not public yet, but is on the moodle site
27 Apr 17 Chapter 24: Advanced Application Development + Overview of Database Research .