CS631 : Implementation techniques in DBMS

Autumn 2014

Previous years: Autumn 2013, Autumn 2012, Spring 2012, 2006, 2005 (Exams in 2012 Spring: midsem and endsem)
Instructor: S. Sudarshan
Teaching Assistants: E K Venkatesh (ekvenkatesh@cse), Mayuri Khardikar (mayuri@cse), Tarun Jain (tarunjain@cse), Shubham Kumar (shubhamiit@cse), Ankit (ankit@cse)
TA office hours: every weekday from 5 to 6 PM, in Infolab, SIC 212, Kanwal Rekhi building
Mon: Mayuri + Shubham
Tue: Venkatesh + Ankit
Wed: Tarun
Thu: Mayuri + Venkatesh + Shubham
Fri: Tarun + Ankit
Course Websites: In addition to this static Web page, we will also be using Moodle for assignment submissions, and Piazza for discussions. More information on Piazza sign up to be provided.
Textbook Database System Concepts, Silberschatz, Korth and Sudarshan, 6th edition (2010), McGraw Hill.
Other reading material will be made available periodically
Book Slides, Errata, solutions to selected exercises and other resources are available at: http://db-book.com

Course contents We will be covering Implementation techniques including storage and indexing, query processing, and transaction processing (Chapters 10-16), Database architectures (Chapters 17-19), Advanced Topics (Chapters 22-26). Chapters 27-29 will be self study, but will be touched upon briefly in class. We will also cover several research papers during the course. This year we will have an increased focus on Big Data, a hot area today. As part of the course assignments, you will write Map-Reduce programs on the Hadoop system. You will also set up PostgreSQL, and learn how to make small changes to it as part of the course assignments. The infrastructure for your course project will be your choice of PostgreSQL or Hyracks (a parallel database system from UC Irvine). If you wish to do it on any other infrastructure, you can discuss it with me.

Evaluation scheme Quizes/Homework 17%, Mid-sem 20%, Assignments 8%, Project 15%, and End-sem 40%
Note : All quizzes will be surprise quizzes. The best N-2 out of N quiz scores will be counted.
Audit Requirements: Must attend all classes and take all exams. No need to do homeworks/assignments/projects.

Lecture schedule Mon 8.30-9.25, Tue 9.30-10.25, Thu 10.35-11.30
Office hours: TBA
Project Information
  1. PostgreSQL project topic suggestions (under construction, will add more ideas)
  2. Hyracks project topic suggestions
  3. Project Groups TBA Information about Projects TBA

PostgreSQL Resources Click Here (Instruction on creating patch files for submitting your project)
Hyracks Resources Hyracks Overview Talk, Steps to set up Hyracks, Hyracks Demo steps

Schedule of Lectures

Lecture Date TopicNotes
1 July 21 Introduction/Overview + Chapter 10: Storage and File Structure Assignment: Download and compile PostgreSQL, and open in Eclipse, using instructions provided here
2 July 22, 24, 28 Chapter 10: Storage and File Structure (Cont.) Physical Storage in PostgreSQL, Look inside above link for information on file layout, database page layout, and free space map (and lots more details src/backend/storage/freespace/README). And how PostgreSQL stores oversized attributes using the TOAST technique
3 July 29, 31 Chapter 11: Indexing BigTable: CS632 Talk slides, (Extra reading: BigTable paper, Jeff Dean Video)
4 Aug 4 Chapter 11: Indexing (Cont.) Index access method interface in PostgreSQL (read only if you want to create new index types)
5 Aug 5, 7 Chapter 12: Query Processing .
6 Aug 11 Chapter 12: Query Processing (Cont.) .
8 TBD Extra class: Overview of PostgreSQL internals -1 PostgreSQL resources, in particular see Tom Lane's talk
9 Aug 11 Chapter 13: Query Optimization Statistics in PostgreSQL, and examples of row estimation in PostgreSQL in particular see how PostgreSQL special cases most common values (MCVs).
10 TBD Chapter 13: Query Optimization (Cont.) Assignments on query plans in PostgreSQL
11 TBD Extra class: Overview of PostgreSQL internals -2, Debugging in Eclipse ..
12 TBD Chapter 14 Transactions .
13 TBD Chapter 15: Concurrency Control .
14 TBD Chapter 15: Conc. Control (Cont)
TBD No Class (Midsem) .
TBD Midsemester Exam .
15 TBD Chapter 15: Conc. Control (Cont): Snapshot isolation .
16 TBD Chapter 16: Recovery .
17 TBD Chapter 16: Recovery (Cont) ..
18 TBD Chapter 16: Recovery (Cont): ARIES ..
19 TBD Chapter 17: Database Architecture Case study: Intel white paper on the SAP HANA system architecture
20 TBD Chapter 18: Parallel Databases .
21 TBD Chapter 18: Parallel Databases + Map Reduce Map reduce assignment
22 TBD Chapter 19: Distributed Databases .
23 TBD Chapter 19: Distributed Databases (Cont.) .
24 TBD Chapter 19: Distributed Databases (Cont.) Talk on distributed data storage (updated 3 Nov 2014) (slides 1-84, and 132-140 only)
(If you are interested you can read the BigTable paper )
You can also view the BigTable talk at video.google.com
NoSQL Databases (ppt)
25 TBD Chapter 5: Sections on OLAP and Warehousing ..
26 TBD Chapter 25: Spatial and Temporal Data and Mobility Just read the Chapter 25 slides on spatial indexing Read the R-Tree paper by Guttman before class, talk on Spatial Index
27 TBD Chapter 24: Advanced Application Development + Read Chapter 24 slides on performance tuning and performance benchmarks + Overview of CS 632 .