ITDBMS Project

Requirements

4 students/project (except in very special cases)
Make your own arrangements with respect to machines for developing/testing code.
Projects to be finalized before midsem exams. Would suggest doing it much earlier and getting started on the project ASAP.
Will have a project checkpoint about a month after midsem exams.

Note: papers from ACM or IEEE sites can be accessed through the IITB library web site (library.iitb.ac.in)

PostgreSQL Projects

Note: If you choose to do one of these projects, first set up a copy of PostgreSQL source on your computer, compile it and get it running. Then start modifying it to implement your chosen project.

Case insensitive indexing/join in PGSQL (**)
Things to be done: modify indexing code (search and insert), join code (at least one of the join techniques, say indexed NL join, perhaps more than one of them). Have a system-level flag which indicates whether query processing should be case insensitive (this flag should be set before setting up a postgreSQL system). Bonus points: modify parser to allow case-insensitivity to be declared as part of the index specification, and as part of equality specification.
New index types using GiST
Reference for GiST: Joseph M. Hellerstein, Jeffrey F. Naughton, Avi Pfeffer: Generalized Search Trees for Database Systems. VLDB 1995
see also: GiST for PostgreSQL information at http://www.sai.msu.su/~megera/postgres/gist/
Multi-relation/column keyword indices (**) Read up about keyword indices (inverted indices) from Soumen Chakrabarti's excellent book "Mining the Web", and implement them. You can use GiST for this if you want. There are some implementations of text search built on GiST already available (see http://www.sai.msu.su/~megera/postgres/gist/). I don't think they work with multiple relations, make modified versions that will handle multiple relations in one index.
Performance monitor for PGSQL + add on to pgAdmin3 (**) (to be divided into two or more groups, each doing front/backend) E.g. what queries are currently running, how much resources each has used, facility to terminate a particular query. -- Lock queue/contention information Modules affected: query processing module pgadmin concurrency control module
Query decorrelation for PGSQL (***)
Queries with nested subqueries can be implemented using nested iteration. This is often very expensive, and many nested subqueries can be transformed to queries with joins/outerjoins. Read Galindo-Legaria/Joshi paper (see below) and implement their decorrelation technique in PostgreSQL. You will have to understand how PostgreSQL parses and represents queries and then add a stage just after parsing to perform decorrelation. Decorrelation can be done always, don't worry about whether it will provide a performance benefit.
Reference: Cesar A. Galindo-Legaria, Milind Joshi: Orthogonal Optimization of Subqueries and Aggregation. SIGMOD Conference 2001.
Progress estimator for SQL queries (***)
Have you ever waited for a long SQL query to complete, and had no idea how much more time it would take to complete (OK, perhaps you haven't but I'm sure you have waited for long classes to end, the principle is similar :-). Relax, help is at hand. There are a few recent papers on how to estimate how much more time a query execution will take. Implement (parts of) one of them on the PGSQL query evaluation engine. If you handle just fully pipelined queries, that should be sufficient.
References:
- Surajit Chaudhuri, Vivek R. Narasayya, Ravishankar Ramamurthy: Estimating Progress of Long Running SQL Queries. SIGMOD Conference 2004
- Gang Luo, Jeffrey F. Naughton, Curt Ellmann, Michael Watzke: Toward a Progress Indicator for Database Queries. SIGMOD Conference 2004: 791-802
Delta relation system for PGSQL (***) Keep logical log of relation updates with each transaction Output to a global logical log on transaction commit, with transaction counter -- optional: simple view maintenance based on above.

Generic Internals Projects

Cache conscious in-memory b-tree implementation (**)
Read paper by Rao and Ross (see below) and implement it either by modifying postgreSQL B+-trees, or on a stand alone B+-tree implementation of your own. Do a performance comparison with regular B-trees.
Reference:
Jun Rao, Kenneth A. Ross: Making B+-Trees Cache Conscious in Main Memory. SIGMOD Conference 2000

Delite Oriented Projects

Incorporating Database Security in DELite
Log-based Recovery System for DELite

Application Oriented Projects

To be added ...