Suggested projects for the DBMS class

Here are some suggestions, you are welcome to propose your own.
Each project team will comprise of two or three members.

Design and implement a rudimentary query optimizer for MySql.
Read the Pico-DBMS paper from VLDB 2000 on building databases for small handheld devices. At the very least, you should implement the storage manager and query manager for such a system assuming the memory sizes as specified in that article. You can start with Berkeley-db or My-SQL, if that helps.
Design and implement an index structure for supporting predicates on string columns of the form edit-distance("given-string") <= k. Use it to support approximate join on the string column of two relations where the join condition is that the edit distance between the two join attributes is <= k. Thispaper from VLDB 2001 proposes a method for doing this. You can implement either this or an improvement of this method. Extra credit for providing support for this in MySql.
Design and implement an index structure for supporting subset match queries on string columns. This paper from VLDB 2001 proposes a method for doing this. You can implement either this, or one of the papers referred to in the related work of this paper. Extra credit for providing support for this in MySql.
Study the transaction support in MySql. Design and implement support for finer granularity concurrency in MySql. You may assume there are no failures, therefore, you need not focus on recovery.
Design and implement Bitmapped indexing support in MySql.
Transaction support on LDAP. Two students, Anandi and Atuld (cse Mtechs) have developed advanced transaction support for updating LDAP storage repositories. Specifically, they have developed a middle layer that provides primitives using which entries in LDAP directories can be updated according to one of many advanced transaction model semantics. Your task is to understand these primitives and use them to build a set of advanced transaction models (in the process debug/stress what they have done).
In this project, you will "attach" a database to a Web Proxy server. The idea is to extend the proxy cache with a database so that previous query results can be stored in this database and if a query were to be repeated, that is, there is a "query hit", the stored results can be sent to the client without having to go to the web server and from there to the database backend. For some ideas, see VLDB2001 paper.
Design and implement support for better statistics collection in relational DBMS. You need to choose what kind of statistics you wish to maintain. Some examples are the number of distinct values of a column with predicates and amount of correlation between attributes. Reference
Design and implement a method for finding approximate quantiles in relational databases.
Design and implement a method for finding aggregates using a single pass in streaming data. Reference
Design a method for optimizing the storage of multidimensional data in R-trees by exploiting regions with high density. This project will involve using the Berkeley GIST package that already supports the basic R-trees implementation. This project will require enhancing the implementation with support for dense regions.
Application level recovery: So far what you have studied for recovery techniques work only on queries submitted to a DBMS. In practice these queries are generated from a user program written in C++ or Java. These programs also manipulate variables that are lost in case of a crash. What are the techniques available for recovering these programs?
Design and implement a multi-pass relevance feedback based system for retrieving objects by example. Extra credit for implementing in MySql.
Implement a rudimentary distributed query processing engine that will

OLD Projects

Indexing

Indexing for querying shapes of time series: Given a collection of time series, the project will develop methods for efficiently querying the shapes of the time series. Example queries will be of the form: give me all time series that dropped continuously for 4 consecutive years, or give me all that showed a sudden jump at any point in time. The project will involve first preprocessing the time series to collect these trend information and then indexing them appropriately to allow fast query answering thus.

Compare bitmapped indexing with R-trees for indexing multidimensional data: Bitmapped indexing is a recently popularized variant of B-tree indexing that is believed to work well on multidimensional warehouse data when the attributes have low cardinality. [More reading available]. R-tree is a classical method for indexing multidimensional data. This can be adapted to specifically handle multidimensional data by preprocessing data to identify dense and sparse regions of the data. This project will involve using the Berkeley GIST package for comparing these methods. The R-tree method is already implemented but bit-mapped indexing will need to be reimplemented.

Other ideas:

Similarity index for arbitrary subset of attributes
Indexing tagged data
Multi-key hashing on berkeley db

New database applications

Database extenders:

A smart calendar application on a database: this project will involve skilful use of triggers and a good database design to not only support a normal calendar application (See calendar supplied with Microsoft Outlook as an example), but also good analysis queries on the calendar e.g. plot the graph of business versus month of the year.
Build an mailing system on top of a database system instead of a file system. Can you do anything smart about sharing long emails sent to multiple recipients?

Database Engines:

Mobile database system: how is data consistency maintained in mobile database system. Consider a model where primary copy of data resides in large central servers and individual check out part of the data from that central server. After that they can do any changes they want on that data. How do you sync data when the mobile agent connects back. How do you efficiently determine the difference between multiple data types and do you resolve conflicts? Read about these issues and implement some part of the functionality using one of the existing systems.