Suggested projects for the DBMS class

Here are some suggestions, you are welcome to propose your own.
Each project team will comprise of two or three members.

  1. Design and implement a rudimentary query optimizer for MySql.
  2. Read the Pico-DBMS paper from VLDB 2000 on building databases for small handheld devices. At the very least, you should implement the storage manager and query manager for such a system assuming the memory sizes as specified in that article. You can start with Berkeley-db or My-SQL, if that helps.
  3. Design and implement an index structure for supporting predicates on string columns of the form edit-distance("given-string") <= k. Use it to support approximate join on the string column of two relations where the join condition is that the edit distance between the two join attributes is <= k. Thispaper from VLDB 2001 proposes a method for doing this. You can implement either this or an improvement of this method. Extra credit for providing support for this in MySql.
  4. Design and implement an index structure for supporting subset match queries on string columns. This paper from VLDB 2001 proposes a method for doing this. You can implement either this, or one of the papers referred to in the related work of this paper. Extra credit for providing support for this in MySql.
  5. Study the transaction support in MySql. Design and implement support for finer granularity concurrency in MySql. You may assume there are no failures, therefore, you need not focus on recovery.
  6. Design and implement Bitmapped indexing support in MySql.
  7. Transaction support on LDAP. Two students, Anandi and Atuld (cse Mtechs) have developed advanced transaction support for updating LDAP storage repositories. Specifically, they have developed a middle layer that provides primitives using which entries in LDAP directories can be updated according to one of many advanced transaction model semantics. Your task is to understand these primitives and use them to build a set of advanced transaction models (in the process debug/stress what they have done).
  8. In this project, you will "attach" a database to a Web Proxy server. The idea is to extend the proxy cache with a database so that previous query results can be stored in this database and if a query were to be repeated, that is, there is a "query hit", the stored results can be sent to the client without having to go to the web server and from there to the database backend. For some ideas, see VLDB2001 paper.
  9. Design and implement support for better statistics collection in relational DBMS. You need to choose what kind of statistics you wish to maintain. Some examples are the number of distinct values of a column with predicates and amount of correlation between attributes. Reference
  10. Design and implement a method for finding approximate quantiles in relational databases.
  11. Design and implement a method for finding aggregates using a single pass in streaming data. Reference
  12. Design a method for optimizing the storage of multidimensional data in R-trees by exploiting regions with high density. This project will involve using the Berkeley GIST package that already supports the basic R-trees implementation. This project will require enhancing the implementation with support for dense regions.
  13. Application level recovery: So far what you have studied for recovery techniques work only on queries submitted to a DBMS.  In practice these queries are generated from a user program written in C++ or Java.  These programs also manipulate variables that are lost in case of a crash. What are the techniques available for recovering these programs?
  14. Design and implement a multi-pass relevance feedback based system for retrieving objects by example. Extra credit for implementing in MySql.
  15. Implement a rudimentary distributed query processing engine that will
  16. support simple join and select queries over a collection of homogeneous database sites.  Extra credit for updates on replicated data.

OLD Projects

Indexing


New database applications

Database Engines: