Suggested projects for the DBMS class
Here are
some suggestions, you are welcome to propose your own.
Each project team will comprise of two or three
members.
- Design and implement a rudimentary query optimizer for MySql.
-
Read the
Pico-DBMS paper from VLDB 2000 on building databases for small handheld
devices.
At the very least, you should implement the storage manager and query manager for such
a system assuming the memory sizes as specified in that article.
You can start with Berkeley-db or My-SQL, if that helps.
- Design and implement an index structure for supporting predicates
on string columns of the form edit-distance("given-string") <= k. Use
it to support approximate join on the string column of two relations
where the join condition is that the edit distance between the two
join attributes is <= k. Thispaper from
VLDB 2001 proposes a method for doing this. You can implement either
this or an improvement of this method. Extra credit for providing
support for this in MySql.
- Design and implement an index structure for supporting subset
match queries on string columns. This
paper from VLDB 2001 proposes a method for doing this. You can
implement either this, or one of the papers referred to in the related
work of this paper. Extra credit for providing support for this in
MySql.
- Study the transaction support in MySql. Design and implement
support for finer granularity concurrency in MySql. You may assume
there are no failures, therefore, you need not focus on recovery.
-
Design and implement Bitmapped indexing support in MySql.
- Transaction support on LDAP. Two students, Anandi and Atuld
(cse Mtechs) have developed advanced transaction support for updating
LDAP storage repositories. Specifically, they have developed a middle
layer that provides primitives using which entries in LDAP directories
can be updated according to one of many advanced transaction model
semantics. Your task is to understand these primitives and use them
to build a set of advanced transaction models (in the process debug/stress what they have done).
- In
this project, you will "attach" a database to a Web Proxy server. The
idea is to extend the proxy cache with a database so that previous
query results can be stored in this database and if a query were to
be repeated, that is, there is a "query hit", the stored results can be sent to the client
without having to go to the web server and from there to the database backend.
For some ideas, see VLDB2001
paper.
- Design and implement support for better statistics collection in
relational DBMS. You need to choose what kind of statistics you wish
to maintain. Some examples are the number of distinct values of a
column with predicates and amount of correlation between attributes. Reference
- Design and implement a method for finding approximate quantiles
in relational databases.
- Design and implement a method for finding aggregates using a
single pass in streaming data. Reference
- Design a method for optimizing the storage of multidimensional
data in R-trees by exploiting regions with high density. This project
will involve using the Berkeley GIST package that already supports the
basic R-trees implementation. This project will require enhancing the
implementation with support for dense regions.
-
Application level recovery: So far what you have studied for recovery
techniques work only on queries submitted to a DBMS. In practice
these queries are generated from a user program written in C++ or
Java. These programs also manipulate variables that are lost in
case of a crash. What are the techniques available for recovering
these programs?
-
Design and implement a multi-pass relevance feedback based system for
retrieving objects by example. Extra credit for implementing in MySql.
-
Implement a rudimentary distributed query processing engine that will
support simple join and select queries over a collection of
homogeneous database sites. Extra credit for updates on replicated
data.
OLD Projects
Indexing
-
Indexing for querying shapes of time series: Given
a collection of time series, the project will develop methods for efficiently
querying the shapes of the time series. Example queries will be of the
form: give me all time series that dropped continuously for 4 consecutive
years, or give me all that showed a sudden jump at any point in time.
The project will involve first preprocessing the time series to collect
these trend information and then indexing them appropriately to allow fast
query answering thus.
-
Compare bitmapped indexing with R-trees for indexing multidimensional
data: Bitmapped indexing is a recently popularized variant of B-tree
indexing that is believed to work well on multidimensional warehouse data
when the attributes have low cardinality. [More reading available]. R-tree
is a classical method for indexing multidimensional data. This can be adapted
to specifically handle multidimensional data by preprocessing data to identify
dense and sparse regions of the data. This project will involve using
the Berkeley GIST package for comparing these methods. The R-tree
method is already implemented but bit-mapped indexing will need to be reimplemented.
Other ideas:
-
Similarity index for arbitrary subset of attributes
-
Indexing tagged data
-
Multi-key hashing on berkeley db
New database applications
Database extenders: The basic relational model often
falls short when managing richer datatypes like images and text.
This had led to the development of extenders that contain a collection
of predefined functions and types to handle the needs of these specific
data types. This project will require first understanding the architecture
of typical database extenders and implementing one of your own --- which
one will can be decided later. Some possibilities include: a multidimensional
data extender, a mining extender, an extender for handling complex objects.
-
A smart calendar application on a database: this project will involve skilful
use of triggers and a good database design to not only support a normal
calendar application (See calendar supplied with Microsoft Outlook as an
example), but also good analysis queries on the calendar e.g. plot the
graph of business versus month of the year.
-
-
Build an mailing system on top of a database system instead of a file system.
Can you do anything smart about sharing long emails sent to multiple recipients?
Database Engines:
-
Mobile database system: how is data consistency maintained in mobile
database system. Consider a model where primary copy of data resides
in large central servers and individual check out part of the data from
that central server. After that they can do any changes they want on that
data. How do you sync data when the mobile agent connects back. How
do you efficiently determine the difference between multiple data types
and do you resolve conflicts? Read about these issues and implement
some part of the functionality using one of the existing systems.