CS-444 Programming Assignment
Last Date for submission is Feb 20, 2003
For your programming assignments, you will be using toyDB. It provides an API for paged file system and creating indexes.
For questions on toyDB contact Satyashil
The toyDB offers two lowest layers of a small DBMS.
The Paged File (PF) layer - offering storage for pages of a file
The Access Method (AM) layer - offering B+ tree based access
The interface offered by the two layers is described in the documentation (files pf.ps and am.ps). The C code implementing these layers is also available.
The data.tar.gz contains databases which can be used for the assigments. It contains simple text files containing data about students. The data is dummy data created for the purpose of these assignments. Students can work on these data files. For example, the courses file contain all the courses being offered in all the departments with various fields such as course-no, course name and whether it's a PG course or UG course. The various fields are seperated by semicolon.
These project exercises require you to understand the interfaces and the C code, so that you can either build on it further or make changes.
1. Provide page buffering for the PF layer using two page replacement strategic to be chosen for a file at the time of opening it : LRU and MRU (most recently used - which may be profitably used for sequential file access). You also need to provide 'dirty' flags for updated pages (and add a call to indicate that a page has been modified). Define buffer pool size as a parameter. The program should maintain statistics on pages accessed (for read, write), logical and physical number of I/O's, etc.
(00005023)
2.To the paged file module, add disk storage simulator, which simulates pages for a disk (n cylinders, m tracks/clyn), and also a specific disk technology (like RAID 0, 1 or 5). You may do mapping of page numbers to disk address. The simulation should provide for a suitable access arm movement strategy, a cache on the disk, and gathering of statistics that will allow us to evaluate performance.
(00D05002)
3.Use PF layer to build a slotted page structure for storing a varying length record file. Use it to store students data. Provide for insert, delete of records, and for sequential reading. Collect some performance data like space utilization (after some number of updates).
(00005014)
4. The AM layer can be used to build index for a file. We may have a file already containing the records and then we build index in a single call. Or, we may start with empty file, and build index using multiple inserts. Compare the two approaches for an index on the Student file on roll-no field. If the given file is sorted, there are efficient techniques available for building an index. Implement one such technique, and compare performance with the AM layer. Get the necessary statistics for comparison.
(00D05010)
(00005019 - independently)
5. Using AM layer, create a secondary index (where records are not in the key order) for the student file on roll-no. Create another file (say, study) as a sequential file with records ordered on rollno. Write nested-loop join algorithm for joining these files and get performance statistics.
(00005030)
6. Use the two layers to implement external sort algorithms (merge sort, and sort by B+ tree scan), and compare performances for the sample data.
(00005011)
7.Implement extended hashing access method on top of the PF layer. Provide proper interface for your access method, and then try comparative performance of this and the present AL layer in terms of storage and access costs for a given file.
(00005013)
8. Build an index on multiple attributes using AM layer. GRID Files can be used for this purpose. Index can be created on an existing file containing records or can be created as the records get added. To simplify things use only two search-key attributes. Compare this approach with other index structures on the data files given. Get the necessary statistics for comparison. (00005031)
9. Using the two layers, develop a clustering file organisation where records of multiple relations having one common attribute are stored in one file. Write a program which does join of the two relations on the common attribute. Get physical performance parameters. (00D05009)