Take-home/Reading Assignments

Important Note : You must do these assignments yourself. Use of any unfair means will be dealt with as per Institute rules

-------------------------------------------------------

1. (a) : Study new disk technologies that are being used for very large databases. These are Storage Area Networks (SAN) and Networked Storage Servers (SAN). Write briefly about standards (such as iSCSI) being used for them. Give typical LAN/WAN configurations that use these technologies. You can see specific characteristics, architectures, and reliability and availability features of some commercial offerings (Sun/Compaq/IBM). Prepare a short report (max 2 pages).
1. (b) : Midern disks often have their own main memory cache, typically above 1MB, and uses this for pre-fetching of pages. The rationale is the empirical observation that for a disk page requested, 80% of the time, the next page is requested as well.
i) Give a reason that a DBMS may not want to rely on pre-fetching controlled by disk. ii) Explain the impact of disk cache of several queries running concurrently, each scanning a different file. iii) Can the above problem be addressed by the DBMS buffer manager doing its own pre-fetching ? Explain. iv) Some disks support segmented caches, with about 4 to 6 segments. Each can be used to cache from different file. Does this technique help ? Given this technique, does it matter whether DBMS buffer manager also does pre-fetching ?

1. (c) : i) In dynamic hashing, we take 'i' high order bits of the hash value for indexing into the bucket address table. Can we instead use low-order 'i' bits ? What changes would be required ? Are there any advantages ?
ii) Assume that we need to build B+ tree index where given keys are in ascending order ? How will you modify the insertion method so that the 'index-creation' is efficient and space utilization is also better.
Submit by Feb. 11, 2003 as ps or doc files by email with subject as Assgm 1

-------------------------------------------------------

Reading Assignments

Submit reports by April 3, 2003
Presentations from April 10 - 13, 2003
The following projects/assignments are to be done by individuals or groups of two as indicated. Each assignment includes reading of some papers and preparing a 20-30 minute presentation for the class. You are also encouraged to implement key features and try them out. Please discuss with me the programming part that you plan. A few references are mentioned with each topic, but you may read additional material as required.
1. (a) Database archiving : An efficient and flexible method for archiving a database, Mohan C., SIGMOD Conf. 1993.
(b). Creating indexes : Algorithm for creating indexes for very large tables without quiescing updates, Mohan C., SIGMOD 1992.
(roll no 00005019)
2. Transaction management in distributed databases : distributed deadlock detection and 2-phase commit. See Ceri/Pelagatti's book on Distributed DBs.
(roll no 00005014)
3. Volcano query optimizer (student reports under Prof. Sudarshan and the tool in Info lab).
(roll no 00005011)
4. TPC benchmarks (C, D and H). See text, p. 798, and www.tpc.org.
(roll no 00d05002)
5. Query processing, optimization and transaction management in Microsoft SQL Server (text Ch. 27, sigmod 2001 paper by Galindo-Legaria+Joshi, VLDB 98 paper by Graefe)
(roll no 00005030)
6. Distributed query optimization (Ceri+Pelagatti book on DDBs, text, p. 735)
(roll no 00d05009)
7. Parallel Database systems : architecture and query optimization. See Chapter 20 in text and references therein
(roll no 00005031)
8. R and R* trees and their use in spatio-temporal databases. See text, p. 875 and : SIGMOD Conf, 1984, p. 47-57, SIGMOD 1990,p. 322-331, SIGMOD 1990, p. 220-231, SIGMOD 1993, p. 237-246, vldb 1997, p. 396-405.
(roll no 00005013)
9. Indexing in data warehouses, especially use of bit-map indexes. See papers in ICDE'97, ICDE'98, sigmod'98 available from www.cs.toranto.edu/~mendel/dwbib.html
(roll no 00005023)
10. Indexing and querying XML data. See vldb'01 p. 361-370, vldb'02 p. 263-274, ICDE'02 p. 141-152, ICDE'03 p. 253-264.
(roll no 00d05010)
11. Ranked (top-K) queries : vldb'99, p.399 and p. 411, ICDE'02 paper by Bruno et al, Sigmod June 2002 paper by Chang et al.
------------------------------------------------------------------------
------------------------------------------------------------------------

Reading Assignments - additional topics

5. Indexing interval-based data : proc. DEXA-98 p. 541, material on interval trees in Cormen's book, EDBT-98 p. 39, and TODS paper by Tsotras et al.; also, Ch. 17, 18 from the book : Temporal Databases, edited by Tansel et al.
7. Oracle tuning parameters and query execution plans
9. Evaluating top-K selection queries : see vldb'99 proceedings, p. 399
10. Star join algorithms (using bit-map indexing - see sigmod record, Sept. 95, pp. 8-11, and sigmod conf, 1997, pp. 38-49, VLDB 99, p. 530)
11. The XML storage and querying features in Oracle 9i.
12. Spatial databases, DB2 Spatial Extender or other case studies (see http://www2.software.ibm.com/casestudies/...)
13. Directory Systems, LDAP (Sudarshan's book, Ch. 19) + Oracle9i
15. Main memory databases (text book Ch. 24, TKDE paper by Garcia-Molina+ Salem and Dali system)
16. Nested and long duration transactions, text Ch. 24 + cited references
17. Workflow systems, text Ch. 24 + cited references + Oracle9i