Solutions for assignment-2 --------------------------- 8 marks ******* 1) a) 2 Marks Yes. It can be done using a B+ tree on name attribute. - Start traversing B+ tree with search key = prefix - The leaf node you end with will be the node with name lexicographically lesser than prefix string or name containing prefix. - Start scanning sequentially all leaf nodes from this node till we get a node with name value lexicographically greater than prefix string. 1) b) 1 Mark No. Names containing a given substring may be scattered at any location in the B+-tree, so there is no benefit of using the tree over a full relation scan. 1) c) 5 Marks B+ tree structure can be modified to make it capable of finding all names containing a given substring. Concept: Substring can be prefix of a suffix Construction: For each name in the set - Find all suffix strings of the given string - Index them independently in the B+ tree. - So they will occupy independent leaf nodes. - But the record pointer of all these leaves will point to same record as that of original string. Search: ------- - Given a string we can locate all strings which have this string as its prefix using the part a). - Pointers in the buckets associated with these leaf nodes will give us all record ids which have name containing this string as a substring. --------------------------------------------------------------------------------------- 6 marks ******** 2) a) 2 Marks Utilization will be approximately 3/5 = 60% 15,24 | 6,9,12 18,21 27,30 | | | 1,2,3,4,5 ... 12,13,14 15,16,17 ... 21,22,23 24,25,26 ...30,31,32 2) b) 4 Marks We have 'n' data values to be inserted already sorted in ascending order. To make maximum utilization of node space: Build leaf level: If each leaf node can contain maximum of 'm' data entries then divide the input in slices of size 'm' Fill up nodes with their maximum capacity Total number of leaf nodes needed will be ceiling(n/m) Build parent level from nodes at lower level 1. Propagate first entries of all leaf nodes starting from second node to second level. 2. Group these entries into nodes of size 'm+1' 3. Delete and send 'm+1'st entry from each node to one level up. 4. Repeat steps 3 & 4 till we get only one node at the current level as a root of B+ tree (Database implementations often leave some free space in each node to better handle future insertions) --------------------------------------------------------------------------------------- 2 Marks ******* 3) - To calculate existence bitmap it is sufficient to have all bitmaps associated with any one column in the relation. - For attributes which can have null value, there will be a separate null bitmap associated with that column. - Existence bitmap will just be OR of all bitmaps associated with a given column including null bitmap. ---------------------------------------------------------------------------------- 4 Marks ******* 4) Alterative (1) -------------- Here we can allocate 'C' contiguous blocks at a time. So we can use same hash function 'h' as that of, 'static hashing without any constraint on contiguous memory allocation. For that we will have to keep an additional B+ tree to keep track of hash function value and corresponding actual block address. If for a key 'k' we get hash value 'h(k)', then 'h(k)' will act as key in the B+ tree. The address of block corresponding to actual key 'k' will be stored there. Alternatively, we can use a file in the file system to store blocks; file systems also use tree structures to record and look up blocks in a file. Alternative (2) --------------- As we can allocate 'C' contiguous blocks at a time. So if the hash table requires 'k*C' number of blocks then we can split the value of hash function into 2 parts. First part of 'log k' bits will decide the set of 'C' blocks to which the record belongs and the second half of 'log C' bits will decide to which of block from that set the record belongs. We will have to keep mapping of set number and the starting physical address of that set of blocks. Alternative (2a) --------------- We can apply two different hash functions h1 & h2. Given a key first hash function will decide the set to which it belongs: h1(k) second hash function will decide the block to which it belongs: h2(h1(k))