Compressed data structures for annotated Web search

Supplementary material

Code from the project will be made available on request. For more details see the main project page.

  1. Section 6.1.1 - Equi - vs - Freq
  2. Section 6.1.4 - Feature re-numbering
  3. Section 6.1.5 - Real time runs
  4. Section 6.1.6 - Sample documents
  1. Section 6.1.1 - Equi - vs - Freq Diagnosis

    Equi does consistently better than Freq. Here are a list of sample leaves and figures of, index of the feature where sync has been placed sync - vs - probability of that feature.

    Leaf ID : 67576 - Equi allocation did better
    Equi inner allocation: Freq inner allocation:
    Equi allocation for Leaf 67576 Freq allocation for Leaf 67576


    Leaf ID : 752280 - Equi allocation did better
    Equi inner allocation: Freq inner allocation:
    Equi allocation for Leaf 752280 Freq allocation for Leaf 752280


    Leaf ID : 1146201 - Equi allocation did better
    Equi inner allocation: Freq inner allocation:
    Equi allocation for Leaf 1146201 Freq allocation for Leaf 1146201


    Leaf ID : 374897 - Freq allocation did better
    Equi inner allocation: Freq inner allocation:
    Equi allocation for Leaf 374897 Freq allocation for Leaf 374897


    Leaf ID : 593508 - Freq allocation did better
    Equi inner allocation: Freq inner allocation:
    Equi allocation for Leaf 593508 Freq allocation for Leaf 593508
    Back to the top
  2. Section 6.1.4 - Feature re-numbering diagnosis

    Global FeatHits (in the below images represented as Freq) ordering is in excellent agreement with some of the costliest leaves. Below are some of the plots of the costliest leaves. Each plot corresponds to the index of a feature - vs - its probability of occurence in the workload.

    Leaf ID : 16312 - Feature probabilities before and after permutation
    Default ordering: FeatHit ordering:
    Feature probabilities before permutation leaf-16312 Feature probabilities after permutation leaf-16312


    Leaf ID : 16561 - Feature probabilities before and after permutation
    Default ordering: FeatHit ordering:
    Feature probabilities before permutation leaf-16561 Feature probabilities after permutation leaf-16561


    Leaf ID : 16873 - Feature probabilities before and after permutation
    Default ordering: FeatHit ordering:
    Feature probabilities before permutation leaf-16873 Feature probabilities after permutation leaf-16873


    Back to the top
  3. Section 6.1.5 - Real time speed - vs - Estimated cost

    The following graph proves the excellent agreement between our estimated cost and the actual time taken to annotate.

    Estimated cost - vs - Realtime speed

    Back to the top
  4. Section 6.1.6 - Sample document list

    Here is the list of documents used for comparison with other systems. (1.6 MB)


    Back to the top