Online Test 2
-------------


(1) N-GRAM FREQUENCY ANALYSIS :
    ---------------------------

    Points : 15
    Approx. Time : 35 min

    Frequency analysis is the practice of counting the number of occurrences of
    different ciphertext characters in the hope that the information can be used
    to break ciphers.  An n-gram is a subsequence of n items from a given
    sequence.  The items in question can be phonemes, syllables, letters, words
    or base pairs depending on the application.
    
    For our current task, we limit n-grams to n-consecutive case-insensitive
    alphabetical characters.  For example, the input string - 'My name is Siva' has

    1-grams : 'a', 'e', 'i', 'm', s, 'n', 'v', 'y'
    2-grams : 'am', 'my', 'na', 'me', 'is', 'si', 'iv', 'va'
    3-grams : 'ame', 'iva', 'nam', 'siv'
    4-grams : 'name', 'siva'
    and no n-grams for n > 4.


    Running the program :
    ---------------------
    
    Write a python script which will take input from stdin and an integer
    argument (say n) and perform an n-gram frequency analysis on the input text.
    By default, it outputs the resultant table to stdout in sorted order of
    character sequence.  In addition, the program accepts an optional numeric
    argument (say k, k > 0) which tells it to print out only the top k (or the
    total number of n-grams found whichever is smaller) n-grams in decreasing
    order of their frequencies.

    The program does NOT print anything if n <= 0 or if n is greater than the
    longest contiguous sequence of alphabetical characters in text.
    
    Sample Run :
    ------------

    $ cat data/tobermory | src/p1.py -n 0
    $ cat data/tobermory | src/p1.py -n 300
    $ cat data/tobermory | src/p1.py -n 2
      ab 29
      ac 30
      ad 79
      ...
      ze 2
      zi 2
      zo 2
    $ cat data/tobermory | src/p1.py -n 3 -k 15
      the 202
      and 80
      ing 76
      ...
      ble 36
      her 35
      ber 35


    
(2) SENTENCE GREP :
    -------------

    Points : 20
    Approx. Time : 40 min

    Suppose you want to search for all sentences in a book which contain a
    particular phrase.  
    
    A sentence, in our simplified model, will be defined as a string of
    characters terminated by a '.', '?' or '!'.  Also, a sentence has all
    leading and trailing whitespaces removed and doesn't contain any newline
    ('\n') character in it.

    A few sample sentences (inside single quotes) :

    1. 'The Pythons had a definite idea about what they wanted to do with the series.'

    2. 'Several names for the show were considered before Monty Python's Flying Circus was settled upon.'

    3. '"How old are you," asked Jem, "four-and-a-half?'

    Note : In general, its possible that sentences are embedded inside quotes.
    or they may contain abbreviated words (like Mr., Prof.).  Ignore these cases
    for now i.e. any of the above delimiters breaks a sentence.  Period.
    

    Running the program :
    ---------------------
    
    Your program should accept a filename, a quoted string and (optionally) a
    a flag which if set to true, considers case-insensitive search.  The program
    prints out all sentences (one per line in their order of appearance in the
    file) which contain the phrase.

    It does NOT print anything if the phrase NEVER appears in the file.

    The program accepts the '-h' option and produces (exactly) the following
    help message and exits.  Remember to add both the short (-f) and the long
    (--filename) options (likewise for the other options).

    $ src/p2.py -h
      Usage: p2.py [options]
      
      Options:
        -h, --help            show this help message and exit
        -f FILENAME, --filename=FILENAME
                              file containing a piece of text
        -i, --ignore_case     ignore case while searching
        -s SEARCH_PHRASE, --search_phrase=SEARCH_PHRASE
                              phrase to search for inside text


    Sample Run :
    ------------

    $ src/p2.py -s "string not found" -f data/tokillamock
    $ src/p2.py -f data/monty_python -s "idea humor"
      If the majority found an idea humorous, it was included in the show.
    $ src/p2.py -s "rats" -f data/tobermory
      "  This time Clovis very distinctly said, "Beyond-rats!
    $ src/p2.py -i -s "rats" -f data/tobermory 
      No one said "Rats," though Clovis's lips moved in a monosyllabic contortion, which probably invoked those rodents of disbelief.
      "  This time Clovis very distinctly said, "Beyond-rats!



(3) DIRECTORY COMPARE :
    -----------------
    
    Points : 25
    Approx. Time : 50 min

    Suppose you have backed up the contents of your home directory in your
    external hard drive and you know that the backup process is error-prone.  So
    your task is to verify whether the two directories (the original and the
    backed up versions) are identical and if not, print out the differences in
    them.

    More generally, the contents of a directory in one tree may have more or
    fewer entries than the corresponding directory in the other tree.  If those
    differing elements are filenames, there is no corresponding file to compare
    with.  If they are directory names, there is no corresponding branch to
    descend through and we simply report the absence of that directory without
    bothering about the files contained in it.  The third possibility is that a
    file in one directory matches with the corresponding file in the other
    directory, but the contents don't.

    
    Running the program :
    ---------------------
    
    Your program should accept two directory names (say 'dir1' and 'dir2') and
    print out the differences between them (one difference per line) in the
    following format :

    (1) If a file or a directory in the tree rooted at 'dir1' is absent in
        'dir2', print '> ' followed by name of file/dir.

    (2) If a file or a directory in the tree rooted at 'dir2' is absent in
        'dir1', print '< ' followed by name of file/dir.

    (3) If a file in the tree rooted at 'dir1' has its contents different from
        the corres. file in 'dir2', print 'm ' following by name of file.
        ('m' : mismatch)

    Note that if a directory is absent in dir2 but not in dir1 (and vice-versa),
    then all files under this directory must necessarily be absent in dir2.
    However, we don't print out those files in our compact representation of the
    differences.
    
    The order of printing must be identical to the directory listing order of
    the UNION of the two dirs 'dir1' and 'dir2' (see sample run below).  The
    program does NOT print anything if the directories are identical.


    Sample Run :
    ------------
    
    $ src/p3.py data/cc data/cc_bkp
    $ src/p3.py data/india data/india_bkp
      > Bhilai/sector1
      > Bhilai/sector4/sector4.log
      < Bhilai/sector5
      > Bhilai/sector8
      < Bhilai/sector9
      m Bhopal/arera/arera.log
      > Bhopal/arera/arerakadsjf.log
      < Bhopal/badatalav
      < Bhopal/jahagirabad/jahagirabad.log
      < Mumbai/andheri/andheri_west
      > Mumbai/borivalli/borivalli_east
      m Mumbai/borivalli/borivalli_west/borivalli_west.log
      > Mumbai/malad/malad.log
      > Mumbai/mulund
