Assignments for IT619

Mini Project: Lex, YACC, Database access and others

Due: 10 Nov 2000, (30% credit)
Form teams of six.

Write a coverter from bibtex to XML conforming to a standard bibliography DTD using lex and yacc. Generate a unique-id for each bibtex entry using the following rule: key = First letter from the names of the first three authors followed by the last two digits of the year of publication. The key should be distinct -- if two bibtex entries turn out to have the same key: append a unique alphabet to the key.
Given a collection of ps and pdf files in a directory structure, use a ps2ascii converter (or any better method that you find) to convert them into ascii and extract the title, author and abstract fields from the papers. Generate a XML file with these three fields and a fourth field that hyperlinks to the original postscript file.
Given two XML publication lists, merge the two files to find entries that have the same author and title lists and generate a single merged list that removes the duplicate. Retain a union of the fields of the two entries.
Given any XML publication file, convert it into a .bib file.
Store the XML file in an Oracle database (get an account on oracle@quark from Mrs. Vijayalakshmi).
Provide a web-based interface for querying the publications and returning the result as an XML document. Any field of the XML tag should be query-able. Support keyword match for string fields and simple less-than, greater than and equal to match for numeric fields like year. Also, one should be able to add entries into the database using the same form-based interface.
Use a suitable XSL file to display the XML publications file on the browser.

All submitted projects will be evaluated by the webteam and the TAs for use in the School's webpage project. The best project will get a bonus seven points. Each member of the team has to clearly identify the part of the project that he/she has developed and accordingly members of a team might landup with different grades.

A test case for this project is available at sunita@cygnus:~/archive. The root directory contains a collection of .bib files. Subdirectories under it contain several ps and pdf files. For many of these files there are entries in one or more of the .bib files. Your output will be a single XML file containing a distinct union of all the .bib entries and extracted entries from the ps/pdf files along with pointers to the source files whenever available. The XML file will be loaded on a Oracle database and made available for querying through a form-based interface. I should then be able to generate say a XML file that contains "Karmakar" as one of the authors and convert the file to .bib for inclusion in a latex document.

Assignment 7: Profilers and version control

Due: 13 Oct 2000
Form teams of two. Create a cvs repository with two modules: one under your partner's login name and one under yours. Each of you write an "inefficient" c-program for sorting records with string attributes. Commit your changes to cvs and ask your partner to check out the changes. Partner runs gprof on your program to find the main bottlenecks and removes them as much as possible. Partner checks in the changes. You check out the changed file, add comments around the changes made by the partner and check in your changes. There should thus be three versions of your program under cvs. Your sorting binary should be called "sorter" and invoked as follows.
sorter input-file numAttrs comma-seperated-attribute-list.
E.g. sorter example.txt 8 "8,3"
will sort the file example.txt consisting of eight string attributes in the order of the 8th and 3rd attribute. The input file is assumed to be comma-seperated. Make the old version of your code available as an executable "sorter.old"

Assignment 6: Makefiles

Due: 6 Oct 2000
You will be a supplied a two-level directory structure where each leaf level directory contains a collection of .cpp .h. A special "test" directory contains a collection of .txt and .out files. You need to generate a makefile in each of the leaf-level directories and a final makefile at the root level. Here is the source-tree in tar format. The makefile should have the following features:

It should work on both windows NT with MS visual C++ and linux with the GNU compiler (g++) and gnu make. An environment variable OSTYPE is set to either WIN32 or LINUX to specify which. The makefile should support the following targets:
- all: make everything to generate an executable called galore
- clean: remove all object files
- test: if "lp_solve" or any of the "test/*.txt" changes, runs following for each changed file (say sample.txt) "lp_solve < test/sample.txt > $TEMP/tmp.out" compares the temp file to sample.out and echo's the difference.
- opt: to remake everything with the optimization flag on.
- debug: to remake everything with the debug flag on.
It should automatically generate dependencies. For instance, if I change a .cpp file to include a new header file, then I should not need to change the makefile.
The root makefile should not contain names of any source files within a subdirectory. It either includes or calls make on the subdirectory makefiles.
All rules regarding what compiler to use etc are defined in exactly one place --- maybe a seperate files that other makefiles include. Also, any dependeny is defined in exactly one place.
Makefiles in subdirectory should use pattern rules whenever possible.

Name your file "Makefile". You are free to add other files like Makefile.rules, Makefile.win etc..but there should always be a file called "Makefile" in every directory.

Assignment 5: Latex typsetting

Due: 29 Sept 2000
Generate a latex document that looks identical to this postscript file in all respects, including margins and fontsize. The figure can be different. The submission will consist of a latex file, a .eps file for the figure you included and a bibtex file with the two bibitems. Make sure that your code compiles on Cygnus --- otherwise you get zero points.

Assignment 4

Due: 8 Sept 2000
Write a perl program that takes as argument a http URL, a string, a newsgroup name, a email address and does the following:

Starts a daemon process to monitor the given url for any updates. When the page of the url contains the given string or a newly added link from the page contains the string, it makes a posting to the newsgroup with the url that contains the string and exits killing the daemon.
If the daemon gets killed for some other reason, it sends an email to the given email address and restarts the daemon.

For ease of grading, you absolutely must follow these additional requirements: The subject lines of the news group message and the email message should contain your name. The input argument order should be exactly as specified above. Your main perl program should be called "restless.pl" and the daemon you start should be called "cantwait". For instance, you might want to run the following: perl restless "http://www.it.iitb.ernet.in/~abhi" Matheran it.announce "me@it.iitb.ernet.in". This will watch abhishek's page or a newly appearing immediate link from it for the work "Matheran".

Assignment 3

Due: 25 Aug 2000
In this assignment, grading will be based both on correctness and quality of the code. Short yet readable coding is desirable. If your code is correct you will get 70% of the credit. The rest will depend on the quality of your code.

Write a perl script that given a directory, will recursively search the directory looking for all files ending with .C or .H. All these files should be renamed as .cpp and .hpp respectively. Also, all "#include" commands in the .C or .H files have to be renamed to reflect the right names. All "^M"s from the end of all files have to be removed.
Write a perl script that takes as input a text file redirected as STDIN. For each word in the file it outputs the word if it is the first time the word occurs otherwise outputs the number of distinct words that appeared before the first occurance of the word. For example if the input is the string "to be or not to be" the output should be "to be or not 0 1"

Assignment 2

Due: 18 Aug 2000

Use PHP to include a common navigation bar to all your pages. Many of you who have used frames in the first assignment, can replace frames with tables and PHP includes.

Add a CGI script to your home page that contains at least two text input box, at least one radio button and at least one selection box. One of the text boxes should be a key. Store the entries that are submitted through the script in a ":" delimited text file. Write a shell program to do the following:

If the key does not match any of the previous keys, append a line to your text file store the user inputs. The output file should be a plain html files with the list of fields and their values just saved.
Whenever a submit is made with a "key" that matches one of the key stored in the text file above, the output file should be the same form with the previous values of other fields loaded in.
If the key is a special user "Admin", then a new form should be displayed that has just one selection box whose elements are the fields in the your first form. When the "Admin" selects a field, the output should be an html file that returns the number of distinct values entered so far for the field and the list of these distinct values along with a count of how many times the value occured.

Assignment 1

Due: 11 Aug 2000
Prepare your home page using html. Your home page should have the following properties:

Use at least two fonts
Use at least one list environment
Use at least one table
Point to at least two other URLs, the school page should be one of them
Contain at least one image, optionally contain an audio file
Comprise more than one html page
Use cascaded style sheets to share fonts, background etc information across all the pages of your site
(Bonus points) Contain at least some text in a font other than Roman --- say, some poetry in your mother tongue or hindi or any other language. Including gifs/jpegs or any other image does not qualify for the extra points. A good starting point for more information on this is: here