Assignments for IT619
Mini Project: Lex, YACC, Database access and others
Due: 10 Nov 2000, (30% credit)
Form teams of six.
Write a coverter from bibtex to XML conforming to a standard
bibliography DTD using lex and yacc. Generate a unique-id for each
bibtex entry using the following rule: key = First letter from the
names of the first three authors followed by the last two digits of
the year of publication. The key should be distinct -- if two bibtex
entries turn out to have the same key: append a unique alphabet to the
- Given a collection of ps and pdf files in a directory
structure, use a ps2ascii converter (or any better method that you
find) to convert them into ascii and extract the title, author and
abstract fields from the papers. Generate a XML file with these three
fields and a fourth field that hyperlinks to the original postscript
Given two XML publication lists, merge the two files to find entries
that have the same author and title lists and generate a single merged
list that removes the duplicate. Retain a union of the fields of the
Given any XML publication file, convert it into a .bib file.
Store the XML file in an Oracle database (get an account on
oracle@quark from Mrs. Vijayalakshmi).
Provide a web-based interface for querying the publications and
returning the result as an XML document. Any field of the XML tag
should be query-able. Support keyword match for string fields and
simple less-than, greater than and equal to match for numeric fields
like year. Also, one should be able to add entries into the database using the same form-based interface.
Use a suitable XSL file to display the XML publications file on the browser.
All submitted projects will be evaluated by the webteam and the TAs
for use in the School's webpage project. The best project will get a
bonus seven points. Each member of the team has to clearly identify
the part of the project that he/she has developed and accordingly
members of a team might landup with different grades.
A test case for this project is available at sunita@cygnus:~/archive.
The root directory contains a collection of .bib files.
Subdirectories under it contain several ps and pdf files. For many of
these files there are entries in one or more of the .bib files. Your
output will be a single XML file containing a distinct union of all
the .bib entries and extracted entries from the ps/pdf files along
with pointers to the source files whenever available. The XML file
will be loaded on a Oracle database and made available for querying
through a form-based interface. I should then be able to generate say
a XML file that contains "Karmakar" as one of the authors and convert
the file to .bib for inclusion in a latex document.
Assignment 7: Profilers and version control
Due: 13 Oct 2000
Form teams of two. Create a cvs repository with two modules: one under
your partner's login name and one under yours. Each of you write an
"inefficient" c-program for sorting records with string attributes.
Commit your changes to cvs and ask your partner to check out the
changes. Partner runs gprof on your program to find the main
bottlenecks and removes them as much as possible. Partner checks in
the changes. You check out the changed file, add comments around the
changes made by the partner and check in your changes. There should
thus be three versions of your program under cvs.
Your sorting binary should be called "sorter" and invoked as follows.
sorter input-file numAttrs comma-seperated-attribute-list.
E.g. sorter example.txt 8 "8,3"
sort the file example.txt consisting of eight string attributes in the
order of the 8th and 3rd attribute. The input file is assumed to be
comma-seperated. Make the old version of your code available as an
Assignment 6: Makefiles Due: 6 Oct 2000
You will be a supplied a two-level directory structure where each leaf
level directory contains a collection of .cpp .h. A special "test"
directory contains a collection of .txt and .out files. You need to
generate a makefile in each of the leaf-level directories and a final
makefile at the root level. Here is the source-tree
in tar format.
The makefile should have the following
Name your file "Makefile". You are free to add other files like
Makefile.rules, Makefile.win etc..but there should always be a file
called "Makefile" in every directory.
- It should work on both windows NT with MS visual C++ and linux
with the GNU compiler (g++) and gnu make. An environment variable
OSTYPE is set to either WIN32 or LINUX to specify which. The makefile
should support the following targets:
- all: make everything to generate an executable called galore
- clean: remove all object files
- test: if "lp_solve" or any of the "test/*.txt" changes, runs
following for each changed file (say sample.txt) "lp_solve < test/sample.txt >
$TEMP/tmp.out" compares the temp file to sample.out and echo's the difference.
- opt: to remake everything with the optimization flag on.
- debug: to remake everything with the debug flag on.
- It should automatically generate dependencies. For instance, if I
change a .cpp file to include a new header file, then I should not
need to change the makefile.
- The root makefile should not contain names of any source files
within a subdirectory. It either includes or calls make on the
- All rules regarding what compiler to use etc are defined in
exactly one place --- maybe a seperate files that other makefiles
include. Also, any dependeny is defined in exactly one place.
- Makefiles in subdirectory should use pattern rules whenever possible.
Assignment 5: Latex typsetting
Due: 29 Sept 2000
Generate a latex document that looks identical to this postscript file in all respects, including
margins and fontsize. The figure can be different. The submission
will consist of a latex file, a .eps file for the figure you included
and a bibtex file with the two bibitems. Make sure that your code
compiles on Cygnus --- otherwise you get zero points.
Due: 8 Sept 2000
Write a perl program that takes as argument a http URL, a string, a newsgroup name, a email address and does the following:
Starts a daemon process to monitor the given url for any updates. When
the page of the url contains the given string or a newly added link
from the page contains the string, it makes a posting to the newsgroup
with the url that contains the string and exits killing the
If the daemon gets killed for some other reason, it sends an email to the
given email address and restarts the daemon.
For ease of grading, you absolutely must follow these additional
requirements: The subject lines of the news group message and the
email message should contain your name. The input argument order
should be exactly as specified above. Your main perl program should be
called "restless.pl" and the daemon you start should be called
"cantwait". For instance, you might want to run the following: perl
restless "http://www.it.iitb.ernet.in/~abhi" Matheran it.announce
"firstname.lastname@example.org". This will watch abhishek's page or a newly appearing
immediate link from it for the work "Matheran".
Due: 25 Aug 2000
In this assignment, grading will be based both on correctness and
quality of the code. Short yet readable coding is desirable. If your
code is correct you will get 70% of the credit. The rest will depend
on the quality of your code.
Write a perl script that given a directory, will recursively search
the directory looking for all files ending with .C or .H. All these
files should be renamed as .cpp and .hpp respectively. Also, all
"#include" commands in the .C or .H files have to be renamed to
reflect the right names. All "^M"s from the end of all files have to
Write a perl script that takes as input a text file redirected as
STDIN. For each word in the file it outputs the word if it is the
first time the word occurs otherwise outputs the number of distinct
words that appeared before the first occurance of the word. For
example if the input is the string "to be or not to be" the output
should be "to be or not 0 1"
Due: 18 Aug 2000
Use PHP to include a common navigation bar to all your pages. Many of you who have used frames in the first assignment, can replace frames with tables and PHP includes.
Add a CGI script to your home page that contains at least two text
input box, at least one radio button and at least one selection
box. One of the text boxes should be a key. Store the entries that are
submitted through the script in a ":" delimited text file. Write a
shell program to do the following:
If the key does not match any of the previous keys, append a line to
your text file store the user inputs. The output file should be a
plain html files with the list of fields and their values just saved.
Whenever a submit is made with a "key" that matches one of the key
stored in the text file above, the output file should be the same form
with the previous values of other fields loaded in.
If the key is a special user "Admin", then a new form should be
displayed that has just one selection box whose elements are the
fields in the your first form. When the "Admin" selects a field, the
output should be an html file that returns the number of distinct
values entered so far for the field and the list of these distinct
values along with a count of how many times the value occured.
Due: 11 Aug 2000
Prepare your home page using html. Your home page should have the following properties:
- Use at least two fonts
- Use at least one list environment
- Use at least one table
- Point to at least two other URLs, the school page should be one of them
- Contain at least one image, optionally contain an audio file
- Comprise more than one html page
- Use cascaded style sheets to share fonts, background etc information across all the pages of your site
- (Bonus points) Contain at least some text in a font other than Roman --- say, some poetry in your mother tongue or hindi or any other language. Including gifs/jpegs or any other image does not qualify for the extra points. A good starting point for more information on this is: