Noun Phrase Chunking using CRF:
Team:
	Prashanth Kamle (08305006)
	Sriram Kashyap  (08305028)

--------------------------------------------------------
Compiling the code:
> Run compile.sh


--------------------------------------------------------
Running the code:

Step 1: Generating Features:

The features are generated using a template, from a specified training data file.
To generate features, run 'featuregen.sh' in the following manner:
# ./featuregen.sh data/train.small
(other datasets provided are data/train.medium, and data/train.large)

This process will create the following files in the data directory:
a. dictionary.txt
b. features.txt



Training Phase:

The training phase requires you to set various parameters in the file 'config', and run 'run.sh'

Format of 'config' file:
Line 1: path to dictionary file
Line 2: path to feature file
Line 3: path to file containing sentences (dataset to train on)
Line 4: train
(note that line 4 should be 'train' and that specifies to the program that training should be performed)

After setting the parameters in config file, run the 'run.sh' script and this will start the training process.

The output of the training phase is a file 'out.txt' 
This file contains the weight vector that was learnt during the training phase.



Testing Phase:

This phase requires you to set the following parameters in the 'config' file, and run 'run.sh'

Format of 'config' file:
Line 1: path to dictionary file (that was used to train the model)
Line 2: path to feature file (that was used to train the model)
Line 3: path to a file containing a single sentence from the dataset. Note that this sentence need not be labeled.
Line 4: test
Line 5: path to weight vector file that was generated after training the model.
(note that line 4 should be 'test')

The output of this is a labeling for each word in the input sentence.

The labels:
B: Marks the beginning of the noun phrase
I: Marks the inside of the noun phrase
O: Marks all other words

