Assignment 3: Using Learning for real world applications
The generic sequence labelling problem
You have a sequence of entities: e1, e2, e3, ..., en
And a sequence of corresponding labels: l1, l2, l3, ..., ln
The problem is: given a new sequence of entities, how can you come up with a sequence of
corresponding labels L*.
This problem applies to many real world scenarios. For this assignment you will need to solve any of the following three problems
Problem 1 – Parts of speech (POS) annotation problem
Entities’ sequence: words in a sentence
Label sequence: corresponding POS tags.
For example,
For the entities’ sequence: |
time |
flies |
like |
an |
arrow |
|
Possible label sequences are: |
N |
V |
P |
A |
N |
:L1 |
|
V |
N |
P |
A |
N |
:L2 |
|
N |
N |
V |
A |
N |
:L3 |
The most likely label sequence for the above sentence should be L1.
Problem 2 – Protien sequence annotation problem
Entities’ sequence: P1, P2, ...
Label sequence: P, S, T, ...
The set of labels for protein structures is {Primary (P), Secondary (S), and Tertiary (T)}
Problem 3 – Gene Sequence annotation problem
The details (corpus links etc) are in moodle.