Collective entity disambiguation
- Comparisons between annotations made by CSAW,
Cucerzan's algorithm, and
Milne
and Witten's algorithm.
- Web documents
crawled for CSAW evaluation in SIGKDD 2009 paper.
- Ashish Kulkarni released a revision with some ground truth mistakes cleaned up.
- Ground
truth annotations on the above documents collected from
volunteers.
- Minor errata with corrections (thanks to the CS728 class for
pointing these out):
- In section 1.2, we wrote about Wikify!: "even random
disambiguation results in an F1 score of 0.82". This is incorrect.
Choosing the most frequent sense/entity gave the F1 score of
0.82. Random selection gave F1 closer to 0.5.
- The unnumbered display equation in section 2.4.2 (just before
section 2.5 begins) claims to express (a
non-negative) relatedness as per Milne and Witten, but the
numerator is clearly negative.
The M&W 2008
paper gives a formula above Figure 2 that is non-negative,
but whereas the lhs is called relatedness, the
rhs decreases with increasing relatedness. In fact, their
earlier AAAI
paper displays the same formula on page 3,
called sr. A plausible formula can be
found here,
in section 4.3, called "mw_coh".