Experiences with the learning approach
n
Too much manual search in preparing
training data
n
Hard to spot challenging and covering sets of
duplicates in large lists
n
Even harder to find close non-duplicates
that
will capture the nuances
Active learning is a generalization of this!
examine instances that are similar on one
attribute but dissimilar on another