Challenges
n
Errors and inconsistencies in data
n
Spotting duplicates might be hard as they
may be spread far apart:
n
may not be group-able using obvious keys
n
Domain-specific
n
Existing manual approaches require retuning
with every new domain
n