Challenges
nErrors and inconsistencies in data
nSpotting duplicates might be hard as they may be spread far apart: 
nmay not be group-able using obvious keys
nDomain-specific
nExisting manual approaches require retuning with every new domain
n