Optimizing the evaluation of a de-duplication function
nNaïve application of a function would require quadratic time
n1000 records would compare 10^6 pairs!
nOur optimizations to avoid materializing all pairs
nGrouped evaluation model
nReordering similarity functions
nPrecede hard functions with simpler canopies