The learning approach
f1 f2 …fn
Similarity
functions
Record 1 D
Record 2
Record 1 N
Record 3
Record 4 D
Record 5
1.0
0.4 … 0.2 1
0.0
0.1 … 0.3 0
0.3
0.4 … 0.4 1
Mapped
examples
Classifier
Record 6
Record 7
Record 8
Record 9
Record 10
Record 11
Unlabeled
list
0.0
0.1 … 0.3 ?
1.0
0.4 … 0.2 ?
0.6
0.2 … 0.5 ?
0.7
0.1 … 0.6 ?
0.3
0.4 … 0.4 ?
0.0
0.1 … 0.1 ?
0.3
0.8 … 0.1 ?
0.6
0.1 … 0.5 ?
0.0
0.1 … 0.3 0
1.0
0.4 … 0.2 1
0.6
0.2 … 0.5 0
0.7
0.1 … 0.6 0
0.3
0.4 … 0.4 1
0.0
0.1 … 0.1 0
0.3
0.8 … 0.1 1
0.6
0.1 … 0.5 1
AuthorTitleNgrams £ 0.4
AuthorEditDist £ 0.8
YearDifference > 1
All-Ngrams £ 0.48
Non-Duplicate
Non Duplicate
Duplicate
TitleIsNull < 1
BldgNumberMatch £ 0.5
Non-Duplicate
Duplicate
Duplicate
Duplicate
Similarity functions