Presentation is loading. Please wait.

Presentation is loading. Please wait.

Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm.

Similar presentations


Presentation on theme: "Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm."— Presentation transcript:

1 Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm

2 Problem Description A single entity may be referenced in separate records in textually dissimilar ways. For example “Robert” and “Bob”. Traditional text similarity functions such as edit distance and jaccard coefficient cannot handle these cases. Current research is looking at string transformation databases. These databases can be extremely large.

3 Problem Description

4 Solution: Definitions Rule Application Example: {Olathe → Olathe, 7, 4} Alignment Rule applications cannot overlap Order does not matter Coverage

5 Solution: Algorithm

6

7

8

9

10 Record Matching Application Generating Example Pairs Traditional text matching methods are used (such as jaccard coefficient). Input from domain experts could also be considered but this is expensive. A few incorrect pairs will not effect the end result. Validation of Transformations All approaches involve confirmation by a domain expert.

11 Analysis

12


Download ppt "Arvind Arasu, Surajit Chaudhuri, and Raghav Kaushik Presented by Bryan Wilhelm."

Similar presentations


Ads by Google