Presentation is loading. Please wait.

Presentation is loading. Please wait.

Entity Profiling with Varying Source Reliabilities Furong Li, Mong Li Lee, Wynne Hsu {furongli, leeml, comp.nus.edu.sg National University of Singapore.

Similar presentations


Presentation on theme: "Entity Profiling with Varying Source Reliabilities Furong Li, Mong Li Lee, Wynne Hsu {furongli, leeml, comp.nus.edu.sg National University of Singapore."— Presentation transcript:

1 Entity Profiling with Varying Source Reliabilities Furong Li, Mong Li Lee, Wynne Hsu {furongli, leeml, comp.nus.edu.sg National University of Singapore

2 Outline  Motivation  Proposed Method  Performance Study  Conclusion 2

3 Entities in Multiple Sources Various name representations 3

4 Entities in Multiple Sources Various name representations Erroneous attribute values 4

5 Entities in Multiple Sources Various name representations Erroneous attribute values Incomplete information 5 No opening hours

6 Entities in Multiple Sources Various name representations Erroneous attribute values Incomplete information Ambiguous references 6

7 What We Want Entity Profiling 7 Various name representations Erroneous attribute values Incomplete information Ambiguous references

8 The Problem Involves Two Tasks  Record linkage  [Getoor et al, VLDB’12], [Negahban et al, CIKM’12]  Truth discovery  [Li et al, VLDB’13], [Yin et al, KDD’07] (LocalEats) (FourSquare) (Urbanspoon) Frank The erroneous values may prevent correct linkages Incomplete picture of the data limits the effectiveness of truth discovery

9 State-of-the-art Method  [Guo et al, VLDB’10]  Assume a set of (soft) uniqueness constraints  Transform records into attribute value pairs  Limitations  Make wrong associations when the percentage of erroneous values increases  Computationally expensive  The uniqueness constraint limits its generality 9

10 Outline  Motivation  Proposed Method  Performance Study  Conclusion 10

11 A Motivating Example X X ? √ √ √ 11 matching -> profiles

12 A Motivating Example X X ? 12 matching -> profiles √ √ √

13 The Example Tells Us 13

14 The Proposed Two-phase Method 14 X X

15 Confidence Based Matching  Bootstrap the framework with a small set of confident matches  Initialize reliability matrix based on confident matches X X 15

16 Confidence Based Matching 16

17 Confidence Based Matching 17

18 Confidence Based Matching 18

19 Adaptive Matching 19 Cluster signatures <- current truth Update reliability matrix Refine clusters by pruning the most dissimilar records Discount records belonging to multiple clusters Lower the impact of erroneous values on our matching decision X X

20 Outline  Motivation  Proposed Method  Performance Study  Conclusion 20

21 Comparative Methods  PIPELINE  Record linkage [1] + Truth discovery [2]  MATCH [3]  State-of-the-art method  COMET  The proposed method 1.Negahban et al. Scaling multiple-source entity resolution using statistically efficient transfer learning. In CIKM, Yin et al. TruthFinder. In KDD, Guo et al. Record linkage with uniqueness constraints and erroneous values. VLDB,

22 Results on Restaurant Dataset 22

23 23 %err: percentage of erroneous values %ambi: percentage of records with abbreviated names

24 24 %err: percentage of erroneous values %ambi: percentage of records with abbreviated names

25 Scalability Experiments 25

26 Outline  Motivation  Proposed Method  Performance Study  Conclusion 26

27 Conclusion  Address the problem of building entity profiles by collating data records from multiple sources in the presence of erroneous values  Interleave record linkage with truth discovery  Varying source reliabilities  Reduce the impact of erroneous values on matching decisions 27

28 Thanks! Q & A 28


Download ppt "Entity Profiling with Varying Source Reliabilities Furong Li, Mong Li Lee, Wynne Hsu {furongli, leeml, comp.nus.edu.sg National University of Singapore."

Similar presentations


Ads by Google