Presentation is loading. Please wait.

Presentation is loading. Please wait.

Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper.

Similar presentations


Presentation on theme: "Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper."— Presentation transcript:

1 Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper Advisor : Dr. Koh Jia-Ling Speaker : Che-Wei Liang Date : 2007.11.20 1

2 Outline Introduction Problem Definitions Computational Model – Web Site Trustworthiness and Fact Confidence – Iterative Computation Empirical Study Conclusions 2

3 Introduction World-wide web – a necessary part of our lives. – ex: Amazon.com, ShopZilla.com. Is the world-wide web always trustable? – There is no guarantee for the correctness of information on the web. 3

4 Introduction Example 1: Authors of books  incomplete!  incorrect! 4

5 Introduction Ranking web pages – According to authority based on hyperlinks. – Ex: Authority-Hub analysis, PageRank, more general link-based analysis. Does authority or popularity of web sites lead to accuracy of information? 5

6 Introduction Veracity problem – Discover the true fact about each object. 6

7 Problem Definitions Define1: Confidence of facts. – The probability of a fact f being correct, denote by s(f). Define2: Trustworthiness of web sites. – The expected confidence of the facts provided by a web site w, denote by t(w). 7

8 Problem Definitions Facts may be conflict or supportive to each other. – Ex: “Jennifer Widom”, “J. Widom” Concept of implication – imp(f 1 → f 2 ): f 1 ’s influence on f 2 ’s confidence. 8

9 Basic heuristic 1. Usually there is only one true fact for a property of an object. 2. This true fact appears to be the same or similar on different web sites. 9

10 Basic heuristic (cont.) Basic heuristic 3. The false facts on different web sites are less likely to be the same or similar. 4. In a certain domain, a web site that provides mostly true facts for many objects will likely provide true facts for other objects. 10

11 Web Site Trustworthiness and Fact Confidence Trustworthiness t(w) where F(w) is the set of facts provided by w. 11

12 Web Site Trustworthiness and Fact Confidence more difficult to estimate the confidence of a fact. 12

13 Web Site Trustworthiness and Fact Confidence Simple case – f 1 is the only fact about object o 1 – assume w 1 and w 2 are independent. Confidence s(f) W(f) is the set of web sites providing f. 13

14 Web Site Trustworthiness and Fact Confidence Trustworthiness score of a web site τ(w) is between 0 and + ∞, better characterizes how accurate w is. – ex: t(w 1 ) = 0.9, t(w 2 ) = 0.99  t(w 2 ) = 1.1 × t(w 1 )  τ(w 2 ) = 2 × τ(w 1 ) 14

15 Web Site Trustworthiness and Fact Confidence Confidence score of a fact – Property: 15

16 Web Site Trustworthiness and Fact Confidence adjusted confidence score of a fact f 16

17 Web Site Trustworthiness and Fact Confidence Compute the confidence of f based on σ*(f) in the same way as computing it based on σ(f). Different web sites are independent.  add a dampening factor γ, 0 < γ < 1.  incorrect! 17

18 Web Site Trustworthiness and Fact Confidence Negative-confidence problem – a fact f conflicting with some facts provided by trustworthy web sites.  σ*(f) < 0 and s*(f) < 0. – If γ . σ*(f) > 0, s(f) is very close to s*(f). – If γ . σ*(f) < 0, s(f) is close to zero but still positive.  unreasonable! 18

19 Iterative Computation T RUTH F INDER - Iterative method – TruthFinder has little information about the web sites and the facts. – Each iteration, improves its knowledge about trustworthiness and confidence. – Stops when the computation reaches a stable state. 19

20 Empirical Study Compare with VOTING – Which Chooses the fact that is provided by most web sites. Intel PC with a 1.66GHz dual-core processor, 1GB memory, Windows XP Professional. ρ = 0.5 and γ = 0.3. 20

21 Empirical Study 21

22 Empirical Study 22

23 Empirical Study 23

24 Empirical Study 24

25 Conclusions Introduce and formulate the Veracity problem – resolving conflicting facts from multiple web site. – finding true facts among them. Propose T RUTH F INDER – Utilizes Web site trustworthiness and fact confidence to find trustable web sites and true facts. Experiment achieves high accuracy. 25


Download ppt "Truth Discovery with Multiple Confliction Information Providers on the Web Xiaoxin Yin, Jiawei Han, Philip S.Yu Industrial and Government Track short paper."

Similar presentations


Ads by Google