Presentation is loading. Please wait.

Presentation is loading. Please wait.

22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY.

Similar presentations


Presentation on theme: "22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY."— Presentation transcript:

1 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY

2 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Introduction: Web Search  Web search – the access to the Web for hundreds of millions of people  Hundreds of millions of queries per day  Queries + people = TRAFFIC  A HUGE incentive for web site owners to rank highly in search engine results Communicate some message (advertising, political statement) Install viruses, adware, etc. Google Yahoo! MSN Search Ask A9 Exalead Gigablast + metasearch + many more!

3 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Introduction: Web Spam  a.k.a. search engine spam, spamdexing  Any technique to manipulate search engine results Target page gets an undeservedly higher ranking  Many methods Link farms, keyword stuffing, cloaking, link bombs, and more  The target of much of our work!

4 Propagating Trust and Distrust to Demote Web Spam Baoning Wu, Vinay Goel, and Brian D. Davison Computer Science & Engineering Lehigh University Bethlehem, PA USA

5 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Outline  Background and motivation  Proposed methods  Experimental results

6 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Background: PageRank  (Page and Brin, 1998)  Uses number and status of “parents” to determine status of child  r (i+1) = (1-α) * T * r (i) + α * s r: PageRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor; α: jump probability s: uniform distribution of 1/N  PageRank score generates a ranking of importance of node

7 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Background: TrustRank  (Gyongyi and Garcia-Molina, VLDB 2004)  Uses number and trust of “parents” to determine trust status of child  t (i+1) = (1-α) * T * t (i) + α * s t: TrustRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor s: seed set trust score distribution  Vector of size N, but only seed nodes are non-zero  Demotes web spam by propagating trust from a known good seed set.

8 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Specific Motivation  In TrustRank Parent divides its trust among its children. This may not be optimal – real-world trust relationships are independent of the number of trusted entities.  Distrust can also be propagated. A B Hyperlink Trust Propagation Distrust Propagation

9 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Key steps in propagation  Decay of trust (d) Trust is not perfectly transitive.  Splitting of trust For each parent, how to divide its score among its children.  Accumulation of trust For each child, how to accumulate the overall score given the portions from all of its parents.

10 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Outline  Background and motivation  Proposed methods  Experimental results

11 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Choices for Trust Splitting  Given a node i with trust score TR(i) and O(i) outgoing links: Equal splitting  Gives d*TR(i)/O(i) to each child (used by TrustRank) Constant splitting  Gives d*TR(i) to each child Logarithmic splitting  Gives d*TR(i)/log(1+O(i)) to each child

12 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Choices for Trust Accumulation  Simple summation Sum the trust values from each parent  Maximum share Use the maximum of the trust values sent by the parents  Maximum parent Sum the trust values but never exceed the trust score of most-trusted parent

13 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Propagating Distrust  Distrust can be propagated from a seed set of bad nodes.  Similar to trust propagation, but in reverse – follow incoming links, not outgoing links  Same key choices for decay, splitting and accumulation

14 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Combining Trust and Distrust  For each node i, Trust score TR(i) and Distrust score DIS_TR(i), the combination score Total(i) can be Total(i) = ŋ * TR(i) – ß * DIS_TR(i) where 0 ≤ ŋ ≤ 1, 0 ≤ ß ≤ 1

15 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Outline  Background and motivation  Proposed methods  Experimental results

16 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Data set  20M pages from the Swiss search engine [search.ch] in 2004  350K sites with “.ch” domain We used only this site graph  Seed sets 3,589 labeled sites as using web spam with various techniques (provided) 20,005 sites with pages in dir.search.ch topics as trusted set

17 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Experimental Design  Explore various combinations of trust and distrust propagation  Evaluation Performance of TrustRank is the number of spam sites found among the highest- ranked ~1% of sites. We use the same metric in this work.

18 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Baseline result AlgorithmNum. spam sites PageRank90 TrustRank58 Topical TrustRank (Wu et al., WWW2006) 33-42

19 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Simple TrustRank Improvement: Increase jump probability (α) (α)(α) default α=0.15

20 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Other trust propagation methods Algorithm Constant Splitting Logarithmic Splitting Decay= 0.10.30.70.90.10.30.70.9 Simple Summation 364 Maximum Share 34 13122018 Maximum Parent 273233 372272932

21 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Results of propagating distrust Combined equally with TrustRank, 200 seeds Algorithm Constant Splitting Logarithmic Splitting d Distrust = 0.10.30.70.90.10.30.70.9 Simple Summation 53 55 5753 Maximum Share 53 595352 Maximum Parent 53 5753

22 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Combining trust and distrust Using best scoring trust and distrust formulations, beta=(1-eta) (Distrust Only)(Trust Only) >2200

23 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Coverage of trust propagation Algorithm Constant Splitting Logarithmic Splitting Decay 0.10.30.70.90.10.30.70.9 Maximum Share 77.7177.7377.74 77.1977.7277.73 Maximum Parent 77.5277.7177.7377.7476.9377.6077.7177.72 Percentage of sites affected by approach. TrustRank reached 76.05%.

24 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Conclusions  Propagating trust based on outdegree does not appear to be optimal.  Alternative splitting and accumulation methods can help to demote top ranked spam sites.  Propagating distrust can also help to demote top ranked spam sites.  Additional tests needed! E.g., to examine impact on retrieval

25 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop Thank You! Questions? Contact Info: Dr. Brian D. Davison davison(at)cse.lehigh.edu WUME Laboratory Computer Science and Engineering Lehigh University Bethlehem, PA 18015 USA The WUME Lab http://wume.cse.lehigh.edu/


Download ppt "22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop L EHIGH U NIVERSITY."

Similar presentations


Ads by Google