Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked.

Similar presentations


Presentation on theme: "Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked."— Presentation transcript:

1 Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked Deck The Case for Partially Randomized Ranking of Search Engine Results

2 @ Carnegie Mellon Databases 2 Popularity as a Surrogate for Quality Search engines want to measure the “quality” of pages Quality is hard to define and measure Various “popularity” measures are used in ranking – e.g., in-links, PageRank, user traffic 1.--------- 2.--------- 3.---------

3 @ Carnegie Mellon Databases 3 Relationship Between Popularity and Quality Popularity : depends on the number of users who “like” a page – relies on both quality and awareness of the page Popularity is different from quality – But strongly correlated when awareness is large Users aware of page p like page p

4 @ Carnegie Mellon Databases 4 Problem Popularity/quality correlation weak for young pages – Even if of high quality, may not (yet) be popular due to lack of user awareness Plus, process of gaining popularity inhibited by “entrenchment effect” – [Cho et. al. WWW’04], [Chakrabarti et. al. SODA’05] [Mowshowitz et. al. Communication’02] and many others

5 @ Carnegie Mellon Databases 5 Entrenchment Effect Search engines show entrenched (already-popular) pages at the top Users discover pages via search engines; tend to focus on top results 1.--------- 2.--------- 3.--------- 4.--------- 5.--------- 6.--------- … entrenched pages user attention new unpopular pages

6 @ Carnegie Mellon Databases 6 Outline Problem introduction Key idea: Mitigate entrenchment by introducing randomness into ranking – Randomized Rank Promotion Scheme – Model of ranking and popularity evolution – Evaluation Summary

7 @ Carnegie Mellon Databases 7 Alternative Approaches to Counter-act Entrenchment Effect Weight links to young pages more – [Baeza-Yates et. al SPIRE ’02] – Proposed an age-based variant of PageRank Extrapolate quality based on increase in popularity – [Cho et. al SIGMOD ’05] – Proposed an estimate of quality based on the derivative of popularity

8 @ Carnegie Mellon Databases 8 Our Approach: Randomized Rank Promotion Select random (young) pages to promote to good rank positions Rank position to promote to is chosen at random 1 2 3 500 501.. 1 500 2 499 501.. 3

9 @ Carnegie Mellon Databases 9 Our Approach: Randomized Rank Promotion Consequence: Users visit promoted pages; improves ability to estimate quality via popularity Compared with previous approaches: Does not rely on temporal measurements (+) Sub-optimal (-)

10 @ Carnegie Mellon Databases 10 Exploration/Exploitation Tradeoff Exploration/Exploitation tradeoff – exploit known high-quality pages by assigning good rank positions – explore quality of new pages by promoting them in rank Existing search engines only exploit (to our knowledge)

11 @ Carnegie Mellon Databases 11 Possible Objectives for Rank Promotion Fairness – Give each page an equal chance to become popular – Incentive for search engines to be fair? Quality – Maximize quality of search results seen by users (in aggregate) – Quality page p: extent to which users “like” p – Q(p) [0,1] our choice

12 @ Carnegie Mellon Databases 12 Quality-Per-Click Metric (QPC) V(p,t) : number of visits made to page p at time t through search engine QPC : average quality of pages viewed by users, amortized over time

13 @ Carnegie Mellon Databases 13 Outline Problem introduction Key idea: Mitigate entrenchment by introducing randomness into ranking – Randomized Rank Promotion Scheme – Model of ranking and popularity evolution – Evaluation Summary

14 @ Carnegie Mellon Databases 14 Desiderata for Randomized Rank Promotion Want ability to: – Control exploration/exploitation tradeoff – “Select” certain pages as candidates for promotion – – “Protect’’ certain pages from demotion 1 2 3 500 501.. 1 500 2 499 501.. 3

15 @ Carnegie Mellon Databases 15 Randomized Rank Promotion Scheme W WmWm W-W m Promotion pool 4 1 2 3 4 1 2 3 random ordering order by popularity LdLd LmLm Remainder

16 @ Carnegie Mellon Databases 16 Randomized Rank Promotion Scheme LdLd k-1 r 1-r Promotion list k = 3 r = 0.5 Remainder 1 12 2 3 4 3456 12 LmLm

17 @ Carnegie Mellon Databases 17 Parameters Promotion pool (W m ) – Uniform rank promotion : give an equal chance to each page – Selective rank promotion : exclusively target zero awareness pages Start rank (k) – rank to start randomization from Degree of randomization (r) – controls the tradeoff between exploration and exploitation

18 @ Carnegie Mellon Databases 18 Tuning the Parameters Objective: maximize quality-per-click (QPC) Two ways to tune – Real-world experiment – Analytical modeling

19 @ Carnegie Mellon Databases 19 Outline Problem introduction Key idea: Mitigate entrenchment by introducing randomness into ranking – Randomized Rank Promotion Scheme – Model of ranking and popularity evolution – Evaluation Summary

20 @ Carnegie Mellon Databases 20 Popularity Evolution Cycle Popularity P(p,t) Rank R(p,t) Awareness A(p,t) Visit rate V(p,t)

21 @ Carnegie Mellon Databases 21 Popularity Evolution Cycle Popularity P(p,t) Rank R(p,t) Awareness A(p,t) Visit rate V(p,t) F AP (A(p,t)) F VA (V(p,t)) F PR (P(p,t)) F RV (R(p,t))

22 @ Carnegie Mellon Databases 22 Deriving Popularity Evolution Curve Popularity P(p,t) time (t) Next step : derive formula for popularity evolution curve Assumptions – Number of pages constant – Pages are created and retired according to a Poisson process with rate parameter – Quality distribution of pages is stationary

23 @ Carnegie Mellon Databases 23 Deriving Popularity Evolution Curve Doing the steady state analysis, we get DETAIL

24 @ Carnegie Mellon Databases 24 Use Popularity Evolution Model to Tune Parameters Model of popularity evolution process (see paper) – Complex dynamic process – To study, we combine approximate analysis with simulation Next step: use model to tune rank promotion scheme – Parameters: k, r and W m – Objective: maximize QPC

25 @ Carnegie Mellon Databases 25 Tuning: Promotion Pool (W m ) -no promotion - uniform promotion - selective promotion k=1 and r=0.2

26 @ Carnegie Mellon Databases 26 Tuning: k and r k: start rank r: degree of randomization

27 @ Carnegie Mellon Databases 27 Tuning: k and r Maximize QPC (Quality-per-click) Avoid excessive “junk” Preserve #1 result for navigational searches

28 @ Carnegie Mellon Databases 28 Model of the Web Squash Linux Web = collection of multiple disjoint topic-specific communities (e.g., ``Linux’’, ``Squash’’ etc.) A community is made up of a set of pages, interested users and related queries

29 @ Carnegie Mellon Databases 29 Robustness Across Different Web Communities

30 @ Carnegie Mellon Databases 30 Summary Entrenchment effect hurts search result quality Solution : Randomized rank promotion Model of Web evolution and QPC metric – Used to tune & evaluate randomized rank promotion Results : – New high-quality pages become popular much faster – Aggregate search result quality significantly improved

31 @ Carnegie Mellon Databases 31 THE END Paper available at : www.cs.cmu.edu/~spandey


Download ppt "Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked."

Similar presentations


Ads by Google