Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching  Google: page rank and anchor text  Hits: hubs and authorities  MSN’s Ranknet: learning to rank  Today’s web dragons.

Similar presentations


Presentation on theme: "Searching  Google: page rank and anchor text  Hits: hubs and authorities  MSN’s Ranknet: learning to rank  Today’s web dragons."— Presentation transcript:

1 Searching  Google: page rank and anchor text  Hits: hubs and authorities  MSN’s Ranknet: learning to rank  Today’s web dragons

2 How to search: Google’s pagerank rank(~me) = rank(p) #outlinks(p)  r(q) =  C(p,q) o(p) r(p) r = C r r is an eigenvector of C  Pagerank  Anchor text Random surfer model  Broken links (hence )  Trapping states (adjust C) ~me p1p1 p2p2 p3p3

3 Chart of the web Terra incognita 30% of nodes  Random surfer 30% of nodes Milgram’s continent Corporate continent 20% of nodes New archipelago 20% of nodes vs random searcher

4 Google search: anchor text ~me: this is the best page ever ~me:you: that is the best page ever  Pagerank  Anchor text Google uses: … and weights them according to a secret recipe  In anchor text?  In URL?  Title  Meta tags  level  Rel font size  Capitalization  Word pos in doc  Secret ingredients

5 HITS: hubs and authorities hub = C auth Principal eigenvector  strongest community Other eigenvectors  other communities hub authority hub(x) =authority(p)  =C(x,p) auth(p)  auth = C T hub hub is an eigenvector of C.C T hub = C.C T hub

6 Using HITS: Ask’s Teoma Web communities jaguar jaguar jaguar jaguar jaguar jaguar

7 Query  neighborhood graph (search hits + neighbors) Using HITS: Ask’s Teoma Web communities jaguar jaguar jaguar jaguar jaguar Hub scores (lists of resources) Authority scores (target pages) helps to deal with synonyms pull in other relevant pages (e.g. Toyota is authority for “auto manufacturers” yet doesn’t contain the term)

8 Learning to rank: MSN’s Ranknet Training set queries with matching documents from human judges Discriminant function e.g. weighted sum of features, plus threshold Machine learning learn the weights Apply to real queries 17,000 queries 10 documents/query human judgement (1–5) 600 features pairs of docs with same query: which is more highly ranked? train a neural net (1-layer, 2-layer) Results? — Pretty good

9 Sergey Brin Larry Page Today’s web dragons 49% Google 1998 2004 23% Yahoo 1994 1996 Inktomi 2002 AltaVista 2003 10% MSN 2005 7%AOL Excite since 1997, Google since 2002 2%Ask (Jeeves) Teoma 2001


Download ppt "Searching  Google: page rank and anchor text  Hits: hubs and authorities  MSN’s Ranknet: learning to rank  Today’s web dragons."

Similar presentations


Ads by Google