Download presentation

Presentation is loading. Please wait.

Published byRaven Wedgeworth Modified over 3 years ago

1
Mining Web’s Link Structure Sushanth Rai University of Texas at Arlington http://cseweb.uta.edu/~rai

2
2

3
3 Structure of WWW Highly Decentralized Unstructured Hyperlink Based Disorganized Presentation

4
4 Searching the WWW Searching : Process of discovering high quality relevant pages in response to specific need for certain information

5
5 Challenges in Search Engines Index based search engines returns one or million results !! Heuristics used to rank the pages use frequency of occurrence of words Spamming can mislead Index based search engines Human language exhibits synonymy and polysemy Web pages are not self descriptive

6
6 Searching with Hyperlinks Features –Hyperlinks represent latent human judgment –Hyperlinks provides opportunity to find potential authorities Pitfalls –Links are created for purposes other than potential authorities –Balance between popularity and relevance

7
7 Focused Subgraph of WWW Authority : A page that is referred by many good hubs Hub : A page that points to many good authorities Authorities and hubs are extracted through focused subgraph which contain set of pages –Whose size is relatively small –Rich in content related to query –Contains strongest authorities

8
8 root base

9
9 Construction of Subgraph Subgraph( , , t, d) : a query string : a text-based search engine t, d : natural numbers. Let R denote the top t results of on Set S = R For each page p R Let + (p) denote the set of all pages p points to. Let - (p) denote the set of all pages pointing to p Add all pages in + (p) to S . If | - (p)| <= d then Add all the pages in - (p) to S Else Add an arbitrary set of d pages from - (p) to S End Return S

10
10 Pruning the Subgraph In the graph G[S ] induced by the set S –Identify the links that are transverse and intrinsic –Delete all the intrinsic links and retain only transverse links

11
11 Computing Hubs and Authorities Associate non-negative authority weight and non- negative hub weight with each page Weights of each type are normalized so that squares sum to 1 Use I and O operation iteratively to update the weights – I : x q:(q,,p) E y –O : y q:(p,,q) E x

12
12 Hubs Authorities Unrelated page of Large in-degree

13
13 Iterative Algorithm Iterate(G,k) G: a collection of n linked pages K: a natural numbers Let z denote the vector (1,1,1….1) R n Set x 0 = z Set y 0 = z For j = 1,2, ….k Apply the I operation to (x j-1, y j-1), obtaining new x-weights x’ j Apply the O operation to (x’ j, y j-1 ), obtaining new y-weights y’ j Normalize x’ j, obtaining x j. Normalize y’ j, obtaining y j. End Return(x k, y k )

14
14 Results (java) Authorities.328 http://www.gamelan.comhttp://www.gamelan.com.251 http://java.sun.comhttp://java.sun.com.190 http://www.digitalfocus.com/digitalfocus/faq/howdoi.htmlhttp://www.digitalfocus.com/digitalfocus/faq/howdoi.html.183 http://sunsite.unc.edu/javafaq/javafaq.htmlhttp://sunsite.unc.edu/javafaq/javafaq.html (Gates) Authorities.643 http://www.roadahead.comhttp://www.roadahead.com.458 http://www.microsoft.comhttp://www.microsoft.com.440 http://www.microsoft.com/corpinfo/bill-g.htm

15
15 Results (Contd…) Comparative results with Altavista, Yahoo, Clever on 26 broad search topics rated as “bad”, “fair”, “good”, “fantastic” For 31%, Yahoo and Clever received equivalent evaluations For 50%, Clever received a higher evaluation For 19%, Yahoo received the higher evaluation Altavista failed to receive higher evaluation on any of the 26 topics.

16
16 Applications Constructing Taxonomies semiautomatically Trawling the web for Emerging Cybercommunities Mining structured information that succumbs to database techniques

17
17 Web Resources Clever - http://www.almaden.ibm.com/cs/k53/clever.html http://www.almaden.ibm.com/cs/k53/clever.html Google - http : //www.google.com WebL - http://www.research.compaq.com/SRC/WebL

18
Questions ??

Similar presentations

OK

Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.

Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on reaction mechanism in organic chemistry Ppt on the land of dense forest Ppt on various social problems in india Ppt on selling techniques Ppt on grease lubrication pdf Ppt on number system for class 10th What does appt only means that Ppt on mauryan period Ppt on kingdom monera notes Download ppt on square and square roots