3 Structure of WWW Highly Decentralized Unstructured Hyperlink Based Disorganized Presentation
4 Searching the WWW Searching : Process of discovering high quality relevant pages in response to specific need for certain information
5 Challenges in Search Engines Index based search engines returns one or million results !! Heuristics used to rank the pages use frequency of occurrence of words Spamming can mislead Index based search engines Human language exhibits synonymy and polysemy Web pages are not self descriptive
6 Searching with Hyperlinks Features –Hyperlinks represent latent human judgment –Hyperlinks provides opportunity to find potential authorities Pitfalls –Links are created for purposes other than potential authorities –Balance between popularity and relevance
7 Focused Subgraph of WWW Authority : A page that is referred by many good hubs Hub : A page that points to many good authorities Authorities and hubs are extracted through focused subgraph which contain set of pages –Whose size is relatively small –Rich in content related to query –Contains strongest authorities
9 Construction of Subgraph Subgraph( , , t, d) : a query string : a text-based search engine t, d : natural numbers. Let R denote the top t results of on Set S = R For each page p R Let + (p) denote the set of all pages p points to. Let - (p) denote the set of all pages pointing to p Add all pages in + (p) to S . If | - (p)| <= d then Add all the pages in - (p) to S Else Add an arbitrary set of d pages from - (p) to S End Return S
10 Pruning the Subgraph In the graph G[S ] induced by the set S –Identify the links that are transverse and intrinsic –Delete all the intrinsic links and retain only transverse links
11 Computing Hubs and Authorities Associate non-negative authority weight and non- negative hub weight with each page Weights of each type are normalized so that squares sum to 1 Use I and O operation iteratively to update the weights – I : x q:(q,,p) E y –O : y q:(p,,q) E x
12 Hubs Authorities Unrelated page of Large in-degree
13 Iterative Algorithm Iterate(G,k) G: a collection of n linked pages K: a natural numbers Let z denote the vector (1,1,1….1) R n Set x 0 = z Set y 0 = z For j = 1,2, ….k Apply the I operation to (x j-1, y j-1), obtaining new x-weights x’ j Apply the O operation to (x’ j, y j-1 ), obtaining new y-weights y’ j Normalize x’ j, obtaining x j. Normalize y’ j, obtaining y j. End Return(x k, y k )
15 Results (Contd…) Comparative results with Altavista, Yahoo, Clever on 26 broad search topics rated as “bad”, “fair”, “good”, “fantastic” For 31%, Yahoo and Clever received equivalent evaluations For 50%, Clever received a higher evaluation For 19%, Yahoo received the higher evaluation Altavista failed to receive higher evaluation on any of the 26 topics.
16 Applications Constructing Taxonomies semiautomatically Trawling the web for Emerging Cybercommunities Mining structured information that succumbs to database techniques
17 Web Resources Clever - http://www.almaden.ibm.com/cs/k53/clever.html http://www.almaden.ibm.com/cs/k53/clever.html Google - http : //www.google.com WebL - http://www.research.compaq.com/SRC/WebL
Your consent to our cookies if you continue to use this website.