Presentation is loading. Please wait.

Presentation is loading. Please wait.

VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University.

Similar presentations


Presentation on theme: "VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University."— Presentation transcript:

1 VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University

2 M.-S. ChenNTU2 Outline  Motivation and Goal  Preliminaries and Related work Introduction to Link-analysis  Defects of Traditional Link-analysis and Ideas for Improvement  System Framework and Algorithms  Implementation and Experimental Results  Conclusions

3 M.-S. ChenNTU3 Motivation and Goal  To find the most relevant pages satisfying the user’s information need in the Web  Traditional means for this task Keyword-based search engines  Problems Some relevant pages do not contain the keywords in the page text  An alternative method Analyze the links contained in Web pages instead of ranking by keywords

4 M.-S. ChenNTU4 HITS (1/3)  Authority pages A page pointed to by many other pages  Hub pages A page pointing to many other pages  Mutual reinforcement An authority pointed to by many hub pages is an even better authority A hub pointing to many authority pages is an even better hub Based on this argument, the goal of HITS is to find the set of best authority pages

5 M.-S. ChenNTU5 HITS (2/3) q1 q2 q3 page p x p := sum of y q for all q  p  Let x p and y p denote the authority and hub score of page p, respectively q1 q2 q3 page p y p := sum of x q for all p  q

6 M.-S. ChenNTU6 HITS (3/3)  Iterative algorithm 1. Obtain a set of Web pages using a keyword- based query and expand it to form a base set 2. Assign each page of the base set an initial authority and hub score of 1 3. According to its links, update the scores of each page 4. Normalize the scores so that  ( x p ) 2 =1 and  ( y p ) 2 =1 for all p in the base set 5. Do steps 3 and 4 iteratively until the scores converge

7 M.-S. ChenNTU7 The Problem with HITS  Links in Web pages only reflect page creators’ judgment  Sometimes a link will not be put in the page even though its destination is very relevant e.g: There will be no link to a company ’ s competitor in the same industry in its homepage  We argue: Page readers’ consideration should be of equal importance

8 M.-S. ChenNTU8 The Notion of Virtual Links  The basic idea Identify pages that are heavily accessed within a period, and form a “ hot set ” from these pages Create “ virtual links ” for pages in the hot set and incorporate them into the computation of authority scores  Design a Web warehouse for this task and utilize it to identify authoritative Web pages

9 M.-S. ChenNTU9 System Framework Page Archive Keyword & Ranking Database Web Pages Authority Evaluator Query Interface Clickstream Database Clicking Observer Virtual Link Creator virtual links page content & links keywords scores query results

10 M.-S. ChenNTU10 Creating Virtual Links  Scenario: A user interested in Java- related Web pages came to our system She submitted a query with keyword “ java ” Assume that the query result contains 100 URLs She clicked top 1-10 of the 100 URLs except the 6 th The hot set consists of the 9 URLs clicked

11 M.-S. ChenNTU11 Creating Virtual Links (cont ’ d) URL 1 URL 2 Virtual Hub URL 5 URL 6 URL 7 URL 10  2 criteria URL 1 URL 2 Hub 1 URL 5 URL 6 URL 7 URL 10 Hub 2 Hub n

12 M.-S. ChenNTU12 Algorithm VIPAS (Virtual LInk Powered Authority Search)  Initialization Phase 1. For a query term, perform the regular HITS analysis 2. Collect a base set of pages with computed authority and hub scores and store them in the database  Virtual Link Collection Phase 3. Monitor the user behavior to see whether a URL in the list is clicked by the user or not 4. After a period of user behavior observation, put URLs that are often accessed into the “hot set” 5. Create virtual links for pages in the hot set

13 M.-S. ChenNTU13 Algorithm VIPAS (cont ’ d)  Refinement Phase 6. For each page in the hot set, compute its new authority and hub scores 7. Run several iterations of score updating for pages in the base set  2 flavors VIPAS-VH(VIPAS with virtual links from a Virtual Hub) VIPAS-TH(VIPAS with virtual links from Top Hubs)

14 M.-S. ChenNTU14 Finding Hot Sets 1. In an observing period, pay attention to clicks of continuous URLs in the list 2. When a user continuously clicks several URLs and then skips some URLs following, we mark those that have been skipped 3. Exclude pages marked with a frequency greater than  from the forming of hot sets 4. Among pages left, those that are accessed by at least % users are put into the hot set  Some relevant URLs that have already been browsed by the user will be skipped

15 M.-S. ChenNTU15 Finding Hot Sets (cont ’ d) 1.http://java.sun.com/ 2.http://www.sun.com/java/ 3.http://www.javaworld.com/ 4.http://java.oreilly.com/ 5.http://www.jars.com/ 6.………….. clicked skipped clicked 1.http://java.sun.com/ 2.http://www.sun.com/java/ 3.http://www.javaworld.com/ 4.http://java.oreilly.com/ 5.http://www.jars.com/ 6.………….. skipped clicked skipped clicked URL 4 is marked, but URL 1 is not URL 4 is marked

16 M.-S. ChenNTU16 Assigning Weights to Virtual Links Clickstream 1: (t 1,t 2,t 3,t 4,x 1,x 2 ) Clickstream 2: (t 3,x 1,t 1 ) n pages in the hot set: t 1,t 2,…,t n

17 M.-S. ChenNTU17  Final weight:   For period T i where i  2 Assigning Weights to Virtual Links (cont ’ d) (1/3 is the degeneration factor)

18 M.-S. ChenNTU18 Computing the New Scores  Let x p and y p denote the authority and hub score of page p, respectively  For each page p, we update p ’s authority score by  Similarly, we update p ’s hub score by

19 M.-S. ChenNTU19 User-behavior Observation  Use an ASP script 1.The Source of Java(TM) Technology http://java.sun.com/ 2.…………………. http://…. 3.……… http://… plain URL http://java.sun.com/ replaced by wrapper.asp?URL=http://java.sun.com/ 1.Increment the click count of http://java.sun.com/ 2.Record the time 3.Redirect the user to http://java.sun.com/ Query result for keyword: “Java” Query result page

20 M.-S. ChenNTU20 Implementation and Experiments  Experimental testbed NTUEE website (http://www.ee.ntu.edu.tw/)http://www.ee.ntu.edu.tw/  Data collection 03/28/ ’ 02 ~ 05/31/ ’ 02  Parameters ParameterValue  20%  40% A A 10 H H

21 M.-S. ChenNTU21 Evaluation Method  For a keyword, we manually select a list of authority pages and compare it with the output of each algorithm  Discrepancy coefficient  SNURL (H denotes http://www.ee.ntu.edu.tw)Title 5633H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 7228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 8682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu

22 M.-S. ChenNTU22 Discrepancy Coefficient – Regular HITS RankSNURL (H denotes http://www.ee.ntu.edu.tw)Title 15633H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 77269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 85892H/html_2000/WWW/faculty/NoSort.html[no title] 94959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. 108904H/html_2000/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. 417228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 41(SN 7228) 

23 M.-S. ChenNTU23 Discrepancy Coefficient – VIPAS-VH RankSNURL (H denotes http://www.ee.ntu.edu.tw)Title 15633H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 77229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 87269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 95892H/html_2000/WWW/faculty/NoSort.html[no title] 104959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 6(SN 7228) 

24 M.-S. ChenNTU24 Evaluation Method  Grouping coefficient   Stability The standard deviation of each algorithm ’ s discrepancy coefficients for all of the keywords

25 M.-S. ChenNTU25 Grouping Coefficient – Regular HITS R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 41(SN 7228)  RankSNURL (H denotes http://www.ee.ntu.edu.tw)Title 15633H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 77269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 85892H/html_2000/WWW/faculty/NoSort.html[no title] 94959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. 108904H/html_2000/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff…. 417228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title]

26 M.-S. ChenNTU26 Grouping Coefficient – VIPAS-VH R 1 = 1(SN 5633), R 2 = 5(SN 8682), R 3 = 6(SN 7228)  RankSNURL (H denotes http://www.ee.ntu.edu.tw)Title 15633H/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 293H/professor_c.htmlFaculty members of NTUEE 334H/prodata_c.htmlFaculty members of NTUEE 494H/professor_e.htmlFaculty members of NTUEE 58682H/html_2000/www/faculty/rb-wu/rb-wu.htmHomepage of professor Ruey-Beei Wu 67228H/html_2000/WWW/faculty/english/Wu-Rei-Bei.html[no title] 77229H/html_2000/WWW/faculty/english/Cao-Heng-Wei.html[no title] 87269H/html_2000/WWW/faculty/english/Chen-Qiu-Lin.html[no title] 95892H/html_2000/WWW/faculty/NoSort.html[no title] 104959H/content/chinese/required/differential_equations.htmlEngineering Mathematics I: Diff….

27 M.-S. ChenNTU27 Experimental Results

28 M.-S. ChenNTU28 Experimental Results (cont ’ d)

29 M.-S. ChenNTU29 Conclusions  Link-analysis algorithms are popular in Web information retrieval But they need further improvement  In our work, we built a Web warehouse Incorporate user feedback into the identification of authoritative resources (Algorithm VIPAS) Experimental results show that VIPAS is very effective and the warehouse is able to retrieve much more valuable information for users


Download ppt "VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University."

Similar presentations


Ads by Google