1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TrustRank Algorithm Srđan Luković 2010/3482
Sandeep Pandey 1, Sourashis Roy 2, Christopher Olston 1, Junghoo Cho 2, Soumen Chakrabarti 3 1 Carnegie Mellon 2 UCLA 3 IIT Bombay Shuffling a Stacked.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
Vote Calibration in Community Question-Answering Systems Bee-Chung Chen (LinkedIn), Anirban Dasgupta (Yahoo! Labs), Xuanhui Wang (Facebook), Jie Yang (Google)
Asking Questions on the Internet
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Evaluating Search Engine
Finding High-Quality Content in Social Media chenwq 2011/11/26.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
CS 345A Data Mining Lecture 1 Introduction to Web Mining.
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
VIPAS: Virtual Link Powered Authority Search in the Web Chi-Chun Lin and Ming-Syan Chen Network Database Laboratory National Taiwan University.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
(hyperlink-induced topic search)
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
Motivation When searching for information on the WWW, user perform a query to a search engine. The engine return, as the query’s result, a list of Web.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Know your Neighbors: Web Spam Detection Using the Web Topology Presented By, SOUMO GORAI Carlos Castillo(1), Debora Donato(1), Aristides Gionis(1), Vanessa.
An Analysis of Assessor Behavior in Crowdsourced Preference Judgments Dongqing Zhu and Ben Carterette University of Delaware.
Using Hyperlink structure information for web search.
1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
1 Computing Relevance, Similarity: The Vector Space Model.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Ranking CSCI 572: Information Retrieval and Search Engines Summer 2010.
SP_IRS Introduction to Research in Special and Inclusive Education(Autumn 2015) Lecture 1: Introduction Lecturer: Mr. S. Kumar.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Diversifying Search Results Rakesh AgrawalSreenivas GollapudiSearch LabsMicrosoft Research Alan HalversonSamuel.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
A code-centric cluster-based approach for searching online support forums for programmers Christopher Scaffidi, Christopher Chambers, Sheela Surisetty.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
1 The EigenRumor Algorithm for Ranking Blogs Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen ( 嚴聖筌 )
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
1 What’s New on the Web? The Evolution of the Web from a Search Engine Perspective A. Ntoulas, J. Cho, and C. Olston, the 13 th International World Wide.
Text Based Information Retrieval
WSRec: A Collaborative Filtering Based Web Service Recommender System
HITS Hypertext-Induced Topic Selection
Methods and Apparatus for Ranking Web Page Search Results
Introduction to Web Mining
A Comparative Study of Link Analysis Algorithms
Improved Algorithms for Topic Distillation in a Hyperlinked Environment (ACM SIGIR ‘98) Ruey-Lung, Hsiao Nov 23, 2000.
Junghoo “John” Cho UCLA
CS 345A Data Mining Lecture 1
CS 345A Data Mining Lecture 1
Introduction to Web Mining
CS 345A Data Mining Lecture 1
Presentation transcript:

1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)

2 Introduction  QA portals allowing users to answer questions posted by others are rapidly growing in popularity.  The reason is that people can share their knowledge, and can find answers for both common and unique questions.  Some of these information needs are too specific to formulate as web search queries, or the content simply does not exist on the web.

3 Introduction  Unfortunately, the quality of the submitted answers is uneven, ranging from excellent detailed answers to snappy and insulting remarks or even advertisements for commercial content.  Therefore, it is increasingly important to better understand the issues of authority and trust in such communities.

4 Introduction  QA portals provide many mechanisms for community feedback.  When a question author chooses a best answer, he or she can provide a “quality” rating.  Another measure of quality of answer are the “thumbs up” and “thumbs down” votes.  Such community feedback is extremely valuable, but requires some time to accumulate.

5 Introduction  It becomes important to estimate the authority of users without exclusively relying on user feedback.  In particular, we attempt to discover authoritative users for specific question categories by analyzing the link structure of the community.

6 Related Work  The HITS algorithm is based on the observation that there are two types of pages: hub: links to authoritative pages. authority: source of information on given topic.

7  HITS assigns each page two scores: hub and authority value. Hub value represents the quality of outgoing links from the page. Authority value represents the quality of information located on that page. Related Work

8 Authority in QA Portals Q : hub A : authority

9 Authority in QA Portals  H(i) is the hub value of each user i from set of users 0..K posting questions  A(j) is the authority value of each user j from set of users 0..M posting answers.  The vectors H and A are initialized to all 0 and 1.

10 Experimental Setup  We observe that authoritative users tend to post answers that are popular or, obtain high ratings from the original question posters.

11 Experimental Setup  3 possible “gold standard” quality scores: Votes: number of positive votes minus negative votes combined with total percent of positive votes an author received from other users, averaged over all answers attempted. %Best: the fraction of best answers awarded to an author by asker over all answers attempted. Ratings: the number of stars an author obtains when their answer is selected as the “best answer” by the asker, averaged over all answers attempted.

12 Experimental Setup  Pearson correlation coefficient:  x : the ranks of users according to our authority estimation method.  y : the ranks of users according to the scores derived from the user feedback.

13 Experimental Setup  Methods compared: HITS: Our method. Degree (Baseline): Frequent posters tend to have significant interest invested in the topic, and degree of a node correlates with answer quality.

14 Experimental Setup

15 Experimental Results

16 Experimental Results

17 Experimental Results

18 Experimental Results

19 Experimental Results  We found that the goodness of fit (R 2 ) of the power law trendline correlates inversely with HITS performance on predicting the %Best scores for each authority.  The more a category graph distribution deviates form the power law, the better HITS authority scores correlate with user feedback.

20 Experimental Results  In the case of Government and Engineering, we can distinguish smaller groups (communities) which appear around subtopics.  In this case, HITS can successfully find expert users.

21 Experimental Results

22 Conclusions  QA portals are a rapidly emerging alternative to web search that exhibit dynamics and structures different from the traditional static Web.  In this paper, we presented an adaptation of the HITS algorithm for predicting experts in QA portals.  We performed a large scale empirical evaluation of this method, demonstrating its effectiveness for discovering authorities in topical categories.