A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Slides:



Advertisements
Similar presentations
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Advertisements

Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Incorporating Participant Reputation in Community-driven Question Answering Systems Liangjie Hong, Zaihan Yang and Brian D. Davison Computer Science and.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Toward Whole-Session Relevance: Exploring Intrinsic Diversity in Web Search Date: 2014/5/20 Author: Karthik Raman, Paul N. Bennett, Kevyn Collins-Thompson.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
A New Suffix Tree Similarity Measure for Document Clustering Hung Chim, Xiaotie Deng City University of Hong Kong WWW 2007 Session: Similarity Search April.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search Engines and Information Retrieval
ADVISE: Advanced Digital Video Information Segmentation Engine
Expertise Networks in Online Communities: Structure and Algorithms Jun Zhang Mark S. Ackerman Lada Adamic University of Michigan WWW 2007, May 8–12, 2007,
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
1 QA in Discussion Boards  Companies (e.g., Dell, IBM) use discussion boards as ways for customers to get answers to their questions  90% of 40 analyzed.
Overview of Search Engines
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Search Engines and Information Retrieval Chapter 1.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong and Brian D. Davison Computer Science and Engineering Lehigh University.
Interactive Probabilistic Search for GikiCLEF Ray R Larson School of Information University of California, Berkeley Ray R Larson School of Information.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, Yueheng Sun SIGIR’08 Speaker: Yi-Ling Tai Date: 2009/02/09 Finding Question-Answer Pairs from Online.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
1 Blog site search using resource selection 2008 ACM CIKM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Liangjie Hong and Brian D. Davison Department of Computer Science and Engineering Lehigh University SIGIR 2009.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Exploring Traversal Strategy for Web Forum Crawling Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai Microsoft Research Asia, Beijing SIGIR
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
Queensland University of Technology
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Presentation transcript:

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho, Chin Wei Advisor: Dr. Koh, Jia-Ling Date: 09/16/2009

Outline  Introduction  Problem Definition  Classification Methods  Experiment  Conclusion

Introduction  Discussion boards and online forums are important platforms for people to share information. Customer support Community development Interactive reporting and online education  People would like to use discussion boards as problem-solving platforms.

Introduction  Users post questions, usually related to some specific problem, and rely on others to provide potential answers.  90% of 40 discussion boards contain question-answering knowledge.  Question answering content is usually the largest type of content on discussion boards in terms of the number of user- generated posts.

Introduction  Therefore, mining such content becomes desirable and valuable: Search engines can enhance search quality for question or problem related queries by providing answers mined from discussion boards Online QA services such as Yahoo! Answers would benefit from using content extracted from discussion boards Finding experts in social media by utilizing question answering content Automatic chat-bots

Introduction  However, to retrieve this kind of information automatically and effectively is still a non-trivial task.  The existence of other types of information (e.g., announcements, plans, elaborations, etc.) makes it difficult to assume that every thread in a discussion board is about a question. If the first post is about a question, following posts may contain similar experiences and potential solutions. If the first post is an announcement, following posts may contain clarifications, elaborations and acknowledgments.

Introduction  Our Goal: Identifying question-related first posts and finding potential answers in subsequent responses within the corresponding threads. Can we detect question-related threads in an efficient and effective manner? Can we effectively discover potential answers without actually analyzing the content of replied posts? Can this task be treated as a traditional information retrieval problem suitable to a relevance-based approach to the retrieval of question-answering content?

Problem Definition  A discussion board is a collection of threads.  Each thread consists of the first post and following replied posts.

Problem Definition  Questions  If the first post of one thread is about a specific problem that needs to be solved, we would consider that post as a whole to be a question post.  We do not focus on identifying “ question sentences ” or “ question paragraphs ” but instead to find whether the first post is a “ question post ”.

Problem Definition  For example, the following paragraph is a question post from UbuntuForums.org

Problem Definition  If there are multiple questions discussed in the first post, the interaction in following replied posts might become complex. Users may answer all those questions while others may only response to some of them  To simplify the task, we treat it as a single question post.

Problem Definition  Answers  If one of the replied posts contains answers to the questions proposed in the first post, we regard that reply as an answer post.  We do not consider the number of answers should match the number of questions. We only consider those replies that directly answer the questions from the first post.  We also consider replied posts not containing the actual content of answers but providing links to other answers as answer posts.

Problem Definition  Our task is: To detect whether the first post is a “ question post ” containing at least one problem needed to be solved. If the first post is a “ question post ”, try to identify the best answer post either directly answering at least one question proposed in the first post or pointing to other potential answer sources.

Classification Methods  We choose several content-based and non-content based features and carefully compare them individually and also in combinations.  We do not use any service- or dataset- specific heuristics or features (like the rank of users) in our classification model

Classification Methods  Question Detection  Question mark the number of question marks  5W1H Words the number of each 5W1H words  Total number of posts within one thread If one thread has many posts, either the topic of the thread probably shifts or the original first post may not contain enough information and hence further clarifications or elaborations are needed

Classification Methods  Authorship the number of posts one user starts and the number of posts one user replies We consider high quality contents to be answers and highly authoritative authors are users who usually answer others ’ questions Fresh users are more likely to post questions rather than answering questions  N-gram

Classification Methods  Answer Detection  The position of the answer post Answer post usually appears not very close to the bottom if the question receives a lot of replies  Authorship  N-gram

Classification Methods  Stop words the count of each stop word We want to see whether the author of answer posts would use more detailed and precise words rather than “ stop words ”, in contrast to other types of posts such as elaborations, suggestions and acknowledgment.  Query Likelihood Model Score (Language Model) We use this basic language model method to calculate the likelihood that a replied post is relevant to the original question post. We use this feature as an example to show how a relevance based model performs in the task.

Experiment  Data and Experiment Settings 721,442 threads from Photography On The Net, a digital camera forum (DC dataset) 555,954 threads from UbuntuForums, an Ubuntu Linux community forum (Ubuntu dataset) Randomly sampled 572 threads from the Ubuntu dataset and 500 threads from the DC Dataset Manually labeled all posts in these threads into question(answer) posts and non-question(non- answer) posts

Experiment  Question Detection

Experiment  N-grams and Sequential Pattern Mining (which requires a POS tagger) are relatively complicated methods, the computational effort may be impractical for large datasets.  We do further experiments on the combinations of those simple methods and see whether the performance can improve.

Experiment

 Answer Detection

Experiment

Conclusion  N-grams and the combination of several non- content features can improve the performance of detecting question-related threads in discussion boards.  The number of posts a user starts and the number of replies produced and their positions are two crucial factors in determining potential answers.  Relevance-based retrieval methods would not be effective in tackling the problem of finding possible answers.  We are able to design a simple ranking scheme that outperforms previous approaches.