A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Information Retrieval and Organisation Chapter 12 Language Models for Information Retrieval Dell Zhang Birkbeck, University of London.
Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Improvements to BM25 and Language Models Examined ANDREW TROTMAN, ANTTI PUURULA, BLAKE BURGESS AUSTRALASIAN DOCUMENT COMPUTING SYMPOSIUM 2014 MELBOURNE,
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
1 Language Model CSC4170 Web Intelligence and Social Computing Tutorial 8 Tutor: Tom Chao Zhou
IR Challenges and Language Modeling. IR Achievements Search engines  Meta-search  Cross-lingual search  Factoid question answering  Filtering Statistical.
A Markov Random Field Model for Term Dependencies Donald Metzler and W. Bruce Croft University of Massachusetts, Amherst Center for Intelligent Information.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
Aparna Kulkarni Nachal Ramasamy Rashmi Havaldar N-grams to Process Hindi Queries.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
A Comparison of Statistical Significance Tests for Information Retrieval Evaluation CIKM´07, November 2007.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Natural Language Based Reformulation Resource and Web Exploitation for Question Answering Ulf Hermjakob, Abdessamad Echihabi, Daniel Marcu University of.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Date : 2012/10/25 Author : Yosi Mass, Yehoshua Sagiv Source : WSDM’12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Probabilistic Models of Novel Document Rankings for Faceted Topic Retrieval Ben Cartrette and Praveen Chandar Dept. of Computer and Information Science.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
21/11/20151Gianluca Demartini Ranking Clusters for Web Search Gianluca Demartini Paul–Alexandru Chirita Ingo Brunkhorst Wolfgang Nejdl L3S Info Lunch Hannover,
CIKM Opinion Retrieval from Blogs Wei Zhang 1 Clement Yu 1 Weiyi Meng 2 1 Department of.
Chapter 23: Probabilistic Language Models April 13, 2004.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
1 Blog site search using resource selection 2008 ACM CIKM Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
A Critique and Improvement of an Evaluation Metric for Text Segmentation A Paper by Lev Pevzner (Harvard University) Marti A. Hearst (UC, Berkeley) Presented.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Date: 2013/6/10 Author: Shiwen Cheng, Arash Termehchy, Vagelis Hristidis Source: CIKM’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Predicting the Effectiveness.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Ranking-based Processing of SQL Queries Date: 2012/1/16 Source: Hany Azzam (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
GENERATING RELEVANT AND DIVERSE QUERY PHRASE SUGGESTIONS USING TOPICAL N-GRAMS ELENA HIRST.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Personalized, Interactive Question Answering on the Web
Multimedia Information Retrieval
Introduction to Information Retrieval
Presentation transcript:

A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany (CIKM’09) Advisor: Dr. Koh, Jia-Ling Speaker: Cho, Chin-Wei Date: 2010.03.01

Outline Introduction Term Clustering Model Term Clustering Algorithm N-gram Models Experiments Conclusion

Introduction Open domain QA has become one of the most actively investigated topics in natural language processing. A user receives an exact answer to his questions rather than being overwhelmed with a large number of retrieved documents, which he must then sort through to find the desired answer.

Introduction In complete a QA system, document retrieval is an important component which should provide a list of candidate documents to be analyzed by the rest of the system. Retrieved documents are much larger than the required answer, and topic changes typically occur within a single document. Relevant information is most often found in one sentence or two.

Introduction Hence, it is essential to split the text into smaller segments, such as sentences, and rank them in a sentence retrieval step. Retrieved sentences are then further processed using a variety of techniques to extract the final answers.

Introduction Simple word-based unigram: all terms are treated independently no relationships between words are considered a search is performed for only the exact literal words present in the query fail to retrieve other relevant information

Introduction Q: “Who invented the car?” A1: “Between 1832 and 1839, Robert Anderson of Scotland invented the first crude electric car carriage.” . . . A2: “Nicolas-Joseph Cugnot built the first self propelled mechanical vehicle.” A3: “An automobile powered by his own engine was built by Karl Benz in 1885 and granted a patent.”

Introduction For this reason, it is desirable to have a more sophisticated model to capture the semantics of sentences rather than just the term distributions. Improvements in system performance from such schemes have proven challenging: the difficulty of estimating term relationships the difficulty of integrating both exact match and term relationships in a single weighting scheme

Term Clustering Model We use a class-based LM by applying term clustering to capture term relationships. LM-based sentence retrieval the probability P(Q|S) of generating query Q conditioned on the observation of sentence S is first calculated. Sentences are ranked in descending order of this probability.

Term Clustering Model For word-based unigrams, the probability of the query Q given the sentence S is estimated based on query terms:

Term Clustering Model For class-based unigrams, P(Q|S) is computed based on the cluster to which each query term belongs:

Term Clustering Model - Overview For each sentence in the word-based model, the sentence terms are extracted. “Between 1832 and 1839, Robert Anderson of Scotland invented the first crude electric car carriage.” -> “invented”, “car”, …. Then the sentence unigram is created from the terms.

Term Clustering Model - Overview While in the class-based model, using a term clustering technique, a map between terms and clusters is created. In this case, instead of extracting the terms, the clusters which the terms belong to are extracted. (“car”, “vehicle”, “automobile”) (“built”, “invented”) Then the sentence unigram is created from the clusters.

Term Clustering Model - Overview Finally, P(Q|S) is computed based on the new model of S, and the sentences are ranked according to this probability.

Term Clustering Algorithm The Brown algorithm uses mutual information between cluster pairs in a bottom-up approach to maximize Average Mutual Information (AMI) between adjacent clusters.

Term Clustering Algorithm

Term Clustering Algorithm — significantly, highly, strongly, fairly — shipping, carrying — Jazz, symphony, theater For example, seeing the term “highly” in a query, the class-based model is able to retrieve sentences that do not have this term but are nonetheless relevant to the query in that the cluster also contains other terms like “strongly”.

Interpolation Model The class-based model may increase system recall, in that it is able to find more relevant sentences. decrease the system precision, by retrieving more irrelevant sentences. We use the linear interpolation of our new class- based model and the word-based model:

N-gram Models Unigram(N=1) is most often used in information retrieval applications the word dependencies are not considered To overcome this problem, it is necessary to consider longer word contexts, such as those used in the bigram(N=2) and trigram(N=3).

N-gram Models It is very difficult to use higher order n-grams for sentence retrieval. data sparsity at the sentence level is much more pronounced than that at the document level. the probability of seeing large patterns is very low. To overcome this problem, we applied the class-based bigram and trigram:

Experiments Data set TREC2007 questions from the QA track A set of 1,960 sentences were used as input to our system. The lexicon contained nearly 10,000 terms, which included all words present in the questions and the sentences to be processed.

Experiments

Experiments

Conclusion We found out that the class-based model using term clustering is an effective approach to solve the data sparsity, the exact matching, and the term independence problems. Linear interpolation of a class-based bigram and a word-based unigram was investigated. Our results indicated a significant improvement of sentence retrieval mean average precision from 23.62% to 29.91%.