Knowledge Base Completion via Search-Based Question Answering

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Date : 2012/09/20 Author : Sina Fakhraee, Farshad Fotouhi Source : KEYS12 Speaker : Er-Gang Liu Advisor : Dr. Jia-ling Koh 1.
Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.
Date: 2013/1/17 Author: Yang Liu, Ruihua Song, Yu Chen, Jian-Yun Nie and Ji-Rong Wen Source: SIGIR12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Adaptive.
Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,
Psychological Advertising: Exploring User Psychology for Click Prediction in Sponsored Search Date: 2014/03/25 Author: Taifeng Wang, Jiang Bian, Shusen.
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Overview of the TAC2013 Knowledge Base Population Evaluation: English Slot Filling Mihai Surdeanu with a lot help from: Hoa Dang, Joe Ellis, Heng Ji, and.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
1 Asking What No One Has Asked Before : Using Phrase Similarities To Generate Synthetic Web Search Queries CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG.
Time-sensitive Personalized Query Auto-Completion
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Large-Scale Cost-sensitive Online Social Network Profile Linkage.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Estimating Importance Features for Fact Mining (With a Case Study in Biography Mining) Sisay Fissaha Adafre School of Computing Dublin City University.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
1 Efficient Search Ranking in Social Network ACM CIKM2007 Monique V. Vieira, Bruno M. Fonseca, Rodrigo Damazio, Paulo B. Golgher, Davi de Castro Reis,
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Understanding and Predicting Personal Navigation Date : 2012/4/16 Source : WSDM 11 Speaker : Chiu, I- Chih Advisor : Dr. Koh Jia-ling 1.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
1 Statistical source expansion for question answering CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
A Novel Pattern Learning Method for Open Domain Question Answering IJCNLP 2004 Yongping Du, Xuanjing Huang, Xin Li, Lide Wu.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Using linked data to interpret tables Varish Mulwad September 14,
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TO Each His Own: Personalized Content Selection Based on Text Comprehensibility Date: 2013/01/24 Author: Chenhao Tan, Evgeniy Gabrilovich, Bo Pang Source:
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Open question answering over curated and extracted knowledge bases
Personalized, Interactive Question Answering on the Web
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
Social Knowledge Mining
Speaker: Jim-an tsai advisor: professor jia-lin koh
A Large Scale Prediction Engine for App Install Clicks and Conversions
Enriching Structured Knowledge with Open Information
Reading Report on Question Answering
Question Answering & Linked Data
Learning Literature Search Models from Citation Behavior
Measuring Complexity of Web Pages Using Gate
Intent-Aware Semantic Query Annotation
Date : 2013/1/10 Author : Lanbo Zhang, Yi Zhang, Yunfei Chen
Enriching Taxonomies With Functional Domain Knowledge
Connecting the Dots Between News Article
Presentation transcript:

Knowledge Base Completion via Search-Based Question Answering Date: 2014/10/23 Author: Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang Lin Source: WWW’14 Advisor: Jia-ling Koh Speaker: Sz-Han,Wang

Outline Introduction Method Experiment Conclusion Offline training KB Completion Experiment Conclusion

Introduction Motivation Large-scale knowledge bases (KBs)—e.g., Freebase , NELL , and YAGO — contain a wealth of valuable information, stored in the form of RDF triples (subject–relation–object) Despite their size, these knowledge bases are still woefully incomplete in many ways Incompleteness of Freebase for some relations that apply to entities of type PERSON

Introduction Problem Goal Propose a way to leverage existing Web-search–based question- answering technology to fill in the gaps in knowledge bases in a targeted way Problem Which questions should issue to the QA system? the birthplace of the musician Frank Zappa where does Frank Zappa come from? where was Frank Zappa born? → more effective Frank Zappa’s mother who is the mother of Frank Zappa? → “The Mothers of Invention” who is the mother of Frank Zappa Baltimore? → “Rose Marie Colimore” → correct

Outline Introduction Method Experiment Conclusion Offline training KB Completion Experiment Conclusion

Framework Input: subject-relation pairs (FRANK ZAPPA, PAERENTS) Output: previously unknown object (ROSE MARIE COLMORE, …) Query template: ___ mother parents of ___

Offline training Construct Query template : (lexicalization template , augmentation template) Mining lexicalizations template from search logs Count for each relation-template pair (R, 𝑞 ) ( Relation , Template) count (PARENTS, _ mother) 10 (PARENTS, parents of _) 20 (PLACE OF BIRTHDAT, where is _ born) 15 … Named-entity recognition Named-entity recognition Query q: parents of Frank Zappa Entity S: Frank Zappa Replace q with a placeholder Replace q with a placeholder Template 𝑞 : parents of ___ Run QA system → get answer entity Run QA system → get answer entity Answer a: …Francis Zappa. Entity A: Francis Zappa Increase the count of ( R, 𝑞 ) Increase the count of ( R, 𝑞 ) (S,A) is linked by a relation R R: PARENTS (Parents, parents of _) +1

Offline training Construct Query template : (lexicalization template , augmentation template) Query augmentation Attaching extra words to a query as query augmentation Specify a property(relation) for which value to be substituted Manual template screening Select 10 lexicalization template from the top candidates found by the log-mining Select 10 augmentation template from the relations pertaining to the subject type Relation PROFESSION PARENTS PLACE OF BIRTH CHILDREN NATIONALITY SIBLINGS EDUCATION ETHNICITY SPOUSES [no augmentation] Subject-relation pair: (Frank Zappa, PARENTS) Lexicalization template: __________ mother Augmentation template: PLACE OF BIRTH → Baltimore Query: Frank Zappa mother Baltimore

KB Completion Query Template Selection Lexicalization template: 10 Augmentation template: 10 Strategy Greedy (r = ∞) Random (r = 0) Given a heatmap of query quality Converting heatmap to a probability distribution Pr( 𝒒 ) ∝ exp ( r MRR( 𝒒 ) ) Sample without replacement 100 queries template Dangers of asking too many queries !

KB Completion Question answering Use an in-house QA system Query analysis Find the head phrase of the query query: Frank Zappa mother Web search Retrieve the top n result snippet from the search engine

KB Completion Question answering Snippet analysis: Score each phrase in the result snippet score(Rose Marie Colimore)=w1*f1+w2*f2+w3*f3+w4*f4+w5*f5+… Phrase aggregation Compute an aggregate score of each distinct phrase score(Rose Marie Colimore)=w1*f1+w2*f2+w3*f3+… Phrase f1: ranked of snippet f2: noun phrase f3: IDF f4: closed to the query term f5: related to the head phrase … Rose Marie Colimore 1 0.3 0.8 0.9 Phrase f1: number of times the phrase appear f2: average values f3: maximum values … Rose Marie Colimore 2 (60+70)/2=75 70

KB Completion Answer resolution Entity linking Take into account the lexical context of each mention Take into account other entities near the given mention answer string : Gail → GAIL context : Zappa married his wife Gail → GAIL ZAPPA Discard incorrectly typed answer entities Relation: PARENTS → Type: Person Entity Type THE MOTHERS OF INVENTION X Music RAY COLLINS Person MUSICAL ENSEMBLE X ….

KB Completion Answer resolution , Answer Calibration Answer resolution: merge all of query answer ranking into a single ranking Compute an entity’s aggregate score: the mean of entity’s ranking-specific scores 𝑠 𝐸 = 1 𝑁 𝑅 𝑖=1 𝑁 𝑅 𝑆 𝑖 (E) Answer calibration: turn the scores into probabilities Apply logistic regression Entity: FRANCIS ZAPPA , 𝑁 𝑅 =4 𝜀 2 = 51 … 𝜀 4 = 49 score(FRANCIS ZAPPA )=(51+49)/4=25

Outline Introduction Method Experiment Conclusion Offline training KB Completion Experiment Conclusion

Experiment Ranking metric Training and Test Data Type: PERSON Relation: PROFESSION、PARENTS、PLACE OF BIRTH、CHILDREN、 NATIONALITY、SIBLINGS、EDUCATION、ETHNICITY、SPOUSES 100,000 most frequently searched for person Divide into 100 percentiles and random sample 10 subjects per percentile → 1,000 subjects per relation Ranking metric MRR (mean reciprocal rank) MAP (mean average precision)

Experiment Quality of answer ranking Quality of answer calibration

Experiment Quality of answer calibration

Experiment Number of high-quality answers

Outline Introduction Method Experiment Conclusion Offline training KB Completion Experiment Conclusion

Conclusion Presents a method for filling gaps in a knowledge base. Uses a question-answering system, which in turn takes advantage of mature Web-search technology to retrieve relevant and up-to-date text passages to extract answer candidates from. Show empirically that choosing the right queries—without choosing too many—is crucial. For several relations, our system makes a large number of high- confidence predictions.

Ranking metric MRR (mean reciprocal rank) MAP (mean average precision) 𝑅𝑅 𝑖 = 1 𝑟 𝑖 MRR= 1 𝑛 𝑖=1 𝑛 𝑅𝑅 𝑖 MMR=(1/3 + 1/2 + 1)/3 = 0.61 AP 𝑖 = 1 𝑚 𝑗=1 𝑚 𝑃 𝑗 MAP= 1 𝑛 𝑖=1 𝑛 𝐴𝑃 𝑖 MAP=(0.57 + 0.83 + 0.4)/3 = 0.6 Query Average Precision Q1 0.57 Q2 0.83 Q3 0.4