Presentation on theme: "Knowledge Base Completion via Search-Based Question Answering"— Presentation transcript:
1 Knowledge Base Completion via Search-Based Question Answering Date：2014/10/23Author：Robert West, Evgeniy Gabrilovich, Kevin Murphy, Shaohua Sun, Rahul Gupta, Dekang LinSource：WWW’14Advisor：Jia-ling KohSpeaker：Sz-Han,Wang
2 Outline Introduction Method Experiment Conclusion Offline training KB CompletionExperimentConclusion
3 Introduction Motivation Large-scale knowledge bases (KBs)—e.g., Freebase , NELL , and YAGO — contain a wealth of valuable information, stored in the form of RDF triples (subject–relation–object)Despite their size, these knowledge bases are still woefully incomplete in many waysIncompleteness of Freebase for some relations that apply to entities of type PERSON
4 Introduction Problem Goal Propose a way to leverage existing Web-search–based question- answering technology to fill in the gaps in knowledge bases in a targeted wayProblemWhich questions should issue to the QA system?the birthplace of the musician Frank Zappawhere does Frank Zappa come from?where was Frank Zappa born? → more effectiveFrank Zappa’s motherwho is the mother of Frank Zappa? → “The Mothers of Invention”who is the mother of Frank Zappa Baltimore? → “Rose Marie Colimore” → correct
5 Outline Introduction Method Experiment Conclusion Offline training KB CompletionExperimentConclusion
6 Framework Input: subject-relation pairs (FRANK ZAPPA, PAERENTS) Output: previously unknown object (ROSE MARIE COLMORE, …)Query template:___ motherparents of ___
7 Offline trainingConstruct Query template : (lexicalization template , augmentation template)Mining lexicalizations template from search logsCount for each relation-template pair (R, 𝑞 )( Relation , Template)count(PARENTS, _ mother)10(PARENTS, parents of _)20(PLACE OF BIRTHDAT, where is _ born)15…Named-entity recognitionNamed-entity recognitionQuery q: parents of Frank ZappaEntity S: Frank ZappaReplace q with a placeholderReplace q with a placeholderTemplate 𝑞 : parents of ___Run QA system→ get answer entityRun QA system→ get answer entityAnswer a: …Francis Zappa.Entity A: Francis ZappaIncrease the count of ( R, 𝑞 )Increase the count of ( R, 𝑞 )(S,A) is linked by a relation RR: PARENTS(Parents, parents of _) +1
8 Offline trainingConstruct Query template : (lexicalization template , augmentation template)Query augmentationAttaching extra words to a query as query augmentationSpecify a property(relation) for which value to be substitutedManual template screeningSelect 10 lexicalization template from the top candidates found by the log-miningSelect 10 augmentation template from the relations pertaining to the subject typeRelationPROFESSIONPARENTSPLACE OF BIRTHCHILDRENNATIONALITYSIBLINGSEDUCATIONETHNICITYSPOUSES[no augmentation]Subject-relation pair:(Frank Zappa, PARENTS)Lexicalization template:__________ motherAugmentation template:PLACE OF BIRTH → BaltimoreQuery:Frank Zappa mother Baltimore
9 KB Completion Query Template Selection Lexicalization template: 10Augmentation template: 10StrategyGreedy (r = ∞)Random (r = 0)Given a heatmap of query qualityConverting heatmap to a probabilitydistributionPr( 𝒒 ) ∝ exp ( r MRR( 𝒒 ) )Sample without replacement100 queries templateDangers of asking too many queries !
10 KB Completion Question answering Use an in-house QA systemQuery analysisFind the head phrase of the queryquery: Frank Zappa motherWeb searchRetrieve the top n result snippet from the search engine
11 KB Completion Question answering Snippet analysis:Score each phrase in the result snippetscore(Rose Marie Colimore)=w1*f1+w2*f2+w3*f3+w4*f4+w5*f5+…Phrase aggregationCompute an aggregate score of each distinct phrasescore(Rose Marie Colimore)=w1*f1+w2*f2+w3*f3+…Phrasef1: ranked of snippetf2: noun phrasef3: IDFf4: closed to the query termf5: related to the head phrase…Rose Marie Colimore10.30.80.9Phrasef1: number of times the phrase appearf2: average valuesf3: maximum values…Rose Marie Colimore2(60+70)/2=7570
12 KB Completion Answer resolution Entity linkingTake into account the lexical context of each mentionTake into account other entities near the given mentionanswer string : Gail → GAILcontext : Zappa married his wife Gail → GAIL ZAPPADiscard incorrectly typed answer entitiesRelation: PARENTS → Type: PersonEntityTypeTHE MOTHERS OF INVENTION XMusicRAY COLLINSPersonMUSICAL ENSEMBLE X….
13 KB Completion Answer resolution , Answer Calibration Answer resolution: merge all of query answer ranking into a single rankingCompute an entity’s aggregate score:the mean of entity’s ranking-specific scores𝑠 𝐸 = 1 𝑁 𝑅 𝑖=1 𝑁 𝑅 𝑆 𝑖 (E)Answer calibration: turn the scores into probabilitiesApply logistic regressionEntity: FRANCIS ZAPPA , 𝑁 𝑅 =4𝜀 2 = 51…𝜀 4 = 49score(FRANCIS ZAPPA )=(51+49)/4=25
14 Outline Introduction Method Experiment Conclusion Offline training KB CompletionExperimentConclusion
15 Experiment Ranking metric Training and Test Data Type: PERSON Relation: PROFESSION、PARENTS、PLACE OF BIRTH、CHILDREN、NATIONALITY、SIBLINGS、EDUCATION、ETHNICITY、SPOUSES100,000 most frequently searched for personDivide into 100 percentiles and random sample 10 subjects per percentile→ 1,000 subjects per relationRanking metricMRR (mean reciprocal rank)MAP (mean average precision)
16 ExperimentQuality of answer rankingQuality of answer calibration
19 Outline Introduction Method Experiment Conclusion Offline training KB CompletionExperimentConclusion
20 Conclusion Presents a method for filling gaps in a knowledge base. Uses a question-answering system, which in turn takes advantage of mature Web-search technology to retrieve relevant and up-to-date text passages to extract answer candidates from.Show empirically that choosing the right queries—without choosing too many—is crucial.For several relations, our system makes a large number of high- confidence predictions.