Reading Report on Question Answering

Reading Report on Question Answering
论文阅读瞿裕忠南京大学计算机系

Articles Open Domain Question Answering via Semantic Enrichment
WWW2015, Microsoft, UIUC HAWK – Hybrid Question Answering using Linked Data ESWC2015, University of Leipzig

Agenda Introduction Methodology Feature development (Ranking)
Answer type prediction (Probabilistic models) Experiments Conclusion

Introduction Web Search  open domain QA
KB-based QA (structured knowledge bases) Corpus-based QA (unstructured corpus)

Introduction Difficulty for KB-based QA systems
Incompleteness of KB: Information required to answer a question may not always exist in KBs. Semantic parsing: Although semantic parsing [2, 3] has been a hot research topic recently, the problem of mapping natural language utterances to logical-form queries is still considered largely unsolved. Weakness of Web-based QA systems Insufficient knowledge about the generated answer candidates. Different mentions of the same entity such as “President Obama” and “Barack Obama” are viewed as different answer candidates. Answer type checking relies on a generic named entity recognition component that provides a small set of crude type labels.

Introduction A new QA system framework, named QuASE
Question Answering via Semantic Enrichment

Who was the first American in space?
Example Who was the first American in space? Submit it to a search engine Get a set of relevant sentences On May 5, 1961, Shepard piloted the Freedom 7 mission... ; Alan Shepard became the first American in space when the Freedom 7...; Linking entity to Freebase Freedom 7, Alan Shepard, Sally Ride, … Such linked entities are treated as answer candidates to the given question. Semantic features are integrated into a ranking algorithm. Alan Shepard

Example Who was the first American in space?

Methodology Web Sentence Selection via Search Engine
Answer Candidate Generation via Entity Linking Feature Generation and Ranking.

Methodology (1) Web Sentence Selection via Search Engine
Submit the question as a query to a search engine Collect the top-50 returned snippets, as well as the top-50 documents. Compute the word count vector based on the returned snippets to represent the information for the query, denoted as wq. For each sentence from the top-50 returned documents, we compute its word count vector ws, and select those sentences with a high cos(ws, wq) into the high-quality sentence set.

Methodology (2) Answer Candidate Generation via Entity Linking
Primarily focus on those questions targeted at certain entities in KBs. ( e.g. excluding “when” question) Use one of the entity linking systems [13] to identify answer candidates linked to Freebase. This system achieves the best scores at TAC-KBP 2013, by several novel designs such as postponing surface form boundary detections and discriminating concepts and entities in Wikipedia pages.

Methodology (3) Feature Generation and Ranking
For each answer candidate, Freebase contains a wealth of information, such as their description texts and entity types. A set of semantic features shall be developed based on such rich information, and subsequently utilized in a ranking algorithm to evaluate the appropriateness of each candidate as the true answer.

Features Count: high frequency of answer candidate serves as a significant indicator of being the correct answer Textual Relevance Answer Type Related Features Propose probabilistic models to directly measure the matching degree between a question and an answer candidate’s Freebase types.

Word to Answer Type (WAT) Model

JQA model Joint <question, answer type> association (JQA) model

TREC dataset: factoid questions from TREC 8-12
Experimental Setup TREC dataset: factoid questions from TREC 8-12 Among the remaining 1902 questions (entity-oriented) Bing query: crowdsourcing, entity answers in Freebase. Approximately 6000 question-answer pairs

Training Dataset for JQA and WAT
Experimental Setup Training Dataset for JQA and WAT Around 1.3 million <question, answer types> pairs based on Bing query logs. <query, clicked url> pairs from the query click logs each entity in Freebase is also linked to some urls that are related to this entity (mostly Wikipedia pages or official web sites of this entity). Freebase types of the entity corresponding to the clicked url as the answer types.

Answer Candidate Ranking
Experimental Setup Answer Candidate Ranking Use an in-house fast implementation of the MART gradient boosting decision tree algorithm [9, 21] to learn the ranker using the training set of our data.

Experimental Setup Evaluation Measures
F1 score, which is the harmonic mean of precision and recall. Mean Reciprocal Rank (MRR).

Alternative QA systems
Experimental Setup Alternative QA systems KB-based QA system: Sempre[2, 3] Web-based QA system: AskMSR+ [39]

Experimental Results

Analysis of QuASE and AskMSR+ on their respective failed questions
Experimental Results Analysis of QuASE and AskMSR+ on their respective failed questions Compared with AskMSR+, features in QuASE for ranking are effective as long as true answers are in the candidate list.

Experimental Results For QuASE, how to improve entity linking performance, in order to include true answers in the candidate list, can be important to further improve QA performance.

Conclusion Contributions Future work A new QA Framework, QuASE
Answer Type Checking Models Extensive Experimental Evaluation Future work Relationships among entities can also be explored as semantic features to be incorporated in our system To improve entity linking performance

HAWK--Hybrid QA using Linked Data
Introduction Methodology Evaluation Conclusion

Introduction Hybrid question answering
Find and combine information stored in both structured and textual data sources Document Web Labels and abstracts in Linked Data sources Question Answering over Linked Data (QALD-4) Task 3: Hybrid question answering

Example Which recipients of the Victoria Cross died in the Battle of Arnhem? Cannot be answered by solely DBpedia or Wikipedia abstracts ?uri dbo:award dbr:Victoria_Cross The abstract for John Hollington Grayburn ‘he went into action in the Battle of Arnhem [...] but was killed after standing up in full view of a German tank’.

Architectural overview of HAWK
Introduction Architectural overview of HAWK

Methodology POS-Tagging Entity Annotation Dependency Parsing
Linguistic Pruning Semantic Annotation Generating SPARQL Queries Semantic Pruning of SPARQL Queries Ranking

Predicate-argument tree
By Example Which recipients of the Victoria Cross died in the Battle of Arnhem? Predicate-argument tree Tree after pruning

Methodology Semantic Annotation
nouns correspond to object type properties and classes verbs correspond to object type properties question words (e.g., who or where) correspond to classes (e.g., Person or Place)

Methodology Generating SPARQL Queries
Traverses the tree in a pre-order walk Related information are situated close to each other in tree Information are more restrictive from left to right Triple pattern for generating queries while traversal E.g. a variable bound to the class Place will not have an outgoing predicate birthPlace. Semantic Pruning of SPARQL Queries

Methodology Ranking HAWK ranks queries using supervised training based on the gold standard answer set from the QALD-4 benchmark. Feature selection?

Evaluation To evaluate HAWK
focus on this hybrid training dataset comprising 25 questions, 17 out of which are entity searches Using only DBpedia type information No aggregation process.

Experimental Results Red: inability to generate correct query,
Green: missing recall. Blue: missing precision

Experimental Results

Failing entity annotation
Error Analysis Failing entity annotation Queries 1, 11 and 15. Jane Austin, G8, Los Alamos. Without matching entity annotations a full-text search retrieves too many matches for reaching high precision values on limited result set. Query structure Queries 11 or 15 complex query structures lead to a multitude of interpretations Missing type information Some of the resources of the gold standard do not have appropriate type information leading to a high amount of queries that need to be ranked correctly.

Error Analysis Query example
1. Give me the currencies of all G8 countries. 11.Who composed the music for the film that depicts the early life of Jane Austin? 15. Of the people that died of radiation in Los Alamos, whose death was an accident?

Conclusion Contributions Future work
HAWK, the first hybrid QA system for the Web of Data A generic approach to generate SPARQL queries out of predicate argument structures Achieve up to 0.68 F-measure on the QALD-4 benchmark. Future work Finding the correct ranking approach to map a predicate-argument tree to a possible interpretation Computational complexity Domain-specific applications (higher F-measures)

致谢欢迎提问！

Reading Report on Question Answering

Similar presentations

Presentation on theme: "Reading Report on Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reading Report on Question Answering

Similar presentations

Presentation on theme: "Reading Report on Question Answering"— Presentation transcript:

Similar presentations

About project

Feedback