Presentation is loading. Please wait.

Presentation is loading. Please wait.

Donna M. Gates Carnegie Mellon University

Similar presentations


Presentation on theme: "Donna M. Gates Carnegie Mellon University"— Presentation transcript:

1 Donna M. Gates Carnegie Mellon University
Generating Look-Back Reading Comprehension Questions from Expository Text Donna M. Gates Carnegie Mellon University September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

2 Fact Based Questions Reading Comprehension: Look-Back Strategy
Question whose answer is Right-There-In-The-Text. (Raphael 1982) Automatic generation of questions: system needs to understand the text well enough to formulate questions. Q&A systems Questions are asked to find information. Need to find best matching answers or documents containing an answer to a query. (Leidner et al 2003) Solution: Annotate documents with syntactic and/or semantic information. September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

3 NLP Programs and Knowledge Sources
Stanford NL Parser (Klein & Manning 2003) WordNet (Felbaum 1998) to get noun classifications BBN IdentiFinder (Bikel et al 1999) to get named entities Assert (Pradhan et al 2005) to produce PropBank (Palmer et al 2005) tags Stanford T-Surgeon (Levy & Galen 2006) + handwritten transformation rules to transform declarative sentence trees into question trees Code to combine the annotations and generate strings. September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

4 Example Text with Question
September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

5 Evaluating Generated Questions
Data Sets: Mitre’s CBC4Kids Q&A data 70+ texts for training 50+ texts for testing All automatically generated questions were evaluated for grammaticality/fluency and whether the answer that matched the question could be found easily. A single grader: More graders would be better. I could obtain intercoder agreement measures. Precision only Goal is to generate well-formed questions but goal is not to generate all possible questions from a single sentence. No gold-standard questions relevant for this specific task. September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

6 Evaluation Scoring Evaluation grades Acceptable = perfect + ok Perfect
Mr. Yashin makes a salary of more than three million dollars a season. Who makes a salary of more than three million dollars a season? Ok A study was conducted by the aboriginals. Whom or what was conducted by the aboriginals? Bad Air-raid warning sirens sounded in the Kosovo capital of Pristina this morning. Who sounded in the Kosovo capital of Pristina this morning? WordNet ambiguities: siren (mythical type of person vs noise making device) Failed Memberships cost $180 a year for adults and $135 for students and seniors. Whom or what $180 did memberships cost a year for adults and $135 for students and seniors? Parsing problem Acceptable = perfect + ok September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

7 Results WH-Phrase Question Transformation Total Perfect OK Bad Failed
Subj NP: Who conducted a study? 444 80% 6% 13% 2% Subj Gerund: What will be a new experience? (Conducting studies will be a new experience.) 0% D. Obj1: What did aboriginals conduct? (Aboriginals conducted a study.) 119 55% 8% 25% 12% D. Obj2: What were aboriginals conducting? 52 58% 10% D. Obj3: What will aboriginals conduct? 24 63% 17% Passive Ag: By whom were studies conducted? (Studies were conducted by aboriginals.) 11 91% 9% Temp: When did aboriginals conduct a study? PP/S: (On Friday aboriginals conducted a study.) 19 89% 5% .. NP/S: (Last Friday aboriginals conducted a study.) 1 100% PP/VP: (Aboriginals conducted a study on Friday.) 14 86% 7% NP/VP: (Aboriginals conducted a study last Friday.) 9 September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

8 Result Highlights Direct Object Wh Phrases
Overall: 81% acceptable Subject Wh Phrases Aboriginals conducted a study last month. Who conducted a study last month? Largest number of examples (444): 86% acceptable. Direct Object Wh Phrases What did aboriginals conduct last month? What will aboriginals conduct? What were aboriginals conducting? Combined, lowest acceptable scores: 66% acceptable Wh NP Temporal Expressions: last month When did aboriginals conduct a study? 100% perfect (10 parsed and annotated correctly) September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge

9 Issues to be Resolved Need to expand semantic annotation and transformations to include locations. Improve use of WordNet by filtering low frequency senses: dish (satellite dish vs attractive person) - WHO vs WHAT. Incorporate other syntactic and semantic annotators. Define a gold standard/target set of questions. September 25-26, 2008 Workshop on the Question Generation Shared Task and Evaluation Challenge


Download ppt "Donna M. Gates Carnegie Mellon University"

Similar presentations


Ads by Google