Presentation is loading. Please wait.

# Resources: Question Classification Schemes, Graesser et al. Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman.

## Presentation on theme: "Resources: Question Classification Schemes, Graesser et al. Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman."— Presentation transcript:

Resources: Question Classification Schemes, Graesser et al. Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman

 Questions test factual knowledge of a learner  When did Alexander invade India?  Who invented small pox vaccine?  Does not involve higher order cognitive skills like inference

 Overgenerate-and-rank framework CMU Question Generator: http://www.ark.cs.cmu.edu/mheilman/questions/

 Source sentence: sentence taken directly from the input document  Derived sentence: declarative sentence derived in stage 1  Answer phrase: possible answer to generated questions  Question phrase: phrase containing the question word replacing an answer phrase

 Mark clauses or phrases for  NLP transformation (simplification, compression)  Answer phrase marking  Tregex  Delete clauses or phases for  NLP transformation  Tsurgeon Resources: Tregex and Tsurgeon: tools for querying and manipulating tree data structures, Levy and Andrew Web: http://nlp.stanford.edu/software/tregex.shtml

NN NP NN NP NN NP NN NP NN NP A java program for identifying patterns in trees Like regular expressions for strings Simple example: NP < NN NN NP filterscigaretteitsin croco- dilite usingstoppedfirmThe PRP IN PPVBG VPVBDDT VP S NN NP tregex.sh “NP < NN” treeFilename

 The basic units of Tregex are Node Descriptions  Descriptions match node labels of a tree  Literal string to match: NP ▪ Disjunction of literal strings separated by ‘|’: NP|PP|VP  Regular Expression (Java 5 regex): /NN.?/ ▪ Matches NN, NNP, NNS  Wildcard symbol: __ (two underscores) ▪ Matches any node  Descriptions can be negated with !: !NP

Relationships between tree nodes can be specified There are many different relations. Here are a few: SymbolDescriptionSymbolDescription A < BA is the parent of BA << BA is an ancestor of B A \$ BA and B are sistersA \$+ BB is next sister of A A < i BB is i th child of AA <: BB is only child of A A <<# B B is a head of phrase A A <<- B B is rightmost descendent A.. BA precedes B in depth-first traversal of tree http://nlp.stanford.edu/manning/courses/ling289/Tregex.html

 Relations can be strung together for “and”  All relations are relative to first node in string  NP < NN \$ VP ▪ “An NP over an NN and with sister VP”  & symbol is optional: NP < NN & \$ VP  Nodes can be grouped with parentheses  NP < (NN < dog) ▪ “An NP over an NN that is over ‘dog’ ”  Not the same as NP < NN < dog

 Ex: NP VBZ))  “An NP both over an NN over ‘dog’ and with a sister VP headed by ‘barks’ under VBZ” X NPVP NN dog VBZ barks

 Operators can be combined via “or” with |  Ex: NP < NN | < NNS  “An NP over NN or over NNS”  By default, & takes precedence over |  Ex: NP < NNS | < NN & \$ VP  “NP over NNS OR both over NN and w/ sister VP”  Equivalent operators are left-associative  Any relation can be negated with “!” prefix  Ex: NP !<< NNP  “An NP that does not dominate NNP”

 To specify operation order, use [ and ]  Ex: NP [ < NNS | < NN ] \$ VP  “An NP either over NNS or NN, and w/ sister VP”  Grouped relations can be negated  Just put ! before the [  Already we can build very complex expressions!  NP (PP <<# (IN ![ < of | < on]))  “An NP with rightmost child matching /NN.?/ under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ”

“An NP with rightmost child matching /NN.?/ under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ” NP (PP <<# (IN ![ < of | < on])) PP IN NP NNS about

 Sometimes we want to find which nodes matched particular sub-expressions  Ex: /NN.?/ \$- JJ|DT  What was the modifier that preceded the noun?  Name nodes with = and if expression matches, we can retrieve matching sub-expr with name  Ex: /NN.?/ \$- JJ|DT=premod  Subtree with root matching JJ|DT is stored in a map under key “premod”  Note:  named nodes are not allowed in scope of negation

 Sometimes we want to try to match a sub- expression to retrieve named nodes if they exist, but still match root if sub-expression fails.  Use the optional relation prefix ‘?’  Ex: NP < (NN ?\$- JJ=premod) \$+ CC \$++ NP  Matches NP over NN with sisters CC and NP  If NN is preceded by JJ, we can retrieve the JJ using the key “premod”  If there is no JJ, the expression will still match  Cannot be combined with negation

 What? makes operations on a grammatical tree  How? based on Tregex syntax  Where? Javanlp: trees.tregex.tsurgeon

utility for identifying patterns in trees (like regular expressions for strings) node descriptions and relationships between nodes NP < /^NN/ NP NN filterscigaretteitsin croco- dilite usingstoppedfirmThe PRP IN PPVBG VPVBDDT VP S NN NP NN NP NNS

 Define a pattern to be matched on the trees VBZ=vbz \$+ NP  Define one or several operation(s) relabel vbz VBZ_TRANSITIVE

(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))) (PUNCT ?))) PUNCT=punct > SBARQ delete punct

(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))) (PUNCT ?))) PUNCT=punct > SBARQ delete punct Delete the node and everything below it delete …

(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat)))))) SBARQ=sbarq > ROOT excise sbarq sbarq (ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat)))))

(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat)))))) SBARQ=sbarq > ROOT excise sbarq sbarq name1 is name2 or dominates name2. All children of name2 go into the parent of name1, where name1 was. excise

(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))))) SQ=sq > ROOT !<- /PUNCT/ insert (PUNCT.) >-1 sq (ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))) (PUNCT.)))

insert := \$+ the left sister of the named node \$-the right sister of the named node >i the i_th daughter of the named node >-i the i_th daughter, counting from the right, of the named node.

(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))) (PUNCT.))) VP < (/^WH/=wh \$++ /^VB/=vb) move vb \$+ wh move moves the named node into the specified position

(ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat))) (PUNCT.))) VP < (/^WH/=wh \$++ /^VB/=vb) move vb \$+ wh (ROOT (SQ (NP (NNS Cats)) (VP (VBP do) (VP (VB eat) (WHNP what))) (PUNCT.)))

adjoin Adjoins the specified auxiliary tree into the named node. The daughters of the target node will become the daughters of the foot of the auxiliary tree. adjoin (VP (ADVP (RB usually)) VP@) vp foot

VP=vp > SQ !> (__ << usually) adjoin (VP (ADVP (RB usually)) VP@) vp

 Input: arbitrary text  Output: simple, concise and declarative sentences

Input: Putin, the Russian Prime Minister, visited Moscow. Desired Output: Putin was the Russian Prime Minister.

NP Putin visited VBD NP ROOT S, VP,,, NP Siberia NP the Russian Prime Minister (mainverb)(appositive)(noun)

NP < (NP=noun !\$-- NP \$+ (/,/ \$++ NP|PP=appositive !\$CC|CONJP)) >> (ROOT << /^VB.*/=mainverb) NP Putin visited VBD NP ROOT S, VP,,, NP Siberia NP the Russian Prime Minister (mainverb)(appositive)(noun)

NP Putin visited VBDNP the Russian Prime Minister

NP Putin was VBDNP the Russian Prime Minister Singular past tense form of be

was VBDNP Putin NP the Russian Prime Minister S ROOT VP

 Representation: phrase structure trees from the Stanford Parser  Syntactic rules are written in the Tregex tree searching language  Tregex operators encode tree relations such as dominance, sisterhood, etc.  Performing manipulation over identified Tregex pattern (Tsurgeon)

Given an input sentence A that is assumed true, we aim to extract sentences B that are also true. Our operations are informed by two phenomena: semantic entailment presupposition

A entails B: B is true whenever A is true. Levinson 1983

A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. Entailment holds when removing certain types of modifiers.

A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. 40 Entailment holds when removing certain types of modifiers. discourse marker non-restrictive relative clause

A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. 41 B: Jefferson did not believe the Embargo Act would hurt the American economy. Entailment holds when removing certain types of modifiers. discourse marker non-restrictive relative clause

In most clausal and verbal conjunctions, the individual conjuncts are entailed. A: Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature. B 2 : The attacks could be considered a challenge to his stature. B 1 : Mr. Putin built his reputation in part on his success at suppressing terrorism.

In some constructions, B is true regardless of whether the main clause of sentence A is true. i.e., B is presupposed to be true. In some constructions, B is true regardless of whether the main clause of sentence A is true. i.e., B is presupposed to be true. A: Hamilton did not like Jefferson, the third U.S. President. B: Jefferson was the third U.S. President. negation of main clause

Many presuppositions have clear syntactic or lexical associations. TriggerExample non-restrictive appositivesJefferson, the third U.S. President, … non-restrictive relative clauses Jefferson, who was the third U.S. President… participial modifiersJefferson, being the third U.S. President, … temporal subordinate clauses Before Jefferson was the third U.S. President, … Jefferson was the third U.S. President.

 Input  Declarative sentences derived in stage 1  Output  Set of grammatically correct questions ▪ Well defined syntactic transformations ▪ Identification of answer phrases for WH-movement ▪ Marking of unmovable chunks ▪ etc

Mark Unmovable Phrases Generate Possible Question Phrase * (Decompose Main Verb) (Invert Subject and Auxiliary) Insert Question Phrase Perform Post-processing Question Declarative Sentence

 Mark phrases that cannot be answer phrases  Select an answer phrase, and generate a set of question phrases for it  Decompose the main verb  Invert the subject and auxiliary verb  Remove the answer phrase and insert one of the question phrases at the beginning of the main clause  Post-process to ensure proper formatting

 Exceptions  Yes-no questions ▪ no answer phrase to remove nor question phrase to insert  answer phrase is the subject of the declarative sentence ▪ John met Sally  Who met Sally? ▪ decomposition of the main verb and subject-auxiliary inversion are not necessary ▪ subject is removed and replaced by a question phrase in the same position

 Question generation involves  WH-movement ▪ To generate WH questions ▪ Target answer phrase is transformed into WH phrase and is moved to front (WH-fronting) ▪ Are all phrases movable?  Subject-Auxiliary inversion ▪ To generate decision (yes-no) questions ▪ Positions of subject and auxiliary verb are swapped

 An example  Darwin studied how species evolve. ▪ ‘Species’ is a potential answer phrase ▪ *What did Darwin study how evolve?  Mark phrases that should not undergo WH- movement using Tregex patterns ▪ Constraints over the phrases ▪ phrases under a clause with a WH complementizer cannot undergo WH-movement ▪ SBAR < /ˆWH.*P/ << NP|ADJP|VP|ADVP|PP=unmv

clauses (i.e., “S” nodes) that are under verb phrases and are signalled as adjuncts by being offset by commas Pattern: VP < (S=unmv \$,, /,/) Input sentence: James hurried, barely catching the bus. Question to avoid: *What did James hurry? A \$,, B  A is a sister of B and follows B

 Iterate over possible answer phrases  Generate question for each  Skipped for decision questions.  Answer phrase is one of the following  Noun phrase (“NP”)  Abraham Lincon  Prepositional phrase (“PP”)  in 1801  Subordinate clause (“SBAR”)  that Thomas Jefferson was the 3rd U.S. President

 Mapping answer phrases to question phrases  Supersense tagger ▪ Label word tokens with high level semantic classes ▪ Noun.person, noun.location etc. B-noun.person I-noun.person B-verb.social B-noun.location O B-verb.change Richard Nixon visited China to improve B-noun.communication O diplomacy.

WH-wordConditionsExamples Whotag@head=noun.person or a personal pronoun Abraham Lincoln, him, the 16th president Whattag@head! = noun.time or noun.person The White House, the building WhereObject of PP tagged with noun.location & preposition: on, in, at, over, to in Japan, to a small town Whentag@head=noun.timeWednesday, next year, 1929 Whose NPtag@head word noun.person and answer phrase is modified with possessive John’s car, the president’s visit to Asia, the companies’ profits How many NPanswer phrase is modified by a cardinal number or quantifier phrase 10 books, two hundred years

 Situation: subject-auxiliary inversion  Condition: Auxiliary verb or modal is not present  Action: main verb = auxiliary do + base form of main verb John saw Mary John did see Mary Who did John see?

 Identifying main verbs that need to be decomposed ROOT < (S=clause < (VP=mainvp [ < (/VB.?/=tensed !< is|was|were|am|are|has| have|had|do|does|did) | < /VB.?/=tensed !< VP ]))

ROOT=root < (S=clause <+(/VP.*/) (VP < /(MD|VB.?)/=aux < (VP < /VB.?/=verb))) clause aux verb clause aux verb A <+ (C) B

ROOT=root < (S=clause <+(/VP.*/) (VP < (/VB.?/=copula < is|are|was|were|am) !< VP)) Copula: word used to link the subject of a sentence with a predicate (a subject complement)

S<(NP=np \$+ VP) delete np S=start { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/13/3896586/slides/slide_64.jpg", "name": "S<(NP=np \$+ VP) delete np S=start

Sir Isaac Newton's book "Mathematical Principles of Natural Philosophy", first published in 1687, laid the foundations for classical mechanics.

TREE-I

TREE-II

Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate) Phrase to move: (PP (IN in) (NP (CD 1687)))

Insert WH subtree: (WHNP (WHADVP (WRB when)))

1.Whose book ``Mathematical Principles of Natural Philosophy'' was first published in 1687? 2.What laid the foundations for classical mechanics? 3.What did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay? 4.When was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first published? 5.Did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay the foundations for classical mechanics? 6.Whose book ``Mathematical Principles of Natural Philosophy'' laid the foundations for classical mechanics? 7.Was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first published in 1687? 8.What was first published in 1687?

Arvind Kejriwal, the AAP leader, resigned from the post of CM. Appositive tree

TREE-I TREE-II

Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate) Phrase to move: (NP (NNP Arvind) (NNP Kejriwal))

Insert WH subtree: (WHNP (WHNP (WRB who)))

1.Who resigned from the post of CM? 2.What did Arvind Kejriwal resign from? 3.Who was Arvind Kejriwal? 4.Who was the AAP leader? 5.Did Arvind Kejriwal resign from the post of CM? 6.Was Arvind Kejriwal the AAP leader?

 Question features  Length feature ▪ Length of question, source sentence, answer phrase  WH words ▪ Boolean feature whether a question is a WH one  N-gram log likelihood of question  Grammatical features  Transformation features  etc.

 Term project evaluation includes  Presentation (10 min)  Demonstration (20 min)  Date 18.04.2015 (Saturday) from 9:30 am  Group 1 -4  Date 18.04.2014 (Saturday) from 2:30 am  Group 5-9

Download ppt "Resources: Question Classification Schemes, Graesser et al. Automatic Factual Question Generation from Text (Chapter 3), Michael Heilman."

Similar presentations

Ads by Google