Source sentence: sentence taken directly from the input document Derived sentence: declarative sentence derived in stage 1 Answer phrase: possible answer to generated questions Question phrase: phrase containing the question word replacing an answer phrase
Mark clauses or phrases for NLP transformation (simplification, compression) Answer phrase marking Tregex Delete clauses or phases for NLP transformation Tsurgeon Resources: Tregex and Tsurgeon: tools for querying and manipulating tree data structures, Levy and Andrew Web:
NN NP NN NP NN NP NN NP NN NP A java program for identifying patterns in trees Like regular expressions for strings Simple example: NP < NN NN NP filterscigaretteitsin croco- dilite usingstoppedfirmThe PRP IN PPVBG VPVBDDT VP S NN NP tregex.sh “NP < NN” treeFilename
The basic units of Tregex are Node Descriptions Descriptions match node labels of a tree Literal string to match: NP ▪ Disjunction of literal strings separated by ‘|’: NP|PP|VP Regular Expression (Java 5 regex): /NN.?/ ▪ Matches NN, NNP, NNS Wildcard symbol: __ (two underscores) ▪ Matches any node Descriptions can be negated with !: !NP
Relationships between tree nodes can be specified There are many different relations. Here are a few: SymbolDescriptionSymbolDescription A < BA is the parent of BA << BA is an ancestor of B A $ BA and B are sistersA $+ BB is next sister of A A < i BB is i th child of AA <: BB is only child of A A <<# B B is a head of phrase A A <<- B B is rightmost descendent A.. BA precedes B in depth-first traversal of tree
Relations can be strung together for “and” All relations are relative to first node in string NP < NN $ VP ▪ “An NP over an NN and with sister VP” & symbol is optional: NP < NN & $ VP Nodes can be grouped with parentheses NP < (NN < dog) ▪ “An NP over an NN that is over ‘dog’ ” Not the same as NP < NN < dog
Ex: NP VBZ)) “An NP both over an NN over ‘dog’ and with a sister VP headed by ‘barks’ under VBZ” X NPVP NN dog VBZ barks
Operators can be combined via “or” with | Ex: NP < NN | < NNS “An NP over NN or over NNS” By default, & takes precedence over | Ex: NP < NNS | < NN & $ VP “NP over NNS OR both over NN and w/ sister VP” Equivalent operators are left-associative Any relation can be negated with “!” prefix Ex: NP !<< NNP “An NP that does not dominate NNP”
To specify operation order, use [ and ] Ex: NP [ < NNS | < NN ] $ VP “An NP either over NNS or NN, and w/ sister VP” Grouped relations can be negated Just put ! before the [ Already we can build very complex expressions! NP (PP <<# (IN ![ < of | < on])) “An NP with rightmost child matching /NN.?/ under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ”
“An NP with rightmost child matching /NN.?/ under a PP headed by some preposition (IN) that is not either ‘of’ or ‘on’ ” NP (PP <<# (IN ![ < of | < on])) PP IN NP NNS about
Sometimes we want to find which nodes matched particular sub-expressions Ex: /NN.?/ $- JJ|DT What was the modifier that preceded the noun? Name nodes with = and if expression matches, we can retrieve matching sub-expr with name Ex: /NN.?/ $- JJ|DT=premod Subtree with root matching JJ|DT is stored in a map under key “premod” Note: named nodes are not allowed in scope of negation
Sometimes we want to try to match a sub- expression to retrieve named nodes if they exist, but still match root if sub-expression fails. Use the optional relation prefix ‘?’ Ex: NP < (NN ?$- JJ=premod) $+ CC $++ NP Matches NP over NN with sisters CC and NP If NN is preceded by JJ, we can retrieve the JJ using the key “premod” If there is no JJ, the expression will still match Cannot be combined with negation
What? makes operations on a grammatical tree How? based on Tregex syntax Where? Javanlp: trees.tregex.tsurgeon
utility for identifying patterns in trees (like regular expressions for strings) node descriptions and relationships between nodes NP < /^NN/ NP NN filterscigaretteitsin croco- dilite usingstoppedfirmThe PRP IN PPVBG VPVBDDT VP S NN NP NN NP NNS
Define a pattern to be matched on the trees VBZ=vbz $+ NP Define one or several operation(s) relabel vbz VBZ_TRANSITIVE
(ROOT (SBARQ (SQ (NP (NNS Cats)) (VP (VBP do) (VP (WHNP what) (VB eat)))))) SBARQ=sbarq > ROOT excise sbarq sbarq name1 is name2 or dominates name2. All children of name2 go into the parent of name1, where name1 was. excise
NP Putin visited VBDNP the Russian Prime Minister
NP Putin was VBDNP the Russian Prime Minister Singular past tense form of be
was VBDNP Putin NP the Russian Prime Minister S ROOT VP
Representation: phrase structure trees from the Stanford Parser Syntactic rules are written in the Tregex tree searching language Tregex operators encode tree relations such as dominance, sisterhood, etc. Performing manipulation over identified Tregex pattern (Tsurgeon)
Given an input sentence A that is assumed true, we aim to extract sentences B that are also true. Our operations are informed by two phenomena: semantic entailment presupposition
A entails B: B is true whenever A is true. Levinson 1983
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. Entailment holds when removing certain types of modifiers.
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. 40 Entailment holds when removing certain types of modifiers. discourse marker non-restrictive relative clause
A: However, Jefferson did not believe the Embargo Act, which restricted trade with Europe, would hurt the American economy. 41 B: Jefferson did not believe the Embargo Act would hurt the American economy. Entailment holds when removing certain types of modifiers. discourse marker non-restrictive relative clause
In most clausal and verbal conjunctions, the individual conjuncts are entailed. A: Mr. Putin built his reputation in part on his success at suppressing terrorism, so the attacks could be considered a challenge to his stature. B 2 : The attacks could be considered a challenge to his stature. B 1 : Mr. Putin built his reputation in part on his success at suppressing terrorism.
In some constructions, B is true regardless of whether the main clause of sentence A is true. i.e., B is presupposed to be true. In some constructions, B is true regardless of whether the main clause of sentence A is true. i.e., B is presupposed to be true. A: Hamilton did not like Jefferson, the third U.S. President. B: Jefferson was the third U.S. President. negation of main clause
Many presuppositions have clear syntactic or lexical associations. TriggerExample non-restrictive appositivesJefferson, the third U.S. President, … non-restrictive relative clauses Jefferson, who was the third U.S. President… participial modifiersJefferson, being the third U.S. President, … temporal subordinate clauses Before Jefferson was the third U.S. President, … Jefferson was the third U.S. President.
Input Declarative sentences derived in stage 1 Output Set of grammatically correct questions ▪ Well defined syntactic transformations ▪ Identification of answer phrases for WH-movement ▪ Marking of unmovable chunks ▪ etc
Mark Unmovable Phrases Generate Possible Question Phrase * (Decompose Main Verb) (Invert Subject and Auxiliary) Insert Question Phrase Perform Post-processing Question Declarative Sentence
Mark phrases that cannot be answer phrases Select an answer phrase, and generate a set of question phrases for it Decompose the main verb Invert the subject and auxiliary verb Remove the answer phrase and insert one of the question phrases at the beginning of the main clause Post-process to ensure proper formatting
Exceptions Yes-no questions ▪ no answer phrase to remove nor question phrase to insert answer phrase is the subject of the declarative sentence ▪ John met Sally Who met Sally? ▪ decomposition of the main verb and subject-auxiliary inversion are not necessary ▪ subject is removed and replaced by a question phrase in the same position
Question generation involves WH-movement ▪ To generate WH questions ▪ Target answer phrase is transformed into WH phrase and is moved to front (WH-fronting) ▪ Are all phrases movable? Subject-Auxiliary inversion ▪ To generate decision (yes-no) questions ▪ Positions of subject and auxiliary verb are swapped
An example Darwin studied how species evolve. ▪ ‘Species’ is a potential answer phrase ▪ *What did Darwin study how evolve? Mark phrases that should not undergo WH- movement using Tregex patterns ▪ Constraints over the phrases ▪ phrases under a clause with a WH complementizer cannot undergo WH-movement ▪ SBAR < /ˆWH.*P/ << NP|ADJP|VP|ADVP|PP=unmv
clauses (i.e., “S” nodes) that are under verb phrases and are signalled as adjuncts by being offset by commas Pattern: VP < (S=unmv $,, /,/) Input sentence: James hurried, barely catching the bus. Question to avoid: *What did James hurry? A $,, B A is a sister of B and follows B
Iterate over possible answer phrases Generate question for each Skipped for decision questions. Answer phrase is one of the following Noun phrase (“NP”) Abraham Lincon Prepositional phrase (“PP”) in 1801 Subordinate clause (“SBAR”) that Thomas Jefferson was the 3rd U.S. President
Mapping answer phrases to question phrases Supersense tagger ▪ Label word tokens with high level semantic classes ▪ Noun.person, noun.location etc. B-noun.person I-noun.person B-verb.social B-noun.location O B-verb.change Richard Nixon visited China to improve B-noun.communication O diplomacy.
WH-wordConditionsExamples or a personal pronoun Abraham Lincoln, him, the 16th president = noun.time or noun.person The White House, the building WhereObject of PP tagged with noun.location & preposition: on, in, at, over, to in Japan, to a small town next year, 1929 Whose word noun.person and answer phrase is modified with possessive John’s car, the president’s visit to Asia, the companies’ profits How many NPanswer phrase is modified by a cardinal number or quantifier phrase 10 books, two hundred years
Situation: subject-auxiliary inversion Condition: Auxiliary verb or modal is not present Action: main verb = auxiliary do + base form of main verb John saw Mary John did see Mary Who did John see?
Identifying main verbs that need to be decomposed ROOT < (S=clause < (VP=mainvp [ < (/VB.?/=tensed !< is|was|were|am|are|has| have|had|do|does|did) | < /VB.?/=tensed !< VP ]))
ROOT=root < (S=clause <+(/VP.*/) (VP < /(MD|VB.?)/=aux < (VP < /VB.?/=verb))) clause aux verb clause aux verb A <+ (C) B
ROOT=root < (S=clause <+(/VP.*/) (VP < (/VB.?/=copula < is|are|was|were|am) !< VP)) Copula: word used to link the subject of a sentence with a predicate (a subject complement)
Sir Isaac Newton's book "Mathematical Principles of Natural Philosophy", first published in 1687, laid the foundations for classical mechanics.
Tregex: ROOT=root < (SQ=qclause << /^(NP|PP|SBAR)-0/=answer < VP=predicate) Phrase to move: (PP (IN in) (NP (CD 1687)))
Insert WH subtree: (WHNP (WHADVP (WRB when)))
1.Whose book ``Mathematical Principles of Natural Philosophy'' was first published in 1687? 2.What laid the foundations for classical mechanics? 3.What did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay? 4.When was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first published? 5.Did Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' lay the foundations for classical mechanics? 6.Whose book ``Mathematical Principles of Natural Philosophy'' laid the foundations for classical mechanics? 7.Was Sir Isaac Newton's book ``Mathematical Principles of Natural Philosophy'' first published in 1687? 8.What was first published in 1687?
Arvind Kejriwal, the AAP leader, resigned from the post of CM. Appositive tree
1.Who resigned from the post of CM? 2.What did Arvind Kejriwal resign from? 3.Who was Arvind Kejriwal? 4.Who was the AAP leader? 5.Did Arvind Kejriwal resign from the post of CM? 6.Was Arvind Kejriwal the AAP leader?
Question features Length feature ▪ Length of question, source sentence, answer phrase WH words ▪ Boolean feature whether a question is a WH one N-gram log likelihood of question Grammatical features Transformation features etc.
Term project evaluation includes Presentation (10 min) Demonstration (20 min) Date (Saturday) from 9:30 am Group 1 -4 Date (Saturday) from 2:30 am Group 5-9