Presentation is loading. Please wait.

Presentation is loading. Please wait.

DEPENDENCY PARSING , FRAMENET, SEMANTIC ROLE LABELING, SEMANTIC PARSING Heng Ji September 17, 2014 Acknowledgement: FrameNet slides from Charles.

Similar presentations


Presentation on theme: "DEPENDENCY PARSING , FRAMENET, SEMANTIC ROLE LABELING, SEMANTIC PARSING Heng Ji September 17, 2014 Acknowledgement: FrameNet slides from Charles."— Presentation transcript:

1 DEPENDENCY PARSING , FRAMENET, SEMANTIC ROLE LABELING, SEMANTIC PARSING Heng Ji jih@rpi.edu September 17, 2014 Acknowledgement: FrameNet slides from Charles Fillmore; Semantic Parsing Slides from Rohit Kate and Yuk Wah Wong

2 Outline Dependency Parsing Formal definition Dynamic programming Supervised Classification Semantic Role Labeling Propbank Automatic SRL FrameNet Semantic Parsing

3 “Semantic Parsing” is, ironically, a semantically ambiguous term Semantic role labeling Finding generic relations in text Transforming a natural language sentence into its meaning representation

4 Semantic Parsing Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs) for domain-specific applications Realistic semantic parsing currently entails domain dependence Example application domains ATIS: Air Travel Information Service CLang: Robocup Coach Language Geoquery: A Database Query Application

5 Interface to an air travel database [Price, 1990] Widely-used benchmark for spoken language understanding ATIS: Air Travel Information Service Air-Transportation Show: (Flight-Number) Origin: (City "Cleveland") Destination: (City "Dallas") May I see all the flights from Cleveland to Dallas? NA 1439, TQ 23, … Semantic Parsing Query

6 CLang: RoboCup Coach Language In RoboCup Coach competition teams compete to coach simulated players [http://www.robocup.org] The coaching instructions are given in a computer language called CLang [Chen et al. 2003] Simulated soccer field CLang If the ball is in our goal area then player 1 should intercept it. (bpos (goal-area our) (do our {1} intercept)) Semantic Parsing

7 Geoquery: A Database Query Application Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996] Which rivers run through the states bordering Texas? Query answer(traverse(next_to(stateid(‘texas’)))) Semantic Parsing Arkansas, Canadian, Cimarron, Gila, Mississippi, Rio Grande … Answer

8 What is the meaning of “meaning”? Representing the meaning of natural language is ultimately a difficult philosophical question Many attempts have been made to define generic formal semantics of natural language Can they really be complete? What can they do for us computationally? Not so useful if the meaning of Life is defined as Life’ Our meaning representation for semantic parsing does something useful for an application Procedural Semantics: The meaning of a sentence is a formal representation of a procedure that performs some action that is an appropriate response Answering questions Following commands

9 Meaning Representation Languages Meaning representation language (MRL) for an application is assumed to be present MRL is designed by the creators of the application to suit the application’s needs independent of natural language CLang was designed by RoboCup community to send formal coaching instructions to simulated players Geoquery’s MRL was based on the Prolog database

10 Engineering Motivation for Semantic Parsing Applications of domain-dependent semantic parsing Natural language interfaces to computing systems Communication with robots in natural language Personalized software assistants Question-answering systems Machine learning makes developing semantic parsers for specific applications more tractable Training corpora can be easily developed by tagging natural-language glosses with formal statements

11 Cognitive Science Motivation for Semantic Parsing Most natural-language learning methods require supervised training data that is not available to a child No POS-tagged or treebank data Assuming a child can infer the likely meaning of an utterance from context, NL - MR pairs are more cognitively plausible training data

12 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Information extraction involves shallow semantic analysis Show the long email Alice sent me yesterday SenderSent-toTypeTime AliceMeLong7/10/2010

13 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Semantic role labeling also involves shallow semantic analysis sender recipient theme Show the long email Alice sent me yesterday

14 Distinctions from Other NLP Tasks: Deeper Semantic Analysis Semantic parsing involves deeper semantic analysis to understand the whole sentence for some application Show the long email Alice sent me yesterday Semantic Parsing

15 Distinctions from Other NLP Tasks: Final Representation Part-of-speech tagging, syntactic parsing, SRL etc. generate some intermediate linguistic representation, typically for latter processing; in contrast, semantic parsing generates a final representation Show the long email Alice sent me yesterday determineradjectivenoun verbpronounnounverb verb phrasenoun phrase verb phrase sentence noun phrase sentence

16 Distinctions from Other NLP Tasks: Computer Readable Output The output of some NLP tasks, like question-answering, summarization and machine translation, are in NL and meant for humans to read Since humans are intelligent, there is some room for incomplete, ungrammatical or incorrect output in these tasks; credit is given for partially correct output In contrast, the output of semantic parsing is in formal language and is meant for computers to read; it is critical to get the exact output, strict evaluation with no partial credit

17 Distinctions from Other NLP Tasks Shallow semantic processing Information extraction Semantic role labeling Intermediate linguistic representations Part-of-speech tagging Syntactic parsing Semantic role labeling Output meant for humans Question answering Summarization Machine translation

18 Relations to Other NLP Tasks: Word Sense Disambiguation Semantic parsing includes performing word sense disambiguation Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’)))) Semantic Parsing State? River? State

19 Relations to Other NLP Tasks: Syntactic Parsing Semantic parsing inherently includes syntactic parsing but as dictated by the semantics ourplayer 2 has theball our player(_,_) 2 bowner(_) null bowner(_) player(our,2) bowner(player(our,2)) MR: bowner(player(our,2)) A semantic derivation:

20 Relations to Other NLP Tasks: Syntactic Parsing Semantic parsing inherently includes syntactic parsing but as dictated by the semantics ourplayer 2 has theball PRP$-ourNN-player(_,_)CD-2VB-bowner(_) null VP-bowner(_)NP-player(our,2) S-bowner(player(our,2)) MR: bowner(player(our,2)) A semantic derivation:

21 Relations to Other NLP Tasks: Machine Translation The MR could be looked upon as another NL [Papineni et al., 1997; Wong & Mooney, 2006] Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’))))

22 Relations to Other NLP Tasks: Natural Language Generation Reversing a semantic parsing system becomes a natural language generation system [Jacobs, 1985; Wong & Mooney, 2007a] Which rivers run through the states bordering Mississippi? answer(traverse(next_to(stateid(‘mississippi’)))) Semantic Parsing NL Generation

23 Relations to Other NLP Tasks Tasks being performed within semantic parsing Word sense disambiguation Syntactic parsing as dictated by semantics Tasks closely related to semantic parsing Machine translation Natural language generation

24

25

26

27

28 Dependency Grammars In CFG-style phrase-structure grammars the main focus is on constituents. But it turns out you can get a lot done with just binary relations among the words in an utterance. In a dependency grammar framework, a parse is a tree where the nodes stand for the words in an utterance The links between the words represent dependency relations between pairs of words. Relations may be typed (labeled), or not.

29 Dependency Relations

30 Dependency Parse They hid the letter on the shelf

31 Dependency Parsing The dependency approach has a number of advantages over full phrase-structure parsing. Deals well with free word order languages where the constituent structure is quite fluid Parsing is much faster than CFG-bases parsers Dependency structure often captures the syntactic relations needed by later applications CFG-based approaches often extract this same information from trees anyway.

32 Dependency Parsing There are two modern approaches to dependency parsing Optimization-based approaches that search a space of trees for the tree that best matches some criteria Shift-reduce approaches that greedily take actions based on the current word and state.

33 Phrase Structure Tree

34 Dependency Grammar Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies Interested in grammatical relations between individual words (governing & dependent words) Does not propose a recursive structure Rather a network of relations These relations can also have labels

35 Draw the dependency tree Red figures on the screens indicated falling stocks

36 Dependency Tree

37 Dependency Tree Example Phrasal nodes are missing in the dependency structure when compared to constituency structure.

38 Dependency Tree with Labels

39 Comparison Dependency structures explicitly represent Head-dependent relations (directed arcs) Functional categories (arc labels) Possibly some structural categories (parts-of-speech) Phrase structure explicitly represent Phrases (non-terminal nodes) Structural categories (non-terminal labels) Possibly some functional categories (grammatical functions)

40 Learning DG over PSG Dependency Parsing is more straightforward Parsing can be reduced to labeling each token w i with w j Direct encoding of predicate-argument structure Fragments are directly interpretable Dependency structure independent of word order Suitable for free word order languages (like Indian languages)

41 Dependency Tree Formal definition An input word sequence w 1 …w n Dependency graph D = (W,E) where W is the set of nodes i.e. word tokens in the input seq. E is the set of unlabeled tree edges (w i, w j ) (w i, w j є W). (w i, w j ) indicates an edge from w i (parent) to w j (child). Task of mapping an input string to a dependency graph satisfying certain conditions is called dependency parsing

42 Well-formedness A dependency graph is well-formed iff Single head: Each word has only one head. Acyclic: The graph should be acyclic. Connected: The graph should be a single tree with all the words in the sentence. Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B).

43 Dependency Parsing Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars. Data driven approaches Parsing by training on annotated/un-annotated data.

44 Dependency Parsing Dependency based parsers can be broadly categorized into Grammar driven approaches Parsing done using grammars. Data driven approaches Parsing by training on annotated/un-annotated data. These approaches are not mutually exclusive

45 Covington’s Incremental Algorithm Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i − 1 down to 1 3. LINK(w i, w j )

46 Covington’s Incremental Algorithm Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i − 1 down to 1 3. LINK(w i, w j ) Different conditions, such as Single-Head and Projectivity, can be incorporated into the LINK operation.

47 Dynamic Programming Basic Idea: Treat dependencies as constituents. Use, e.g., CYK parser (with minor modifications)

48 Dynamic Programming Approaches Original version [Hays 1964] (grammar driven) Link grammar [Sleator and Temperley 1991] (grammar driven) Bilexical grammar [Eisner 1996] (data driven) Maximum spanning tree [McDonald 2006] (data driven)

49 Eisner 1996 Two novel aspects: Modified parsing algorithm Probabilistic dependency parsing Time requirement: O(n 3 ) Modification: Instead of storing subtrees, store spans Span: Substring such that no interior word links to any word outside the span. Idea: In a span, only the boundary words are active, i.e. still need a head or a child One or both of the boundary words can be active

50 Example Red figuresonthescreenindicatedfallingstocks_ROOT_

51 Example Spans: Red figuresonthescreenindicatedfallingstocks_ROOT_ indicatedfallingstocks Red figures

52 Assembly of correct parse Red figuresonthescreenindicatedfallingstocks_ROOT_ Red figures Start by combining adjacent words to minimal spans figureson the

53 Assembly of correct parse Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. onthe screen +→ onthescreen

54 Assembly of correct parse Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. figureson +→ thescreenfiguresonthescreen

55 Assembly of correct parse Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. Invalid span Red figuresonthescreen

56 Assembly of correct parse Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. +→ indicatedfallingstocksfallingstocks indicatedfalling

57 Classifier-Based Parsing Data-driven deterministic parsing: Deterministic parsing requires an oracle. An oracle can be approximated by a classifier. A classifier can be trained using treebank data. Learning algorithms: Support vector machines (SVM) [Kudo and Matsumoto 2002, Yamada and Matsumoto 2003,Isozaki et al. 2004, Cheng et al. 2004, Nivre et al. 2006] Memory-based learning (MBL) [Nivre et al. 2004, Nivre and Scholz 2004] Maximum entropy modeling (MaxEnt ) [Cheng et al. 2005]

58 Feature Models Learning problem: Approximate a function from parser states, represented by feature vectors to parser actions, given a training set of gold standard derivations. Typical features: Tokens and POS tags of : Target words Linear context (neighbors in S and Q) Structural context (parents, children, siblings in G) Can not be used in dynamic programming algorithms.

59 Feature Models

60 Dependency Parsers for download MST parser by Ryan McDonald Malt parser by Joakim Nivre Stanford parser

61 Outline Dependency Parsing Formal definition Dynamic programming Supervised Classification Semantic Role Labeling Propbank Automatic SRL FrameNet Semantic Parsing

62 62 What is PropBank: From Sentences to Propositions Powell met Zhu Rongji Proposition: meet(Powell, Zhu Rongji ) Powell met with Zhu Rongji Powell and Zhu Rongji met Powell and Zhu Rongji had a meeting... When Powell met Zhu Rongji on Thursday they discussed the return of the spy plane. meet(Powell, Zhu) discuss([Powell, Zhu], return(X, plane)) debate consult join wrestle battle meet(Somebody1, Somebody2)

63 63 Capturing semantic roles Faisal broke [ ARG1 Tom’s chair]. [ARG1 Monica’s chair] was broken by Lucian. [ARG1 Nitin’s chair] broke into pieces when it fell down. SUBJ

64 64 A TreeBanked Sentence S VBD NP VP NP working gave NP DT JJ NN NNS NN VBG Leeway The Supreme Court states “Non Terminal” (130969 ARGs) “Terminal”(4246 ARGs)

65 65 The Same Sentence, PropBanked S VBD NP/ARG0 VP NP/ARG2 working gave NP/ARG1 DT JJ NN NNS NN VBG Leeway/ARG0 The Supreme Court states Predicate

66 66 Core Arguments Arg0 = agent Arg1 = direct object / theme / patient Arg2 = indirect object / benefactive / instrument / attribute / end state Arg3 = start point / benefactive / instrument / attribute Arg4 = end point

67 67 Secondary ArgMs LOC - where at? DIR - where to? MNR - how? PRP -why? REC - himself, themselves, each other PRD -this argument refers to or modifies another ADV –others TMP - when? TPC – topic ADV –others

68 68 Distributions of Argument Types CORE Types ARGM Types

69 69 How to Use PropBank: Train a Semantic Role Labeling System (CONLL 04, CONLL 05 and) Our Goal: Given a list of (3073) target verbs the system should be able to tag the possible nodes with semantic role labels (Ji et al., 2005)

70 70 Predicate Features: Lexical Head Word, Head Pos of (-2,-1,0,1,2) window of Predicate Predicate is a Transitive verb or not Predicate Voice (Passive or not) Verb itself: must be in its past particle form Passive Context --Immediately following the verb "be" --Postmodifying a noun in a reduced relative clause, "The building damaged by fire". Encoding Conjunction feature of Predicate POS_Passive Context

71 71 Predicate Sub-Categorization Feature S VBD NP/ARG0 VP NP/ARG2 working gave NP/ARG1 DT JJ NN NNS NN VBG Leeway/ARG0 The Supreme Court states VP->VBD-NP-NP  The Phrase structure rule expanding the predicate’s grandparent

72 72 S VBD NP/ARG0 VP NP/ARG2 working gave NP/ARG1 DT JJ NN NNS NN VBG Leeway/ARG0 The Supreme Court states vbd_NNS_VBG_NN, v_NNS_VBG_NN, gave_NNS_VBG_NN, vbd_NP_NP, v_NP_NP, gave_NP_NP  Consider the predicate as a “Pivot”, and its grandparent’s children are defined in relation to it. Predicate Pivot Features

73 73 Argument Features: Lexical/Syntactic Head Word, Head Pos, Phrase Type of (-2,-1,0,1,2) window words, Begin Word, Last Word, Left Sister, Right Sister, Parent of Argument (Head of PP replaced by head word of NP inside it) Head Pos, Phrase Type of GrandParent Suffix1, Suffix2, Suffix3 Preceding, Succeeding Node’s Label Length (Span) Level from Leaves Beginning Letter of Phrase Type (for generalization) Punctuation before/after If it includes a Preposition or not, and Prep POS

74 74 Path: NP S VP VBD gave PartialPath: NP S InterPath: S VP VBD PredPathArgPathLens: 3_1 CollapsedPath (delete nodes between clauses): NP S VP VBD gave PathSOnly(Replace all the non-clause nodes with "*“): NP S * * gave PathSRemain(only keep clause nodes): NP S gave PathNGrams: NP|S|VP, S|VP|VBD, VP|VBD|gave, VBD|gave|*, gave|*|* PathLen: 4 ArgPhraseType, PathLen: NP, 4 PredHeadWord, PathSRemain: gave, NP S gave Intervene Feature: Path S VBD NP/ARG0 VP gave … DT JJ NN The Supreme Court

75 75 Directionality: Left SameClause: true Dominate Phrase Type: S Adjacency (Adjacent/not-Adjacent): Adjacent ArgPhraseType,Adjacency: NP,Adjacent PositionInClause (Begin, Inside, End): Begin RelPosition (Distance between Spans): 0 RelPosition,Directionality: 0, Left RelPosition,Transitive: 0, True RelPosition,PredHeadPos,PassiveContext: 0, VBD, False RelPosition,ArgPhraseType: 0, NP RelPosition,PredPivotv: 0, v_NNS_VBG_NN Intervene Feature: Position S VBD NP/ARG0 VP gave … DT JJ NN The Supreme Court

76 76 S VBD NP/ARG0 VP NP/ARG2 gave NP/ARG1 DT JJ NN The Supreme Court np_v_NP_NP_PP, CUR_v_NP_NP_PP, CUR_gave_NP_NP_PP Intervene Features: Pivot  Consider the predicate and candidate argument as “Pivots”, and other constituents are defined in relation to them PP …

77 77 Other Features PredHeadWord, ArgHeadWord PredHeadWord, ArgPhraseType ArgPreposition,Transitive Frequency of VP, NP, SBAR, CC, “,”, “:”, “”” in the sentence …

78

79 Hmm. Haven’t I heard that word “frame” before? Yes, it’s intended to be seen as a variation on the word as it’s been used in various branches of the cognitive sciences in recent decades.

80 “Frames” Traditions Let’s locate “our” notion of frame within the various traditions in the cognitive sciences that use the words frame or schema (joined, sometimes, with stereotype, script, scenario or idealized cognitive model) which appear to be dealing with essentially the same concept. These terms are used to denote structured sets of expectations that play a central role in how human beings create or interpret their experiences.

81 customer The noun customer is typically defined as ‘someone who buys something in a shop or business.’That includes everyone I know over the age of 5. Suppose you overhear somebody say Sue tends to be rude to customers. What situation do you imagine?

82 chicken (mass noun) The noun chicken, as a count noun, is the name of a well-known domestic bird. As a mass noun it is defined as ‘the meat of a chicken or chickens’. What’s wrong with the following sentence? The fox that lives near our farm likes chicken. (compare: likes chickens) The image you might get is of a fox eating fried chicken, holding a knife and a fork, and a napkin, in its paws.

83 The products of the lexical construction that yields mass noun uses of chicken, lamb, duck, turkey, etc., refer to meats prepared as part of human cuisine. *The wolf that lives near our ranch prefers lamb.

84 Invoking and Evoking Frames People invoke (summon up from their memory) frames, to make sense of their experience, linguistic or otherwise. a cognitive act Words evoke categories and knowledge structures that shape interpreters’ understanding of a text. a cognitive experience Warning: this not a standard use of these words.

85 So, We need to describe words in terms of the “framal” background. If we don’t understand the frame, we don’t understand the word, or why the language needs this word, or why the speaker chose to use it.

86 The ideal dictionary should let you 1. Look up a word 2. Get a link to a description of the relevant frame, for each of its meanings, and see the names of the frame’s components 3. See a display of its combinatory affordances, its valence possibilities, both semantic and syntactic 4. Find a collection of example sentences illustrating all of its main combinatory patterns 5. Find a list of other words that evoke the same frame 6. Link to other semantically related frames

87 Frame examples: Risk Taking_a_risk: Protagonist, Action, Harm, Asset 1. I’m going to risk a swim in the sea. 2. You’ll risk bankruptcy if you make that investment. 3. You’re risking your reputation by doing that. 4. You’re taking a big risk. Being_at_risk: Protagonist, Harm, Asset 1. Buildings in California risk destruction by earthquakes. 2. Newborns in this hospital run the risk of hypothermia. 3. We risk our lives every day. 4. I am at risk of stroke.

88 Frame examples: Explanation Communication.explanation: Speaker, Addressee, Mystery, Account 1. The coach explained the problem to the team. 2. The coach explained that they hadn’t learned the maneuvers. 3. What’s your explanation of these facts? 4. The defense lawyer gave an inadequate explanation. Cognition.explanation: Mystery, Account 1. What can explain these facts? 2. A history of unrestricted logging explains the erosion pattern we see here.

89 Compare: explain & reveal In their cognitive senses, as opposed to their meanings as verbs of speaking, the verbs explain and reveal are near-inverses of each other, where the Mystery and the Account in the former correspond to Evidence and Conclusion in the latter. 1. A history of unrestrained logging explains the erosion pattern we see here. (explains, accounts for) 2. The erosion pattern we see here reveals a history of unrestrained logging. (reveals, shows, suggests)

90 statetransitionstatetransitionstate Reading the Pictures  The boxes refer to five-part scenarios consisting of an initial state, a transition, an intermediary state, another transition, and a final state.  The writing under the pictures abbreviates particular role names and gives verbs that evoke instances of the scenario.  The bold borders indicate a profiling of some portion of the event.

91 X; return, go back, come back He returned to Hong Kong. He returned Tuesday evening after a week’s trip to Australia. He returned to his home for a few days. The verb RETURN profiles the time of arrival, but it evokes the entire frame; other information in the sentences can fill in some of the details of the larger scenario.

92 A, X; return, replace, put back I returned your books this morning. I returned to your desk the books that I had borrowed last week. After the earthquake we replaced all the books on the shelf.

93 frame elements Participants and Sub-events. Avenger the one who enacts revenge Offender the original offender Injured_party the offender ’ s victim Injury the offender ’ s act Punishment the avenger ’ s act

94 grammar Components of linguistic form for expressing the FEs (defining valence). Subject Direct Object Prepositional marking (by, for, with, on, at, against) Subordinate clause marking (for DOING, by DOING)

95 Outline Dependency Parsing Formal definition Dynamic programming Supervised Classification Semantic Role Labeling Propbank Automatic SRL FrameNet Semantic Parsing

96 Learning Semantic Parsers Semantic Parser Learner Semantic Parser Meaning Representations Sentences Training Sentences & Meaning Representations

97 Data Collection for ATIS Air travel planning scenarios (Hirschman, 1992) You have 3 days for job hunting, and you have arranged job interviews in 2 different cities! Start from City-A and plan the flight and ground transportation itinerary to City-B and City-C, and back home to City-A. Use of human wizards: Subjects were led to believe they were talking to a fully automated systems Human transcription and error correction behind the scene A group at SRI responsible for database reference answers Collected more than 10,000 utterances and 1,000 sessions for ATIS-3

98 Sample Session in ATIS may i see all the flights from cleveland to, dallas can you show me the flights that leave before noon, only could you sh- please show me the types of aircraft used on these flights Air-Transportation Show: (Aircraft) Origin: (City "Cleveland") Destination: (City "Dallas") Departure-Time: (< 1200)

99 Chanel (Kuhn & De Mori, 1995) Consists of a set of decision trees Each tree builds part of a meaning representation Some trees decide whether a given attribute should be displayed in query results Some trees decide the semantic role of a given substring Each correpsonds to a query constraint

100 Chanel (Kuhn & De Mori, 1995) show me TIME flights from CITY1 to CITY2 and how much they cost Tree 1: Display aircraft_code? Tree 23: Display fare_id? Tree 114: Display booking_class? CITY tree: For each CITY: origin, dest, or stop? TIME tree: For each TIME: arrival or departure? Display Attributes: {flight_id, fare_id} Constraints: {from_airport = CITY1, to_airport = CITY2, departure_time = TIME}

101 Statistical Parsing (Miller et al., 1996) Find most likely meaning M 0, given words W and history H (M': pre-discourse meaning, T: parse tree) Three successive stages: parsing, semantic interpretation, and discourse Parsing model similar to Seneff (1992) Requires annotated parse trees for training

102 Statistical Parsing (Miller et al., 1996) WhenthedoflightsthatleavefromBostonarriveinAtlanta time /wh-head/det/aux flight /np-head/comp departure /vp-head departure /prep city /npr arrival /vp-head location /prep city /npr departure /pp location /pp flight /corenp departure /vp flight-constraints /rel-clause flight /np arrival /vp /wh-question

103 Recent Approaches Different levels of supervision Ranging from fully supervised to unsupervised Advances in machine learning Structured learning Kernel methods Grammar formalisms Combinatory categorial grammars Sychronous grammars Unified framework for handling various phenomena Spontaneous speech Discourse Perceptual context Generation

104 Combinatory Categorial Grammar Highly structured lexical entries A few general parsing rules (Steedman, 2000; Steedman & Baldridge, 2005) Each lexical entry is a word paired with a category Texas := NP borders := (S \ NP) / NP Mexico := NP New Mexico := NP

105 Parsing Rules (Combinators) Describe how adjacent categories are combined Functional application: A / BB ⇒ A(>) BA \ B ⇒ A(<) TexasbordersNew Mexico NP(S \ NP) / NPNP S \ NP > < S

106 CCG for Semantic Parsing Extend categories with semantic types Functional application with semantics: Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) A / B : fB : a ⇒ A : f(a)(>) B : aA \ B : f ⇒ A : f(a)(<)

107 Sample CCG Derivation TexasbordersNew Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP new_mexico S \ NP λy.borders(y, new_mexico) > < S borders(texas, new_mexico)

108 Another Sample CCG Derivation TexasbordersNew Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP mexico S \ NP λy.borders(y, mexico) > < S borders(texas, mexico)

109 Probabilistic CCG for Semantic Parsing L (lexicon) = w (feature weights) Features: f i (x, d): Number of times lexical item i is used in derivation d Log-linear model: P w (d | x)  exp(w. f(x, d)) Best derviation: d* = argmax d w. f(x, d) Consider all possible derivations d for the sentence x given the lexicon L Zettlemoyer & Collins (2005) Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) Mexico := NP : mexico New Mexico := NP : new_mexico

110 Learning Probabilistic CCG Lexical Generation CCG Parser Logical Forms Sentences Training Sentences & Logical Forms Parameter Estimation Lexicon L Feature weights w

111 Lexical Generation Input: Output lexicon: Texas borders New Mexico borders(texas, new_mexico) Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) New Mexico := NP : new_mexico

112 Lexical Generation Input sentence: Texas borders New Mexico Output substrings: Texas borders New Mexico Texas borders borders New New Mexico Texas borders New … Input logical form: borders(texas, new_mexico) Output categories: NP : texas NP : new _mexico (S \ NP) / NP : λx.λy.borders(y, x) (S \ NP) / NP : λx.λy.borders(x, y) … 

113 Category Rules Input TriggerOutput Category constant cNP : c arity one predicate pN : λx.p(x) arity one predicate pS \ NP : λx.p(x) arity two predicate p(S \ NP) / NP : λx.λy.p(y, x) arity two predicate p(S \ NP) / NP : λx.λy.p(x, y) arity one predicate p N / N : λg.λx.p(x)  g(x) arity two predicate p and constant c N / N : λg.λx.p(x, c)  g(x) arity two predicate p (N \ N) / NP : λx.λg.λy.p(y, x)  g(x) arity one function fNP / N : λg.argmax/min(g(x), λx.f(x)) arity one function fS / NP : λx.f(x)

114 Parameter Estimation Maximum conditional likelihood Derivations d are not annotated, treated as hidden variables Stochastic gradient ascent (LeCun et al., 1998) Keep only those lexical items that occur in the highest scoring derivations of training set

115 Results Test for correct logical forms Precision: # correct / total # parsed sentences Recall: # correct / total # sentences For Geoquery, 96% precision, 79% recall Low recall due to incomplete lexical generation: Through which states does the Mississippi run?

116 Context-Free Semantic Grammar QUERY  What is CITY CITY  the capital CITY CITY  of STATE STATE  Ohio Ohio of STATE QUERY CITY What is CITY the capital

117 Sample SCFG Production QUERY  What is CITY / answer(CITY) Natural languageFormal language

118 Sample SCFG Derivation QUERY

119 QUERY  What is CITY / answer(CITY) Sample SCFG Derivation QUERY CITY What is QUERY answer ( CITY )

120 Sample SCFG Derivation What is the capital of Ohio Ohio of STATE QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( STATE ) stateid ( 'ohio' ) answer(capital(loc_2(stateid('ohio')))) CITY the capital

121 Another Sample SCFG Derivation What is the capital of Ohio Ohio of RIVER QUERY CITY What is QUERY answer ( CITY ) capital ( CITY ) loc_2 ( RIVER ) riverid ( 'ohio' ) answer(capital(loc_2(riverid('ohio')))) CITY the capital

122 Probabilistic SCFG for Semantic Parsing S (start symbol) = QUERY L (lexicon) = w (feature weights) Features: f i (x, d): Number of times production i is used in derivation d Log-linear model: P w (d | x)  exp(w. f(x, d)) Best derviation: d* = argmax d w. f(x, d) STATE  Ohio / stateid('ohio') QUERY  What is CITY / answer(CITY) CITY  the capital CITY / capital(CITY) CITY  of STATE / loc_2(STATE)

123 Learning Probabilistic SCFG Lexical Acquisition SCFG Parser Meaning Representations Sentences Training Sentences & Meaning Representations Parameter Estimation Lexicon L Feature weights w Unambiguous CFG for Meaning Representations

124 Lexical Acquisition SCFG productions are extracted from word alignments between training sentences and their meaning representations ( ( true ) ( do our { 1 } ( pos ( half our ) ) ) ) The goalie should always stay in our half

125 Extracting SCFG Productions The goalie should always stay in our half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our

126 Extracting SCFG Productions TEAM The goalie should always stay in half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM) TEAM  our / our

127 Extracting SCFG Productions TEAM The goalie should always stay in half RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  (half TEAM)

128 Extracting SCFG Productions REGION The goalie should always stay in RULE  (CONDITION DIRECTIVE) CONDITION  (true) DIRECTIVE  (do TEAM {UNUM} ACTION) TEAM  our UNUM  1 ACTION  (pos REGION) REGION  TEAM half / (half TEAM)

129 Output SCFG Productions TEAM  our / our REGION  TEAM half / (half TEAM) ACTION  stay in REGION / (pos REGION) UNUM  goalie / 1 RULE  [the] UNUM should always ACTION / ((true) (do our {UNUM} ACTION)) Phrases can be non-contiguous

130 Handling Logical Forms with Variables Wong & Mooney (2007b) FORM  state / λx.state(x) FORM  by area / λx.λy.area(x, y) FORM  [the] smallest FORM FORM / λx.smallest(y, (FORM(x), FORM(x, y)) QUERY  what is FORM / answer(x, FORM(x)) Operators for variable binding

131 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations

132 Unambiguous Supervision for Learning Semantic Parsers The training data for semantic parsing consists of hundreds of natural language sentences unambiguously paired with their meaning representations Which rivers run through the states bordering Texas? answer(traverse(next_to(stateid(‘texas’)))) What is the lowest point of the state with the largest area? answer(lowest(place(loc(largest_one(area(state(all))))))) What is the largest city in states that border California? answer(largest(city(loc(next_to(stateid( 'california')))))) ……

133 Shortcomings of Unambiguous Supervision It requires considerable human effort to annotate each sentence with its correct meaning representation Does not model the type of supervision children receive when they are learning a language Children are not taught meanings of individual sentences They learn to identify the correct meaning of a sentence from several meanings possible in their perceptual context

134 ??? “Mary is on the phone”

135 Ambiguous Supervision for Learning Semantic Parsers A computer system simultaneously exposed to perceptual contexts and natural language utterances should be able to learn the underlying language semantics We consider ambiguous training data of sentences associated with multiple potential meaning representations Siskind (1996) uses this type “referentially uncertain” training data to learn meanings of words We use ambiguous training data to learn meanings of sentences Capturing meaning representations from perceptual contexts is a difficult unsolved problem Our system directly works with symbolic meaning representations

136 “Mary is on the phone” ???

137 “Mary is on the phone” ???

138 Ironing(Mommy, Shirt) “Mary is on the phone” ???

139 Ironing(Mommy, Shirt) Working(Sister, Computer) “Mary is on the phone” ???

140 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) “Mary is on the phone” ???

141 Ironing(Mommy, Shirt) Working(Sister, Computer) Carrying(Daddy, Bag) Talking(Mary, Phone) Sitting(Mary, Chair) “Mary is on the phone” Ambiguous Training Example ???

142 Ironing(Mommy, Shirt) Working(Sister, Computer) Talking(Mary, Phone) Sitting(Mary, Chair) “Mommy is ironing shirt” Next Ambiguous Training Example ???

143 Sample Ambiguous Corpus Daisy gave the clock to the mouse. Mommy saw that Mary gave the hammer to the dog. The dog broke the box. John gave the bag to the mouse. The dog threw the ball. ate(mouse, orange) gave(daisy, clock, mouse) ate(dog, apple) saw(mother, gave(mary, dog, hammer)) broke(dog, box) gave(woman, toy, mouse) gave(john, bag, mouse) threw(dog, ball) runs(dog) saw(john, walks(man, dog))

144 Parsing in Watson

145 Watson English Slot Grammar (ESG) parser Deep parser which explores the syntactic and logical structure to generate semantic clues Fig: Slot filling for John sold a fish Fig: Slot Grammer Analysis Structure John sold a fish subj obj ndet SlotsWS(arg) features Subj(n)John(1) noun pron Topsold(2,1,4) verb Ndeta(3) det indef Obj(n)fish(4) noun


Download ppt "DEPENDENCY PARSING , FRAMENET, SEMANTIC ROLE LABELING, SEMANTIC PARSING Heng Ji September 17, 2014 Acknowledgement: FrameNet slides from Charles."

Similar presentations


Ads by Google