Download presentation

Presentation is loading. Please wait.

Published byDylon Grim Modified over 3 years ago

2
Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

3
Previous work Probabilistic Context Free Grammars Supervised induction methods Little work on raw data Mostly work on artificial CFGs Clustering

4
Our goal Given a corpus of raw text separated into sentences, we want to derive a specification of the underlying grammar This means we want to be able to Create new unseen grammatically correct sentences Accept new unseen grammatically correct sentences and reject ungrammatical ones

5
What do we need to do? G is given by the rewrite rules – S NP VP NP the N | a N N man | boy | dog VP V NP V saw | heard | sensed | sniffed

6
ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability We will consider each of these in turn

7
Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3) 103 (1) 104 (1) (2) 104 (3) (2) (3) 103 (6) (5)(7) (6) (5) where 104 (4) thedog ? END (4) (5) aandhorse (2) that cat 102 (1) BEGIN is Is that a cat?Where is the dog?And is that a horse? node edge The Model: Graph representation with words as vertices and sentences as paths.

8
ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

9
Toy problem – Alice in Wonderland a l i c e w a s b e g i n n i n g t o g e t v e r y t i r e d o f s i t t i n g b y h e r s i s t e r o n t h e b a n k a n d o f h a v i n g n o t h i n g t o d o o n c e o r t w i c e s h e h a d p e e p e d i n t o t h e b o o k h e r s i s t e r w a s r e a d i n g b u t i t h a d n o p i c t u r e s o r c o n v e r s a t i o n s i n i t a n d w h a t i s t h e u s e o f a b o o k t h o u g h t a l i c e w i t h o u t p i c t u r e s o r c o n v e r s a t i o n

10
Detecting significant patterns Identifying patterns becomes easier on a graph Sub-paths are automatically aligned

11
Motif EXtraction

12
The Markov Matrix The top right triangle defines the P L probabilities, bottom left triangle the P R probabilities Matrix is path-dependent

14
Example of a probability matrix

15
Rewiring the graph Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

16
MEX at work

17
ALICE motifs curious1.00196 hadbeen1.00176 however1.00206 perhaps1.00166 hastily1.00166 herself1.00786 footman1.00146 suppose1.00126 silence0.99146 witness0.99106 gryphon0.97546 serpent0.97116 angrily0.9786 croquet0.9786 venture0.95126 forsome0.95126 timidly0.9596 whisper0.9596 rabbit1.00275 course1.00255 eplied1.00225 seemed1.00265 remark1.00285 WeightOccurrencesLength

18
ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

19
Generalization – defining an equivalence class show me flights from philadelphia to san francisco on wednesdays list all flights from boston to san francisco with the maximum number of stops show flights from dallas to san francisco may i see the flights from denver to san francisco please show me flights from to san francisco on wednesdays boston philadelphia denver dallas Generalized search path:

20
Generalization show me flights from to san francisco on wednesdays boston philadelphia denver dallas i need to fly from boston to baltimore please give me… which airlines fly from dallas to denver please give me a flight from philadelphia to atlanta before ten a m in the morning list all flights going from boston to atlanta on wednesday… P1: from _E1 to _E1 = boston philadelphia denver dallas

21
Context-sensitive generalization Slide a context window of size L across current search path For each 1iL look at all paths that are identical with the search path for 1kL, except for k=i Define an equivalence class containing the nodes at index i for these paths Replace ith node with equivalence class Find significant patterns using MEX criterion

22
Determining L Involves a tradeoff Larger L will demand more context sensitivity in the inference Will hamper generalization Smaller L will detect more patterns But many might be spurious

23
The effects of context window width

24
Over-generalization john believes that to please is easy john thinks that to please is fun jack and john believe that to please is hard john that to please is easy believes thinks believe Generalized search path:

25
Bootstrapping what are the cheapest flights from denver to boston that stop in atlanta boston philadelphia denver dallas A pre-existing equivalence class: What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas Generalized search path I: boston philadelphia denver dallas

26
Bootstrapping What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas what is the cheapest fare from denver to philadelphia and from pittsburgh to atlanta i would… like the cheapest airfare from boston to denver december twenty sixth show me the cheapest flight from philadelphia to dallas which arrives…

27
Bootstrapping What are the cheapest from to that stop in atlanta boston philadelphia denver Generalized search path II: denver philadelphia dallas flight flights airfare fare _P2: the cheapest _E2 from _E3 to _E4 flight flights airfare fare boston philadelphia denver denver philadelphia dallas _E2 =_E3 =_E4 =

28
Bootstrapping Slide a context window of length L along the current search path Consider all sub-paths of length L that begin in a 1 and end in a L These are the candidate paths For each 1iL For each 1kL, ki Replace node k with the EC that contains node k and maximally overlaps the set of nodes at index k of the candidate paths Continue as before

29
The ADIOS algorithm Initialization – load all data into a pseudograph Until no more patterns are found For each path P Create generalized search paths from P Detect significant patterns using MEX If found, add best new pattern and equivalence classes and rewire the graph

30
1205 567321120132234621987 321 234987 1203 567321120132 234 621 987 2000 321234 987 1203 3211203 234987 1204 987 2001 1204 The training process

31
1205 567321120132234621987 1203 567120132621 2000 321 1203 1204 987 2001 1204

32
The result 1205 567 321 120132234621987 567 120132621 2000 321 1203 987 2001 1204

33
Example

34
More Patterns

35
Evaluating performance In principle, we would like to compare ADIOS-generated parse-trees with the true parse-trees for given sentences Alas, the true parse-trees are subject to opinion Some approaches dont even suppose parse trees

36
Evaluating performance Define Recall – the probability of ADIOS recognizing an unseen grammatical sentence Precision – the proportion of grammatical ADIOS productions Recall can be assessed by leaving out some of the training corpus Precision is trickier Unless were learning a known CFG

37
The ATIS experiments ATIS-NL is a 13,043 sentence corpus of natural language Transcribed phone calls to an airline reservation service ADIOS was trained on 12,700 sentences of ATIS-NL The remaining 343 sentences were used to assess recall Precision was determined with the help of 8 graduate students from Cornell University

38
An ADIOS drawback ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains forever – errors conflate Sentence ordering affects outcome Running ADIOS with different orderings gives patterns that cover different parts of the grammar

39
An ad-hoc solution Train multiple learners on the corpus Each on a different sentence ordering Create a forest of learners To create a new sentence Pick one learner at random Use it to produce sentence To check grammaticality of given sentence If any learner accepts sentence, declare as grammatical

40
The ATIS experiments ADIOS performance scores – Recall – 40% Precision – 70% For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)

41
ADIOS/ATIS-N comparison

42
accept able unaccep table i would like a flight from washington to boston flight three twenty four on august twentieth 1 1 H round trip flight from boston to baltimore leaving boston less than a thousand dollars 1 2 C well what offers the ground transportation available in fort worth 1 3 C does continental fly from san francisco to atlanta 1 4 C does american airlines fly from philadelphia to dallas 1 5 H please describe to me the classes of service that are available 1 6 H i'd like to fly from philadelphia to dallas next week 1 7 H which airline offers the most flights from san francisco washington 1 8 C is it possible for me to fly from baltimore to san francisco 1 9 H i want to fly from boston to pittsburgh to san francisco 1 1010 C would like to arrange a round trip flight from atlanta to boston to pittsburgh to san francisco tuesday the 1 1 C between eleven and twelve o'clock in the morning 1 1212 H what offers the cheapest fare from boston to pittsburgh to atlanta 1 1313 C what is the airfare from boston to pittsburgh to atlanta 1 1414 C H uman C omputer

43
English as Second Language test A single instance of ADIOS was trained on the CHILDES corpus 120,000 sentences of transcribed child-directed speech Subjected to the Goteborg multiple choice ESL test 100 sentences, each with open slot Pick correct word out of three ADIOS got 60% of answers correctly An average ninth-grader performance

44
ADIOS

45
Meta-analysis of ADIOS results Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its contents E.g. TT, TET, EE, PE… A single ADIOS learner was trained with each of 6 translations of the bible

46
Pattern spectra

47
Language dendogram

48
In case theres time…

49
Pattern significance Say we found a potential pattern-edge from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because its a pattern edge, we know that Lets suppose that the true probability for n+1 given 1 through n is r/m is our best estimate, but just an estimate What are the odds of getting r and m but still have ?

50
Pattern significance Assume The odds of getting result r and m or better are then given by If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

51
To be continued…

Similar presentations

OK

Graph Theory Arnold Mesa. Basic Concepts n A graph G = (V,E) is defined by a set of vertices and edges v3 v1 v2Vertex (v1) Edge (e1) A Graph with 3 vertices.

Graph Theory Arnold Mesa. Basic Concepts n A graph G = (V,E) is defined by a set of vertices and edges v3 v1 v2Vertex (v1) Edge (e1) A Graph with 3 vertices.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Sample ppt on importance of english Ppt on rational numbers for class 7 free download Ppt on k-means clustering algorithm Ppt on afforestation in india Ppt on different solid figures powerpoint Ppt on sports day clip Ppt on vehicle tracking system with gps and gsm Ppt on marie curie's daughter Open ppt on ipad with keynote Ppt on osi model and tcp ip