Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Similar presentations


Presentation on theme: "Grammar Induction With ADIOS (Automatic DIstillation Of Structure)"— Presentation transcript:

1

2 Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

3 Previous work Probabilistic Context Free Grammars Supervised induction methods Little work on raw data Mostly work on artificial CFGs Clustering

4 Our goal Given a corpus of raw text separated into sentences, we want to derive a specification of the underlying grammar This means we want to be able to Create new unseen grammatically correct sentences Accept new unseen grammatically correct sentences and reject ungrammatical ones

5 What do we need to do? G is given by the rewrite rules – S NP VP NP the N | a N N man | boy | dog VP V NP V saw | heard | sensed | sniffed

6 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability We will consider each of these in turn

7 Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3) 103 (1) 104 (1) (2) 104 (3) (2) (3) 103 (6) (5)(7) (6) (5) where 104 (4) thedog ? END (4) (5) aandhorse (2) that cat 102 (1) BEGIN is Is that a cat?Where is the dog?And is that a horse? node edge The Model: Graph representation with words as vertices and sentences as paths.

8 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

9 Toy problem – Alice in Wonderland a l i c e w a s b e g i n n i n g t o g e t v e r y t i r e d o f s i t t i n g b y h e r s i s t e r o n t h e b a n k a n d o f h a v i n g n o t h i n g t o d o o n c e o r t w i c e s h e h a d p e e p e d i n t o t h e b o o k h e r s i s t e r w a s r e a d i n g b u t i t h a d n o p i c t u r e s o r c o n v e r s a t i o n s i n i t a n d w h a t i s t h e u s e o f a b o o k t h o u g h t a l i c e w i t h o u t p i c t u r e s o r c o n v e r s a t i o n

10 Detecting significant patterns Identifying patterns becomes easier on a graph Sub-paths are automatically aligned

11 Motif EXtraction

12 The Markov Matrix The top right triangle defines the P L probabilities, bottom left triangle the P R probabilities Matrix is path-dependent

13

14 Example of a probability matrix

15 Rewiring the graph Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

16 MEX at work

17 ALICE motifs curious1.00196 hadbeen1.00176 however1.00206 perhaps1.00166 hastily1.00166 herself1.00786 footman1.00146 suppose1.00126 silence0.99146 witness0.99106 gryphon0.97546 serpent0.97116 angrily0.9786 croquet0.9786 venture0.95126 forsome0.95126 timidly0.9596 whisper0.9596 rabbit1.00275 course1.00255 eplied1.00225 seemed1.00265 remark1.00285 WeightOccurrencesLength

18 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

19 Generalization – defining an equivalence class show me flights from philadelphia to san francisco on wednesdays list all flights from boston to san francisco with the maximum number of stops show flights from dallas to san francisco may i see the flights from denver to san francisco please show me flights from to san francisco on wednesdays boston philadelphia denver dallas Generalized search path:

20 Generalization show me flights from to san francisco on wednesdays boston philadelphia denver dallas i need to fly from boston to baltimore please give me… which airlines fly from dallas to denver please give me a flight from philadelphia to atlanta before ten a m in the morning list all flights going from boston to atlanta on wednesday… P1: from _E1 to _E1 = boston philadelphia denver dallas

21 Context-sensitive generalization Slide a context window of size L across current search path For each 1iL look at all paths that are identical with the search path for 1kL, except for k=i Define an equivalence class containing the nodes at index i for these paths Replace ith node with equivalence class Find significant patterns using MEX criterion

22 Determining L Involves a tradeoff Larger L will demand more context sensitivity in the inference Will hamper generalization Smaller L will detect more patterns But many might be spurious

23 The effects of context window width

24 Over-generalization john believes that to please is easy john thinks that to please is fun jack and john believe that to please is hard john that to please is easy believes thinks believe Generalized search path:

25 Bootstrapping what are the cheapest flights from denver to boston that stop in atlanta boston philadelphia denver dallas A pre-existing equivalence class: What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas Generalized search path I: boston philadelphia denver dallas

26 Bootstrapping What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas what is the cheapest fare from denver to philadelphia and from pittsburgh to atlanta i would… like the cheapest airfare from boston to denver december twenty sixth show me the cheapest flight from philadelphia to dallas which arrives…

27 Bootstrapping What are the cheapest from to that stop in atlanta boston philadelphia denver Generalized search path II: denver philadelphia dallas flight flights airfare fare _P2: the cheapest _E2 from _E3 to _E4 flight flights airfare fare boston philadelphia denver denver philadelphia dallas _E2 =_E3 =_E4 =

28 Bootstrapping Slide a context window of length L along the current search path Consider all sub-paths of length L that begin in a 1 and end in a L These are the candidate paths For each 1iL For each 1kL, ki Replace node k with the EC that contains node k and maximally overlaps the set of nodes at index k of the candidate paths Continue as before

29 The ADIOS algorithm Initialization – load all data into a pseudograph Until no more patterns are found For each path P Create generalized search paths from P Detect significant patterns using MEX If found, add best new pattern and equivalence classes and rewire the graph

30 1205 567321120132234621987 321 234987 1203 567321120132 234 621 987 2000 321234 987 1203 3211203 234987 1204 987 2001 1204 The training process

31 1205 567321120132234621987 1203 567120132621 2000 321 1203 1204 987 2001 1204

32 The result 1205 567 321 120132234621987 567 120132621 2000 321 1203 987 2001 1204

33 Example

34 More Patterns

35 Evaluating performance In principle, we would like to compare ADIOS-generated parse-trees with the true parse-trees for given sentences Alas, the true parse-trees are subject to opinion Some approaches dont even suppose parse trees

36 Evaluating performance Define Recall – the probability of ADIOS recognizing an unseen grammatical sentence Precision – the proportion of grammatical ADIOS productions Recall can be assessed by leaving out some of the training corpus Precision is trickier Unless were learning a known CFG

37 The ATIS experiments ATIS-NL is a 13,043 sentence corpus of natural language Transcribed phone calls to an airline reservation service ADIOS was trained on 12,700 sentences of ATIS-NL The remaining 343 sentences were used to assess recall Precision was determined with the help of 8 graduate students from Cornell University

38 An ADIOS drawback ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains forever – errors conflate Sentence ordering affects outcome Running ADIOS with different orderings gives patterns that cover different parts of the grammar

39 An ad-hoc solution Train multiple learners on the corpus Each on a different sentence ordering Create a forest of learners To create a new sentence Pick one learner at random Use it to produce sentence To check grammaticality of given sentence If any learner accepts sentence, declare as grammatical

40 The ATIS experiments ADIOS performance scores – Recall – 40% Precision – 70% For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)

41 ADIOS/ATIS-N comparison

42 accept able unaccep table i would like a flight from washington to boston flight three twenty four on august twentieth 1 1 H round trip flight from boston to baltimore leaving boston less than a thousand dollars 1 2 C well what offers the ground transportation available in fort worth 1 3 C does continental fly from san francisco to atlanta 1 4 C does american airlines fly from philadelphia to dallas 1 5 H please describe to me the classes of service that are available 1 6 H i'd like to fly from philadelphia to dallas next week 1 7 H which airline offers the most flights from san francisco washington 1 8 C is it possible for me to fly from baltimore to san francisco 1 9 H i want to fly from boston to pittsburgh to san francisco 1 1010 C would like to arrange a round trip flight from atlanta to boston to pittsburgh to san francisco tuesday the 1 1 C between eleven and twelve o'clock in the morning 1 1212 H what offers the cheapest fare from boston to pittsburgh to atlanta 1 1313 C what is the airfare from boston to pittsburgh to atlanta 1 1414 C H uman C omputer

43 English as Second Language test A single instance of ADIOS was trained on the CHILDES corpus 120,000 sentences of transcribed child-directed speech Subjected to the Goteborg multiple choice ESL test 100 sentences, each with open slot Pick correct word out of three ADIOS got 60% of answers correctly An average ninth-grader performance

44 ADIOS

45 Meta-analysis of ADIOS results Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its contents E.g. TT, TET, EE, PE… A single ADIOS learner was trained with each of 6 translations of the bible

46 Pattern spectra

47 Language dendogram

48 In case theres time…

49 Pattern significance Say we found a potential pattern-edge from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because its a pattern edge, we know that Lets suppose that the true probability for n+1 given 1 through n is r/m is our best estimate, but just an estimate What are the odds of getting r and m but still have ?

50 Pattern significance Assume The odds of getting result r and m or better are then given by If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

51 To be continued…


Download ppt "Grammar Induction With ADIOS (Automatic DIstillation Of Structure)"

Similar presentations


Ads by Google