Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

Similar presentations


Presentation on theme: "Grammar Induction With ADIOS (Automatic DIstillation Of Structure)"— Presentation transcript:

1

2 Grammar Induction With ADIOS (Automatic DIstillation Of Structure)

3 Previous work Probabilistic Context Free Grammars Supervised induction methods Little work on raw data Mostly work on artificial CFGs Clustering

4 Our goal Given a corpus of raw text separated into sentences, we want to derive a specification of the underlying grammar This means we want to be able to Create new unseen grammatically correct sentences Accept new unseen grammatically correct sentences and reject ungrammatical ones

5 What do we need to do? G is given by the rewrite rules – S NP VP NP the N | a N N man | boy | dog VP V NP V saw | heard | sensed | sniffed

6 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability We will consider each of these in turn

7 Is that a dog? (6) 102 (5) (4) 102 (3) (4) 101 (1)(2) 101(3) 103 (1) 104 (1) (2) 104 (3) (2) (3) 103 (6) (5)(7) (6) (5) where 104 (4) thedog ? END (4) (5) aandhorse (2) that cat 102 (1) BEGIN is Is that a cat?Where is the dog?And is that a horse? node edge The Model: Graph representation with words as vertices and sentences as paths.

8 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

9 Toy problem – Alice in Wonderland a l i c e w a s b e g i n n i n g t o g e t v e r y t i r e d o f s i t t i n g b y h e r s i s t e r o n t h e b a n k a n d o f h a v i n g n o t h i n g t o d o o n c e o r t w i c e s h e h a d p e e p e d i n t o t h e b o o k h e r s i s t e r w a s r e a d i n g b u t i t h a d n o p i c t u r e s o r c o n v e r s a t i o n s i n i t a n d w h a t i s t h e u s e o f a b o o k t h o u g h t a l i c e w i t h o u t p i c t u r e s o r c o n v e r s a t i o n

10 Detecting significant patterns Identifying patterns becomes easier on a graph Sub-paths are automatically aligned

11 Motif EXtraction

12 The Markov Matrix The top right triangle defines the P L probabilities, bottom left triangle the P R probabilities Matrix is path-dependent

13

14 Example of a probability matrix

15 Rewiring the graph Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

16 MEX at work

17 ALICE motifs curious hadbeen however perhaps hastily herself footman suppose silence witness gryphon serpent angrily croquet venture forsome timidly whisper rabbit course eplied seemed remark WeightOccurrencesLength

18 ADIOS in outline Composed of three main elements A representational data structure A segmentation criterion (MEX) A generalization ability

19 Generalization – defining an equivalence class show me flights from philadelphia to san francisco on wednesdays list all flights from boston to san francisco with the maximum number of stops show flights from dallas to san francisco may i see the flights from denver to san francisco please show me flights from to san francisco on wednesdays boston philadelphia denver dallas Generalized search path:

20 Generalization show me flights from to san francisco on wednesdays boston philadelphia denver dallas i need to fly from boston to baltimore please give me… which airlines fly from dallas to denver please give me a flight from philadelphia to atlanta before ten a m in the morning list all flights going from boston to atlanta on wednesday… P1: from _E1 to _E1 = boston philadelphia denver dallas

21 Context-sensitive generalization Slide a context window of size L across current search path For each 1iL look at all paths that are identical with the search path for 1kL, except for k=i Define an equivalence class containing the nodes at index i for these paths Replace ith node with equivalence class Find significant patterns using MEX criterion

22 Determining L Involves a tradeoff Larger L will demand more context sensitivity in the inference Will hamper generalization Smaller L will detect more patterns But many might be spurious

23 The effects of context window width

24 Over-generalization john believes that to please is easy john thinks that to please is fun jack and john believe that to please is hard john that to please is easy believes thinks believe Generalized search path:

25 Bootstrapping what are the cheapest flights from denver to boston that stop in atlanta boston philadelphia denver dallas A pre-existing equivalence class: What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas Generalized search path I: boston philadelphia denver dallas

26 Bootstrapping What are the cheapest flights from to that stop in atlanta boston philadelphia denver dallas what is the cheapest fare from denver to philadelphia and from pittsburgh to atlanta i would… like the cheapest airfare from boston to denver december twenty sixth show me the cheapest flight from philadelphia to dallas which arrives…

27 Bootstrapping What are the cheapest from to that stop in atlanta boston philadelphia denver Generalized search path II: denver philadelphia dallas flight flights airfare fare _P2: the cheapest _E2 from _E3 to _E4 flight flights airfare fare boston philadelphia denver denver philadelphia dallas _E2 =_E3 =_E4 =

28 Bootstrapping Slide a context window of length L along the current search path Consider all sub-paths of length L that begin in a 1 and end in a L These are the candidate paths For each 1iL For each 1kL, ki Replace node k with the EC that contains node k and maximally overlaps the set of nodes at index k of the candidate paths Continue as before

29 The ADIOS algorithm Initialization – load all data into a pseudograph Until no more patterns are found For each path P Create generalized search paths from P Detect significant patterns using MEX If found, add best new pattern and equivalence classes and rewire the graph

30 The training process

31

32 The result

33 Example

34 More Patterns

35 Evaluating performance In principle, we would like to compare ADIOS-generated parse-trees with the true parse-trees for given sentences Alas, the true parse-trees are subject to opinion Some approaches dont even suppose parse trees

36 Evaluating performance Define Recall – the probability of ADIOS recognizing an unseen grammatical sentence Precision – the proportion of grammatical ADIOS productions Recall can be assessed by leaving out some of the training corpus Precision is trickier Unless were learning a known CFG

37 The ATIS experiments ATIS-NL is a 13,043 sentence corpus of natural language Transcribed phone calls to an airline reservation service ADIOS was trained on 12,700 sentences of ATIS-NL The remaining 343 sentences were used to assess recall Precision was determined with the help of 8 graduate students from Cornell University

38 An ADIOS drawback ADIOS is inherently a heuristic and greedy algorithm Once a pattern is created it remains forever – errors conflate Sentence ordering affects outcome Running ADIOS with different orderings gives patterns that cover different parts of the grammar

39 An ad-hoc solution Train multiple learners on the corpus Each on a different sentence ordering Create a forest of learners To create a new sentence Pick one learner at random Use it to produce sentence To check grammaticality of given sentence If any learner accepts sentence, declare as grammatical

40 The ATIS experiments ADIOS performance scores – Recall – 40% Precision – 70% For comparison, ATIS-CFG reached – Recall – 45% Precision - <1%(!)

41 ADIOS/ATIS-N comparison

42 accept able unaccep table i would like a flight from washington to boston flight three twenty four on august twentieth 1 1 H round trip flight from boston to baltimore leaving boston less than a thousand dollars 1 2 C well what offers the ground transportation available in fort worth 1 3 C does continental fly from san francisco to atlanta 1 4 C does american airlines fly from philadelphia to dallas 1 5 H please describe to me the classes of service that are available 1 6 H i'd like to fly from philadelphia to dallas next week 1 7 H which airline offers the most flights from san francisco washington 1 8 C is it possible for me to fly from baltimore to san francisco 1 9 H i want to fly from boston to pittsburgh to san francisco C would like to arrange a round trip flight from atlanta to boston to pittsburgh to san francisco tuesday the 1 1 C between eleven and twelve o'clock in the morning H what offers the cheapest fare from boston to pittsburgh to atlanta C what is the airfare from boston to pittsburgh to atlanta C H uman C omputer

43 English as Second Language test A single instance of ADIOS was trained on the CHILDES corpus 120,000 sentences of transcribed child-directed speech Subjected to the Goteborg multiple choice ESL test 100 sentences, each with open slot Pick correct word out of three ADIOS got 60% of answers correctly An average ninth-grader performance

44 ADIOS

45 Meta-analysis of ADIOS results Define a pattern spectrum as the histogram of pattern types for an individual learner A pattern type is determined by its contents E.g. TT, TET, EE, PE… A single ADIOS learner was trained with each of 6 translations of the bible

46 Pattern spectra

47 Language dendogram

48 In case theres time…

49 Pattern significance Say we found a potential pattern-edge from nodes 1 to n. Define m - the number of paths from 1 to n r – the number of paths from 1 to n+1 Because its a pattern edge, we know that Lets suppose that the true probability for n+1 given 1 through n is r/m is our best estimate, but just an estimate What are the odds of getting r and m but still have ?

50 Pattern significance Assume The odds of getting result r and m or better are then given by If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

51 To be continued…


Download ppt "Grammar Induction With ADIOS (Automatic DIstillation Of Structure)"

Similar presentations


Ads by Google