Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Similar presentations


Presentation on theme: "1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:"— Presentation transcript:

1 1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

2 2 Feasibility Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? Clark (1994, in press), Niyogi and Berwick (1996), Lightfoot (1989) (degree-0), Sakas(2000), Tesar and Smolensky (1996) and many PAC results concerning induction of FSA’s Feasibility measure (Sakas and Fodor, 2001) Near linear increase of the expected number of sentences consumed before a learner converges on the target grammar.

3 3 Infeasible Feasible Feature of learning model to be tested as feasible or not feasible. This could be # of parameters, average sentence length, etc.

4 4 A Feasibility Case Study : A three parameter domain (Gibson and Wexler, 1994) Sentences are strings of the symbols: S, V, 01, 02, aux, adv SV / VS- subject precedes verb / verb precedes subject +V2 / -V2- verb or aux must be in the second position in the sentence VO / OV - verb precedes object / object precedes verb Allie will eat the birds  S aux V O

5 5 Surprisingly, G&W’s simple 3-parameter domain presents nontrivial obstacles to several learning strategies, but the space is ultimately learnable (for example by a "blind-guess-learner" – see next slide). Big question: How will the learning process scale up in terms of feasibility as the number of parameters increases? Two problems for most acquisition strategies: 1) Ambiguity 2) Size of the domain

6 6 "Size of the domain" feasibility result: A Blind-guess learner succeeds only after consuming a number of sentences exponentially correlated with the number of parameters. Blind-guess learner (with oracle, no parsing, no error-driven constraint): 1)consume a sentence from the linguistic environment. 2)randomly select a setting for each parameter == randomly pick a grammar out of the domain. 3)if the settings indicate the target grammar stop, otherwise goto 1. A simple math result shows that the average number of trials required to achieve success* (in this case picking the target grammar) is: 1 / Probability of success. So if we can figure out the probability of picking the target grammar at step 3, we can model how many inputs the learner will require on average. * given that each trial has a possible outcome of either success or failure. For those technically minded, this is a hypergeometric distribution.

7 7 If # Parameters= 30 then # Grammars = 2 30 = 1,073,741,824 So, on average, over a billion sentences would need to be consumed by the blind- guess learner before attaining the target. The search space is huge! Probability of Success == Probability of picking the target grammar at step 3, as is calculated as: Pr(Success) = 1 / Num of grammars = 1 / 2 num of parameters Applying the math result for number of trials on average: 1 / (1 / 2 num of parameters ) = 2 num of parameters

8 8 SV OV +V2 (German-like) S V S V O O V S S V O2 O1 O1 V S O2 O2 V S O1 S AUX V S AUX O V O AUX S V S AUX O2 O1 V O1 AUX S O2 V O2 AUX S O1 V ADV V S ADV V S O ADV V S O2 O1 ADV AUX S V ADV AUX S O V ADV AUX S O2 O1 V SV VO -V2 (English-like) S V S V O S V O1 O2 S AUX V S AUX V O S AUX V O1 O2 ADV S V ADV S V O ADV S V O1 O2 ADV S AUX V ADV S AUX V O ADV S AUX V O1 O2 Cross-language ambiguity Indicates a few ambiguous strings

9 9 P&P acquisition: How to obtain informative feasibility results studying linguistically interesting domains with cognitively plausible learning algorithms?

10 10 Answer: introduce some formal notions in order to abstract away from the specific linguistic content of the input data. Create an input space for a linguistically plausible domain. -- simulations. (Briscoe (2000), Elman (1990, 1991,1996), Yang (200)) So, how to answer questions of feasibility as the number of grammars (exponentially) scales up? But won't work for large domains.

11 11 A formal approach 1)formalize the learning process and input space 2) use the formalization in a probabilistic model to empirically test the learner across a wide range of learning scenarios Gives general data on the expected performance of acquisition algorithms. Can answer the question: Given learner L, if the input space exhibits characteristics x, y and z, is feasible learning possible?

12 12 The Triggering Learning Algorithm -TLA (Gibson and Wexler, 1994) Searches the (huge) grammar space using local heuristics repeat until convergence: receive a string s from L(G targ ) if it can be parsed by G curr, do nothing otherwise, pick a grammar that differs by one parameter value from the current grammar if this grammar parses the sentence, make it the current grammar, otherwise do nothing SVC Greediness Error-driven

13 13 TLA-minus (Sakas and Fodor, 2001) simplified TLA used for introducing a way of studying the effects of cross-language ambiguity through some simple formulas repeat until convergence: receive a string s from L(G targ ) if it can be parsed by G curr, do nothing otherwise, pick a grammar at random (not necessarily one parameter distant) if this grammar parses the sentence, make it the current grammar, otherwise do nothing

14 14 Some variables for TLA- analysis: n = number of parameters in the domain r = number of relevant parameters in the domain a = number parameters expressed ambiguously by an arbitrary input sentence An example of n and r : Suppose there are 5 parameters in the domain under study and the learner is being exposed to a language for which either setting of parameter 5 (p5) is "correct." In this case n = 5 and r = 4 and, there are 2 n = 2 5 = 32 grammars in the domain and, 2 n-r = 2 5-4 = 2 1 = 2 grammars that the learner can consider a successful convergence.

15 15 Some variables for TLA- analysis (con't): An example of a : Suppose a sentence, s, is consumed by a learner where a successful parse can be acquired ONLY in the case that p1 and p2 set to 0 and 1, respectively, but the settings for p3 and p4 don't matter (and of course, as we said previously, the setting of p5 doesn't matter for the entire target language). In this case a = 2. That is, 2 parameters are ambiguous - the settings of p3 and p4.

16 16 Some formulas for TLA- analysis: Recall: # grammars that make up the cluster of legit target grammars = 2 n-r # grammars that are ambiguous = 2 a Number of grammars that license an arbitrary sentence s : = 2 n-r * 2 a = 2 a+n-r ** Probability that an arbitrary sentence s from the target language can be parsed by an arbitrary grammar: ** = # grammars that can parse / # grammars in the domain = (2 n-r * 2 a ) / (2 n ) = 2 a-r

17 17 Some formulas for TLA- analysis (con't): Recall: Number of grammars that license an arbitrary sentence s = 2 a+n-r Probability that one of those grammars is in the 2 n-r target clusters: = # grammars that make up the cluster divided by # grammars that can parse the current input = 2 n-r / (2 a+n-r ) = 1 / 2 a

18 18 G curr can parse Retain G curr and get another sentence. G curr can't parse Pick G' G' can't parse Forget about G'. Keep G curr (this is Greediness) and get another sentence. G' can parse G' is one of the grammars in the target cluster G' = = G target and will be retained "in the limit." (This is due to the error- driven constraint) Retain G' and get another sentence. Sentence  A tree of choices and outcomes for the TLA- 2 a-r 1-2 a-r ≈ 2 a-r ≈ 1-2 a-r 1/2 a Blue is probabilityRed is actionBlack is outcome

19 19


Download ppt "1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:"

Similar presentations


Ads by Google