Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Similar presentations


Presentation on theme: "Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:"— Presentation transcript:

1 Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

2 Meeting 1 (Overview): Today’s agenda: Why computationally model language learning? Linguistics, state space search and definitions Early (classic) computational approaches Gold - language can’t be learned theorem Angluin - Oh yes it can Artificial Neural Networks: an Introduction Tlearn software demonstration (if time)

3 Explicitness of the computational model can ground linguistic theories - "...it may be necessary to find out how language learning could work in order for the developmental data to tell us how is does work." (Pinker, 1979) Can natural language grammar be modeled by X? only if X is both descriptively adequate (predicts perceived linguistic phenomena) and explanatorily adequate (explains how the phenomena come to be) (Bertolo, MIT Encyclopedia of Cognitive Science) If a computational model demonstrates that some formally defined class of models cannot be learned, X had better fall outside of that class regardless of its descriptive adequacy.

4 Generative Linguistics phrase structure rule (PS) grammar - a formalism based on rewrite rules which are recursively applied to yield the structure of an utterance. transformational grammar - sentences have (at least) two phrase structures: an original or base-generated structure and the final or surface structure. A transformation is a mapping from one phrase structure to another. principles and parameters - all languages share the same principles with a finite number of sharply delineated differences or parameters. NON-generative linguistics. See Elman, Language as a dynamical system.

5 Syntax acquisition can be viewed as a state space search — nodes represent grammars including a start state and a target state. — arcs represent a possible change from one hypothesized grammar to another. G0G0 G3G3 G5G5 G2G2 G6G6 G5G5 G4G4 G targ

6 Gold’s grammar enumeration learner (1967) G1G1 G2G2 G3G3 G targ G0G0 s  L(G 0 ) s  L(G 1 )s  L(G 0 )s  L(G 3 ) s  L(G 1 )s  L(G 2 )s  L(G 3 )s  L(G targ ) s  L(G 2 ) where s is a function that returns the next sentence from the input sample being fed to the learner, and L(G i ) is the language generated by grammar G i. Two points: The learner is error-driven error-driven learners converge on the target in the limit

7 Learnability - Under what conditions is learning possible? Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? A class of grammars H) is learnable iff  a learner such that  G  H,   (fair) generable by G, the learner converges on G.

8 An early learnability result (Gold, 1967) Exposed to input strings of an arbitrary target language L targ = L(G targ ) where G targ  H, it is impossible to guarantee that a learner can converge on G targ if H is any class in the Chomsky hierarchy. Moreover, no learner is uniformly faster than one that executes simple error-driven enumeration of languages. H - The hypothesis space is the set of grammars that may be hypothesized by the learner

9 L(G i ) L(G m ) "Walked." "She walked." "She ate." "She eated." "Eated." L(G k ) L(G o ) "Walked she." The Overgeneralization Hazard

10 If H = L(G i ) L(G k ) L(G i ) An infinite language  an infinite set of included finite languages then H is unlearnable H  L reg  L reg is unlearnable L reg  L cf  L cs  L re  No class of languages in the chomsky hierarchy is learnable

11 Assume there exists a rival learner that converges earlier than the enumeration learner. The rival arrives at the target at time i, The enumerator at time j (i < j). At time j, the enumeration learner had to be conjecturing SOME grammar consistent with the input up to that point. If the target had happened to be that grammar, the enumerator would have been correct and the rival incorrect. Thus, for every language that the rival converges on faster than the enumerator, there is a language for which the reverse is true. Gold’s Enumeration Learner is as fast as any other learner

12 Corollary: Language just can't be learned ;-)

13 The class of human languages must intersect the Chomsky Hierarchy so that it does not coincide with any other class that properly includes any class in the hierarchy. L re L cs L cf L reg L human

14 Angluin’s Theorem (1980) A class of grammars H is learnable iff for every language L i = L(G i ), G i  H there exists a finite subset D such that no other language L(G), G  H includes D and is included in L i. if this language can be generated by a grammar in H, H is not learnable! D L(G i ) L(G)

15 Artificial Neural Networks: A brief introduction a) fully recurrent b) feedforward c) multi-component

16 bias node If these inputs are great enough, the unit fires. That is to say, a positive activation occurs here. How can we implement the AND function? Threshold node Input activations

17 First we must decide on representation: possible inputs: 1,0 possible outputs: 1,0 1 1 1 0 10 1 00 0 0 0 unit inputs unit output Boolean AND: How can we implement the AND function? We want an artificial neuron to implement this function.

18 1 1 1 net 1 1 1 0 10 1 00 0 0 -1 unit inputs unit output Oooops net = ∑activations arriving at threshold node

19 STEP activation function f(x) = 1 if x > 0 f(x) = 0 if x <= 0 0 0 0 net f (netΣ)

20 .75.5 1.25 f(net 9 ).777.3 1.6667 1.75 a8a8 a7a7 a9a9 w 79 w 89 = a 7 (w 79 ) = 1(.75) =.75 = a 8 (w 89 ) =.3(1.6667) =.5 net 9 = Σ j a j (w j9 ) =.3(1.6667) + 1(.75) = 1.25 = 1 / (1+ e (-net) ) =. 777.8 w 91.6216 a1a1


Download ppt "Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:"

Similar presentations


Ads by Google