Introduction to Computational Natural Language Learning Linguistics 79400 (Under: Topics in Natural Language Processing ) Computer Science 83000 (Under:

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.
Linguistic Theory Lecture 11 Explanation.
CS 461 – Nov. 9 Chomsky hierarchy of language classes –Review –Let’s find a language outside the TM world! –Hints: languages and TM are countable, but.
1 Lecture 32 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union,
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
CPSC 411, Fall 2008: Set 12 1 CPSC 411 Design and Analysis of Algorithms Set 12: Undecidability Prof. Jennifer Welch Fall 2008.
9.012 Brain and Cognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson.
Fall 2004COMP 3351 Recursively Enumerable and Recursive Languages.
Lecture 8 Recursively enumerable (r.e.) languages
1 Undecidability Andreas Klappenecker [based on slides by Prof. Welch]
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Tom Griffiths CogSci C131/Psych C123 Computational Models of Cognition.
Regular Languages Sequential Machine Theory Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 3 Comments, additions and modifications.
CS 330 Programming Languages 09 / 18 / 2007 Instructor: Michael Eckmann.
Topic: Theoretical Bases for Cognitive Method Objectives Trainees will be able to give reasons for the design and procedures of the Cognitive Method.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
1 Module 31 Closure Properties for CFL’s –Kleene Closure construction examples proof of correctness –Others covered less thoroughly in lecture union, concatenation.
Normal forms for Context-Free Grammars
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
1 Modeling Parameter Setting Performance in Domains with a Large Number of Parameters: A Hybrid Approach CUNY / SUNY / NYU Linguistics Mini-conference.
Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
1 Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under:
Formal Learning Theory Michele Friend (Philosophy) and Valentina Harizanov (Mathematics)
Fractal Composition of Meaning: Toward a Collage Theorem for Language Simon D. Levy Department of Computer Science Washington and Lee University Lexington,
Modelling Language Evolution Lecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
Evolution of Universal Grammar Pia Göser Universität Tübingen Seminar: Sprachevolution Dozent: Prof. Jäger
Modelling Language Evolution Lecture 5: Iterated Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
Computer Science and Engineering
CSE 3813 Introduction to Formal Languages and Automata Chapter 8 Properties of Context-free Languages These class notes are based on material from our.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Perceptrons; # of regions;
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
Learning DFA from corrections Leonor Becerra-Bonache, Cristina Bibire, Adrian Horia Dediu Research Group on Mathematical Linguistics, Rovira i Virgili.
MA/CSSE 474 Theory of Computation More Reduction Examples Non-SD Reductions.
Lecture 1 Computation and Languages CS311 Fall 2012.
Learning Automata and Grammars Peter Černo.  The problem of learning or inferring automata and grammars has been studied for decades and has connections.
Modelling Language Evolution Lecture 1: Introduction to Learning Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit.
1 Machine Learning: Lecture 8 Computational Learning Theory (Based on Chapter 7 of Mitchell T.., Machine Learning, 1997)
MA/CSSE 474 Theory of Computation Enumerability Reduction.
How Solvable Is Intelligence? A brief introduction to AI Dr. Richard Fox Department of Computer Science Northern Kentucky University.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Complexity and Computability Theory I Lecture #9 Instructor: Rina Zviel-Girshin Lea Epstein.
Introduction to Language Acquisition Theory Janet Dean Fodor St. Petersburg July 2013 Class 1. Language acquisition theory: Origins and issues.
Computational Learning Theory IntroductionIntroduction The PAC Learning FrameworkThe PAC Learning Framework Finite Hypothesis SpacesFinite Hypothesis Spaces.
CS Inductive Bias1 Inductive Bias: How to generalize on novel data.
1 Turing’s Thesis. 2 Turing’s thesis: Any computation carried out by mechanical means can be performed by a Turing Machine (1930)
Copyright © Curt Hill Proofs An Introduction.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Great Theoretical Ideas in Computer Science.
CS 203: Introduction to Formal Languages and Automata
Chap2. Language Acquisition: The Problem of Inductive Inference (2.1 ~ 2.2) Min Su Lee The Computational Nature of Language Learning and Evolution.
Chapter 8 Properties of Context-free Languages These class notes are based on material from our textbook, An Introduction to Formal Languages and Automata,
Discrete Structures ICS252 Chapter 5 Lecture 2. Languages and Grammars prepared By sabiha begum.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
C Sc 132 Computing Theory Professor Meiliu Lu Computer Science Department.
Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.
CS 9633 Machine Learning Concept Learning
Computational Learning Theory
William Gregory Sakas City University of New York (CUNY)
Formal Language.
Lecture 14 Learning Inductive inference
Presentation transcript:

Introduction to Computational Natural Language Learning Linguistics (Under: Topics in Natural Language Processing ) Computer Science (Under: Topics in Artificial Intelligence ) The Graduate School of the City University of New York Fall 2001 William Gregory Sakas Hunter College, Department of Computer Science Graduate Center, PhD Programs in Computer Science and Linguistics The City University of New York

Meeting 1 (Overview): Today’s agenda: Why computationally model language learning? Linguistics, state space search and definitions Early (classic) computational approaches Gold - language can’t be learned theorem Angluin - Oh yes it can Artificial Neural Networks: an Introduction Tlearn software demonstration (if time)

Explicitness of the computational model can ground linguistic theories - "...it may be necessary to find out how language learning could work in order for the developmental data to tell us how is does work." (Pinker, 1979) Can natural language grammar be modeled by X? only if X is both descriptively adequate (predicts perceived linguistic phenomena) and explanatorily adequate (explains how the phenomena come to be) (Bertolo, MIT Encyclopedia of Cognitive Science) If a computational model demonstrates that some formally defined class of models cannot be learned, X had better fall outside of that class regardless of its descriptive adequacy.

Generative Linguistics phrase structure rule (PS) grammar - a formalism based on rewrite rules which are recursively applied to yield the structure of an utterance. transformational grammar - sentences have (at least) two phrase structures: an original or base-generated structure and the final or surface structure. A transformation is a mapping from one phrase structure to another. principles and parameters - all languages share the same principles with a finite number of sharply delineated differences or parameters. NON-generative linguistics. See Elman, Language as a dynamical system.

Syntax acquisition can be viewed as a state space search — nodes represent grammars including a start state and a target state. — arcs represent a possible change from one hypothesized grammar to another. G0G0 G3G3 G5G5 G2G2 G6G6 G5G5 G4G4 G targ

Gold’s grammar enumeration learner (1967) G1G1 G2G2 G3G3 G targ G0G0 s  L(G 0 ) s  L(G 1 )s  L(G 0 )s  L(G 3 ) s  L(G 1 )s  L(G 2 )s  L(G 3 )s  L(G targ ) s  L(G 2 ) where s is a function that returns the next sentence from the input sample being fed to the learner, and L(G i ) is the language generated by grammar G i. Two points: The learner is error-driven error-driven learners converge on the target in the limit

Learnability - Under what conditions is learning possible? Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable amount of work? A class of grammars H) is learnable iff  a learner such that  G  H,   (fair) generable by G, the learner converges on G.

An early learnability result (Gold, 1967) Exposed to input strings of an arbitrary target language L targ = L(G targ ) where G targ  H, it is impossible to guarantee that a learner can converge on G targ if H is any class in the Chomsky hierarchy. Moreover, no learner is uniformly faster than one that executes simple error-driven enumeration of languages. H - The hypothesis space is the set of grammars that may be hypothesized by the learner

L(G i ) L(G m ) "Walked." "She walked." "She ate." "She eated." "Eated." L(G k ) L(G o ) "Walked she." The Overgeneralization Hazard

If H = L(G i ) L(G k ) L(G i ) An infinite language  an infinite set of included finite languages then H is unlearnable H  L reg  L reg is unlearnable L reg  L cf  L cs  L re  No class of languages in the chomsky hierarchy is learnable

Assume there exists a rival learner that converges earlier than the enumeration learner. The rival arrives at the target at time i, The enumerator at time j (i < j). At time j, the enumeration learner had to be conjecturing SOME grammar consistent with the input up to that point. If the target had happened to be that grammar, the enumerator would have been correct and the rival incorrect. Thus, for every language that the rival converges on faster than the enumerator, there is a language for which the reverse is true. Gold’s Enumeration Learner is as fast as any other learner

Corollary: Language just can't be learned ;-)

The class of human languages must intersect the Chomsky Hierarchy so that it does not coincide with any other class that properly includes any class in the hierarchy. L re L cs L cf L reg L human

Angluin’s Theorem (1980) A class of grammars H is learnable iff for every language L i = L(G i ), G i  H there exists a finite subset D such that no other language L(G), G  H includes D and is included in L i. if this language can be generated by a grammar in H, H is not learnable! D L(G i ) L(G)

Artificial Neural Networks: A brief introduction a) fully recurrent b) feedforward c) multi-component

bias node If these inputs are great enough, the unit fires. That is to say, a positive activation occurs here. How can we implement the AND function? Threshold node Input activations

First we must decide on representation: possible inputs: 1,0 possible outputs: 1, unit inputs unit output Boolean AND: How can we implement the AND function? We want an artificial neuron to implement this function.

1 1 1 net unit inputs unit output Oooops net = ∑activations arriving at threshold node

STEP activation function f(x) = 1 if x > 0 f(x) = 0 if x <= net f (netΣ)

f(net 9 ) a8a8 a7a7 a9a9 w 79 w 89 = a 7 (w 79 ) = 1(.75) =.75 = a 8 (w 89 ) =.3(1.6667) =.5 net 9 = Σ j a j (w j9 ) =.3(1.6667) + 1(.75) = 1.25 = 1 / (1+ e (-net) ) = w a1a1