Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Learning Biases.

Slides:

Advertisements

Similar presentations

Intro to NLP - J. Eisner1 Learning in the Limit Golds Theorem.

Advertisements

Psych 156A/ Ling 150: Acquisition of Language II Lecture 5 Sounds of Words.

Chapter 1 What is listening?

Psych 56L/ Ling 51: Acquisition of Language Lecture 2 Introduction Continued.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 12 Poverty of the Stimulus I.

Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.

Figurative Language Development Research and Popular Children’s Literature: Why We Should Know, “Where the Wild Things Are” Kathleen Ahrens.

Speech perception 2 Perceptual organization of speech.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Poverty of the Stimulus II.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech I.

1 Language and kids Linguistics lecture #8 November 21, 2006.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 4 Sounds.

Statistical Frequency in Word Segmentation. Words don’t come with nice clean boundaries between them Where are the word boundaries?

Segmenting Nonsense Sanders, Newport & Neville (2002) Ricardo TaboneLIN 7912.

Sentence Memory: A Constructive Versus Interpretive Approach Bransford, J.D., Barclay, J.R., & Franks, J.J.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 8 Words in Fluent Speech.

PaPI 2005 (Barcelona, June) The perception of stress patterns by Spanish and Catalan infants Ferran Pons (University of British Columbia) Laura Bosch.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 6 Words in Fluent Speech II.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 11 Poverty of the Stimulus II.

Using Statistics in Research Psych 231: Research Methods in Psychology.

Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 12 Poverty of the Stimulus II.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 4 Words in Fluent Speech.

Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/

Psych 56L/ Ling 51: Acquisition of Language Lecture 13 Development of Syntax & Morphology III.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 14 Learning Language Structure.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 5 Words in Fluent Speech I.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 9 Structure.

Psych 156A/ Ling 150: Acquisition of Language II Lecture 11 Phrases.

Psych 56L/ Ling 51: Acquisition of Language Lecture 2 Introduction Continued.

Psych 56L/ Ling 51: Acquisition of Language Lecture 8 Phonological Development III.

Psych 56L/ Ling 51: Acquisition of Language Lecture 2 Introduction Continued.

Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,

Psych 156A/ Ling 150: Acquisition of Language II Lecture 16 Language Structure II.

The Psychology of the Person Chapter 2 Research Naomi Wagner, Ph.D Lecture Outlines Based on Burger, 8 th edition.

Infant Speech Perception & Language Processing. Languages of the World Similar and Different on many features Similarities –Arbitrary mapping of sound.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 5 Sounds III.

Statistical learning, cross- constraints, and the acquisition of speech categories: a computational approach. Joseph Toscano & Bob McMurray Psychology.

Psych156A/Ling150: Psychology of Language Learning Lecture 19 Learning Structure with Parameters.

Types of validity we will study for the Next Exam... internal validity -- causal interpretability external validity -- generalizability statistical conclusion.

A chicken-and-egg problem

Statistical Learning in Infants (and bigger folks)

Building a Lexicon Statistical learning & recognizing words.

Adele E. Goldberg. How argument structure constructions are learned.

Statistical Learning in Infants (and bigger folks)

Hypotheses tests for means

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 6 Sounds of Words I.

Experiment Basics: Variables Psych 231: Research Methods in Psychology.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 3 Sounds II.

Psych 156A/ Ling 150: Acquisition of Language II

What infants bring to language acquisition Limitations of Motherese & First steps in Word Learning.

Multiplication of Common Fractions © Math As A Second Language All Rights Reserved next #6 Taking the Fear out of Math 1 3 ×1 3 Applying.

Psych229: Language Acquisition Lecture 13 Learning Language Structure.

Psych156A/Ling150: Psychology of Language Learning Lecture 15 Poverty of the Stimulus II.

1. Central issues For both experimental and modeling field it is important to start from theory A common theory (and a common lexicon) will help building.

Learning Through Failure. Reflect O Take a few moments to write down your answers to the following questions: O What was your reaction to the video? O.

Method. Input to Learning Two groups of learners each learn one of two new Semi-Artificial Languages. Both Languages: Example sentences: glim lion bee.

Statistics (cont.) Psych 231: Research Methods in Psychology.

King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.

Chapter 3 Language Acquisition: A Linguistic Treatment Jang, HaYoung Biointelligence Laborotary Seoul National University.

Psych 156A/ Ling 150: Psychology of Language Learning Lecture 9 Words in Fluent Speech II.

Poverty of stimulus in the context of language Second Semester.

Psych156A/Ling150: Psychology of Language Learning

Psych 156A/ Ling 150: Psychology of Language Learning

Office of Education Improvement and Innovation

Psych156A/Ling150: Psychology of Language Learning

Inferential Statistics

Psych 231: Research Methods in Psychology

Psych 231: Research Methods in Psychology

Constructing a Test We now know what makes a good question:

Presentation transcript:

Psych 156A/ Ling 150: Acquisition of Language II Lecture 13 Learning Biases

Announcements Please pick up previous assignments if you haven’t already (HW1, HW2, Midterm) HW3 due 5/25/10 Review questions available for learning biases

Items Encountered Items in English Items not in English Summary from last time: Poverty of the Stimulus and Learning Strategies Poverty of the stimulus: Children will often be faced with multiple generalizations that are compatible with the language data they encounter. In order to learn their native language, they must choose the correct generalizations.

Summary from last time: Poverty of the Stimulus and Learning Strategies Claim of prior (innate) knowledge: Children only seem to make the right generalization. This suggests something biases them to make that generalization over other possible generalizations. Importantly, that something isn’t available in the data itself. It is knowledge they must already know to succeed at learning language. Items Encountered Items in English Items not in English

Summary from last time: Poverty of the Stimulus and Learning Strategies One Learning Bias: Experimental research on artificial languages (Gerken 2006) suggests that children prefer the more conservative generalization compatible with the data they encounter. data less general more general

Learning Biases “Innate capacities may take the form of biases or sensitivities toward particular types of information inherent in environmental events such as language, rather than a priori knowledge of grammar itself.” - Seidenberg (1997) Example: Children seem able to calculate transitional probabilities across syllables (Saffran, Aslin, & Newport 1996). Example: Adults seem able to calculate transitional probabilities across grammatical categories (Thompson & Newport 2007)

But is it always just statistical information of some kind? Gambell & Yang (2006) found that tracking transitional probabilities across syllables yields very poor word segmentation on realistic English data (though Pearl et al found that more sophisticated statistical learning did much better). Other learning strategies like the Unique Stress Constraint and algebraic learning did far better. These other learning strategies were not statistical in nature - they did not use probabilistic information available in the data.

Peña et al. 2002: Experimental Study Goal: examine the relation between statistical learning mechanisms and non-statistical learning mechanisms (like algebraic learning). Adult learners’ tasks on artificial language: (1) word segmentation (2) generalization about words in the language (somewhat similar to categorization)

Peña et al. 2002: Experimental Study Goal: examine the relation between statistical learning mechanisms and non-statistical learning mechanisms (like algebraic learning). Adult learners’ tasks on artificial language: (1) word segmentation (2) generalization about words in the language (somewhat similar to categorization)

Peña et al. 2002: Experimental Study Goal: examine the relation between statistical learning mechanisms and non-statistical learning mechanisms (like algebraic learning). Adult learners’ tasks on artificial language: (1) word segmentation (2) generalization about words in the language (somewhat similar to categorization) BE__GA is a type of word in this language

Peña et al. 2002: Experimental Study The artificial language: “AXC language” Syllables: A, X, C Generalization: A perfectly predicts C: A_C is a word in the language pu_ki, be_ga, ta_du Intervening syllable X: _ra_, _li_, _fo_ pu ra ki be li ga ta fo du pu fo ki ta li du be ra ga …

Peña et al. 2002: Experimental Study The artificial language: “AXC language” Note: transitional probability information is not informative. pu ra ki be li ga ta fo du pu fo ki ta li du be ra ga … TrProb = 1/3 =.333…

Peña et al. 2002: Experimental Study The artificial language: “AXC language” Note: transitional probability information is not informative. pu ra ki be li ga ta fo du pu fo ki ta li du be ra ga … TrProb =.5

Peña et al. 2002: Experimental Study The artificial language: “AXC language” Note: transitional probability information is not informative. pu ra ki be li ga ta fo du pu fo ki ta li du be ra ga … TrProb is actually higher at word boundaries…

Peña et al. 2002: Experimental Study The artificial language: “AXC language” Note: transitional probability information is not informative. Only non-adjacent syllables are informative about what words are in the language. pu ra ki be li ga ta fo du pu fo ki ta li du be ra ga … Non-adjacent syllable probability = 1

First Question: Good word segmentation? 10 minute familiarization period Can adults recognize words from part-words? Remember: transitional probability won’t help - it’ll bias them the wrong way. word: pu ra ki TrProb (pura) = 1/3, TrProb (raki) = 1/3, TrProb (puraki) = TrProb(pura)*TrProb(raki) = 1/3*1/3 = 1/9 part-word: ra ki be TrProb(raki) = 1/3, TrProb (kibe) = 1/2 TrProb (rakibe) = TrProb(raki)*TrProb(kibe) = 1/3*1/2 = 1/6 (higher than 1/9)

First Question: Good word segmentation? Adults prefer real words to part-words that they actually heard. This means they can unconsciously track the non-adjacent probabilities of the AXC language and identify the words.

Next Question: Good generalization about words? Do adults make the generalization about which words belong in the language (ex: that words of the language can take the form of pu_ki ) Compare:rakibevs. pubeki Word that follows the pattern, but is not a word heard during training Part-word heard during training

Next Question: Good generalization about words? Adults have no preference between part-words that they actually heard and real words that follow the generalization about words in the language, but which they didn’t actually hear. This means they can’t use the non-adjacent probabilities of the AXC language to identify properties of the words in general.

What’s going on?  X “We conjecture that this reflects the fact that the discovery of components of a stream and the discovery of structural regularities require different sorts of computations…the process of projecting generalizations…may not be statistical in nature.” - Peña et al. (2002)

Prediction for Different Types of Computation  X “…it is the type of signal being processed rather than the amount of familiarization that determines the type of computation in which participants will engage…changing a signal even slightly may induce a change in computation.” - Peña et al. (2002) Types of computation: statistical, algebraic

10 minute familiarization period with 25ms (subliminal) gaps after each word If word segmentation is already accomplished, subjects will be free to engage their algebraic computation. This should allow them to succeed at identifying the properties of words in the artificial language (e.g. pu_ki, be_ga, ta_du), since this kind of structural regularity is hypothesized to be found by algebraic computation. New Stimuli: Stimulating Algebraic Computation?

Question: Good generalization about words? Adults prefer real words that follow the generalization about words in the language, but which they didn’t actually hear, over part-words they did hear. This means they can use the non-adjacent probabilities of the AXC language to identify properties of the words in general. They make the structural generalization.

Prediction: Algebraic vs. Statistical Idea: Subjects are really using a different kind of computation (algebraic) because of the nature of the input. Specifically, the input is already subliminally segmented for them, so they don’t need to engage their statistical computation abilities to accomplish that. Instead, they are free to (unconsciously) notice more abstract properties via algebraic computation. Prediction 1: If the words are not segmented subliminally, statistical computation will be invoked. It doesn’t matter if subjects hear a lot more data. Their performance on preferring a real word they didn’t hear over a part-word they did hear will not improve.

Question: Good generalization about words? If given 30 minutes of training on unsegmented artificial language, do adults fail to make the generalization even though they have a lot more data? Compare:rakibevs. pubeki Word that follows the pattern, but is not a word heard during training Part-word heard during training a lot!

Question: Good generalization about words? If given 30 minutes of training on unsegmented artificial language, adults really prefer part-words that they actually heard over real words that follow the generalization about words in the language, but which they didn’t actually hear. They can’t make the generalization: prediction 1 seems true.

Prediction: Algebraic vs. Statistical Idea: Subjects are really using a different kind of computation (algebraic) because of the nature of the input. Specifically, the input is already subliminally segmented for them, so they don’t need to engage their statistical computation abilities to accomplish that. Instead, they are free to notice more abstract properties via algebraic computation. Prediction 2: If the words are segmented subliminally, algebraic computation will be invoked. It doesn’t matter if subjects hear a lot less data. They will still prefer a real word they didn’t hear over a part-word they did hear.

Question: Good generalization about words? If given 2 minutes of training on segmented artificial language, do adults make the generalization even though they have a lot less data? Compare:rakibevs. pubeki Word that follows the pattern, but is not a word heard during training Part-word heard during training

Question: Good generalization about words? If given 2 minutes of training on segmented artificial language, adults really prefer real words that follow the generalization about words in the language, but which they didn’t actually hear, over part-words that they actually heard. They still make the generalization: prediction 2 seems true.

Peña et al. (2002): Summary While humans may be able to compute powerful statistical relationships among the language data they’re exposed to, this may not be enough to capture all the linguistic knowledge humans come to possess. In particular, learning structural regularities (like structural properties of words) may require a non-statistical learning mechanism, perhaps algebraic computation. Different kinds of computation can be cued in learners based on the data at hand - this is a learning bias (to use a particular computation given particular data). Statistical computation was cued by the need to group and cluster items together. Algebraic computation was cued once items were already identified, and generalizations had to be made among the items.

What kind of things can statistical computation keep track of? Idea: “Learners might be able to compute certain types of statistical regularities, but not others.” - Newport & Aslin (2004) Important: AXC-syllable language (statistical regularity between 1st and 3rd syllable of the word, like what Peña et al used) does not naturally occur in real languages. What kind of non-adjacent regularities do real languages actually exhibit? Can humans reliably segment these kinds of languages using statistical computation?

Naturally occurring non-adjacent regularities Example of non-adjacent dependency: between individual segments (sounds) Semitic languages: words built from consonantal “stems”, where vowels are inserted to make different words Arabic: k-t-b = “write” kataba = “he wrote”yaktubu = “he writes” kitaab = “book”maktab = “office”

Non-adjacent segment regularities: consonants Newport & Aslin (2004): AXCXEX segment language p_g_t, d_k_bfiller vowels in IPA: a, i, Q, o, u, e (generalization about words) Subject exposure time to artificial language made up of these kinds of words: 20 minutes Result 1: Subjects were able to segment words based on non-adjacent segment regularities. This is similar to the result found in Peña et al on their artificial language.

Non-adjacent segment regularities: vowels Newport & Aslin (2004): XBXDXF segment language _a_u_e, _o_i_Qfiller consonants: p, g, t, d, k, b (generalization about words) Subject exposure time to artificial language made up of these kinds of words: 20 minutes Result 2: Subjects were again able to segment words based on non-adjacent segment regularities. So this again accords with the results found by Peña et al. (2002).

Newport & Aslin (2004): Summary When subjects are tested with artificial languages that reflect properties real languages have (such as statistical dependencies between non-adjacent individual sounds), they are still able to track statistical regularities. This suggests that statistical computation is likely to be something real people use to notice the statistical regularities (non-adjacent or otherwise) that real languages have. It is not just something that will only work for the regularities that have been created in a lab setting, such as those between non-adjacent syllables in artificial languages.

Questions? Be working on the review questions and HW3