From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Word-counts, visualizations and N-grams Eric Atwell, Language Research.
Advertisements

Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.
Psycholinguistic what is psycholinguistic? 1-pyscholinguistic is the study of the cognitive process of language acquisition and use. 2-The scope of psycholinguistic.
18 and 24-month-olds use syntactic knowledge of functional categories for determining meaning and reference Yarden Kedar Marianella Casasola Barbara Lust.
Psych 156A/ Ling 150: Acquisition of Language II Lecture 10 Grammatical Categories.
Chapter 4 Syntax.
计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.
Second Language Acquisition (SLA)
PSY 369: Psycholinguistics Language Acquisition: Learning words, syntax, and more.
1 Words and the Lexicon September 10th 2009 Lecture #3.
An interactive environment for creating and validating syntactic rules Panagiotis Bouros*, Aggeliki Fotopoulou, Nicholas Glaros Institute for Language.
Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/
Input-Output Relations in Syntactic Development Reflected in Large Corpora Anat Ninio The Hebrew University, Jerusalem The 2009 Biennial Meeting of SRCD,
Young Children Learn a Native English Anat Ninio The Hebrew University, Jerusalem 2010 Conference of Human Development, Fordham University, New York Background:
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Language Technologies Institute Student Research Symposium September 2005.
Automatic Measurement of Syntactic Development in Child Language Kenji Sagae Alon Lavie Brian MacWhinney Carnegie Mellon University.
Generative Grammar(Part ii)
Three Generative grammars
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Adaptor Grammars Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University 1.
Natural Language Processing Lecture 6 : Revision.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
SYNTAX Lecture -1 SMRITI SINGH.
Psycholinguistic Theory
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Adele E. Goldberg. How argument structure constructions are learned.
English-speaking children who are typically developing first acquire item-specific patterns (e.g. put it in) and their meanings as a whole, then develop.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
Hao Wang, Toben Mintz Department of Psychology University of Southern California.
PSY270 Michaela Porubanova. Language  a system of communication using sounds or symbols that enables us to express our feelings, thoughts, ideas, and.
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
Linguistic Development Thomas G. Bowers, Ph.D
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.
SYNTAX.
Levels of Linguistic Analysis
3 Phonology: Speech Sounds as a System No language has all the speech sounds possible in human languages; each language contains a selection of the possible.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Parsing & Language Acquisition: Parsing Child Language Data CSMC Natural Language Processing February 7, 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Language and Cognition Colombo, June 2011 Day 2 Introduction to Linguistic Theory, Part 3.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
A knowledge rich morph analyzer for Marathi derived forms Ashwini Vaidya IIIT Hyderabad.
Evaluating NLP Features for Automatic Prediction of Language Impairment Using Child Speech Transcripts Khairun-nisa Hassanali 1, Yang Liu 1 and Thamar.
1 The acquisition of reference in two-year-olds a cross-linguistic perspective Margot Rozendaal - University of Amsterdam ELA, 9 December 2005, Lyon.
Chapter 11 Language. Some Questions to Consider How do we understand individual words, and how are words combined to create sentences? How can we understand.
1 The acquisition of the morphosyntax and pragmatics of reference Evidence from the use of indefinite/definite determiners and pronouns in English Margot.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Language Identification and Part-of-Speech Tagging
PSYC 206 Lifespan Development Bilge Yagmurlu.
Grammar Grammar analysis.
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
What is linguistics?.
Words in puddles of sound
Natural Language Processing (NLP)
Consonant variegations in first words: Infants’ actual productions of
THE NATURE of LEARNER LANGUAGE
BBI 3212 ENGLISH SYNTAX AND MORPHOLOGY
Levels of Linguistic Analysis
Natural Language Processing (NLP)
Artificial Intelligence 2004 Speech & Natural Language Processing
The Nature of learner language
Natural Language Processing (NLP)
Presentation transcript:

From linear sequences to abstract structures: Distributional information in infant-direct speech Hao Wang & Toby Mintz Department of Psychology University of Southern California This research was supported in part by a grant from the National Science Foundation (BCS ). 1 Socal Workshop UCLA

Outline Introduction – Learning word categories (e.g., noun and verb) is a crucial part of language acquisition – The role of distributional information – Frequent frames (FFs) Analyses 1 & 2, structures of FFs in child- directed speech Conclusion and implication 2

Speakers’ Implicit Knowledge of Categories Upon hearing: I saw him slich. Hypothesizing: They slich. He sliches. Johny was sliching. 3 The truff was in the bag. He has two truffs. She wants a truff. Some of the truffs are here.

Distributional Information The contexts a word occurs – Words before and after the target word Example – the cat is on the mat – Affixes in rich morphology languages Cartwright & Brent, 1997; Chemla et al, 2009; Maratsos & Chalkley, 1980; Mintz, 2002, 2003; Redington et al,

Frequent frames (Mintz, 2003) Two words co-occurring frequently with one word intervening 5 FRAMEFREQ. you__it 433 you__to 265 you__the 257 what__you 234 to__it 220 want__to the__is79... Frame you_it Peter Corpus (Bloom, 1970) 433 tokens, 93 types, 100% verbs

Accuracy Results Averaged Over All Six Corpora (Mintz, 2003) 6

Structure of Natural Languages In contemporary linguistics, sentences are analyzed as hierarchical structures Word categories are defined by their structural positions in the hierarchical structure 7 But, FFs are defined over linear sequences How can they accurately capture abstract structural regularities?

Why FFs are so good at categorizing words? Is there anything special about the structures associated with FFs? FFs are manifestations of some hierarchically coherent and consistent patterns which largely constrained the possible word categories in the target position. 8

Analysis 1 Corpora – Same six child-directed speech corpora from CHILDES (MacWhinney, 2000) as in Mintz (2003) – Labeled with dependency structures (Sagae et al., 2007) – Speech to children before age of 2;6 Eve (Brown, 1973), Peter (Bloom, Hood, & Lightbown, 1974; Bloom, Lightbown, & Hood, 1975), Naomi (Sachs, 1983), Nina (Suppes, 1974), Anne (Theakston, Lieven, Pine, & Rowland, 2001), and Aran (Theakston, et al., 2001). 9

Grammatical relations A dependency structure consists of grammatical relations (GRs) between words in a sentence Similar to phrase structures, it’s a representation of structural information. Sagae et al.,

Consistency of structures of FFs Combination of GRs to represent structure – W1-W3, W1-W2, W2-W3, W1-W2-W3 Measures – For each FF, percentage of tokens accounted for by the most frequent 4 GR patterns Control – Most frequent 45 unigrams (FUs) – E.g., the__ Method W1 W2 W3 11

Results 12 * t(5)=26.97, p<.001

Frequent framesGR of W1*GR of W3* Token count what__you 2 OBJ2 SUBJ OBJ2 SUBJ 46 5 OBJ2 SUBJ 20 3 POBJ2 SUBJ 5 you__to 0 SUBJ2 INF SUBJ0 JCT SUBJ2 INF 1 0 SUBJ0 INF 1 what__that 0 PRED0 SUBJ PRED2 DET 14 3 OBJ2 DET 4 2 OBJ2 SUBJ 4 you__it 0 SUBJ0 OBJ SUBJ2 SUBJ 6 -2 OBJ0 OBJ 2 -2 OBJ2 SUBJ 1 *The word position and head position for GRs in this table are positions relative to the target word of a frame. W1’s word position is always -1, W3 is always 1. Top 4 W1-W3 GR patterns 13

Analysis 1 Summary Frequent frames in child-directed speech select very consistent structures, which help accurately categorizing words Analysis 2, internal organizations of frequent frames 14

Analysis 2 Same corpora as Analysis 1 GRs between words in a frame and words outside that frame (external links) and GRs between two words within a frame (internal links) For each FF type, the number of links per token was computed for each word position Internal links External links Not counted 15

Links from/to W1 16

Conclusion & implications Frequent frames, which are simple linear relations between words, achieve accurate categorization by selecting structurally consistent and coherent environments. The third word (W3) helps FFs to focus on informative structures This relation between a linear order pattern and internal structures of languages may be a cue for children to bootstrap into syntax 17

References – MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. Mahwah, NJ: Lawrence Erlbaum Associates. – Mintz, T. H. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90(1), – Sagae, K., Lavie, A., & MacWhinney, B. (2005). Automatic measurement of syntactic development in child language. ACL Proceedings. – Sagae, K., Davis, E., Lavie, A., MacWhinney, B. and Wintner, S. High- accuracy annotation and parsing of CHILDES transcripts. In, Proceedings of the ACL-2007 Workshop on Cognitive Aspects of Computational Language Acquisition. Thank you! 18

Pure frequent frames? 19

Ana. 2 mean token coverage EvePeterNinaNaomiAnneAran Frequent frames W1-W W1-W W2-W W1-W2-W BigramsW1-W

Ana. 2 FF external links Corpus Token count External links to W1to W2to W3from W1from W2from W3 Eve Peter Nina Naomi Anne Aran Average Table 3 Average number of links per token for frequent frames 21

FF internal links Corpus Token count Internal links W1->W2W1->W3W2->W1W2->W3W3->W1W3->W2 Eve Peter Nina Naomi Anne Aran Average

Ana. 2 FU links Corpus Token count External linksInternal links to W1to W2from W1from W2W1->W2W2->W1 Eve Peter Nina Naomi Anne Aran Average