Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014.

Slides:



Advertisements
Similar presentations
Teaching Pronunciation
Advertisements

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Plasticity, exemplars, and the perceptual equivalence of ‘defective’ and non-defective /r/ realisations Rachael-Anne Knight & Mark J. Jones.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Voice quality variation with fundamental frequency in English and Mandarin.
The perception of dialect Julia Fischer-Weppler HS Speaker Characteristics Venice International University
Speech perception 2 Perceptual organization of speech.
Frequency, Pitch, Tone and Length October 15, 2012 Thanks to Chilin Shih for making some of these lecture materials available.
Two Types of Listeners? Marie Nilsenov á (Tilburg University) 1. Background When you and I listen to the same utterance, we may not perceive the linguistic.
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Speech and speaker normalization (in vowel normalization)
Evidence of a Production Basis for Front/Back Vowel Harmony Jennifer Cole, Gary Dell, Alina Khasanova University of Illinois at Urbana-Champaign Is there.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
A Tale of Two Fricatives Consonantal Contrast in Heritage Speakers of Mandarin The 32 nd Penn Linguistics Colloquium 23 February 2008 Charles B. Chang,
Chapter three Phonology
Phonology Katie Burns Title III Resource Teacher.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Sebastián-Gallés, N. & Bosch, L. (2009) Developmental shift in the discrimination of vowel contrasts in bilingual infants: is the distributional account.
Background Infants and toddlers have detailed representations for their known vocabulary items Consonants (e.g., Swingley & Aslin, 2000; Fennel & Werker,
Experiments concerning boundary tone perception in German 3 rd Workshop of the SPP-1234 Potsdam, 7 th January 2009 Presentation of the Stuttgart Project.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Funded by NIH grant RO1 HD-4152 to J. Arnold NSF BCS and NSF BCS to Z. Griffin Why do speakers modulate acoustic prominence? Listener-oriented.
Nasal endings of Taiwan Mandarin: Production, perception, and linguistic change Student : Shu-Ping Huang ID No. : NA3C0004 Professor : Dr. Chung Chienjer.
Lecture 6 The Intonation Phonology Suprasegmental phonology Intonation
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
Words, Voices and Memories: the interaction of linguistic and indexical information in cross-language speech perception Steve Winters (in collaboration.
5aSC5. The Correlation between Perceiving and Producing English Obstruents across Korean Learners Kenneth de Jong & Yen-chen Hao Department of Linguistics.
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
1. Background Evidence of phonetic perception during the first year of life: from language-universal listeners to native listeners: Consonants and vowels:
SPEECH PERCEPTION DAY 16 – OCT 2, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Sh s Children with CIs produce ‘s’ with a lower spectral peak than their peers with NH, but both groups of children produce ‘sh’ similarly [1]. This effect.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
LATERALIZATION OF PHONOLOGY 2 DAY 23 – OCT 21, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
SPEECH PERCEPTION DAY 18 – OCT 9, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Frequency, Pitch, Tone and Length October 16, 2013 Thanks to Chilin Shih for making some of these lecture materials available.
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.
The long-term retention of fine- grained phonetic details: evidence from a second language voice identification training task Steve Winters CAA Presentation.
The New Normal: Goodness Judgments of Non-Invariant Speech Julia Drouin, Speech, Language and Hearing Sciences & Psychology, Dr.
A Study of Assisting Hearing-Impaired Students in Identifying Mandarin Tones by Using Modified Pitch Contours Adviser: Dr. Yeou - Jiunn Chen Presenter:
Acoustic Continua and Phonetic Categories Frequency - Tones.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Exemplar Theory, part 2 April 15, 2013.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Bridging the gap between L2 speech perception research and phonological theory Paola Escudero & Paul Boersma (March 2002) Presented by Paola Escudero.
Phonetics, part III: Suprasegmentals October 18, 2010.
Outline  I. Introduction  II. Reading fluency components  III. Experimental study  1) Method and participants  2) Testing materials  IV. Interpretation.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Speechreading Based on Tye-Murray (1998) pp
Usage-Based Phonology Anna Nordenskjöld Bergman. Usage-Based Phonology overall approach What is the overall approach taken by this theory? summarize How.
Effects of Musical Experience on Learning Lexical Tone Categories
Sentence Durations and Accentedness Judgments
The 157th Meeting of Acoustical Society of America in Portland, Oregon, May 21, pSW35. Confusion Direction Differences in Second Language Production.
6th International Conference on Language Variation in Europe
Text-To-Speech System for English
RESULTS AND DISCUSSION Fall Level High-rising Fall Level High-rising
Studying Intonation Julia Hirschberg CS /21/2018.
Abstraction versus exemplars
Representing Intonational Variation
Analyzing F0 and vowel formants of Persian based on long-term features
Within-speaker variability in long-term F0
Presentation transcript:

Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014

 Can exemplar theory account for native listeners’ perception of intonation in English statements and questions? 2

 Previous studies reveal significant variations in speech.  Peterson & Barney (1952): frequency of F1 (x-axis) vs. frequency of F2 (y-axis) for 10 vowels ( i, ɪ, ɛ, æ, ɑ ɔ, ʊ, u, ʌ, ɝ) produced by 76 speakers  How do listeners perceive speech sounds given the amount of variance? 3

 Johnson (1997) proposed an exemplar theory to account for listeners’ perception of speech.  According to this theory (Johnson, 1997; Pierrehumbert, 2001), listeners store in memory the fine phonetic details of the words (or exemplars) that they hear, including sounds that are associated with the speaker’s identity, gender, and language.  When listeners hear a new word, they categorize the word with the exemplars in memory that are most similar to the new word, overall. 4

 The objective of my project was to create an exemplar- based computational model that would learn to categorize English statements and questions based on how similar a sentence is with the previously encountered sentences, according to their intonation patterns.  If a similarity-based calculation model (Johnson, 1997) can accurately classify novel sentences at an acceptable rate on the basis of intonation alone, it can be expanded to account for the human perception of intonation more generally. 5

6

 Reads in audio-recorded samples of speech sounds (in.wav format), e.g. Ann teaches history.  Removes any silence or noise before and after the speech sound. 7

 This function analyzes the pitch contour of the input sentence for salient cues.  In English, the pitch of the voice tends to fall at the end of a statement but tends to rise at the end of an echo question (Wells, 2006). For example,  Statement: Mary has a little lamb.  Echo question: Mary has a little lamb? 8

 This step first fills the gaps within a pitch contour using interpolation (a mathematical method) in order to create a continuous curve.  It then locates the nuclear tone in the sentence, that is, the last fall or rise. 9

 In order to calculate how similar a new exemplar (i.e., sentence) is with other exemplars in ‘memory’, we used the following perceptual dimensions:  the speed of change in pitch value at the nuclear tone,  the direction of the change, and  the timing of the nuclear tone relative to its position in the sentence.  This step extracts these similarity measures from the new exemplars. E.g. for the statement, Ann teaches history.  Category = S, exemplar = e07a21S, speed = 537, direction = -1, time = 0.6, 10

 In calculating similarities, the model assigns different weights to the dimensions.  For example, the direction of the nuclear tone (whether it is a fall or rise) may serve as a better cue in identifying the sentence type than the timing of the nuclear tone. If that is the case, direction would be weighted more heavily than timing.  This step trains the model to learn the weight distribution of the dimensions that would yield the best accuracy rate in categorizing new sentences. 11

 This step tests how accurately the model can categorize statements and questions from a set of sentences that is different from the training set.  It uses the weighted sum of the dimensions to estimate to which category a new sentence belongs. (Johnson 1997:147) 12

 To evaluate how well the model generalizes, this step uses a k-fold cross-validation (Refaeilzadeh et al., 2009).  K refers to the number of folds used.  In a k-fold cross-validation, the training and test data are separate in a given run but they cross-over in successive runs such that each exemplar gets tested (once and only once) eventually. For example, a 3-fold cross-validation 13

 40 statements and 40 echo questions per speaker: 5 dialogues x 4 sentences x 2 repetitions  Speakers:  One male and one female (18 years old), native speakers of Canadian English  Recruited from the online LING 201 (Introduction to Linguistics) Research Participation System at the University of Calgary.  Received 1% credit towards their LING 201 course grades for completing the one-hour recording session. 14

 The stimuli were recorded in the sound booth in the Phonetics Lab at the University of Calgary.  Statements and questions of 5, 7, 9, 11, and 13 syllables long; 4 pairs of statements and questions for each length  E.g.  Ann teaches history. Ann teaches history?  Alice went horse riding with a friend. Alice went horse riding with a friend?  Morris wants to visit the old mansion on Monday. Morris wants to visit the old mansion on Monday? 15

 For testing, we used a 10-fold cross-validation.  There were 15 sentences that showed pitch halving or doubling so these sentences and their corresponding statements or questions were removed from the training and test data. The total number of sentences for each type reduced to 65.  All 65 questions had a rising intonation, but 5 of the 65 statements also had a rising intonation. 16

 With all the weight on the direction dimension, the 10-fold cross-validation method  correctly trained 95.69% % of the exemplars, and  correctly categorized statements (100%) and questions (75% - 100%). 17

 How well the model categorizes the sentences depends on the intonation patterns of the sentences as well as the generalized weights.  The model works well for this data set when 100% of the weight is on the direction dimension. The accuracy declines when a weight is added to another dimension.  Therefore, this model would need to be modified in order to be able to deal with uptalk, a terminal rising intonation (Ladd, 2006), in statements.  It is also predicted to fail to work for languages that do not mainly rely on the pitch direction, such as Mandarin. 18

 Mandarin is a tone language that uses lexical tones to differentiate meaning in words.  Some researchers (e.g. Yuan, Shih, & Kochanski, 2002) claim that Mandarin raises the pitch of the overall sentence to signal an echo question.  Can exemplar theory account for the perception of intonation in Mandarin sentences? 19

 Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In K. Johnson & J. W. Mullennix (Eds.), Talker variability in speech processing (pp ). San Diego: Academic Press.  Pierrehumbert, J. (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In J. L. Bybee, & P. J. Hopper (Eds.), Frequency and emergence of linguistic structure (pp ). Philadelphia: John Benjamins.  Ladd, D. R. (2008). Intonational phonology. Cambridge: Cambridge University Press. 20

 Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. In L. Liu & M. T. Zsu (Eds.), Encyclopedia of database systems (pp ). Springer Publishing Company Incorporated.  Wells, J. C. (2006). English intonation: An introduction. Cambridge: Cambridge University Press.  Yuan, J., Shih, C., & Kochanski, G. (2002). Comparison of declarative and interrogative intonation in Chinese. In B. Bel, & I. Marlien (Eds.), Proceedings of the Speech Prosody 2002 Conference (pp ). Aix-en-Provence: Laboratoire Parole et Langage. 21

 This research was funded by the University of Calgary Program for Undergraduate Research Experience (PURE), awarded to Una Chow in

 Thank you!  Comments? Questions? 23