Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.

Slides:



Advertisements
Similar presentations
Stop Consonant Voicing (and 2 nd language learning)
Advertisements

Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Speech Perception Dynamics of Speech
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Voice Onset Time (VOT) An Animated and Narrated Glossary of Terms used in Linguistics presents.
1 CS 551/651: Structure of Spoken Language Spectrogram Reading: Stops John-Paul Hosom Fall 2010.
Phonological rules LING 200 Spring 2006 Foreign accents and borrowed words Borrowed words –often pronounced according to phonological rules of borrowing.
The sound patterns of language
Basic Phonology of English
Nasal Stops.
The Sound Patterns of Language: Phonology
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
The Human Voice. I. Speech production 1. The vocal organs
Introduction to Linguistics
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Chapter 6 Features PHONOLOGY (Lane 335).
Research on teaching and learning pronunciation
Nasal Stops. Nasals Distinct vocal tract configuration Pharyngeal cavity Oral cavity (closed) Nasal cavity (open)
Chapter 2 Introduction to articulatory phonetics
The effectiveness of pronunciation teaching to Greek state school students Eleni Tsiartsioni Aristotle University of Thessaloniki
Chapter 3 Phonetics: Describing Sounds. Phonetics -study of speech sounds Sounds and symbols --use a system of written symbols --one sound represents.
Fricatives + VOT April 6, 2010 For Starters… A note on perceptual verbiage. Also note: I gave you the wrong CP data!
Fricatives + Voice Onset Time March 31, 2014 In the Year 2000 Today: we’ll wrap up fricatives… and then move on to stops. This Friday, there will be.
The sounds of language Phonetics Chapter 4.
English Pronunciation Practice A Practical Course for Students of English By Wang Guizhen Faculty of English Language & Culture Guangdong University of.
Phonetics and Phonology
An Introduction to Linguistics
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
LING 001 Introduction to Linguistics Fall 2010 Sound Structure I: Phonetics Acoustic phonetics Jan. 27.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Acoustic Cues to Laryngeal Contrasts in Hindi Susan Jackson and Stephen Winters University of Calgary Acoustics Week in Canada October 14,
Phonetics: Dimensions of Articulation October 13, 2010.
1 Linguistics week Phonetics 3. 2 Check table 6.2, p243.
Voice Onset Time + Voice Quality
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. The scope of the field of phonology; 2. The relevance of phonology.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
1 Linguistics week 6 Phonetics 4. 2 Parameters for describing consonants So far (this is not complete yet) we have – Airstream (usually the same for all.
Phonation, Part 2 LIN Some confusion… Some of you are getting confused betweenPHONATION And Voiced Sounds… These 2 concepts ARE NOT the same!
Performance Comparison of Speaker and Emotion Recognition
Stop Acoustics and Glides December 2, 2013 Where Do We Go From Here? The Final Exam has been scheduled! Wednesday, December 18 th 8-10 am (!) Kinesiology.
Stop Consonant Voicing (and 2nd language learning)
Stop + Approximant Acoustics
Ch4 – Features Features are partly acoustic partly articulatory aspects of sounds but they are used for phonology so sometimes they are created to distinguish.
Phonetics: consonants
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Stop/Plosives.
Today we are going to learn about: Speech sounds Anomotical production.
Chinese Learners’ Perception and Production of the vowels: /e/, /ei/, /o/, & /ou/ in English by Contrastive Analysis 研究生 : 張悅寧 報告人 : NA2C0003 傅學琳 WHO WHAT.
Fricatives + Voice Onset Time November 25, 2015 In the Year 2000 Today: we’ll wrap up fricatives… and then move on to stops. This Friday, there will.
Chapter 3 Stage & School Textbook
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
International Phonetic Alphabet (IPA)
The Human Voice. 1. The vocal organs
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Review of Catford.
Jessica McKee Speech, Language and Hearing Sciences
Speech Perception (acoustic cues)
Review for Test 2.
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
Changing phonation [p] followed by [a]
PHONETICS AND PHONOLOGY INTRODUCTION TO LINGUISTICS Lourna J. Baldera BSED- ENGLISH 1.
Presentation transcript:

Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization – Linguistic Significance ● Motivation for studying VOT ● Methodology for automatically analyzing VOT contrasts ● Evaluation Method ● Results ● Discussion

Project Description ● Automatically distinguish whether a voiceless stop consonant is pronounced with a native or accented pronunciation based on voice onset time characteristics. ● Use data from the Tball corpus: ESL children doing oral reading tasks. ● Evaluate different methods of accomplishing this. – State duration measurements – Explicit modeling of aspiration – Model probablility discrimination

What is VOT? ● Voice onset time is defined for stops – e.g. /p,b,t,d,k,g/ ● It is the inverval between the release of closure of an articulator (the transient “burst”) and the start of voicing. ● VOT has a continuum of values: – When the start of voicing precedes the release of closure for a stop, the VOT takes on a negative value. – When the release of closure and onset of voicing are coincident, VOT is zero. – When voicing comes after release of closure, VOT is positive.

Physical Realization of VOT ● Stop consonants are produced with a closure of the vocal tract at a specific point, the place of articulation ● During the closure, there is a build up of sub- laryngeal pressure. ● When the closure is released there is a transient burst of air, frication due to turbulence at the place of articulation, aspiration noise from turbulence at the glottis ● Voicing may occur before, during, or after the closure.

Linguistic Significance of VOT ● VOT distinguishes consonants with the same place of articulation (/p/ vs. /b/, /t/ vs. /d/, etc.) ● However, different languages use different VOT intervals in contrasts (e.g. “taco”, “pasta”). ● English voiceless stops: VOT= ms ● Spanish voiceless stops: VOT= near zero ● English voiced stops: VOT = near zero ● Spanish voiced stops: negative VOT (voicing before closure

Linguistic Significance Cont'd ● In English, voiceless stops are have a long VOT at the beginning of a word and before stressed vowels, so aspiration is a perceptual cue to word boundaries and stress ● Since the frication and aspiration during the VOT is due to build up of pressure from the lungs, it may correspond with emphasis.

Motivation for Studying VOT ● This study was motivated by a desire to determine if a phone was pronounced with a non-standard pronuniation ● Other reasons to study VOT – It is an important contrastive feature – It gives information about stess – It gives information about word segmentation – It may give information about emphasis

Methodology ● Baseline: use duration measurements from a forced alignment. ● Insert an /h/ symbol in the transcriptions with standard pronunciation, train accordingly and decode the test files to see if the /h/ phone is recognized. ● Cut out the phones of interest from the audio file, train separate models and a combined model, and evaluate the likelihood of the separate models w.r.t. the combined model.

Methodology (cont'd) ● The data was transcribed by ear with special symbols for non-standard pronunciations. – b/c the data for non standard pronunciatons was sparse, the symbol for dental /t/ was included as short VOT. ● Standard 3 state HMM models – 4 mixtures, T-state silence model – Different frame rates were tested – Bootstrap and flat start methods were tested

Evaluation Method ● The evaluation metric used was the error rate for both classes evaluated separately. – This was necessary because the there were much fewer instances of the non-standard pronunciations. ● When using thresholds, the point of equal error rate for both classes was used. – This was necessary b/c moving the threshold would tilt the error rate toward one class or the other.

Results ● Baseline method error rates: – p: 55%t:23%k:29% – p: 19%t:20% k:48% using duration of 3 rd HMM state ● With aspiration model: – ShortVOT/ LongVOT – p: 5% / 36% – t: 11% / 38% – k: 57% / 17% ● With probability comparision: – p: 36% / 4% – t: 0% / 5% – k: 0% / 6% – (trained on test data—over trained?)

Discussion ● Studies have noted that for VOT k>t>p – This could explain why the baseline gets poor results for p – and why the aspiration model predicts the short VOT class best for /p,t/ but predicts the long VOT class best for /k/ ● Roughly, each method increased in difficulty. ● T he results improved from the baseline, but the last approach (comparing probabilities) may have been over-trained. ● Comparing probabilities may be easier to extend to other pronunciation modeling tasks.

Discussion ● Increasing the frame rate didn't help much. – Don't use a 1ms frame rate Unless you want to test your patience. ● If an Inintial consonant has a short VOT, this does not necessarily imply non-standard accent. – Words like “today” and “together” have stress on the 2 nd syllable, so the VOT of the initial consonant is shorter for even for standard pronunciation.

Conclusion ● When classifying stop consonants based on VOT characteristics, different approaches work better on different stops – Measuring duration of stop state works reasonably well for /t,k/ b/c longer VOT than /p/. – Detecting insertion of an aspiration model during decoding works well for /p,t/ but not k, which has too many false positives. – Comparing phone probabilities worked well except for unaspirated /p/

Future Work ● Since VOT is a time/timing related phenomenon, it may help to explicitly model the state duration density in the HMMs. ● Other optimization criteria might be be better suited than maximum likelihood extimation to train models for this purpose