The trough effect: Can we predict tongue lowering from acoustic data alone? Yolanda Vazquez Alvarez.

Slides:



Advertisements
Similar presentations
A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) : Visual displays in practical auditory phonetics teaching. Introduction What.
Advertisements

A. Hatzis, P.D. Green, S. Howard (1) Optical Logo-Therapy (OLT) Introduction What is OLT ? OLT is a Computer Based Speech Training system (CBST)
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
A two dimensional kinematic mapping between speech acoustics and vocal tract configurations : WISP A.Hatzis, P.D.Green1 History of Vowel.
INTRODUCTION Speech sound disorders of unknown origin (SSDUO) are considered as an output disorder, characterized by incorrect articulation of mostly consonantal.
11/26/081 AUTOMATIC SOLAR ACTIVITY DETECTION BASED ON IMAGES FROM HSOS NAOC, HSOS YANG Xiao, LIN GangHua
Evidence of a Production Basis for Front/Back Vowel Harmony Jennifer Cole, Gary Dell, Alina Khasanova University of Illinois at Urbana-Champaign Is there.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
Coding Scheme in Gestures Analysis Liang Zhou Dr. Manolya Kwa.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
An ultrasound study of the trough effect in VhV sequences Natalia Zharkova Queen Margaret University College, Speech and Hearing Sciences
Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge
1 Basic statistics Week 10 Lecture 1. Thursday, May 20, 2004 ISYS3015 Analytic methods for IS professionals School of IT, University of Sydney 2 Meanings.
Language Comprehension Speech Perception Naming Deficits.
Relationships Among Variables
CHAPTER OVERVIEW Survey Research Developmental Research Correlational Research.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Coarticulation in apraxia of speech: an acoustic study of non-words Authors : Sandra P Whiteside and Rosemary A. Varley.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Gaussian process modelling
ENDA MOLLOY, ELECTRONIC ENG. FINAL PRESENTATION, 31/03/09. Automated Image Analysis Techniques for Screening of Mammography Images.
Interarticulator programming in VCV sequences: Effects of closure duration on lip and tongue coordination Anders Löfqvist Haskins Laboratories New Haven,
Phonetics and Phonology
Abstract Research Questions The present study compared articulatory patterns in production of dental stop [t] with conventional dentures to productions.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
7-Speech Recognition Speech Recognition Concepts
Graphite 2004 Statistical Synthesis of Facial Expressions for the Portrayal of Emotion Lisa Gralewski Bristol University United Kingdom
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Connected speech processes Coarticulation Suprasegmentals.
Radial Basis Function Networks:
B AD 6243: Applied Univariate Statistics Correlation Professor Laku Chidambaram Price College of Business University of Oklahoma.
Time state Athanassios Katsamanis, George Papandreou, Petros Maragos School of E.C.E., National Technical University of Athens, Athens 15773, Greece Audiovisual-to-articulatory.
Results Tone study: Accuracy and error rates (percentage lower than 10% is omitted) Consonant study: Accuracy and error rates 3aSCb5. The categorical nature.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
From subtle to gross variation: an Ultrasound Tongue Imaging study of Dutch and Scottish English /r/ James M Scobbie Koen Sebregts Jane Stuart-Smith.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.
Chapter Thirteen Copyright © 2006 John Wiley & Sons, Inc. Bivariate Correlation and Regression.
Tongue movement kinematics in speech: Task specific control of movement speed Anders Löfqvist Haskins Laboratories New Haven, CT.
Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Describing the Relation between Two Variables 4.
Lecture 1 Phonetics – the study of speech sounds
CHAPTER 1 EVERYTHING YOU EVER WANTED TO KNOW ABOUT STATISTCS.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.
Data-Driven Intonation Modeling Using a Neural Network and a Command Response Model Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Nobuaki.
The effect of speech timing on velopharyngeal function
XXV International Conference on Sports Rehabilitation and Traumatology
Copyright © American Speech-Language-Hearing Association
Statistical Data Analysis
Enhancing User identification during Reading by Applying Content-Based Text Analysis to Eye- Movement Patterns Akram Bayat Amir Hossein Bayat Marc.
Representing Intonational Variation
Statistical Data Analysis
A Japanese trilogy: Segment duration, articulatory kinematics, and interarticulator programming Anders Löfqvist Haskins Laboratories New Haven, CT.
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

The trough effect: Can we predict tongue lowering from acoustic data alone? Yolanda Vazquez Alvarez

Overview 1. Background on the ‘trough effect’ 2. Aim of this experiment 3. Experimental method & results 4. Acoustic-to-articulatory mapping 5. Conclusions

Background – The ‘trough effect’  The ‘trough’ effect occurs in symmetrical VCV sequences and has been described as: ‘A Momentary deactivation of the tongue movement during the consonant closure’ (Bell-Berti, F. & Harris, K.; Gay, T)

Background – Acoustic evidence  Lindblom et al. (2002) collected direct measures of the F2 trajectories from symmetrical VCV utterances (V=/i/)

Background – Ultrasound evidence  Used QMUC’s data from the trough experiment  3 Annotation points corresponding to 3 different tongue contours.  2 Measurements of tongue displacement (MTD) were carried out for these 3 different contours

Background – Ultrasound evidence  MTDs were significantly different from each other for /iCi/ sequences (ipi (t (9) = , p< 0.010), ibi (t (9) = , p< 0.010)

Background – Advantages & disadvantages of both techniques  Acoustics: - Good time resolution - Doesn’t require specialised equipment to acquire the data No visualisation of the tongue  Ultrasound: - Tongue contour visualization - Physical measurement Need for frame-by-frame analysis of the tongue recording

Aim of this experiment Given the advantages of acoustic measurements: How confident can we be that the acoustic measurement of the tongue lowering gives us a true representation of the trough effect?

Experimental method Subjects5 native speakers of English, various accents Datasymmetrical VCV sequences: C=/p/,/b/ & V=/i/ Repetitions2 reps, n=20 Ultrasound analysis 3 annotation points: V 1 mid, Cmid and V 2 sym 2 distance measurements: V 1 -C and C-V 2 Acoustic analysis 4 F2 annotation points: V 1 mid, V 1 offset and F2 onset, V 2 mid 2 F2 measurements: F2V 1 -C and F2C-V 2

Experiment - Results Pearson correlation of V 1 -C and F2V 1 -C was significant (r (18)=.496, r2= 0.25, p<.05), predicting 25% of tongue lowering variance. Using both F2 predictors showed an increase in the correlation coefficient for V 1 -C, predicting a 43% of tongue lowering variance. Pearson correlation of C-V 2 and F2C-V 2 was not significant. Correlation of F2 values and ultrasound data (V 1 -C)

Experiment – Results  3 possible reasons why we couldn’t predict the rise for C-V 2 : 1. Start of the tongue rise is in the closure so F2 can’t show information about its possible movement 2. The measuring point was mainly on the release for /p/ in the ultrasound data but we used V 2 mid because otherwise we wouldn’t have sufficient F2 data 3. Ultrasound time resolution may be too poor to capture the rising of the tongue at the appropriate moment

Acoustic-to-articulatory mapping  Korin Richmond et al. (2003) at CSTR, Edinburgh Univ., used a multilayer perceptron (MLP) neural network to estimate articulatory trajectories  The neural network was trained on articulatory data (EMA) and acoustic data where articulatory feature vectors (x,y) were normalised to lie in the range [0.1,0.9]

Acoustic-to-articulatory mapping  The MLP was applied to the acoustic data from the ultrasound experiment  Despite being trained on a different speaker, the trough phenomena could be observed in the MLP estimates for the y-coordinates of tongue body movement

Acoustic-to-articulatory mapping Annotation times from the ultrasound measurement points were used to compare the estimated tongue positions from the MLP A tongue lowering and rising was observed in the MLP plots but no significant statistical results were obtained MLP plot for /ibi/ V 2 sym v 1 mid Cmid MLP plot for /ipi/

Conclusions  Acoustic information (F2) may be missing for crucial articulatory movement. It is hard to map acoustic change into articulatory change  Current ultrasound time resolution can be too poor to provide information of rapid articulatory change  However, a combined approach can help improve both techniques

References  Bell-Berti, F. & Harris, K. (1974). More on the motor organization of speech gestures. Haskins Laboratories: Status Report on Speech Research SR-37/38,  Gay, T. (1975). Some electromyographic measures of coarticulation in VCV-utterances. Haskins Laboratories: Status Report on Speech Research SR-44,  Lindblom, B., Sussman, H., Modarressi, G. & Burlingame, E. (2002). The trough effect: Implications for motor programming, Phonetica, 59,  K. Richmond, S. King, and P. Taylor. (2003). Modelling the uncertainty in recovering articulation from acoustics. Computer Speech and Language, 17:

Acknowledgements Thanks go to SHS at QMUC in Edinburgh for the use of the ultrasound data from the trough experiment. Also, I would like to thank Korin Richmond at CSTR in Edinburgh for his interest and help with the processing of the acoustic data using the MLP neural network.