A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,

Slides:

Advertisements

Similar presentations

Catia Cucchiarini Quantitative assessment of second language learners’ fluency in read and spontaneous speech Radboud University Nijmegen.

Advertisements

Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.

Nigerian English prosody Sociolinguistics: Varieties of English Class 8.

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.

Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.

Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.

EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.

CS 4705 Lecture 22 Intonation and Discourse What does prosody convey? In general, information about: –What the speaker is trying to convey Is this a.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Chapter three Phonology

Joint Prosody Prediction and Unit Selection for Concatenative Speech Synthesis Ivan Bulyko and Mari Ostendorf Electrical Engineering Department University.

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.

Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.

As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.

Intonation and Information Discourse and Dialogue CS359 October 16, 2001.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

Building a sentential model for automatic prosody evaluation Kyuchul Yoon School of English Language & Literature Yeungnam University Korea.

Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.

Acoustic Properties of Taiwanese High School Students ’ Stress in English Intonation Advisor: Dr. Raung-Fu Chung Student: Hong-Yao Chen.

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

A Fully Annotated Corpus of Russian Speech

National Taiwan University, Taiwan

Ways to generate computer speech Record a human speaking every sentence HAL will ever speak (not likely) Make a mathematical model of the human vocal.

© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.

Performance Comparison of Speaker and Emotion Recognition

Phonetics, part III: Suprasegmentals October 19, 2012.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Prominent in English Teaching for Taiwan EFL Learning 指導教授 : 鍾榮富高師大博士生范春銀.

11 How we organize the sounds of speech 12 How we use tone of voice 2009 년 1 학기 담당교수 : 홍우평 언어커뮤니케이션의 기 초.

Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.

Predicting Children’s Reading Ability using Evaluator-Informed Features Matthew Black, Joseph Tepperman, Sungbok Lee, and Shrikanth Narayanan Signal Analysis.

A Bayesian Network Classifier for Word-level Reading Assessment Joseph Tepperman 1, Matthew Black 1, Patti Price 2, Sungbok Lee 1, Abe Kazemzadeh 1, Matteo.

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

Investigating Pitch Accent Recognition in Non-native Speech

August 15, 2008, presented by Rio Akasaka

Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang

For Evaluating Dialog Error Conditions Based on Acoustic Information

Studying Intonation Julia Hirschberg CS /21/2018.

Meanings of Intonational Contours

Studying Intonation Julia Hirschberg CS /21/2018.

Automatic Fluency Assessment

Intonational and Its Meanings

Intonational and Its Meanings

The American School and ToBI

Detecting Prosody Improvement in Oral Rereading

Audio Books for Phonetics Research

Meanings of Intonational Contours

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Representing Intonational Variation

Representing Intonational Variation

Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Low Level Cues to Emotion

Automatic Prosodic Event Detection

Presentation transcript:

A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory, University of Southern California This work is supported by the National Science Foundation

Intonation in English is Predictable “…if you’re a mind-reader” – D. Bolinger We know what native speakers usually don’t do –e.g. put pitch accents on function words But what they can do is so open… –“I didn’t steal your red hat” (Joe stole it) –“I didn’t steal your red hat” (I ate it) –“I didn’t steal your red hat” (I stole your red shirt) How can we decide if a nonnative speaker’s “tune” sounds native? –…without limiting the sentence structure?

Past Approaches to This Task Compare f0 contour with a reference expert pronunciation –Doesn’t allow for variability Extract features from syllables, then classify –Ad-hoc –Requires syllable segmentation time f0 time f0 mean slope max min range duration

Our Proposed Method Train native intonation HMMs over processed f0 contours (also: ∆f0, ∆∆f0, Energy) –Normalized, interpolated, smoothed –Annotated with ToBI labels Decode a nonnative speaker’s contour –Define an intonation “grammar” for recognition –No text required Calculate standard confidence measures for the recognized accents/boundaries –Demonstrate correlation with overall pronunciation scores

SIL L*H* SIL L*H* SIL HMM Training H* L* Interpolation & Smoothing SIL f0 time H* L* SIL time f0 Centerpoints of pitch accents/boundaries Compensates for segmental effects on f0 realization Baum-Welch time f0 5 states for each model time f0

Intonation Grammars FSGs –AB: –HL: Bigram tone models –e.g. SIL L* H*H% L% % * % %H SIL H = high L = low * = accent % = boundary SIL = silence Two sets of models

Score Calculation For a single recognized tone “segment”: Where O is the speech observation in suprasegmental features, M t is the recognized tone model, and i takes values over all tone HMMs Then the overall utterance-level score over T tones is:

Corpora Native (training set) –The BURNC Professionally read AE radio speech ToBI transcripts for one speaker (1.2 hours) –The IViE corpus Designed to capture intonation variation in BE ToBI-like labels for read Southern BE (0.1 hours) Nonnative (test set) –The ISLE corpus Italian and German learners of BE (23 of each) 138 read sentences (3 x 46 speakers) No tone transcripts AE = American English BE = British English According to Bolinger, AE and BE differ not in tone shape but in frequency and context of use

Perceptual Evaluations 3 sentences for each nonnative speaker: –1: “I said ‘white,’ not ‘bait.’” –2: “Could I have chicken soup as a starter and then lamb chops?” –3: “This Summer I’d like to visit Rome for a few days.” Overall pronunciation scored on a 1 to 5 scale –Six Native English-speaking evaluators –Includes both prosodic and segment-level effects Mean inter-evaluator correlation All sentences0.657 Speaker-level0.798 Sentence Sentence Sentence Italian0.707 German0.238 Median of three sentence-level scores Some sentences had more obvious pronunciation mistakes Evaluators were self- consistent and used context Italian speakers were less proficient; German is more closely related to English

Results HL FSG Bigram: BE Bigram: AE Bigram: both AB FSG Bigram: BE Bigram: AE Bigram: both Theoretical FSG doesn’t apply to nonnatives Correlation between automatic scores and evaluator medians for both model sets, four grammars, and variable # of mixtures per state Too many mixtures = overtraining BE and AE can be used together for intonation grammar High/Low Models are necessary

Results, continued Automatic scores follow perceptual trends Mean inter-evaluator correlation Best automatic score correlation All sentences Speaker-level Sentence Sentence Sentence Italian German But not here: Automatic scores did not use context

In Conclusion correlation –Represents contribution of intonation to overall pronunciation scores –Considering all factors, inter-human agreement is –Comparable to SRI Eduspeak System Prosodic features derived from knowledge of text Now: combine this with segment-level features for robust overall pronunciation scores Can also potentially be used for: –Speaker ID –Pronunciation scoring of spontaneous speech