Investigating Pitch Accent Recognition in Non-native Speech

Slides:

Advertisements

Similar presentations

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.

Advertisements

The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.

Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.

Results: Word prominence detection models Each feature set increases accuracy over the 69% baseline accuracy. Word Prominence Detection using Robust yet.

Nigerian English prosody Sociolinguistics: Varieties of English Class 8.

Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.

Unsupervised and Semi-Supervised Learning of Tone and Pitch Accent Gina-Anne Levow University of Chicago June 6, 2006.

Context and Learning in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago May 18, 2007.

Prosody in Spoken Language Understanding Gina Anne Levow University of Chicago January 4, 2008 NLP Winter School 2008.

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.

Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.

Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.

Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.

Efficacy of Computer-based Phonetic Training on Students’ Boundary Tone Zhang Yan, Nanjing University.

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.

Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

National Taiwan University, Taiwan

Imposing native speakers’ prosody on non-native speakers’ utterances: Preliminary studies Kyuchul Yoon Spring 2006 NAELL The Division of English Kyungnam.

Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?

Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.

영어교육에 있어서의 영어억양의 역할 (The role of prosody in English education) Korea Nazarene University Kyuchul Yoon English Division Kyungnam University.

A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,

Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞（南京工业大学，外国语言文学学院，江苏南京211816） Introduction Vocabulary acquisition is of great.

Using Speech Recognition to Predict VoIP Quality

Language Identification and Part-of-Speech Tagging

CRF &SVM in Medication Extraction

Pick samples from task t

Conditional Random Fields for ASR

Joint Training for Pivot-based Neural Machine Translation

Studying Intonation Julia Hirschberg CS /21/2018.

Meanings of Intonational Contours

Studying Intonation Julia Hirschberg CS /21/2018.

Intonational and Its Meanings

Intonational and Its Meanings

The American School and ToBI

Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,

Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Voice source characterisation

Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg

Agustín Gravano & Julia Hirschberg {agus,

Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,

Recognizing Structure: Dialogue Acts and Segmentation

John H.L. Hansen & Taufiq Al Babba Hasan

Applied Linguistics Chapter Four: Corpus Linguistics

Machine Learning in Practice Lecture 27

Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang

University of Illinois System in HOO Text Correction Shared Task

Speaker Identification:

Low Level Cues to Emotion

Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue

The Nature of learner language

Automatic Prosodic Event Detection

Emre Yılmaz, Henk van den Heuvel and David A. van Leeuwen

Presentation transcript:

Investigating Pitch Accent Recognition in Non-native Speech Gina-Anne Levow August 4, 2009

Roadmap Motivation: Prosody Recognition in Non-native Speech Prosody and Language Learning Prosody Recognition in Non-native Speech LEAP Corpus Modeling Pitch Accent for Recognition Analysis of Pitch Accent in Learner Speech Pitch Accent Recognition: Within-group Cross-group Conclusion

Prosody and Language Learning Acquisition of prosody essential for language learners Contributes to semantic, pragmatic info as well as quality Less emphasized in instruction (class, CALL (Chun, 1998)) Challenging to characterize Often requires individual attention Computer-assisted Language Learning (CALL) Potential for flexible, individual feedback Many prior approaches emphasize scoring (Teixeia et al, 2000; Tepperman et al, 2008) Goal: Automatic prosodic labeling for targeted, individal feedback Current focus: English pitch accent

Automatic Prosodic Labeling of Non-native Speech Significant strides in prosody labeling Acoustic-only methods: > 80% Syllable-based, binary Native speakers, mostly broadcast news Challenges: Characterization, comparison of learner prosody Are pitch accents reliably produced? Can recognition reach competitive levels? Little prosodically labeled learner speech Can other sources be employed?

LEAP Corpus “Learning Prosody in a Foreign Language” (Milde & Gut, 2002): papers on DB, agreement, etc Focus on prosodically labeled English set Read speech: analogous to language lab ‘Extended’ EToBI tagset (Silverman, 1992) 14 pitch accent tags, 14 phrase/boundary tags Collapse to standard sets: Analysis: 4-way: High, Downstepped High, Low, Unacc. Classification: Binary: Accented/Unaccented

LEAP Corpus Range of speakers, L1s, experience 37 recordings: ~300 syllables each 26 speakers ID Description c1 Non-native, before prosody training c2 Non-native, after first prosody training c3 Non-native, after second prosody training e1 Non-native, before travel abroad e2 Non-native, after travel abroad sl “super-learner”, near-native na Native

Modeling Pitch Accent Pitch accent identity, realization depend on context Pitch is relative: To speaker range To neighboring accents, phrase range e.g. downstep Coarticulatory effects: Modeling improves recognition (e.g. Sun 2002) Approach based on Pitch Target Approximation Model Tone/pitch accent target exponentially approached Linear target: height, slope (Xu et al, 99)

Local Feature Extraction Base features: Pitch, Intensity max, mean, min, range (Praat, speaker normalized) Pitch at 5 points across voiced region Duration Initial, final in phrase Slope: Linear fit to last half of pitch contour

Context Features Local context: Extended features Difference features Adjacent points of preceding, following syllables Difference features Difference between Pitch max, mean, mid, slope Intensity max, mean Of preceding, following and current syllable

Analysis of Learner Pitch Accent Pitch height characterizes accent, but Key feature is contrast with neighbors Contrasts: Unaccented vs High accented syllables Early learners (e1, c1) and native Pitch height and pitch deltas w.r.t. previous

Contrasts Pitch delta: Pitch height: High significantly larger than unaccented All groups Differences significantly larger for native than early learner Pitch height: e1: No significant difference b/t High, unacc c1, na: Significant difference b/t High, unacc All speakers understand local contrast Some learners do not have reliable global control Potential for effective pitch accent recognition

Contrasts in Learner Prosody Pitch Delta Pitch Height

Pitch Accent Recognition in Non-native Speech I Classifier: Support Vector Machine Linear kernel, LibSVM (Cheng & Lin, 2001)

Pitch Accent Recognition in Non-native Speech II Cross-group training with native and near-native speakers

Conclusion Non-native pitch accent Even early learners exhibit key local contrasts Learners exhibit smaller contrasts than natives Some learners do not achieve reliable global control Non-native pitch accent recognition: Within-group training achieves competitive accuracies Cross-group training also effective No significant degradation for binary classification Potential effectiveness for CALL

Future Work Integrate non-native prosodic labeling in CALL setting Explore utility for tone languages Identify learner errors, relative to gold std. Employ resynthesis of learner’s speech for focused feedback Further explore effect of learner L1, for very early learners

Thanks LEAP Corpus (Ulrike Gut) LibSVM (C.-C. Cheng and C.-J. Lin) This work was supported by: NSF IIS #: 0414919

Contrasts in Learner Prosody Pitch Delta Pitch Height