Investigating Pitch Accent Recognition in Non-native Speech Gina-Anne Levow August 4, 2009
Roadmap Motivation: Prosody Recognition in Non-native Speech Prosody and Language Learning Prosody Recognition in Non-native Speech LEAP Corpus Modeling Pitch Accent for Recognition Analysis of Pitch Accent in Learner Speech Pitch Accent Recognition: Within-group Cross-group Conclusion
Prosody and Language Learning Acquisition of prosody essential for language learners Contributes to semantic, pragmatic info as well as quality Less emphasized in instruction (class, CALL (Chun, 1998)) Challenging to characterize Often requires individual attention Computer-assisted Language Learning (CALL) Potential for flexible, individual feedback Many prior approaches emphasize scoring (Teixeia et al, 2000; Tepperman et al, 2008) Goal: Automatic prosodic labeling for targeted, individal feedback Current focus: English pitch accent
Automatic Prosodic Labeling of Non-native Speech Significant strides in prosody labeling Acoustic-only methods: > 80% Syllable-based, binary Native speakers, mostly broadcast news Challenges: Characterization, comparison of learner prosody Are pitch accents reliably produced? Can recognition reach competitive levels? Little prosodically labeled learner speech Can other sources be employed?
LEAP Corpus “Learning Prosody in a Foreign Language” (Milde & Gut, 2002): papers on DB, agreement, etc Focus on prosodically labeled English set Read speech: analogous to language lab ‘Extended’ EToBI tagset (Silverman, 1992) 14 pitch accent tags, 14 phrase/boundary tags Collapse to standard sets: Analysis: 4-way: High, Downstepped High, Low, Unacc. Classification: Binary: Accented/Unaccented
LEAP Corpus Range of speakers, L1s, experience 37 recordings: ~300 syllables each 26 speakers ID Description c1 Non-native, before prosody training c2 Non-native, after first prosody training c3 Non-native, after second prosody training e1 Non-native, before travel abroad e2 Non-native, after travel abroad sl “super-learner”, near-native na Native
Modeling Pitch Accent Pitch accent identity, realization depend on context Pitch is relative: To speaker range To neighboring accents, phrase range e.g. downstep Coarticulatory effects: Modeling improves recognition (e.g. Sun 2002) Approach based on Pitch Target Approximation Model Tone/pitch accent target exponentially approached Linear target: height, slope (Xu et al, 99)
Local Feature Extraction Base features: Pitch, Intensity max, mean, min, range (Praat, speaker normalized) Pitch at 5 points across voiced region Duration Initial, final in phrase Slope: Linear fit to last half of pitch contour
Context Features Local context: Extended features Difference features Adjacent points of preceding, following syllables Difference features Difference between Pitch max, mean, mid, slope Intensity max, mean Of preceding, following and current syllable
Analysis of Learner Pitch Accent Pitch height characterizes accent, but Key feature is contrast with neighbors Contrasts: Unaccented vs High accented syllables Early learners (e1, c1) and native Pitch height and pitch deltas w.r.t. previous
Contrasts Pitch delta: Pitch height: High significantly larger than unaccented All groups Differences significantly larger for native than early learner Pitch height: e1: No significant difference b/t High, unacc c1, na: Significant difference b/t High, unacc All speakers understand local contrast Some learners do not have reliable global control Potential for effective pitch accent recognition
Contrasts in Learner Prosody Pitch Delta Pitch Height
Pitch Accent Recognition in Non-native Speech I Classifier: Support Vector Machine Linear kernel, LibSVM (Cheng & Lin, 2001)
Pitch Accent Recognition in Non-native Speech II Cross-group training with native and near-native speakers
Conclusion Non-native pitch accent Even early learners exhibit key local contrasts Learners exhibit smaller contrasts than natives Some learners do not achieve reliable global control Non-native pitch accent recognition: Within-group training achieves competitive accuracies Cross-group training also effective No significant degradation for binary classification Potential effectiveness for CALL
Future Work Integrate non-native prosodic labeling in CALL setting Explore utility for tone languages Identify learner errors, relative to gold std. Employ resynthesis of learner’s speech for focused feedback Further explore effect of learner L1, for very early learners
Thanks LEAP Corpus (Ulrike Gut) LibSVM (C.-C. Cheng and C.-J. Lin) This work was supported by: NSF IIS #: 0414919
Contrasts in Learner Prosody Pitch Delta Pitch Height