Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Sub-Project I Prosody, Tones and Text-To-Speech Synthesis Sin-Horng Chen (PI), Chiu-yu Tseng (Co-PI), Yih-Ru Wang (Co-PI), Yuan-Fu Liao (Co-PI), Lin-shan.
Coarticulation Analysis of Dysarthric Speech Xiaochuan Niu, advised by Jan van Santen.
Prosody Modeling (in Speech) by Julia Hirschberg Presented by Elaine Chew QMUL: ELE021/ELED021/ELEM March 2012.
Motor Control Strategies for Chinese Intonation Greg Kochanski (University of Oxford, UK) Chilin Shih (University of Illinois, Urbana-Champaign) Tan Lee.
Outlines  Objectives  Study of Thai tones  Construction of contextual factors  Design of decision-tree structures  Design of context clustering.
Mandarin Chinese Speech Recognition. Mandarin Chinese Tonal language (inflection matters!) Tonal language (inflection matters!) 1 st tone – High, constant.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Analyzing Students’ Pronunciation and Improving Tonal Teaching Ropngrong Liao Marilyn Chakwin Defense.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Unsupervised and Semi-Supervised Learning of Tone and Pitch Accent Gina-Anne Levow University of Chicago June 6, 2006.
Tone, Accent and Stress February 14, 2014 Practicalities Production Exercise #2 is due at 5 pm today! For Monday after the break: Yoruba tone transcription.
Context and Learning in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago May 18, 2007.
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
Prosody in Spoken Language Understanding Gina Anne Levow University of Chicago January 4, 2008 NLP Winter School 2008.
Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.
Retrieving Actions in Group Contexts Tian Lan, Yang Wang, Greg Mori, Stephen Robinovitch Simon Fraser University Sept. 11, 2010.
Speech perception Relating features of hearing to the perception of speech.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
Identifying Local Corrections in Human-Computer Dialogue Gina-Anne Levow University of Chicago October 5, 2004.
Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.
SPOKEN LANGUAGE SYSTEMS MIT Computer Science and Artificial Intelligence Laboratory Mitchell Peabody, Chao Wang, and Stephanie Seneff June 19, 2004 Lexical.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
Anchoring effects in Spanish Pilar Prieto and Francisco Torreira (ICREA-UAB & ULB) 2004 TIE Workshop Santorini, September 11-13, 2004.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Varying Input Segmentation for Story Boundary Detection Julia Hirschberg GALE PI Meeting March 23, 2007.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Intonation September 18, 2014 The Plan for Today Also: I have posted a couple of readings on TOBI (an intonation transcription system) to the course.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
- Pronunciation - Intonation.  How many different tones does Mandarin have?  4  What effect do these different tones have on the language?  Each tone.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Managing Ambiguity, Gaps, and Errors in Spoken Language Processing Gina-Anne Levow May 14, 2009.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
1 Determining query types by analysing intonation.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
TOBI Basics April 13, 2010.
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Tone sandhi and tonal coarticulation in Fuzhou Min Yang Li 李杨 Phonetics Laboratory, DTAL University of Cambridge 1.
TOBI, continued January 29, 2008 The Outlook 1.Return course project reports. 2.New course schedule. 3.Today: Continue the discussion of English Intonation.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Yow-Bang Wang, Lin-Shan Lee INTERSPEECH 2010 Speaker: Hsiao-Tsung Hung.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Investigating Pitch Accent Recognition in Non-native Speech
Pick samples from task t
Recognizing Deformable Shapes
Tone in Sherpa (Sino-Tibetan) Joyce McDonough1, Rebecca Baier2 and
Studying Intonation Julia Hirschberg CS /21/2018.
Studying Intonation Julia Hirschberg CS /21/2018.
The American School and ToBI
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Ju Lin, Yanlu Xie, Yingming Gao, Jinsong Zhang
Low Level Cues to Emotion
Automatic Prosodic Event Detection
Presentation transcript:

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005

Roadmap Motivating Context Data Collections & Processing Modeling Context for Tone and Pitch Accent Context in Recognition Conclusion

Challenges Tone and Pitch Accent Recognition –Key component of language understanding Lexical tone carries word meaning Pitch accent carries semantic, pragmatic, discourse meaning –Non-canonical form (Shen 90, Shih 00, Xu 01) Tonal coarticulation modifies surface realization –In extreme cases, fall becomes rise –Tone is relative To speaker range –High for male may be low for female To phrase range, other tones –E.g. downstep

Strategy Common model across languages, SVM classifier –Acoustic-prosodic model: no word label, POS, lexical stress info No explicit tone label sequence model –English, Mandarin Chinese (also Cantonese) Exploit contextual information –Features from adjacent syllables Height, shape: direct, relative –Compensate for phrase contour Analyze impact of –Context position, context encoding, context type –> 20% relative improvement over no context Preceding context greater enhancement than following

Data Collection & Processing English: (Ostendorf et al, 95) –Boston University Radio News Corpus, f2b –Manually ToBI annotated, aligned, syllabified –Pitch accent aligned to syllables Unaccented, High, Downstepped High, Low –(Sun 02, Ross & Ostendorf 95) Mandarin: –TDT2 Voice of America Mandarin Broadcast News –Automatically force aligned to anchor scripts (CUSonic) –High, Mid-rising, Low, High falling, Neutral

Local Feature Extraction Uniform representation for tone, pitch accent –Motivated by Pitch Target Approximation Model Tone/pitch accent target exponentially approached –Linear target: height, slope (Xu et al, 99) Scalar features: –Pitch, Intensity max, mean (Praat, speaker normalized) –Pitch at 5 points across voiced region –Duration –Initial, final in phrase Slope: –Linear fit to last half of pitch contour

Context Features Local context: –Extended features Pitch max, mean, adjacent points of preceding, following syllables –Difference features Difference between –Pitch max, mean, mid, slope –Intensity max, mean Of preceding, following and current syllable Phrasal context: –Compute collection average phrase slope –Compute scalar pitch values, adjusted for slope

Classification Experiments Classifier: Support Vector Machine –Linear kernel –Multiclass formulation (SVMlight, Joachims), LibSVM (Cheng & Lin 01) –4:1 training / test splits Experiments: Effects of –Context position: preceding, following, none, both –Context encoding: Extended/Difference –Context type: local, phrasal

Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend LR74%80.7% Extend L74%79.9% Extend R70.5%76.7% Diffs LR75.5%80.7% Diffs L76.5%79.5% Diffs R69%77.3% Both L76.5%79.7% Both R71.5%77.6% No context68.5%75.9%

Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74.0%80.7% Extend Pre74.0%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69.0%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74%80.7% Extend Pre74%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74%80.7% Extend Pre74%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

Discussion: Local Context Any context information improves over none –Preceding context information consistently improves over none or following context information English: Generally more context features are better Mandarin: Following context can degrade –Little difference in encoding (Extend vs Diffs) Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory

Results & Discussion: Phrasal Context Phrase ContextMandarin ToneEnglish Pitch Accent Phrase75.5%81.3% No Phrase72%79.9% Phrase contour compensation enhances recognition Simple strategy Use of non-linear slope compensate may improve

Conclusion Employ common acoustic representation –Tone (Mandarin), pitch accent (English) Cantonese, recent experiments SVM classifiers - linear kernel: 76%, 81% Local context effects: –Up to > 20% relative reduction in error –Preceding context greatest contribution Carryover vs anticipatory Phrasal context effects: –Compensation for phrasal contour improves recognition

Current & Future Work Application of model to different languages –Cantonese, Dschang (Bantu family) Cantonese: ~65% acoustic only, 85% w/segmental Integration of additional contextual influence –Topic, turn, discourse structure –HMSVM, GHMM models –Supported by NSF Grant #:

Confusion Matrix (English) Recognized Tone Manually Labeled Tone Unaccented High LowD.S. High Unaccented 95% ( 888/934) 25% (110/440) 100% (12/12) 53.5% (61/114) High 4.6% (43/934) 73% (322/440) 0% 38.5% (44/114) Low 0% D.S. High0.3% (3/934) 2%( 8/440) 0%8% (9/114)

Confusion Matrix (English) Recognized Tone Manually Labeled Tone Unaccented High LowD.S. High Unaccented 95% 25% 100% 53.5% High 4.6% 73% 0% 38.5% Low 0% D.S. High 0.3% 2% 0% 8%

Confusion Matrix (Mandarin) Recognized Tone Manually Labeled Tone HighMid-Rising LowHigh-Falling | Neutral High 84% (38/45) 9% (5/56) 5% (1/20) 13% | 0% (9/68) | Mid-Rising 6.7% (3/45) 78.6% (44/56) 10% (2/20) 7% | 27.3% (5/68) | (3/11) Low 0%3.6% (2/56) 70% (14/20) 7% (5/68) | 27.3% High-Falling 7.4% (4/45) 3.6% (2/56) 10% (2/20) 70% (48/68) | 0% | Neutral 0%5.3% (3/56) 5% (1/20) 1.5% (1/68) | 45%

Confusion Matrix (Mandarin) Recognized Tone Manually Labeled Tone HighMid-Rising LowHigh-Falling | Neutral High 84% 9% 5% 13% | 0% | Mid-Rising 6.7% 78.6% 10% 7% | 27.3% | Low 0% 3.6% 70% 7% | 27.3% High-Falling 7.4% 3.6% 10% 70% | 0% | Neutral 0% 5.3% 5% 1.5% | 45%

Related Work Tonal coarticulation: –Xu & Sun,02; Xu 97;Shih & Kochanski 00 English pitch accent –X. Sun, 02; Hasegawa-Johnson et al, 04; Ross & Ostendorf 95 Lexical tone recognition –SVM recognition of Thai tone: Thubthong 01 –Context-dependent tone models Wang & Seneff 00, Zhou et al 04

Pitch Target Approximation Model Pitch target: –Linear model: –Exponentially approximated: –In practice, assume target well-approximated by mid-point (Sun, 02)