Presentation is loading. Please wait.

Presentation is loading. Please wait.

Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005.

Similar presentations


Presentation on theme: "Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005."— Presentation transcript:

1 Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005

2 Roadmap Motivating Context Data Collections & Processing Modeling Context for Tone and Pitch Accent Context in Recognition Conclusion

3 Challenges Tone and Pitch Accent Recognition –Key component of language understanding Lexical tone carries word meaning Pitch accent carries semantic, pragmatic, discourse meaning –Non-canonical form (Shen 90, Shih 00, Xu 01) Tonal coarticulation modifies surface realization –In extreme cases, fall becomes rise –Tone is relative To speaker range –High for male may be low for female To phrase range, other tones –E.g. downstep

4 Strategy Common model across languages, SVM classifier –Acoustic-prosodic model: no word label, POS, lexical stress info No explicit tone label sequence model –English, Mandarin Chinese (also Cantonese) Exploit contextual information –Features from adjacent syllables Height, shape: direct, relative –Compensate for phrase contour Analyze impact of –Context position, context encoding, context type –> 20% relative improvement over no context Preceding context greater enhancement than following

5 Data Collection & Processing English: (Ostendorf et al, 95) –Boston University Radio News Corpus, f2b –Manually ToBI annotated, aligned, syllabified –Pitch accent aligned to syllables Unaccented, High, Downstepped High, Low –(Sun 02, Ross & Ostendorf 95) Mandarin: –TDT2 Voice of America Mandarin Broadcast News –Automatically force aligned to anchor scripts (CUSonic) –High, Mid-rising, Low, High falling, Neutral

6 Local Feature Extraction Uniform representation for tone, pitch accent –Motivated by Pitch Target Approximation Model Tone/pitch accent target exponentially approached –Linear target: height, slope (Xu et al, 99) Scalar features: –Pitch, Intensity max, mean (Praat, speaker normalized) –Pitch at 5 points across voiced region –Duration –Initial, final in phrase Slope: –Linear fit to last half of pitch contour

7 Context Features Local context: –Extended features Pitch max, mean, adjacent points of preceding, following syllables –Difference features Difference between –Pitch max, mean, mid, slope –Intensity max, mean Of preceding, following and current syllable Phrasal context: –Compute collection average phrase slope –Compute scalar pitch values, adjusted for slope

8 Classification Experiments Classifier: Support Vector Machine –Linear kernel –Multiclass formulation (SVMlight, Joachims), LibSVM (Cheng & Lin 01) –4:1 training / test splits Experiments: Effects of –Context position: preceding, following, none, both –Context encoding: Extended/Difference –Context type: local, phrasal

9 Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend LR74%80.7% Extend L74%79.9% Extend R70.5%76.7% Diffs LR75.5%80.7% Diffs L76.5%79.5% Diffs R69%77.3% Both L76.5%79.7% Both R71.5%77.6% No context68.5%75.9%

10 Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74.0%80.7% Extend Pre74.0%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69.0%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

11 Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74%80.7% Extend Pre74%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

12 Results: Local Context ContextMandarin ToneEnglish Pitch Accent Full74.5%81.3% Extend PrePost74%80.7% Extend Pre74%79.9% Extend Post70.5%76.7% Diffs PrePost75.5%80.7% Diffs Pre76.5%79.5% Diffs Post69%77.3% Both Pre76.5%79.7% Both Post71.5%77.6% No context68.5%75.9%

13 Discussion: Local Context Any context information improves over none –Preceding context information consistently improves over none or following context information English: Generally more context features are better Mandarin: Following context can degrade –Little difference in encoding (Extend vs Diffs) Consistent with phonological analysis (Xu) that coarticulation is carryover, not anticipatory

14 Results & Discussion: Phrasal Context Phrase ContextMandarin ToneEnglish Pitch Accent Phrase75.5%81.3% No Phrase72%79.9% Phrase contour compensation enhances recognition Simple strategy Use of non-linear slope compensate may improve

15 Conclusion Employ common acoustic representation –Tone (Mandarin), pitch accent (English) Cantonese, recent experiments SVM classifiers - linear kernel: 76%, 81% Local context effects: –Up to > 20% relative reduction in error –Preceding context greatest contribution Carryover vs anticipatory Phrasal context effects: –Compensation for phrasal contour improves recognition

16 Current & Future Work Application of model to different languages –Cantonese, Dschang (Bantu family) Cantonese: ~65% acoustic only, 85% w/segmental Integration of additional contextual influence –Topic, turn, discourse structure –HMSVM, GHMM models http://people.cs.uchicago.edu/~levow/projects/tai –Supported by NSF Grant #: 0414919

17 Confusion Matrix (English) Recognized Tone Manually Labeled Tone Unaccented High LowD.S. High Unaccented 95% ( 888/934) 25% (110/440) 100% (12/12) 53.5% (61/114) High 4.6% (43/934) 73% (322/440) 0% 38.5% (44/114) Low 0% D.S. High0.3% (3/934) 2%( 8/440) 0%8% (9/114)

18 Confusion Matrix (English) Recognized Tone Manually Labeled Tone Unaccented High LowD.S. High Unaccented 95% 25% 100% 53.5% High 4.6% 73% 0% 38.5% Low 0% D.S. High 0.3% 2% 0% 8%

19 Confusion Matrix (Mandarin) Recognized Tone Manually Labeled Tone HighMid-Rising LowHigh-Falling | Neutral High 84% (38/45) 9% (5/56) 5% (1/20) 13% | 0% (9/68) | Mid-Rising 6.7% (3/45) 78.6% (44/56) 10% (2/20) 7% | 27.3% (5/68) | (3/11) Low 0%3.6% (2/56) 70% (14/20) 7% (5/68) | 27.3% High-Falling 7.4% (4/45) 3.6% (2/56) 10% (2/20) 70% (48/68) | 0% | Neutral 0%5.3% (3/56) 5% (1/20) 1.5% (1/68) | 45%

20 Confusion Matrix (Mandarin) Recognized Tone Manually Labeled Tone HighMid-Rising LowHigh-Falling | Neutral High 84% 9% 5% 13% | 0% | Mid-Rising 6.7% 78.6% 10% 7% | 27.3% | Low 0% 3.6% 70% 7% | 27.3% High-Falling 7.4% 3.6% 10% 70% | 0% | Neutral 0% 5.3% 5% 1.5% | 45%

21 Related Work Tonal coarticulation: –Xu & Sun,02; Xu 97;Shih & Kochanski 00 English pitch accent –X. Sun, 02; Hasegawa-Johnson et al, 04; Ross & Ostendorf 95 Lexical tone recognition –SVM recognition of Thai tone: Thubthong 01 –Context-dependent tone models Wang & Seneff 00, Zhou et al 04

22 Pitch Target Approximation Model Pitch target: –Linear model: –Exponentially approximated: –In practice, assume target well-approximated by mid-point (Sun, 02)


Download ppt "Context in Multilingual Tone and Pitch Accent Recognition Gina-Anne Levow University of Chicago September 7, 2005."

Similar presentations


Ads by Google