Presentation is loading. Please wait.

Presentation is loading. Please wait.

COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different.

Similar presentations


Presentation on theme: "COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different."— Presentation transcript:

1

2 COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different languages William J. Barry Bistra Andreeva Jacques Koreman

3 COST2102 International School - Development of Multimodal Interfacesslide 2 Basis for this presentation This talk presents the related results from three recent presentations: Koreman, J., Andreeva, B. & Barry, W.J. (2008). Accentuation cues in French and German, in: P.A. Barbosa, S. Madureira and C. Reis. Proc. Speech Prosody 2008, Campinas (Brazil), 613-616. Campinas, Brazil: Editora RG/CNPq. Koreman, J., Van Dommelen, W., Sikveland, R., Andreeva, B. & Barry, W.J. (in print). Cross-language differences in the production of phrasal prominence in Norwegian and German, Proc. Nordic Prosody 2008, Helsinki (Finland). Barry, William J. & Bistra Andreeva (2009). Cross-language and individual differences in the production and perception of syllabic prominence, Annual Meeting SPP 1234 Sprachlautliche Kompetenz 2009, Cologne (Germany).

4 COST2102 International School - Development of Multimodal Interfacesslide 3 Why present this here? Björn Granström: “Coherence between audio and video?”, e.g. between nodding and F0 in “Båten seglede forbi”. Kristiina Jokinen: “To what extent does non-verbal activity, esp. gestures and facial expressions, co-occur with verbal expressions?” (culture-dependence, communicative function) Are there cross-cultural (-language) differences in importance of acoustic and visual cues? (There are for prosodic dimensions.) Are they complementary? (Prosodic dimensions are.) What does that mean for synchrony detection? (Trouble?) This talk only deals with the acoustics of prominence. But because that involves several prosodic dimensions, the data analysis may also be relevant to multi-modal speech.

5 COST2102 International School - Development of Multimodal Interfacesslide 4 Outline The ideas about the acoustic realization of prominence that I present here are mainly Bill Barry’s and Bistra Andreeva’s. (This is an acknowledgement, not an attempt to evade responsibility.) from each of the three presentations Research questions Recordings Measurements Statistical analysis Results Discussion Conclusion and possible relevance to COST 2102

6 COST2102 International School - Development of Multimodal Interfacesslide 5 Research questions How do different languages exploit the universal means of signalling the varying prominence of words in an utterance? duration fundamental frequency energy spectral properties Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? duration (length opposition; word stress) fundamental frequency (tonal word-accent) spectral properties (phonologized vowel reduction)

7 COST2102 International School - Development of Multimodal Interfacesslide 6 Project The present work is part of a larger project funded by the German Research Council: Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited. The languages investigated in the projects are article 1article 2article 3 German English Norwegian Bulgarian Russian French Japanese

8 COST2102 International School - Development of Multimodal Interfacesslide 7 Recordings Six speakers from homogeneous groups in each language Comparable production task across languages: varying accentuation due to different focus on critical words (CWs) elicited by questions: broad narrow non-contrastive (early or late) narrow contrastive (early or late) Text replies to questions followed by “dada” version Norwegian sentences: 1. Hun Siv drar med skipet snart. 2. Han Karl tenker på fag nå. 3. Hans far brukte sagen da. 4. Min pasta blir kald til da. 6. Min stabsmann forblir bak nå. 7. Han Krister fikk skiftet mitt. German sentences: 1. Der Mann fuhr den Wagen vor. 2. Das Bild soll nicht hässlich sein. 3. Das Kind sollte im Bett sein. 4. Der Peter kann den Film gucken. 5. Das Mädchen soll ein Bild malen. 6. Mein Vater kann Türkisch lesen. Results given here, but checked with text versions B E L text dada

9 COST2102 International School - Development of Multimodal Interfacesslide 8 Measurements DurationDuration (ms) of stressed vowels, stressed syllables, CWs, feet F0 Mean F 0 (semitones) across stressed vowel of CW F 0 contour by comparison of stressed vowel in CW with preceding/following vowels IntensityMean intensity (dB) of stressed vowel in CW Spectral balance = difference between 70-1000 Hz band and 1200-5000 Hz band in stressed vowel of CW Normalized relative to mean across corresponding units in sentence Spectr. def.F1–F3 at middle of stressed nucleus of CW

10 COST2102 International School - Development of Multimodal Interfacesslide 9 Statistical analysis FR-GE (Speech Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (FR, GE) focus (accented, deaccented) number of syllables in CW (1,2) Multivariate Anova’s per language (FR, GE) Stepwise discriminant analyses: cue weighting for CW1 and CW2 separately for each language separately

11 COST2102 International School - Development of Multimodal Interfacesslide 10 Results: Manova’s Main effects for language Parameter CW1CW2 vowel dur. syllable dur. word dur. foot dur. * *** *** - ** ****** F 0 mean F 0 difference *** ---- intensity spect. bal. *** * *** F1 F2 F3 *** - * *** ** - Interactions lang.  accentuation ParameterCW1CW2 vowel dur. syllable dur. word dur. foot dur. ** * - *** - F 0 mean F 0 difference *** intensity spect. bal. ---- ---- F1 F2 F3 - ** - **-**-

12 COST2102 International School - Development of Multimodal Interfacesslide 11 CW1 syllable duration CW1 word duration 111 111 222222 111111222 222 GEFR Results for duration syllable duration word duration

13 COST2102 International School - Development of Multimodal Interfacesslide 12 CW2 word duration 111222 in final foot 111 222 111 22 1 111221 CW2 syllable duration GEFR Effects greater for French than for German Results for duration syllable duration word duration

14 COST2102 International School - Development of Multimodal Interfacesslide 13 CW1 LanguageParametersdc French mean F 0 syllable dur. vowel dur. intensity 0.709 0.665 -0.379 0.328 German intensity mean F 0 word duration spect. balance vowel dur. foot dur. 0.683 0.575 0.399 -0.209 0.171 0.158 LanguageParametersdc French mean F 0 intensity F 0 change vowel dur. word dur. 0.962 0.576 -0.419 0.279 0.164 German intensity vowel dur. mean F 0 syllable dur. spect. balance 0.932 0.671 0.515 -0.430 -0.345 CW2 Results: discriminant analyses

15 COST2102 International School - Development of Multimodal Interfacesslide 14 Duration effects accented-deaccented in anova greater for French than for German: exploitation in German constrained due to segmental vowel length opposition?? Spectral balance included as DA-predictor in German: reduction increases accented-deaccented opposition (but no interaction lg x accentuation in Anova’s). But importance of duration in French compared to German not so clear in DA, probably due to correlation between acoustic cues. DA therefore not very suitable for analyzing these data. Discussion

16 COST2102 International School - Development of Multimodal Interfacesslide 15 Statistical analysis NO-GE (Nordic Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (NO, GE) focus (broad, early narrow, late narrow) number of syllables in CW (1,2) Multivariate Anova’s per language (NO, GE)

17 COST2102 International School - Development of Multimodal Interfacesslide 16 Results Main effects for language Parameter CW1CW2 vowel dur. syllable dur. word dur. foot dur.  n.s.   n.s.    F 0 mean F 0 difference   intensity spect. balance   F1 F2 F3    Interactions lang.  accentuation ParameterCW1CW2 vowel dur. syllable dur. word dur. foot dur.    F 0 mean F 0 difference    intensity spect. balance  n.s.   F1 F2 F3 n.s.

18 COST2102 International School - Development of Multimodal Interfacesslide 17 Results: Manova’s per language F-values* for accentuation for N and G, for CW1 (left) and CW2 (right) Parameter NOGE vowel dur. syllable dur. word dur. foot dur. 184 318 164 27 6 92 72 5 F 0 mean F 0 difference 199 25 738 349 intensity spect. balance 112 9 444 20 F1 F2 F3 23 (2) (0) 70 3 4 ParameterNOGE vowel dur. syllable dur. word dur. foot dur. 294 450 245 121 (3) 28 44 10 F 0 mean F 0 difference 47 8 1052 325 intensity spect. balance 109 18 1053 107 F1 F2 F3 9 15 (1) 89 (2) (1) * F-value = ratio of treatment / residual variances; values in brackets n.s. at p=0.05 47 20 292 135 348 153 505 134 1 syll.

19 COST2102 International School - Development of Multimodal Interfacesslide 18 Results: Manova’s per language η 2 -values for accentuation (for both CWs, NO and GE) * *η 2 = ratio of treatment / total variancesη 2 in red > 0.5; η 2 in grey n.s. NOGE ParameterCW1CW2CW1CW2 Vowel duration.556.669.038.020 Syllable duration.684.756.390.168 Word duration.527.627.335.243 Foot duration.155.454.035.067 F0 mean.576.246.837.884 F0 difference.145.053.709.702 Intensity.433.428.756.884 Spectral balance.057.112.123.437 F1.134.058.331.392 F2.012.095.022.013 F3.003.007.026.004

20 COST2102 International School - Development of Multimodal Interfacesslide 19 Results η 2 -values are a ratio of treatment and total variance, and thus indicate the part of the total variance explained by the focus conditions. In Norwegian, durational cues (esp. syllable duration) distinguish the three conditions. In German, intensity and F0 are the strongest cues to distinguish the three conditions. The lack of importance of F0 in Norwegian is most likely an artefact of the different realizations of the lexical tone 1 for mono- and disyllabic stimuli.

21 COST2102 International School - Development of Multimodal Interfacesslide 20 Results for intensity vowel intensity Similar patterns for (normalized) intensity for German and Norwegian But greater differences between early, late and broad focus in German than in Norwegian In Norwegian late and broad focus intensity of CW2 less than that of CW1, but not in German GERMAN NORW. CW1 CW2 early late broad Focus

22 COST2102 International School - Development of Multimodal Interfacesslide 21 Results for duration critical word 1 syllable duration word duration GERMAN NORWEGIAN Greater (normalized) durational differen- ces between early, broad and late focus in Norwegian than in German Similar effect for CW2 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ early late broad Focus

23 COST2102 International School - Development of Multimodal Interfacesslide 22 Results: summary German strongly uses intensity to signal prominence Norwegian uses duration more → but Norwegian also has a vowel length opposition and is classified as the same rhythm type as German (stress-timed), so this disconfirms the hypothesis that the use of acoustic cues depends on their phonological status in a language! F0 does play a role (esp. for German), but our measures do not reflect the different accent types well. →There is a difference in peak alignment of early and late/broad focus between Norwegian and German

24 COST2102 International School - Development of Multimodal Interfacesslide 23 Discussion: duration in CW1 syllable duration word duration GERMAN NORWEGIAN syllable duration word duration 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ FRENCH early late broad Focus

25 COST2102 International School - Development of Multimodal Interfacesslide 24 Discussion: F0 in monosyllabic CW1 early Focus rel. peak alignment Despite two different pitch accents for German (H*) and Norwegian (L*H) – both of them realized as rising pitch movements – a lot of overlap in relative peak alignment between speakers from the two languages (as shown by statistical tests) 25,00 50,00 75,00 100,00 0,00 GERMANNORWEGIAN SP1SP2SP3SP4SP5SP6 SP1SP2SP3SP4SP5SP6

26 COST2102 International School - Development of Multimodal Interfacesslide 25 Discussion: F0 in mono-syllabic CW2 rel. peak alignment late broad Focus rel. peak alignment 0,00 50,00 100,00 150,00 GERMAN NORW. 0,00 50,00 100,00 150,00 200,00 GERMAN NORWEGIAN SP1SP2SP3SP4SP5SP6 SP1SP2SP3SP4SP5SP6 If German speakers differentiate broad from late focus, broad focus has earlier peak alignment in broad than late focus. For Norwegian, there is an opposite difference.

27 COST2102 International School - Development of Multimodal Interfacesslide 26 broad late early German SP1 German SP4

28 COST2102 International School - Development of Multimodal Interfacesslide 27 broad late early Norwegian SP4 Norwegian SP10 H FOC

29 COST2102 International School - Development of Multimodal Interfacesslide 28 Discussion: summary In French we found a fixed syllable duration (syllable- timing), while in German, syllable shortening enhances word isochrony (1 vs. 2 syllables) – but not in Norwegian, which is also classified as stress-timed. Is there a phonological explanation for this? Or should we conclude that the prosodic use of acoustic cues is independent of their phonological status in a language? Among our parameters, F0 needs special attention because of its mixed phonetic-phonological properties. Detailed analysis in Nordic Prosody paper.

30 COST2102 International School - Development of Multimodal Interfacesslide 29 Analysis 6 languages (SPP1234 data) Anova’s with languages as independent variables Dependent variable is mean change in values from broad to contrastive focus Mean change is expressed as a percentage (duration, F0) or in dB (intensity)

31 COST2102 International School - Development of Multimodal Interfacesslide 30 Results for syllable duration of [da] Languages use the acoustic carriers of prominence to different degrees (CS=Critical Syllable): NO > FR > RU ~ GE > EN ~ BU CS146%32%25%22%17%16% NO > FR > RU > GE ~ BU ~ EN CS253%38%26%17%17%14% Note: No apparent connection between vowel length opposition and use of duration for accentuation (in contrast to Rebecca Dauer‘s claim)

32 COST2102 International School - Development of Multimodal Interfacesslide 31 Results for F0 in text recordings Languages use the acoustic carriers of prominence to different degrees: FR > EN ~ GE > BU ~ NO > RU CS172%61%58%28%27%20% GE ~ FR > EN > BU > RU > NO CS264%62%51%38%31% 10% Note: Despite some shift in rank between FR, EN, GE and between NO and RU for the early (CS1) and the late position (CS2), the generally high vs. low dynamics for the groups remain (the ranking for [dada] is even more consistent)

33 COST2102 International School - Development of Multimodal Interfacesslide 32 Results for intensity in [dada] recordings Languages use the acoustic carriers of prominence to different degrees (intensities in dB): BU > FR ~ GE > RU ~ EN > NO CS15.83.23.02.72.51.6 BU > FR = GE > EN > RU > NO CS26.55.65.64.23.72.8 Note:Larger intensity differences for CS2 than CS1.

34 COST2102 International School - Development of Multimodal Interfacesslide 33 Conclusion and possible relevance For each acoustic parameter, there is a hierarchy of its exploitation for signalling focus-induced prominence in different languages. Similar differences may exist between languages/cultures in the way they exploit different gestures (face, hand, arm, etc.) and/or for the relative explotiation of acoustic/visual cues, e.g. to signal focus or other communicative functions. Possibly not only correlation (synchrony), but also complementarity of parameters.

35 COST2102 International School - Development of Multimodal Interfacesslide 34 Thank you for your attention


Download ppt "COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different."

Similar presentations


Ads by Google