COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different.

Slides:



Advertisements
Similar presentations
Non-normative preaspiration of voiceless fricatives in Scottish English a comparison with Swedish preaspiration Olga Gordeeva and James M.Scobbie, Queen.
Advertisements

Phonetics as a scientific study of speech
How does first language influence second language rhythm? Laurence White and Sven Mattys Experimental Psychology Bristol University.
Tone perception and production by Cantonese-speaking and English- speaking L2 learners of Mandarin Chinese Yen-Chen Hao Indiana University.
The Role of F0 in the Perceived Accentedness of L2 Speech Mary Grantham O’Brien Stephen Winters GLAC-15, Banff, Alberta May 1, 2009.
Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Acoustic Characteristics of Vowels
American English Speech Patterns
Phonetic variability of the Greek rhotic sound Mary Baltazani University of Ioannina, Greece  Rhotics exhibit considerable phonetic variety cross-linguistically.
Perceptual Organization in Intonational Phonology: A Test of Parallelism J. Devin McAuley 1 & Laura C. Dilley 2 Department of Psychology Bowling Green.
Infant sensitivity to distributional information can affect phonetic discrimination Jessica Maye, Janet F. Werker, LouAnn Gerken A brief article from Cognition.
Speech Science XII Speech Perception (acoustic cues) Version
Suprasegmentals The term suprasegmental refers to those properties of an utterance which aren't properties of any single segment. The following are usually.
Syllables and Stress, part II October 22, 2012 Potentialities There are homeworks to hand back! Production Exercise #2 is due at 5 pm today! First off:
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Nigerian English prosody Sociolinguistics: Varieties of English Class 8.
EP and BP Rhythm: Acoustic and Perceptual Evidence Sónia Frota Universidade de Lisboa Marina Vigário, Fernando Martins.
A cross-linguistic comparison of the coordination between hand gestures and phonological prominence Giorgos Tserdanelis.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Voice source characterisation Gerrit Bloothooft UiL-OTS Utrecht University.
Development of coarticulatory patterns in spontaneous speech Melinda Fricke Keith Johnson University of California, Berkeley.
Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Primary Stress and Intelligibility: Research to Motivate the Teaching of Suprasegmentals By Laura D. Hahn Afra MA Carolyn MA Josh MA
Chapter three Phonology
STUDY OF ENGLISH STRESS AND INTONATION
The role of word edge tones in Catalan and Spanish Eva Estebas-Vilaplana & Pilar Prieto UNED & ICREA/UAB & PAPI.
Conclusions  Constriction Type does influence AV speech perception when it is visibly distinct Constriction is more effective than Articulator in this.
Segment Duration and Vowel Quality in German Lexical Stress Perception Klaus J. Kohler University of Kiel, Germany Paper presented at Speech Prosody 2012.
DFG Project BA 737/10: "Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited."
Perceived prominence and nuclear accent shape Rachael-Anne Knight LAGB 5 th September 2003.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Word order and tonal shape in the production of focus in short Finnish utterances Martti Vainio 1, Juhani Järvikivi 2 and Stefan Werner 3 1 University.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Turn-taking Discourse and Dialogue CS 359 November 6, 2001.
SEPARATION OF CO-OCCURRING SYLLABLES: SEQUENTIAL AND SIMULTANEOUS GROUPING or CAN SCHEMATA OVERRULE PRIMITIVE GROUPING CUES IN SPEECH PERCEPTION? William.
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
1 Separable Processing of Consonants and Vowels Alfonso Caramazza, Doriana Chialant, Rita Capasso & Gabriele Miceli (Jan. 2000) Nature. Vol 403:
Speech Perception.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Phonetic features in ASR Kurzvortrag Institut für Kommunikationsforschung und Phonetik Bonn 17. Juni 1999 Jacques Koreman Institute of Phonetics University.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
Language and Speech, 2000, 43 (2), THE BEHAVIOUR OF H* AND L* UNDER VARIATIONS IN PITCH RANGE IN DUTCH RISING CONTOURS Carlos Gussenhoven and Toni.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Control of prosodic features under perturbation in collaboration with Frank Guenther Dept. of Cognitive and Neural Systems, BU Carrie Niziolek [carrien]
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Suprasegmental features and Prosody Lect 6A&B LING1005/6105.
Sentence Durations and Accentedness Judgments
Suprasegmental features and Prosody
Investigating Pitch Accent Recognition in Non-native Speech
Tone in Sherpa (Sino-Tibetan) Joyce McDonough1, Rebecca Baier2 and
Studying Intonation Julia Hirschberg CS /21/2018.
Speech Perception.
Voice source characterisation
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
Speech Perception (acoustic cues)
Analyzing F0 and vowel formants of Persian based on long-term features
Presentation transcript:

COST2102 International School - Development of Multimodal Interfacesslide 1 Analyzing complementary acoustic cues for signalling prominence in different languages William J. Barry Bistra Andreeva Jacques Koreman

COST2102 International School - Development of Multimodal Interfacesslide 2 Basis for this presentation This talk presents the related results from three recent presentations: Koreman, J., Andreeva, B. & Barry, W.J. (2008). Accentuation cues in French and German, in: P.A. Barbosa, S. Madureira and C. Reis. Proc. Speech Prosody 2008, Campinas (Brazil), Campinas, Brazil: Editora RG/CNPq. Koreman, J., Van Dommelen, W., Sikveland, R., Andreeva, B. & Barry, W.J. (in print). Cross-language differences in the production of phrasal prominence in Norwegian and German, Proc. Nordic Prosody 2008, Helsinki (Finland). Barry, William J. & Bistra Andreeva (2009). Cross-language and individual differences in the production and perception of syllabic prominence, Annual Meeting SPP 1234 Sprachlautliche Kompetenz 2009, Cologne (Germany).

COST2102 International School - Development of Multimodal Interfacesslide 3 Why present this here? Björn Granström: “Coherence between audio and video?”, e.g. between nodding and F0 in “Båten seglede forbi”. Kristiina Jokinen: “To what extent does non-verbal activity, esp. gestures and facial expressions, co-occur with verbal expressions?” (culture-dependence, communicative function) Are there cross-cultural (-language) differences in importance of acoustic and visual cues? (There are for prosodic dimensions.) Are they complementary? (Prosodic dimensions are.) What does that mean for synchrony detection? (Trouble?) This talk only deals with the acoustics of prominence. But because that involves several prosodic dimensions, the data analysis may also be relevant to multi-modal speech.

COST2102 International School - Development of Multimodal Interfacesslide 4 Outline The ideas about the acoustic realization of prominence that I present here are mainly Bill Barry’s and Bistra Andreeva’s. (This is an acknowledgement, not an attempt to evade responsibility.) from each of the three presentations Research questions Recordings Measurements Statistical analysis Results Discussion Conclusion and possible relevance to COST 2102

COST2102 International School - Development of Multimodal Interfacesslide 5 Research questions How do different languages exploit the universal means of signalling the varying prominence of words in an utterance? duration fundamental frequency energy spectral properties Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? duration (length opposition; word stress) fundamental frequency (tonal word-accent) spectral properties (phonologized vowel reduction)

COST2102 International School - Development of Multimodal Interfacesslide 6 Project The present work is part of a larger project funded by the German Research Council: Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited. The languages investigated in the projects are article 1article 2article 3 German English Norwegian Bulgarian Russian French Japanese

COST2102 International School - Development of Multimodal Interfacesslide 7 Recordings Six speakers from homogeneous groups in each language Comparable production task across languages: varying accentuation due to different focus on critical words (CWs) elicited by questions: broad narrow non-contrastive (early or late) narrow contrastive (early or late) Text replies to questions followed by “dada” version Norwegian sentences: 1. Hun Siv drar med skipet snart. 2. Han Karl tenker på fag nå. 3. Hans far brukte sagen da. 4. Min pasta blir kald til da. 6. Min stabsmann forblir bak nå. 7. Han Krister fikk skiftet mitt. German sentences: 1. Der Mann fuhr den Wagen vor. 2. Das Bild soll nicht hässlich sein. 3. Das Kind sollte im Bett sein. 4. Der Peter kann den Film gucken. 5. Das Mädchen soll ein Bild malen. 6. Mein Vater kann Türkisch lesen. Results given here, but checked with text versions B E L text dada

COST2102 International School - Development of Multimodal Interfacesslide 8 Measurements DurationDuration (ms) of stressed vowels, stressed syllables, CWs, feet F0 Mean F 0 (semitones) across stressed vowel of CW F 0 contour by comparison of stressed vowel in CW with preceding/following vowels IntensityMean intensity (dB) of stressed vowel in CW Spectral balance = difference between Hz band and Hz band in stressed vowel of CW Normalized relative to mean across corresponding units in sentence Spectr. def.F1–F3 at middle of stressed nucleus of CW

COST2102 International School - Development of Multimodal Interfacesslide 9 Statistical analysis FR-GE (Speech Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (FR, GE) focus (accented, deaccented) number of syllables in CW (1,2) Multivariate Anova’s per language (FR, GE) Stepwise discriminant analyses: cue weighting for CW1 and CW2 separately for each language separately

COST2102 International School - Development of Multimodal Interfacesslide 10 Results: Manova’s Main effects for language Parameter CW1CW2 vowel dur. syllable dur. word dur. foot dur. * *** *** - ** ****** F 0 mean F 0 difference *** ---- intensity spect. bal. *** * *** F1 F2 F3 *** - * *** ** - Interactions lang.  accentuation ParameterCW1CW2 vowel dur. syllable dur. word dur. foot dur. ** * - *** - F 0 mean F 0 difference *** intensity spect. bal F1 F2 F3 - ** - **-**-

COST2102 International School - Development of Multimodal Interfacesslide 11 CW1 syllable duration CW1 word duration GEFR Results for duration syllable duration word duration

COST2102 International School - Development of Multimodal Interfacesslide 12 CW2 word duration in final foot CW2 syllable duration GEFR Effects greater for French than for German Results for duration syllable duration word duration

COST2102 International School - Development of Multimodal Interfacesslide 13 CW1 LanguageParametersdc French mean F 0 syllable dur. vowel dur. intensity German intensity mean F 0 word duration spect. balance vowel dur. foot dur LanguageParametersdc French mean F 0 intensity F 0 change vowel dur. word dur German intensity vowel dur. mean F 0 syllable dur. spect. balance CW2 Results: discriminant analyses

COST2102 International School - Development of Multimodal Interfacesslide 14 Duration effects accented-deaccented in anova greater for French than for German: exploitation in German constrained due to segmental vowel length opposition?? Spectral balance included as DA-predictor in German: reduction increases accented-deaccented opposition (but no interaction lg x accentuation in Anova’s). But importance of duration in French compared to German not so clear in DA, probably due to correlation between acoustic cues. DA therefore not very suitable for analyzing these data. Discussion

COST2102 International School - Development of Multimodal Interfacesslide 15 Statistical analysis NO-GE (Nordic Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (NO, GE) focus (broad, early narrow, late narrow) number of syllables in CW (1,2) Multivariate Anova’s per language (NO, GE)

COST2102 International School - Development of Multimodal Interfacesslide 16 Results Main effects for language Parameter CW1CW2 vowel dur. syllable dur. word dur. foot dur.  n.s.   n.s.    F 0 mean F 0 difference   intensity spect. balance   F1 F2 F3    Interactions lang.  accentuation ParameterCW1CW2 vowel dur. syllable dur. word dur. foot dur.    F 0 mean F 0 difference    intensity spect. balance  n.s.   F1 F2 F3 n.s.

COST2102 International School - Development of Multimodal Interfacesslide 17 Results: Manova’s per language F-values* for accentuation for N and G, for CW1 (left) and CW2 (right) Parameter NOGE vowel dur. syllable dur. word dur. foot dur F 0 mean F 0 difference intensity spect. balance F1 F2 F3 23 (2) (0) ParameterNOGE vowel dur. syllable dur. word dur. foot dur (3) F 0 mean F 0 difference intensity spect. balance F1 F2 F (1) 89 (2) (1) * F-value = ratio of treatment / residual variances; values in brackets n.s. at p= syll.

COST2102 International School - Development of Multimodal Interfacesslide 18 Results: Manova’s per language η 2 -values for accentuation (for both CWs, NO and GE) * *η 2 = ratio of treatment / total variancesη 2 in red > 0.5; η 2 in grey n.s. NOGE ParameterCW1CW2CW1CW2 Vowel duration Syllable duration Word duration Foot duration F0 mean F0 difference Intensity Spectral balance F F F

COST2102 International School - Development of Multimodal Interfacesslide 19 Results η 2 -values are a ratio of treatment and total variance, and thus indicate the part of the total variance explained by the focus conditions. In Norwegian, durational cues (esp. syllable duration) distinguish the three conditions. In German, intensity and F0 are the strongest cues to distinguish the three conditions. The lack of importance of F0 in Norwegian is most likely an artefact of the different realizations of the lexical tone 1 for mono- and disyllabic stimuli.

COST2102 International School - Development of Multimodal Interfacesslide 20 Results for intensity vowel intensity Similar patterns for (normalized) intensity for German and Norwegian But greater differences between early, late and broad focus in German than in Norwegian In Norwegian late and broad focus intensity of CW2 less than that of CW1, but not in German GERMAN NORW. CW1 CW2 early late broad Focus

COST2102 International School - Development of Multimodal Interfacesslide 21 Results for duration critical word 1 syllable duration word duration GERMAN NORWEGIAN Greater (normalized) durational differen- ces between early, broad and late focus in Norwegian than in German Similar effect for CW2 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ early late broad Focus

COST2102 International School - Development of Multimodal Interfacesslide 22 Results: summary German strongly uses intensity to signal prominence Norwegian uses duration more → but Norwegian also has a vowel length opposition and is classified as the same rhythm type as German (stress-timed), so this disconfirms the hypothesis that the use of acoustic cues depends on their phonological status in a language! F0 does play a role (esp. for German), but our measures do not reflect the different accent types well. →There is a difference in peak alignment of early and late/broad focus between Norwegian and German

COST2102 International School - Development of Multimodal Interfacesslide 23 Discussion: duration in CW1 syllable duration word duration GERMAN NORWEGIAN syllable duration word duration 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ 1 σ 2 σ FRENCH early late broad Focus

COST2102 International School - Development of Multimodal Interfacesslide 24 Discussion: F0 in monosyllabic CW1 early Focus rel. peak alignment Despite two different pitch accents for German (H*) and Norwegian (L*H) – both of them realized as rising pitch movements – a lot of overlap in relative peak alignment between speakers from the two languages (as shown by statistical tests) 25,00 50,00 75,00 100,00 0,00 GERMANNORWEGIAN SP1SP2SP3SP4SP5SP6 SP1SP2SP3SP4SP5SP6

COST2102 International School - Development of Multimodal Interfacesslide 25 Discussion: F0 in mono-syllabic CW2 rel. peak alignment late broad Focus rel. peak alignment 0,00 50,00 100,00 150,00 GERMAN NORW. 0,00 50,00 100,00 150,00 200,00 GERMAN NORWEGIAN SP1SP2SP3SP4SP5SP6 SP1SP2SP3SP4SP5SP6 If German speakers differentiate broad from late focus, broad focus has earlier peak alignment in broad than late focus. For Norwegian, there is an opposite difference.

COST2102 International School - Development of Multimodal Interfacesslide 26 broad late early German SP1 German SP4

COST2102 International School - Development of Multimodal Interfacesslide 27 broad late early Norwegian SP4 Norwegian SP10 H FOC

COST2102 International School - Development of Multimodal Interfacesslide 28 Discussion: summary In French we found a fixed syllable duration (syllable- timing), while in German, syllable shortening enhances word isochrony (1 vs. 2 syllables) – but not in Norwegian, which is also classified as stress-timed. Is there a phonological explanation for this? Or should we conclude that the prosodic use of acoustic cues is independent of their phonological status in a language? Among our parameters, F0 needs special attention because of its mixed phonetic-phonological properties. Detailed analysis in Nordic Prosody paper.

COST2102 International School - Development of Multimodal Interfacesslide 29 Analysis 6 languages (SPP1234 data) Anova’s with languages as independent variables Dependent variable is mean change in values from broad to contrastive focus Mean change is expressed as a percentage (duration, F0) or in dB (intensity)

COST2102 International School - Development of Multimodal Interfacesslide 30 Results for syllable duration of [da] Languages use the acoustic carriers of prominence to different degrees (CS=Critical Syllable): NO > FR > RU ~ GE > EN ~ BU CS146%32%25%22%17%16% NO > FR > RU > GE ~ BU ~ EN CS253%38%26%17%17%14% Note: No apparent connection between vowel length opposition and use of duration for accentuation (in contrast to Rebecca Dauer‘s claim)

COST2102 International School - Development of Multimodal Interfacesslide 31 Results for F0 in text recordings Languages use the acoustic carriers of prominence to different degrees: FR > EN ~ GE > BU ~ NO > RU CS172%61%58%28%27%20% GE ~ FR > EN > BU > RU > NO CS264%62%51%38%31% 10% Note: Despite some shift in rank between FR, EN, GE and between NO and RU for the early (CS1) and the late position (CS2), the generally high vs. low dynamics for the groups remain (the ranking for [dada] is even more consistent)

COST2102 International School - Development of Multimodal Interfacesslide 32 Results for intensity in [dada] recordings Languages use the acoustic carriers of prominence to different degrees (intensities in dB): BU > FR ~ GE > RU ~ EN > NO CS BU > FR = GE > EN > RU > NO CS Note:Larger intensity differences for CS2 than CS1.

COST2102 International School - Development of Multimodal Interfacesslide 33 Conclusion and possible relevance For each acoustic parameter, there is a hierarchy of its exploitation for signalling focus-induced prominence in different languages. Similar differences may exist between languages/cultures in the way they exploit different gestures (face, hand, arm, etc.) and/or for the relative explotiation of acoustic/visual cues, e.g. to signal focus or other communicative functions. Possibly not only correlation (synchrony), but also complementarity of parameters.

COST2102 International School - Development of Multimodal Interfacesslide 34 Thank you for your attention