Presentation is loading. Please wait.

Presentation is loading. Please wait.

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

Similar presentations


Presentation on theme: "AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University."— Presentation transcript:

1 AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands tel: +31 20 5252183; fax: +31 20 5252197 email: Rob.van.Son@hum.uva.nl ICSLP2000, Beijing, China, Oct. 20, 2000

2 INTRODUCTION Speech is "efficient": Important components are emphasized Less important ones are de-emphasized Two mechanisms: 1) Prosody: Lexical Stress and Sentence Accent (Prominence) 2) Predictability: Frequency of Occurrence (tested) and Context (not tested)

3 MECHANISMS FOR EFFICIENT SPEECH Speech emphasis should mirror importance which largely corresponds to unpredictability Prosodic structure distributes emphasis according to importance (lexical stress, sentence accent / prominence) Speakers can (de-)emphasize according to supposed (un)importance Speech production mechanisms can facilitate redundant speech or hamper unpredictable speech

4 QUESTIONS Can the distribution of emphasis or reduction be completely explained from Prosody? (Lexical stress and Sentence Accent / Prominence) If not, can we identify a speech production mechanism that would assist efficiency in speech? e.g. preprogrammed articulation of redundant and / or high-frequent syllable-like segments?

5 Unstressed Stressed Total Corpus  Accent  – + – + Single consonants 550 180 569 283 1582 Speakervowels 812 461 528 224 2025 Polyphone vowels 443549429603351622496 Accent: Sentence accent / Prominence Stressed/Unstressed: Lexical stress SPEECH MATERIAL (DUTCH) Single Male Speaker: Vowels and Consonants Matched Informal and Read speech, 791 matched VCV pairs Polyphone: Vowels only 273 speakers (out of 5000), telephone speech, 1244 read sentences Segmented with a modified HMM recognizer (Xue Wang) Corpora sizes: Number of realizations of vowels and consonants

6 METHODS: SPEECH PREPARATION Single speaker corpus –All 2 x 791 VCV segments hand-labeled –Also sentence accent determined by hand –22 Native listeners identified consonants from this corpus Polyphone corpus –Automatically labeled using a pronunciation lexicon and a modified HMM recognizer –10 Judges marked prominent words (prominence 1-10) Word and Syllable -log 2 (Frequencies) for both corpora were determined from Dutch CELEX

7 METHODS: ANALYSIS Single Speaker Corpus Consonants and Vowels Duration in ms (vowels and consonants) Contrast (vowels only) F 1 / F 2 distance to (300, 1450) Hz in semitones Spectral Center of Gravity (CoG) (V and C) Weighted mean frequency in semitones at point of maximum energy Log 2 (Perplexity) from consonant identification Calculated from confusion matrices

8 METHODS: ANALYSIS Polyphone Corpus Vowels only Loudness in sone Spectral Center of Gravity (CoG) Weighted mean frequency in semitones averaged over the segment Prominence (1-10) The number of 'PROMINENT' listener judgements 0 – 5 is considered Unaccented 6 –10 is considered Accented

9 Duration in ms Loudness in sones CoG: Spectral Center of Gravity (semitones) Px: log 2 (Perplexity) plotted is –R Contrast: F 1 / F 2 distance to (300, 1450) Hz (semitones) CONSISTENCY OF MEASUREMENTS Correlation coefficients between factors Single Speaker Polyphone } Filled symbols: P<=0.01 Consonants (n=1582) Vowels (n=2025) Polyphone (n=22496) G E S A 2 GI C Filled: p<=0.01 Duration x CoG Duration x Px CoG x Px Duration x Contr. Duration x CoG Loudness x CoG Contrast x CoG

10 CONSONANT REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) CoG: Spectral Center of Gravity (semitones) Perplexity: log 2 (Perplexity), plotted is –R. Syllable and word frequencies were correlated (R=0.230, p=0.01) Single speaker corpus (n=1582) Filled symbols: P<=0.01 G E A Duration CoG Perplexity Filled: p<=0.01

11 Duration in ms Contrast: F 1 / F 2 distance to (300, 1450) Hz (semitones) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies were correlated (R=0.280, p<=0.01) VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Single speaker corpus (n=2025) Filled symbols: P<=0.01

12 DISCUSSION OF SINGLE SPEAKER DATA There are consistent correlations between frequency of occurrence and “acoustic reduction” (duration, CoG and contrast), but not for consonant identification (perplexity) Correlations for syllable frequencies tend to be larger than those for word frequencies (p  0.01) Correlations were found after accounting for Phoneme identity, Lexical Stress and Sentence Accent

13 Loudness (sone) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies (-log 2 (freq)) PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) Filled symbols: P<=0.01 G Loudness E CoG C Syllable freq. A Word freq. Filled: p<=0.01

14 VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) Loudness (sone) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies were correlated (R=0.316, p<=0.01) Filled symbols: P<=0.01 Accent: + Prom > 5 – Prom <= 5

15 DISCUSSION OF POLYPHONE DATA Perceived prominence correlates with “acoustic vowel reduction” (loudness, CoG) and frequency of occurrence (syllable and word) There are small but consistent correlations between “acoustic vowel reduction” and frequency of occurrence Correlations were found after accounting for Vowel identity, Lexical Stress and Prominence

16 CONCLUSIONS LEXICAL STRESS and SENTENCE ACCENT / PROMINENCE cannot explain all of the “efficiency” of speech: FREQUENCY OF OCCURRENCE and possibly CONTEXT in general are needed for a full account A SYLLABARY which speeds up (and reduces) the articulation of “stored”, high-frequency, syllables with respect to “computed”, rare, syllables might explain at least part of our data

17 SPOKEN LANGUAGE CORPUS How Efficient is Speech 8-10 speakers: ~60 minutes of speech each (fixed and variable materials) Informal story telling and retold stories~15 min Reading continuous texts~15 min Reading Isolated (Pseudo-) sentences~20 min Word lists~ 5 min Syllable lists~ 5 min

18 MEASURING SPEECH EFFICIENCY Speaking Style differences (Informal, Retold, Read, Sentences, Lists) Predictability –Frequency of Occurrence (words and syllables) –In Context (language models) –Cloze-tests –Shadowing (RT or delay) Acoustic Reduction –Segment identification –Duration –Spectral reduction


Download ppt "AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University."

Similar presentations


Ads by Google