AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.

Slides:



Advertisements
Similar presentations
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Advertisements

Analysis of Spoken Language Department of General & Comparative Linguistics Christian-Albrechts-Universität zu Kiel Oliver Niebuhr 1 Vowel.
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
From speech signal acoustics to perception Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC)
S. P. Kishore*, Rohit Kumar** and Rajeev Sangal* * Language Technologies Research Center International Institute of Information Technology Hyderabad **
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Results ISI Variance in STP Corpus ISI Variance in BU Corpus * p
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
PHONETICS AND PHONOLOGY
Connecting Acoustics to Linguistics in Chinese Intonation Greg Kochanski (Oxford Phonetics) Chilin Shih (University of Illinois) Tan Lee (CUHK) with Hongyan.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Optical Phonetics and Visual Perception of Lexical and Phrasal Stress in English Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough, Abeer.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Chapter three Phonology
Morphological information and acoustic salience in Dutch compounds Victor Kuperman, IWTS Radboud University Nijmegen.
A PRESENTATION BY SHAMALEE DESHPANDE
Speech acoustics and phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Segment Duration and Vowel Quality in German Lexical Stress Perception Klaus J. Kohler University of Kiel, Germany Paper presented at Speech Prosody 2012.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
PROSODY MODELING AND EIGEN- PROSODY ANALYSIS FOR ROBUST SPEAKER RECOGNITION Zi-He Chen, Yuan-Fu Liao, and Yau-Tarng Juang ICASSP 2005 Presenter: Fang-Hui.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
As a conclusion, our system can perform good performance on a read speech corpus, but we will have to develop more accurate tools in order to model the.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Human and Machine Performance in Speech Processing Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, The Netherlands (Apologies:
Bernd Möbius CoE MMCI Saarland University Lecture 7 8 Dec 2010 Unit Selection Synthesis B Möbius Unit selection synthesis Text-to-Speech Synthesis.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
A Fully Annotated Corpus of Russian Speech
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Speech Perception.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Merging Segmental, Rhythmic and Fundamental Frequency Features for Automatic Language Identification Jean-Luc Rouas 1, Jérôme Farinas 1 & François Pellegrino.
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Outline  I. Introduction  II. Reading fluency components  III. Experimental study  1) Method and participants  2) Testing materials  IV. Interpretation.
Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
A Text-free Approach to Assessing Nonnative Intonation Joseph Tepperman, Abe Kazemzadeh, and Shrikanth Narayanan Signal Analysis and Interpretation Laboratory,
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
Sentence Durations and Accentedness Judgments
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Audio Books for Phonetics Research
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
Topic: Language perception
Presentation transcript:

AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands tel: ; fax: ICSLP2000, Beijing, China, Oct. 20, 2000

INTRODUCTION Speech is "efficient": Important components are emphasized Less important ones are de-emphasized Two mechanisms: 1) Prosody: Lexical Stress and Sentence Accent (Prominence) 2) Predictability: Frequency of Occurrence (tested) and Context (not tested)

MECHANISMS FOR EFFICIENT SPEECH Speech emphasis should mirror importance which largely corresponds to unpredictability Prosodic structure distributes emphasis according to importance (lexical stress, sentence accent / prominence) Speakers can (de-)emphasize according to supposed (un)importance Speech production mechanisms can facilitate redundant speech or hamper unpredictable speech

QUESTIONS Can the distribution of emphasis or reduction be completely explained from Prosody? (Lexical stress and Sentence Accent / Prominence) If not, can we identify a speech production mechanism that would assist efficiency in speech? e.g. preprogrammed articulation of redundant and / or high-frequent syllable-like segments?

Unstressed Stressed Total Corpus  Accent  – + – + Single consonants Speakervowels Polyphone vowels Accent: Sentence accent / Prominence Stressed/Unstressed: Lexical stress SPEECH MATERIAL (DUTCH) Single Male Speaker: Vowels and Consonants Matched Informal and Read speech, 791 matched VCV pairs Polyphone: Vowels only 273 speakers (out of 5000), telephone speech, 1244 read sentences Segmented with a modified HMM recognizer (Xue Wang) Corpora sizes: Number of realizations of vowels and consonants

METHODS: SPEECH PREPARATION Single speaker corpus –All 2 x 791 VCV segments hand-labeled –Also sentence accent determined by hand –22 Native listeners identified consonants from this corpus Polyphone corpus –Automatically labeled using a pronunciation lexicon and a modified HMM recognizer –10 Judges marked prominent words (prominence 1-10) Word and Syllable -log 2 (Frequencies) for both corpora were determined from Dutch CELEX

METHODS: ANALYSIS Single Speaker Corpus Consonants and Vowels Duration in ms (vowels and consonants) Contrast (vowels only) F 1 / F 2 distance to (300, 1450) Hz in semitones Spectral Center of Gravity (CoG) (V and C) Weighted mean frequency in semitones at point of maximum energy Log 2 (Perplexity) from consonant identification Calculated from confusion matrices

METHODS: ANALYSIS Polyphone Corpus Vowels only Loudness in sone Spectral Center of Gravity (CoG) Weighted mean frequency in semitones averaged over the segment Prominence (1-10) The number of 'PROMINENT' listener judgements 0 – 5 is considered Unaccented 6 –10 is considered Accented

Duration in ms Loudness in sones CoG: Spectral Center of Gravity (semitones) Px: log 2 (Perplexity) plotted is –R Contrast: F 1 / F 2 distance to (300, 1450) Hz (semitones) CONSISTENCY OF MEASUREMENTS Correlation coefficients between factors Single Speaker Polyphone } Filled symbols: P<=0.01 Consonants (n=1582) Vowels (n=2025) Polyphone (n=22496) G E S A 2 GI C Filled: p<=0.01 Duration x CoG Duration x Px CoG x Px Duration x Contr. Duration x CoG Loudness x CoG Contrast x CoG

CONSONANT REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) CoG: Spectral Center of Gravity (semitones) Perplexity: log 2 (Perplexity), plotted is –R. Syllable and word frequencies were correlated (R=0.230, p=0.01) Single speaker corpus (n=1582) Filled symbols: P<=0.01 G E A Duration CoG Perplexity Filled: p<=0.01

Duration in ms Contrast: F 1 / F 2 distance to (300, 1450) Hz (semitones) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies were correlated (R=0.280, p<=0.01) VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Single speaker corpus (n=2025) Filled symbols: P<=0.01

DISCUSSION OF SINGLE SPEAKER DATA There are consistent correlations between frequency of occurrence and “acoustic reduction” (duration, CoG and contrast), but not for consonant identification (perplexity) Correlations for syllable frequencies tend to be larger than those for word frequencies (p  0.01) Correlations were found after accounting for Phoneme identity, Lexical Stress and Sentence Accent

Loudness (sone) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies (-log 2 (freq)) PROMINENCE VERSUS VOWEL REDUCTION AND FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) Filled symbols: P<=0.01 G Loudness E CoG C Syllable freq. A Word freq. Filled: p<=0.01

VOWEL REDUCTION VERSUS FREQUENCY OF OCCURRENCE (correlation coefficients) Polyphone corpus (n=22496) Loudness (sone) CoG: Spectral Center of Gravity (semitones) Syllable and word frequencies were correlated (R=0.316, p<=0.01) Filled symbols: P<=0.01 Accent: + Prom > 5 – Prom <= 5

DISCUSSION OF POLYPHONE DATA Perceived prominence correlates with “acoustic vowel reduction” (loudness, CoG) and frequency of occurrence (syllable and word) There are small but consistent correlations between “acoustic vowel reduction” and frequency of occurrence Correlations were found after accounting for Vowel identity, Lexical Stress and Prominence

CONCLUSIONS LEXICAL STRESS and SENTENCE ACCENT / PROMINENCE cannot explain all of the “efficiency” of speech: FREQUENCY OF OCCURRENCE and possibly CONTEXT in general are needed for a full account A SYLLABARY which speeds up (and reduces) the articulation of “stored”, high-frequency, syllables with respect to “computed”, rare, syllables might explain at least part of our data

SPOKEN LANGUAGE CORPUS How Efficient is Speech 8-10 speakers: ~60 minutes of speech each (fixed and variable materials) Informal story telling and retold stories~15 min Reading continuous texts~15 min Reading Isolated (Pseudo-) sentences~20 min Word lists~ 5 min Syllable lists~ 5 min

MEASURING SPEECH EFFICIENCY Speaking Style differences (Informal, Retold, Read, Sentences, Lists) Predictability –Frequency of Occurrence (words and syllables) –In Context (language models) –Cloze-tests –Shadowing (RT or delay) Acoustic Reduction –Segment identification –Duration –Spectral reduction