Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo.

Slides:



Advertisements
Similar presentations
Tom Lentz (slides Ivana Brasileiro)
Advertisements

Philip Harrison J P French Associates & Department of Language & Linguistic Science, York University IAFPA 2006 Annual Conference Göteborg, Sweden Variability.
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Function words are often reduced or even deleted in casual conversation (Fig. 1). Pairs may neutralize: he’s/he was, we’re/we were What sources of information.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Syllable. Syllable When talking about stress, we refer to the degree of force and loudness with which a syllable is uttered. When talking about stress,
Can a prosodic pattern induce/ reduce the perception of a lower- class suburban accent in French? Philippe Boula de Mareüil 1 & Iryna Lehka-Lemarchand.
Speech perception 2 Perceptual organization of speech.
Frequency, Pitch, Tone and Length October 15, 2012 Thanks to Chilin Shih for making some of these lecture materials available.
Fundamental Frequency & Jitter Lab 2. Fundamental Frequency Pitch is the perceptual correlate of F 0 Perception is not equivalent to measurement: –Pitch=
Suprasegmentals The term suprasegmental refers to those properties of an utterance which aren't properties of any single segment. The following are usually.
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
Prosodic Signalling of (Un)Expected Information in South Swedish Gilbert Ambrazaitis Linguistics and Phonetics Centre for Languages and Literature.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
PHONETICS AND PHONOLOGY
General Problems  Foreign language speakers of a target language cause a great difficulty to native speakers because the sounds they produce seems very.
Accent Profile Qin Yan Dept of Electronic & Computer Engineering, Brunel University November, 2002.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
Sound and Speech. The vocal tract Figures from Graddol et al.
Chapter three Phonology
2.3. Measures of Dispersion (Variation):
A Corpus Search Methodology for Focus Realization Jonathan Howell and Mats Rooth Linguistics and CIS Cornell University.
The Perception of Speech
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Suprasegmentals Segmental Segmental refers to phonemes and allophones and their attributes refers to phonemes and allophones and their attributes Supra-
Speech Science Fall 2009 Nov 2, Outline Suprasegmental features of speech Stress Intonation Duration and Juncture Role of feedback in speech production.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Connected speech processes Coarticulation Suprasegmentals.
Syllables and Stress October 19, 2012 Practicalities Mid-sagittal diagrams to turn in! Plus: homeworks to hand back. Production Exercise #2 is still.
Rhythmic Transcription of MIDI Signals Carmine Casciato MUMT 611 Thursday, February 10, 2005.
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
English Phonetics 许德华 许德华. Objectives of the Course This course is intended to help the students to improve their English pronunciation, including such.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
A Fully Annotated Corpus of Russian Speech
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Tone, Accent and Quantity October 19, 2015 Thanks to Chilin Shih for making some of these lecture materials available.
Bettina Braun Max Planck Institute for Psycholinguistics Effects of dialect and context on the realisation of German prenuclear accents.
Unit 5 Phonetics and Phonology. Phonetics Sounds produced by the human speech organs are called the “phonic/auditory medium” Phonetics is the study of.
Syllables and Stress October 21, 2015.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
Speech Perception.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
CHAPTER 2: Basic Summary Statistics
Suprasegmental Properties of Speech Robert A. Prosek, Ph.D. CSD 301 Robert A. Prosek, Ph.D. CSD 301.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Allophonic processes Kuiper and Allan Chapter 5.4.
Speech in the DHH Classroom A new perspective. Speech in the DHH Bilingual Classroom Important to look beyond the traditional view of speech Think of.
Pitch Tracking + Prosody January 19, 2012 Homework! For Tuesday: introductory course project report Background information on your consultant and the.
Chapter 4: The Sounds of American English Speech and Writing Confusion – Synesthesia (confusion of the senses) affects people beliefs of language Sound.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Speechreading Based on Tye-Murray (1998) pp
Teaching pronunciation
Chapter 4: The Sounds of American English
Rhythmic Transcription of MIDI Signals
an Introduction to English
Phonetics SPAU 3343 Chap. 10 – Grasping the melody of language
Speech Perception.
Speech Perception CS4706.
CHAPTER 2: Basic Summary Statistics
2.3. Measures of Dispersion (Variation):
Presentation transcript:

Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo

My main principle: Acoustic data that we retrieve from speech material for analysis should be connected with phonetic (linguistic) features, hence obtained values should reflect concrete features and have clear phonetic (linguistic) interpretation; used methods of calculations and classifications should take into account not only speech production but speech perception properties too.

Fo measurements (phrase № 53 in reading and spontaneous speech, F>40)

Terms: Smoothing: Fo data is processed by rectangular window of 100 ms long with pitch-synchronous shift Correction: pitch marks are eliminated on voiced consonants + approximants, on voiced onsets, hesitations and voiced transitions between vowels and consonants.

Why the correction is needed on voiced transitions between vowels and consonants This voiced transition does not affect the perceived vowel duration: The whole group [kak'i tak'i] [kaki] isolated -- original vowel length -- vowel without transition [i] -- original length (66 ms) -- vowel without transition (37ms) but affects the next consonant duration and Fo values:

Smoothed Fo data with pitch marks on voiced [i-t] transition Smoothed Fo data without pitch marks on voiced [i-t] transition

Raw data: reading Raw data: spontaneous speech

Smoothed data: reading Smoothed data: spontaneous speech

Smoothed Fo without laryngealization: reading Smoothed Fo without laryngealization: spontaneous speech

Smoothed Fo without laryngealization and some consonants: reading Smoothed Fo without laryngealization and some consonants: spont. speech

Fo measurements Raw Fo DataSmoothed Fo DataSmoothed Fo without laryngealization Smoothed Fo without laryngealazation and some consonants ReadingSpont.ReadingSpontReadingSpontReadingSpont Average Fo (Hz) Max Fo (Hz) Min Fo (Hz) Range (Hz) St. deviation Mean rise slope (Hz/s) Mean fall slope (Hz/s)

Tempo measurements Methods may be different for different tasks: For Comparison on the basis of the whole material For tempo monitoring, for example for revealing tempo modification specific for some IU types or IU position in the utterance or for local tempo comparison between read and spontaneous realizations of the same phrase

Tempo measurements Comparison on the basis of the whole material Syllables: Average Duration of Syllables realized in Spont. Speech vs Average Duration of Syllables realized in Reading Example for F>40 152/143 = 1.06 Sounds: Average Duration of Sounds realized in Spont. Speech vs Average Duration of Sounds realized in Reading Example for F>40 67/63 = 1.06 Possible correction - taking into account the ideal number of syllables or sounds

The simplest way: direct comparison of sound duration in both phrases But some sounds are longer in reading, others – in spontaneous speech, it makes the tempo comparison difficult and inconsistent.

Tempo measurements Tempo monitoring: Example (Speaker F<20: phrase №12, sounds duration in spontaneous speech and reading )

Methods for tempo monitoring 1. Current syllable duration/average syllable duration: current syllable duration = IU duration/number of syllables; average syllable duration = net sound material duration/number of syllables Not good because the result depends on syllable structures in the current IU, so it needs use of some normalization taking into account the average syllable structure (C/V coefficient) and current one.

Methods for tempo monitoring 2. Average sound duration in current IU/average sound duration average sound duration in current IU = IU duration/number of sounds; average sound duration = net sound material duration/number of sounds With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU Example for F<20 Reading1-st IU59/64 = nd IU61/64 = 0.95 Spont. Speech1-st IU70/71 = nd IU54/71 = 0.76 Not good because the result does not take into account individual average durations of each sound in the IU and deviations of current duration of each sound in the IU from its average duration in the whole material.

Methods for tempo monitoring 3. Average sound duration in current IU/averaged sound duration in the IU average sound duration in current IU = IU duration/number of sounds; averaged sounds duration = sum of average sound durations (on the basis of the whole material) in the IU/ number of sounds in the IU (some pictures)

Or the same in better view

Methods for tempo monitoring 3. Average sound duration in current IU/averaged sounds duration in the IU average sound duration in current IU = IU duration/number of sounds; averaged sounds duration = sum of average sounds durations (on the basis of the whole material) / number of sounds Example for F<20 Reading1-st IU59/72 = nd IU61/56 = 1.09 Spont. Speech1-st IU70/75 = nd IU54/66 = 0.82 With possible correction - taking into account the ideal number of sounds in the whole material and in the current IU and average durations of pre-stressed and post-stressed vowels

Methods for tempo monitoring 4. Rob van Son proposal (Z-values): "As Finnish and Dutch (and Russian?) use quantities on (some) phonemes, this is not a good way to define tempo. We had a PhD student (Xue Wang) who developed a very nice way to define "local" tempo as the Z value of the phoneme (i.e., LocalTempo = (PhonemeDuration - MeanPhonemeDuration)/StandDeviation for each phoneme). The local speaking rate is then the mean of these values over an utterance." Example for F<20 Reading1-st IU nd IU 0.26 Spont. Speech1-st IU nd IU-0.41 No comprehensible relation between values and linguistic features

Method comparison IU-1IU-2 ReadingSpont. SpeechReadingSpont. Speech 2. Average sound duration in current IU/average sound duration Average sound duration in current IU/averaged sound duration in the IU Mean for Z-values