Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Slides:



Advertisements
Similar presentations
Accessing spoken words: the importance of word onsets
Advertisements

SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Basic Spectrogram & Clinical Application Lab 9. Spectrographic Features of Vowels n 1st formant carries much information about manner of articulation.
JPN494: Japanese Language and Linguistics JPN543: Advanced Japanese Language and Linguistics Phonology & Phonetics (2)
Acoustic Characteristics of Vowels
Unit 2 The sounds of English. Review Review What are the major defining features that natural languages possess? What are the major defining features.
The Sound Patterns of Language: Phonology
Identification of prosodic near- minimal Pairs in Spontaneous Speech Keesha Joseph Howard University Center for Spoken Language Understanding (CSLU) Oregon.
Speech Science XII Speech Perception (acoustic cues) Version
Spoken Language Analysis Dept. of General & Comparative Linguistics Christian-Albrechts-Universität zu Kiel Oliver Niebuhr 1 At the Segment-Prosody.
“Speech and the Hearing-Impaired Child: Theory and Practice” Ch. 13 Vowels and Diphthongs –Vowels are formed when sound produced at the glottal source.
Stress-Accent and Vowel Quality in The Switchboard Corpus Steven Greenberg and Leah Hitchcock International Computer Science Institute 1947 Center Street,
Prosodics, Part 1 LIN Prosodics, or Suprasegmentals Remember, from our first discussions in class, that speech is really a continuous flow of initiation,
1 Università di Cagliari Corso di Laurea in Economia e Gestione Aziendale Economia e Finanza Economia e Finanza Lingue e Culture per la Mediazione Programma.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Temporal Properties of Spoken Language Steven Greenberg The Speech Institute
Results ISI Variance in STP Corpus ISI Variance in BU Corpus * p
Niebuhr, D‘Imperio, Gili Fivela, Cangemi 1 Are there “Shapers” and “Aligners” ? Individual differences in signalling pitch accent category.
Phonology Phonology is essentially the description of the systems and patterns of speech sounds in a language. It is, in effect, based on a theory of.
PHONETICS AND PHONOLOGY
Time Frames of Spoken Language Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704
Sonority as a Basis for Rhythmic Class Discrimination Antonio Galves, USP. Jesus Garcia, USP. Denise Duarte, USP and UFGo. Charlotte Galves, UNICAMP.
Understanding Spoken Language using Statistical and Computational Methods Steven Greenberg International Computer Science Institute 1947 Center Street,
What are the Essential Cues for Understanding Spoken Language? Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley,
The Relation Between Stress Accent and Pronunciation Variation in Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
From Here to Utility Melding Phonetic Insight With Speech Technology Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley,
Linguisitics Levels of description. Speech and language Language as communication Speech vs. text –Speech primary –Text is derived –Text is not “written.
Chapter three Phonology
1 ENGLISH PHONETICS AND PHONOLOGY Lesson 3A Introduction to Phonetics and Phonology.
Phonetic Dissection of Switchboard-Corpus Automatic Speech Recognition Systems Steven Greenberg and Shuangyu Chang International Computer Science Institute.
An Elitist Approach to Articulatory-Acoustic Feature Classification in English and in Dutch Steven Greenberg, Shawn Chang and Mirjam Wester International.
ACE TESOL Diploma Program – London Language Institute OBJECTIVES You will understand: 1. A process for teaching the receptive and productive sides of pronunciation.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction Mel- Frequency Cepstral Coefficients (MFCCs) are quantitative representations of speech and are commonly used to label sound files. They are.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Phonetics and Phonology
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
Phonetic Dissection of Switchboard-Corpus Automatic Speech Recognition Systems Steven Greenberg and Shuangyu Chang International Computer Science Institute.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Chapter 10 Hetero- skedasticity Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
The Phonetic Patterning of Spontaneous American English Discourse Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Speech Science IX How is articulation organized? Version WS
The vowel detection algorithm provides an estimation of the actual number of vowel present in the waveform. It thus provides an estimate of SR(u) : François.
Automatic Identification and Classification of Words using Phonetic and Prosodic Features Vidya Mohan Center for Speech and Language Engineering The Johns.
A Fully Annotated Corpus of Russian Speech
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
Chapter II phonology II. Classification of English speech sounds Vowels and Consonants The basic difference between these two classes is that in the production.
Chapter Five Language Description language study and linguistic study 1Applied Linguistics Chapter 5 by TIAN Bing.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
THE SOUND PATTERNS OF LANGUAGE
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street,
Descriptive Statistics. Outline of Today’s Discussion 1.Central Tendency 2.Dispersion 3.Graphs 4.Excel Practice: Computing the S.D. 5.SPSS: Existing Files.
Pronunciation Variation is Key to Understanding Spoken Language Steven Greenberg The Speech Institute.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
Temporal Properties of Spoken Language Steven Greenberg In Collaboration with Hannah Carvey,
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Audio Books for Phonetics Research CatCod2008 Jiahong Yuan and Mark Liberman University of Pennsylvania Dec. 4, 2008.
Introduction to Linguistics
LING 103 Introduction to English Linguistics 2017.
English Phonetics and Phonology
Audio Books for Phonetics Research
Speech Perception (acoustic cues)
Phonetics: Sound Principles
PHONETICS AND PHONOLOGY INTRODUCTION TO LINGUISTICS Lourna J. Baldera BSED- ENGLISH 1.
Presentation transcript:

Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA {steveng, hmcarvey, leahh,

Acknowledgements and Thanks Research Funding U.S. Department of Defense U.S. National Science Foundation

For Further Information Consult the web site:

OVERTURE The Central Challenge for Models of Speech Recognition

The Serial Frame Perspective on Speech Traditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral profile of the acoustic signal computed for each time interval (frame) of speech

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) ….

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string”

Phonemic Beads on a String Illustrated In traditional models of speech recognition words are conceptualized as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” No quarter is provided for stress accent or other syllabic properties

Language - The Traditional Perspective The “classical” view of spoken language posits a quasi-arbitrary relation between the lower and higher tiers of linguistic organization Cat= [k] + [ae] + [t] Cat = /k/ + /ae/ + /t/ ASR systems focus on decoding words from sequences of phones

A Challenge for the “Phonemic Beads on a String” Approach to Speech Recognition Pronunciation Variability

Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse

Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced

Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced (as the following two slides illustrate for the word “AND” based on manual phonetic annotation of a corpus comprising telephone dialogues)

How Many Pronunciations of “and”? NPronunciationN Canonical pronunciation

How Many Pronunciations of “and”? NPronunciationN

Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)

Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard) (which together account for 35% of the word tokens in the corpus)

How Many Different Pronunciations? RankWordN#Pron Most Common Pronunciation MCP %Total The 20 most frequency words account for 35% of the tokens

QUESTION How do listeners decode the speech signal given the large amount of pronunciation variation?

PART ONE Anatomy of a Syllable

Language - A Syllable-Centric Perspective A more empirically grounded perspective of spoken language focuses on the SYLLABLE as the interface between “sound” and “meaning” Within this framework the relationship between the syllable and the higher and lower tiers is non-arbitrary and systematic statistically

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset?

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus?

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus? What is a coda?

The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset? What is a nucleus? What is a coda? The following slides provide a brief (and gentle) introduction to syllable structure

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) “J” = JUNCTUREOGI Numbers95 corpus

Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Onset segments often differ in significant ways from coda segments “J” = JUNCTUREOGI Numbers95 corpus

PART TWO Spectro-Temporal Profiles

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …..

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances

The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on the following slides) STePs are derived from averages of hundreds of individual instances

Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en]

[s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en] Spectro-Temporal Profile - DiSyllabic Word

PART THREE Scientific Approach to Speech Recognition

Ascertain the contribution of …. A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level A Scientific Approach to Speech Recognition

Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level We will be comparing the “baseline” system (entirely automatic recognition) with an entirely “fabricated” set of input data (derived from hand-labeled phonetic annotation + autoSAL) as well as a “half-way house” system that is partially automatic and partially not (manually derived phonetic segmentation, as well as whether each segment is vocalic or not) A Scientific Approach to Speech Recognition

Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% Numbers95 Recognition – Stress Accent Impact

Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Numbers95 Recognition – Stress Accent Impact

Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Numbers95 Recognition – Stress Accent Impact

Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Numbers95 Recognition – Stress Accent Impact

Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Also important for coda – WER increases by 7-15% Numbers95 Recognition – Stress Accent Impact

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Numbers95 Recognition – Pronunciation Impact

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Numbers95 Recognition – Pronunciation Impact

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Coda segments are least canonical Numbers95 Recognition – Pronunciation Impact

Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Coda segments are least canonical Therefore, it is important to provide for pronunciation variation in ASR system Numbers95 Recognition – Pronunciation Impact

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Numbers95 – Syllable Position Importance

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Numbers95 – Syllable Position Importance

Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Conclusion: Onsets and nuclei are most important for lexical access in an ASR system (at least for the Numbers95 corpus) Numbers95 – Syllable Position Importance

PART FOUR Being Phonetically and Prosodically Annotated

Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically annotated (labeled and segmented)

Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis)

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality

Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality Transcription System A variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd

Phonetic Transcription of Spontaneous English The Data are Available at ….

Phonetic Transcription of Spontaneous English The Data are Available at ….

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished:

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: Heavy

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLight

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)

Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))

The data are available at …. Annotation of Stress Accent

The data are available at …. Annotation of Stress Accent

Automatic Labeling of Stress Accent This forty-five minutes of hand-labeled phonetic and prosodic annotation from the Switchboard corpus was used as training data for development of an Automatic Stress Accent Labeling System (AutoSAL)

How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step

How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step

How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step This degree of concordance is as high as that exhibited by two highly trained (human) transcribers

PART FIVE Stress Accent and Syllable Position

The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable

All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable CODA Territory ONSET Territory NUCLEUS Territory

All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account CODA Territory ONSET Territory NUCLEUS Territory

Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account BOTH syllable structure and accent level are required for a full accounting All Segments Deletions Insertions Substitutions CODA Territory ONSET Territory NUCLEUS Territory

PART SIX Durational Properties of Pronunciation Variation

Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position

Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none

Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English

Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables The CVCC and CCVC (complex syllable) forms account for another 10% V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Canonical Syllable Forms V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” Canonical Syllable Forms V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration Canonical Syllable Forms V = Vowel C = Consonant

Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration (as we’ll see) Canonical Syllable Forms V = Vowel C = Consonant

Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) V = Vowel C = Consonant

Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts V = Vowel C = Consonant

Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) V = Vowel C = Consonant

Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has the greatest impact on vocalic duration V = Vowel C = Consonant

Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below

Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)

PART SEVEN Stress Accent and the Vocalic Nucleus

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration Stress Accent’s Impact on the Vocalic Nucleus

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form Stress Accent’s Impact on the Vocalic Nucleus

Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact Stress Accent’s Impact on the Vocalic Nucleus

The Spatial Patterning of Duration in Vocalic Nuclei

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue A Brief Primer on Vocalic Acoustics

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance A Brief Primer on Vocalic Acoustics

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 A Brief Primer on Vocalic Acoustics

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position Spatial Patterning of Duration et al.

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) Spatial Patterning of Duration et al.

In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei Spatial Patterning of Duration et al.

Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration

Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height

Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height But … the situation is a little more complicated than first appearances would suggest

Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Canonical Syllable Forms

Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms

Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms Lax monophthongs

Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed

Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed

Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed The high diphthongs and high/mid, tense monophthongs occupy an intermediate position

The high vowels are rarely fully stressed Vocalic Identity Among Fully Stressed Nuclei

The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed Vocalic Identity Among Fully Stressed Nuclei

The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances Vocalic Identity Among Fully Stressed Nuclei

The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here) Vocalic Identity Among Fully Stressed Nuclei

The vowels of heavily accented syllables are (mostly) pronounced canonically Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Vocalic Variation – Importance of Stress Accent Canonical PronunciationsNon-Canonical Pronunciations

The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms That’s reinforced in patterns of deviation from canonical pronunciation Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented

Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare

Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height

Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon

Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon Virtually all 2-step height deviations occur in unaccented syllables

The Vowel Space Under (Full) Stress (Accent) In unaccented nuclei there is a relatively even distribution of segments across the vowel space, with a slight bias towards the front and central vowels Canonical Vowels Only

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The Vowel Space Without (Stress) Accent Canonical Vowels Only

In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The low and mid vowels “get creamed” The Vowel Space Without (Stress) Accent Canonical Vowels Only

Stress accent exerts a profound effect on the character of the vowel space The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Is It Stress? Vocalic Identity? Or What?

Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Thus, stress accent appears to be intricately involved with vocalic identity Is It Stress? Vocalic Identity? Or What?

PART EIGHT Stress Accent’s Impact on Syllable Onsets

Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access”

Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT

Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT, And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents)

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter) The duration of unaccented onsets is similar across syllable forms

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts

Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)

Place of Articulation – A Brief Primer The tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English Anterior Labial [p] [b] [m] Labio-dental [f] [v] Inter-dental [th] [dh] Central Alveolar [t] [d] [n] [s] [z] Posterior Palatal [sh] [zh] Velar [k] [g] [ng] From Daniloff (1973)

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …)

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …) Usually, non-canonical realizations are manifest as segmental deletions

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization)

Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization) Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? OK, Let’s go! Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] [dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Central segments tend to “disappear” under (absence of) stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place

Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place

Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions In heavily accented syllables, central segments maintain their canonical identity Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Duration - Posterior Place Posterior segments are remarkably stable in onset position Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – Posterior Place Posterior segments are remarkably stable in onset position The only significant “deviation” from canonical realization is the intrusion of the glottal stop [q], which lacks phonemic status in English Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

PART NINE Stress Accent’s Impact on Syllable Codas

Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets

Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents

Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration

Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)

Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms Canonical Syllable Forms

Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Canonical Syllable Forms

Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables Canonical Syllable Forms

Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and They tend to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to:

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions In some sense the default duration for central codas is very short

Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – POSTERIOR Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables The “infamous” glottal stop [q] tends to insert in this context Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms The segment [r] tends to delete in unaccented syllables, but not otherwise Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

PART TEN What’s Going on in Pronunciation?

With respect to onset and coda segments, there are two basic forms … What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] What’s Going On? (in pronunciation)

With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] None of these properties is consistent with a segmental model of language What’s Going On? (in pronunciation)

Synopsis The Rationale for a Juncture-Accent Model of Spoken Language

Take Home Messages Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language Take Home Messages

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes Take Home Messages

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Take Home Messages

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations Take Home Messages

Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations A preliminary juncture-accent model provides a potential starting point for developing more realistic (and robust) lexical representations Take Home Messages

That’s All, Folks Many Thanks for Your Time and Attention