Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA {steveng, hmcarvey, leahh,
Acknowledgements and Thanks Research Funding U.S. Department of Defense U.S. National Science Foundation
For Further Information Consult the web site:
OVERTURE The Central Challenge for Models of Speech Recognition
The Serial Frame Perspective on Speech Traditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral profile of the acoustic signal computed for each time interval (frame) of speech
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) ….
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string”
Phonemic Beads on a String Illustrated In traditional models of speech recognition words are conceptualized as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” No quarter is provided for stress accent or other syllabic properties
Language - The Traditional Perspective The “classical” view of spoken language posits a quasi-arbitrary relation between the lower and higher tiers of linguistic organization Cat= [k] + [ae] + [t] Cat = /k/ + /ae/ + /t/ ASR systems focus on decoding words from sequences of phones
A Challenge for the “Phonemic Beads on a String” Approach to Speech Recognition Pronunciation Variability
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced
Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced (as the following two slides illustrate for the word “AND” based on manual phonetic annotation of a corpus comprising telephone dialogues)
How Many Pronunciations of “and”? NPronunciationN Canonical pronunciation
How Many Pronunciations of “and”? NPronunciationN
Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)
Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard) (which together account for 35% of the word tokens in the corpus)
How Many Different Pronunciations? RankWordN#Pron Most Common Pronunciation MCP %Total The 20 most frequency words account for 35% of the tokens
QUESTION How do listeners decode the speech signal given the large amount of pronunciation variation?
PART ONE Anatomy of a Syllable
Language - A Syllable-Centric Perspective A more empirically grounded perspective of spoken language focuses on the SYLLABLE as the interface between “sound” and “meaning” Within this framework the relationship between the syllable and the higher and lower tiers is non-arbitrary and systematic statistically
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus? What is a coda?
The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset? What is a nucleus? What is a coda? The following slides provide a brief (and gentle) introduction to syllable structure
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) “J” = JUNCTUREOGI Numbers95 corpus
Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Onset segments often differ in significant ways from coda segments “J” = JUNCTUREOGI Numbers95 corpus
PART TWO Spectro-Temporal Profiles
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …..
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances
The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on the following slides) STePs are derived from averages of hundreds of individual instances
Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en]
[s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en] Spectro-Temporal Profile - DiSyllabic Word
PART THREE Scientific Approach to Speech Recognition
Ascertain the contribution of …. A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level A Scientific Approach to Speech Recognition
Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level We will be comparing the “baseline” system (entirely automatic recognition) with an entirely “fabricated” set of input data (derived from hand-labeled phonetic annotation + autoSAL) as well as a “half-way house” system that is partially automatic and partially not (manually derived phonetic segmentation, as well as whether each segment is vocalic or not) A Scientific Approach to Speech Recognition
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% Numbers95 Recognition – Stress Accent Impact
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Numbers95 Recognition – Stress Accent Impact
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Numbers95 Recognition – Stress Accent Impact
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Numbers95 Recognition – Stress Accent Impact
Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Also important for coda – WER increases by 7-15% Numbers95 Recognition – Stress Accent Impact
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Numbers95 Recognition – Pronunciation Impact
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Numbers95 Recognition – Pronunciation Impact
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Coda segments are least canonical Numbers95 Recognition – Pronunciation Impact
Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated % Half-way House % Baseline % Conclusions: Onset segments are most canonical Coda segments are least canonical Therefore, it is important to provide for pronunciation variation in ASR system Numbers95 Recognition – Pronunciation Impact
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Numbers95 – Syllable Position Importance
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Numbers95 – Syllable Position Importance
Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated % Half-way House % Baseline % Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Conclusion: Onsets and nuclei are most important for lexical access in an ASR system (at least for the Numbers95 corpus) Numbers95 – Syllable Position Importance
PART FOUR Being Phonetically and Prosodically Annotated
Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically annotated (labeled and segmented)
Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis)
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality
Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality Transcription System A variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd
Phonetic Transcription of Spontaneous English The Data are Available at ….
Phonetic Transcription of Spontaneous English The Data are Available at ….
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished:
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: Heavy
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLight
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)
Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))
The data are available at …. Annotation of Stress Accent
The data are available at …. Annotation of Stress Accent
Automatic Labeling of Stress Accent This forty-five minutes of hand-labeled phonetic and prosodic annotation from the Switchboard corpus was used as training data for development of an Automatic Stress Accent Labeling System (AutoSAL)
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step
How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step This degree of concordance is as high as that exhibited by two highly trained (human) transcribers
PART FIVE Stress Accent and Syllable Position
The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent
The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable
All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable CODA Territory ONSET Territory NUCLEUS Territory
All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account CODA Territory ONSET Territory NUCLEUS Territory
Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account BOTH syllable structure and accent level are required for a full accounting All Segments Deletions Insertions Substitutions CODA Territory ONSET Territory NUCLEUS Territory
PART SIX Durational Properties of Pronunciation Variation
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level
Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables The CVCC and CCVC (complex syllable) forms account for another 10% V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Canonical Syllable Forms V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” Canonical Syllable Forms V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration Canonical Syllable Forms V = Vowel C = Consonant
Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration (as we’ll see) Canonical Syllable Forms V = Vowel C = Consonant
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) V = Vowel C = Consonant
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts V = Vowel C = Consonant
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) V = Vowel C = Consonant
Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally % longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has the greatest impact on vocalic duration V = Vowel C = Consonant
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts
Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)
PART SEVEN Stress Accent and the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form Stress Accent’s Impact on the Vocalic Nucleus
Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact Stress Accent’s Impact on the Vocalic Nucleus
The Spatial Patterning of Duration in Vocalic Nuclei
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics
Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position Spatial Patterning of Duration et al.
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) Spatial Patterning of Duration et al.
In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei Spatial Patterning of Duration et al.
Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration
Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height
Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height But … the situation is a little more complicated than first appearances would suggest
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Canonical Syllable Forms
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms
Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms Lax monophthongs
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed
Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed The high diphthongs and high/mid, tense monophthongs occupy an intermediate position
The high vowels are rarely fully stressed Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances Vocalic Identity Among Fully Stressed Nuclei
The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here) Vocalic Identity Among Fully Stressed Nuclei
The vowels of heavily accented syllables are (mostly) pronounced canonically Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Vocalic Variation – Importance of Stress Accent Canonical PronunciationsNon-Canonical Pronunciations
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms That’s reinforced in patterns of deviation from canonical pronunciation Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon
Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon Virtually all 2-step height deviations occur in unaccented syllables
The Vowel Space Under (Full) Stress (Accent) In unaccented nuclei there is a relatively even distribution of segments across the vowel space, with a slight bias towards the front and central vowels Canonical Vowels Only
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The Vowel Space Without (Stress) Accent Canonical Vowels Only
In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The low and mid vowels “get creamed” The Vowel Space Without (Stress) Accent Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Is It Stress? Vocalic Identity? Or What?
Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Thus, stress accent appears to be intricately involved with vocalic identity Is It Stress? Vocalic Identity? Or What?
PART EIGHT Stress Accent’s Impact on Syllable Onsets
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access”
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT
Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT, And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents)
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter) The duration of unaccented onsets is similar across syllable forms
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts
Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)
Place of Articulation – A Brief Primer The tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English Anterior Labial [p] [b] [m] Labio-dental [f] [v] Inter-dental [th] [dh] Central Alveolar [t] [d] [n] [s] [z] Posterior Palatal [sh] [zh] Velar [k] [g] [ng] From Daniloff (1973)
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …)
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …) Usually, non-canonical realizations are manifest as segmental deletions
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization)
Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization) Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? OK, Let’s go! Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] [dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Central segments tend to “disappear” under (absence of) stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place
Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place
Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions In heavily accented syllables, central segments maintain their canonical identity Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Duration - Posterior Place Posterior segments are remarkably stable in onset position Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – Posterior Place Posterior segments are remarkably stable in onset position The only significant “deviation” from canonical realization is the intrusion of the glottal stop [q], which lacks phonemic status in English Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
PART NINE Stress Accent’s Impact on Syllable Codas
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration
Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms Canonical Syllable Forms
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Canonical Syllable Forms
Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables Canonical Syllable Forms
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and They tend to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to:
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas
Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas
Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions
Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions In some sense the default duration for central codas is very short
Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – POSTERIOR Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables The “infamous” glottal stop [q] tends to insert in this context Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms The segment [r] tends to delete in unaccented syllables, but not otherwise Can = Canonical form Trans = Transcribed (i.e., phonetically realized)
PART TEN What’s Going on in Pronunciation?
With respect to onset and coda segments, there are two basic forms … What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] What’s Going On? (in pronunciation)
With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] None of these properties is consistent with a segmental model of language What’s Going On? (in pronunciation)
Synopsis The Rationale for a Juncture-Accent Model of Spoken Language
Take Home Messages Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:
The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language Take Home Messages
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes Take Home Messages
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Take Home Messages
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations Take Home Messages
Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations A preliminary juncture-accent model provides a potential starting point for developing more realistic (and robust) lexical representations Take Home Messages
That’s All, Folks Many Thanks for Your Time and Attention