Presentation is loading. Please wait.

Presentation is loading. Please wait.

Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer.

Similar presentations


Presentation on theme: "Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer."— Presentation transcript:

1 Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer Science Institute 1947 Center Street, Berkeley, CA 94704 {steveng, hmcarvey, leahh, shawnc}@icsi.berkeley.edu

2 Acknowledgements and Thanks Research Funding U.S. Department of Defense U.S. National Science Foundation

3 For Further Information Consult the web site: www.icsi.berkeley.edu/~steveng

4 OVERTURE The Central Challenge for Models of Speech Recognition

5 The Serial Frame Perspective on Speech Traditional models of speech recognition assume the identity of a phonetic segment is derived from a detailed spectral profile of the acoustic signal computed for each time interval (frame) of speech

6 Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) ….

7 Phonemic Beads on a String Illustrated In traditional models of speech recognition words are represented as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string”

8 Phonemic Beads on a String Illustrated In traditional models of speech recognition words are conceptualized as mere sequences of phonetic segments (“phones”) …. Strung together like “beads on a string” No quarter is provided for stress accent or other syllabic properties

9 Language - The Traditional Perspective The “classical” view of spoken language posits a quasi-arbitrary relation between the lower and higher tiers of linguistic organization Cat= [k] + [ae] + [t] Cat = /k/ + /ae/ + /t/ ASR systems focus on decoding words from sequences of phones

10 A Challenge for the “Phonemic Beads on a String” Approach to Speech Recognition Pronunciation Variability

11 Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse

12 Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced

13 Pronunciation Variability of Real Speech Pronunciation patterns encountered in everyday life are extremely diverse There are literally dozens of ways in which common words are pronounced (as the following two slides illustrate for the word “AND” based on manual phonetic annotation of a corpus comprising telephone dialogues)

14 How Many Pronunciations of “and”? NPronunciationN Canonical pronunciation

15 How Many Pronunciations of “and”? NPronunciationN

16 Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard)

17 Pronunciation Variability of Real Speech The are literally dozens of ways in which common words are pronounced And as the following slide illustrates for the 20 most frequent words from the same corpus (Switchboard) (which together account for 35% of the word tokens in the corpus)

18 How Many Different Pronunciations? RankWordN#Pron Most Common Pronunciation MCP %Total The 20 most frequency words account for 35% of the tokens

19 QUESTION How do listeners decode the speech signal given the large amount of pronunciation variation?

20 PART ONE Anatomy of a Syllable

21 Language - A Syllable-Centric Perspective A more empirically grounded perspective of spoken language focuses on the SYLLABLE as the interface between “sound” and “meaning” Within this framework the relationship between the syllable and the higher and lower tiers is non-arbitrary and systematic statistically

22 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure

23 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position

24 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level)

25 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns

26 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset?

27 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus?

28 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is a onset? What is a nucleus? What is a coda?

29 The Importance of the Syllable The analyses to follow are all linked, in some fashion, to syllable structure In order to highlight patterns germane to variation in segmental duration it is necessary to partition the data in terms of syllable position (as well as stress accent level) As a consequence, we will examine the onsets, codas and nuclei of syllables separately in order to gain insight into the underlying patterns What is an onset? What is a nucleus? What is a coda? The following slides provide a brief (and gentle) introduction to syllable structure

30 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA “J” = JUNCTUREOGI Numbers95 corpus

31 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) “J” = JUNCTUREOGI Numbers95 corpus

32 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus

33 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) “J” = JUNCTUREOGI Numbers95 corpus

34 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) “J” = JUNCTUREOGI Numbers95 corpus

35 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) “J” = JUNCTUREOGI Numbers95 corpus

36 Syllable and Phonetic Segment Illustrated Syllables generally consist of three constituents - ONSET, NUCLEUS, CODA Virtually all syllables contain a NUCLEUS, which is VOCALIC (by definition) Most (but not all) syllables also contain an ONSET (usually a CONSONANT) Many syllables contain a CODA (also typically a CONSONANT) The most common syllable form in English is Onset + Nucleus + Coda (“Nine”) Followed in popularity by Onset + Nucleus (“Two”) Onset segments often differ in significant ways from coda segments “J” = JUNCTUREOGI Numbers95 corpus

37 PART TWO Spectro-Temporal Profiles

38 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation

39 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation STRESS ACCENT and JUNCTURE are two such properties

40 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail

41 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …..

42 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below ….. STePs are derived from averages of hundreds of individual instances

43 The Spectro-Temporal Profile (STeP) Certain specific (and important) properties of the syllable are not well represented in terms of the traditional 2.5-D spectrographic representation Stress Accent and Juncture are two such properties A different representation, based on the log, critical-band energy profile across frequency and time, can provide the requisite detail As shown in “miniature” below …. (and as shown in expanded form on the following slides) STePs are derived from averages of hundreds of individual instances

44 Spectro-Temporal Profile - DiSyllabic Word [s] [eh] [vx] [en] juncture accented syllable unaccented syllable “Seven” mean duration Full-spectrum perspective OGI Numbers95 [s] [eh] [vx] [en]

45 [s] [eh] [vx] [en] juncture accented syllable unaccented syllable mean duration “Seven” High-frequency perspective OGI Numbers95 [s] [eh] [vx] [en] Spectro-Temporal Profile - DiSyllabic Word

46 PART THREE Scientific Approach to Speech Recognition

47 Ascertain the contribution of …. A Scientific Approach to Speech Recognition

48 Ascertain the contribution of …. (1) phonetic segment (and feature) classification A Scientific Approach to Speech Recognition

49 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation A Scientific Approach to Speech Recognition

50 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and A Scientific Approach to Speech Recognition

51 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position A Scientific Approach to Speech Recognition

52 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance A Scientific Approach to Speech Recognition

53 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus A Scientific Approach to Speech Recognition

54 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length A Scientific Approach to Speech Recognition

55 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level A Scientific Approach to Speech Recognition

56 Ascertain the contribution of …. (1) phonetic segment (and feature) classification (2) phonetic segmentation (3) stress accent, and (4) syllable position to ASR performance Using the OGI Numbers95 Corpus as a controlled (limited vocabulary) corpus And a relatively transparent recognition engine utilizing the following variety of articulatory-based features: manner and place of articulation, voicing, vowel height, lip-rounding, spectral dynamics, segment length That are explicitly tied to syllable position (i.e., onset, nucleus and coda) and stress-accent level We will be comparing the “baseline” system (entirely automatic recognition) with an entirely “fabricated” set of input data (derived from hand-labeled phonetic annotation + autoSAL) as well as a “half-way house” system that is partially automatic and partially not (manually derived phonetic segmentation, as well as whether each segment is vocalic or not) A Scientific Approach to Speech Recognition

57 Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% Numbers95 Recognition – Stress Accent Impact

58 Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Numbers95 Recognition – Stress Accent Impact

59 Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Numbers95 Recognition – Stress Accent Impact

60 Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Numbers95 Recognition – Stress Accent Impact

61 Entirely Stress-Accent Dependent Results Word Error Rate Fabricated 1.3% Half-way House2.0% Baseline 5.6% The half-way house system is much closer in performance to the fabricated data version than to the baseline system, suggesting that …. Accurate phonetic segmentation is extremely important for enhanced ASR performance, as is knowledge of the location of the syllabic nucleus Stress-accent information most important for the vocalic nucleus – without it WER increases by 10-20% Also important for coda – WER increases by 7-15% Numbers95 Recognition – Stress Accent Impact

62 Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Numbers95 Recognition – Pronunciation Impact

63 Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Numbers95 Recognition – Pronunciation Impact

64 Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Coda segments are least canonical Numbers95 Recognition – Pronunciation Impact

65 Effect of pronunciation variation as a function of syllable position, where the “canonical” pronunciation is potentially fixed for each syllable position separately (or “All” together) “Standard” refers to regular recognition system Word Error Rate StandardOnsetNucleus Coda All Fabricated 1.29 1.33 1.61 1.63 1.76% Half-way House1.97 2.16 2.21 2.55 2.81% Baseline 5.59 5.91 5.91 6.70 7.03% Conclusions: Onset segments are most canonical Coda segments are least canonical Therefore, it is important to provide for pronunciation variation in ASR system Numbers95 Recognition – Pronunciation Impact

66 Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Numbers95 – Syllable Position Importance

67 Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Numbers95 – Syllable Position Importance

68 Effect of pronunciation variation as a function of syllable position, where each syllabic constituent is “neutralized” with respect to lexical matching (i.e., each element is factored out of the decoding process separately) “Standard” refers to the regular recognition system Word Error Rate Standard Onset Nucleus Coda Fabricated 1.29 9.70 5.95 3.92% Half-way House1.97 11.27 13.28 6.60% Baseline 5.59 15.70 20.22 10.13% Neutralization of the onset and nucleic elements exerts a greater impact on ASR performance than codas Conclusion: Onsets and nuclei are most important for lexical access in an ASR system (at least for the Numbers95 corpus) Numbers95 – Syllable Position Importance

69 PART FOUR Being Phonetically and Prosodically Annotated

70 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically annotated (labeled and segmented)

71 Phonetic Transcription of Spontaneous English Telephone dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually

72 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level

73 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level

74 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods

75 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material

76 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis)

77 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed

78 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality

79 Phonetic Transcription of Spontaneous English Telephone Dialogues of 5-10 minutes duration, from the SWITCHBOARD corpus, have been phonetically transcribed (labeled and segmented) Most of this material has been annotated manually 4 hours labeled at the phone level and segmented at the syllabic level 1 hour labeled and segmented at the phonetic-segment level The remaining material segmented at the phonetic-segment level using automatic methods 45 minutes of hand-labeled stress-accent material An additional four hours of stress-accent material automatically labeled (though unused in the current analysis) There is a Lot of Diversity in the Material Transcribed Spans speech of both genders (ca. 50/50%), reflecting a wide range of American dialectal variation, speaking rate and voice quality Transcription System A variant of Arpabet, with phonetic diacritics such as:_gl,_cr, _fr, _n, _vl, _vd

80 Phonetic Transcription of Spontaneous English The Data are Available at ….

81 Phonetic Transcription of Spontaneous English The Data are Available at …. http://www.icsi/berkeley.edu/real/stp

82 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent

83 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished:

84 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: Heavy

85 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLight

86 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone

87 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone

88 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others)

89 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary)

90 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5)

91 Annotation of Stress Accent Forty-five minutes of the phonetically annotated portion of the Switchboard corpus was manually labeled with respect to stress accent Three levels of accent were distinguished: HeavyLightNone (In actuality, labelers assigned a “1” to fully accented syllables, a “null” to completely unaccented syllables, and a “0.5” to all others) An example of the annotation (attached to the vocalic nucleus) is shown below (where the accent levels could not be derived from a dictionary) In this example most of the syllables are unaccented, with two labeled as lightly accented (0.5) (and one other labeled as very lightly accented (0.25))

92 The data are available at …. Annotation of Stress Accent

93 The data are available at …. http://www.icsi/berkeley.edu/~steveng/prosody Annotation of Stress Accent

94 Automatic Labeling of Stress Accent This forty-five minutes of hand-labeled phonetic and prosodic annotation from the Switchboard corpus was used as training data for development of an Automatic Stress Accent Labeling System (AutoSAL)

95 How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step

96 How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step

97 How Good is AutoSAL? There is an 79% concordance between human and machine accent labels when the tolerance level is a quarter-step There is 97.5% concordance when the tolerance level is half a step This degree of concordance is as high as that exhibited by two highly trained (human) transcribers

98 PART FIVE Stress Accent and Syllable Position

99 The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent

100 The Importance of Syllable Structure Before going into the details of durational variation at the segmental level we briefly examine some general patterns of pronunciation variation that are conditioned by syllable position and stress accent These data serve to illustrate the sort of variation observed that is conditioned by position within the syllable

101 All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable CODA Territory ONSET Territory NUCLEUS Territory

102 All Segments Pronunciation Variation – Syllable and Accent Deletions Insertions Substitutions Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account CODA Territory ONSET Territory NUCLEUS Territory

103 Pronunciation Variation – Syllable and Accent Pronunciation variation is systematic at the level of the syllable Particularly when stress accent is also taken into account BOTH syllable structure and accent level are required for a full accounting All Segments Deletions Insertions Substitutions CODA Territory ONSET Territory NUCLEUS Territory

104 PART SIX Durational Properties of Pronunciation Variation

105 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position

106 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration

107 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level

108 Analysis of Durational Properties of Speech The following analyses are conditioned on stress accent level and (for the most part) syllable position We’ll begin with analyses illustrating the patterns associated with three levels of stress accent (heavy, light and none) to show the graded nature of the durational properties pertaining to syllable and segment duration However, for purposes of illustrative clarity, many of the slides will show only two levels of accent (heavy and none) in order to delineate the differences in duration associated with stress accent level Under such conditions, the durational properties associated with light accent are generally intermediate between heavy accent and none

109 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English

110 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables V = Vowel C = Consonant

111 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables V = Vowel C = Consonant

112 Syllable Duration - Across Syllable Forms There is a broad range of syllable structures observed in spoken English The CV and CVC forms cover ca. 60% of the syllables Together, the V, VC, CV and CVC forms account for 85% of syllables The CVCC and CCVC (complex syllable) forms account for another 10% V = Vowel C = Consonant

113 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Canonical Syllable Forms V = Vowel C = Consonant

114 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” Canonical Syllable Forms V = Vowel C = Consonant

115 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration Canonical Syllable Forms V = Vowel C = Consonant

116 Syllable Duration - Across Syllable Forms It is unsurprising that syllable duration is largely a function of the number of segments within the syllable (as shown in the graph below) Note the systematic lengthening of the syllable for each form as the accent level increases from “NONE” to “LIGHT “to “HEAVY” This pattern is representative of accent’s impact on duration (as we’ll see) Canonical Syllable Forms V = Vowel C = Consonant

117 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) V = Vowel C = Consonant

118 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts V = Vowel C = Consonant

119 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) V = Vowel C = Consonant

120 Syllable Duration - Accent Level/Syllable Form Canonical Syllable Forms This graph shows the same data as the previous slides, but from the perspective of just two accent levels (“HEAVY” and “NONE”) The heavily accented syllables are generally 60-100% longer than their unaccented counterparts The disparity in duration is most pronounced for syllable forms with one or no consonants (i.e., V, VC, CV) This pattern implies that accent has the greatest impact on vocalic duration V = Vowel C = Consonant

121 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below

122 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts

123 Canonical Syllable Forms Nucleus Duration - Accent Level/Syllable Form The hypothesis delineated on the previous slide (that accent has the most profound impact on vocalic duration) is confirmed in the graph below Vowels in accented syllables (of all forms) are at least twice as long as their unaccented counterparts This pattern implies that the syllable nucleus absorbs a major component of accent’s impact (at least as far as duration is concerned)

124 PART SEVEN Stress Accent and the Vocalic Nucleus

125 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration Stress Accent’s Impact on the Vocalic Nucleus

126 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form Stress Accent’s Impact on the Vocalic Nucleus

127 Because the pattern of stress accent’s impact on vocalic duration is relatively uniform across syllable form it is likely that the specific structure of the syllable has relatively little impact on vocalic duration As a consequence, the remaining analyses pertaining to accent’s impact on vocalic duration collapse the data across syllable form We now examine vocalic duration in somewhat greater detail and illustrate how duration, stress accent and vocalic identity interact Stress Accent’s Impact on the Vocalic Nucleus

128 The Spatial Patterning of Duration in Vocalic Nuclei

129 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue A Brief Primer on Vocalic Acoustics

130 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance A Brief Primer on Vocalic Acoustics

131 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 A Brief Primer on Vocalic Acoustics

132 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

133 Vowel quality is generally thought to be a function primarily of two articulatory properties – both related to the motion of the tongue The front-back plane is most closely associated with the second formant frequency (or more precisely F2 - F1) and the volume of the front-cavity resonance The height parameter is closely linked to the frequency of F1 In the classic vowel “triangle,” segments are positioned in terms of the tongue positions associated with their production, as follows: A Brief Primer on Vocalic Acoustics

134 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position Spatial Patterning of Duration et al.

135 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) Spatial Patterning of Duration et al.

136 In the following slides duration is plotted on a 2-D grid, where the x-axis represents the (hypothetical) front-back tongue position (and hence remains a constant throughout the plots to follow) The y-axis serves as the dependent measure, expressed in terms of either duration or the proportion of fully stressed (or unstressed) nuclei Spatial Patterning of Duration et al.

137 Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration

138 Vocalic Duration and Vowel Height The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

139 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels

140 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height

141 Vocalic Duration and Vowel Height All nuclei DiphthongsMonophthongs The spatial patterning of vocalic segments is systematic with respect to duration Low vowels, be they diphthongs or monophthongs, are longer (on average) than high vowels Thus, duration appears to be highly correlated with vowel height But … the situation is a little more complicated than first appearances would suggest

142 Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Canonical Syllable Forms

143 Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms

144 Durational Differences - Stressed/Unstressed There is a large dynamic range in duration between accented and unaccented vocalic nuclei Moreover, diphthongs and tense, low monophthongs tend to exhibit a larger dynamic range than the lax monophthongs Canonical Syllable Forms Lax monophthongs

145 Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed

146 Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed

147 Vocalic Identity Among Unstressed Nuclei The high, lax monophthongs are almost always unstressed The low vowels, be they monophthongs or diphthongs, are rarely unstressed The high diphthongs and high/mid, tense monophthongs occupy an intermediate position

148 The high vowels are rarely fully stressed Vocalic Identity Among Fully Stressed Nuclei

149 The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed Vocalic Identity Among Fully Stressed Nuclei

150 The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances Vocalic Identity Among Fully Stressed Nuclei

151 The high vowels are rarely fully stressed The low vowels, be they monophthongs or diphthongs, are far more likely to be fully stressed An intermediate degree of stress accounts for the other vocalic instances (but will not be addressed here) Vocalic Identity Among Fully Stressed Nuclei

152 The vowels of heavily accented syllables are (mostly) pronounced canonically Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

153 The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

154 The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Vocalic Variation – Importance of Stress Accent Canonical PronunciationsNon-Canonical Pronunciations

155 The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

156 The vowels of heavily accented syllables are (mostly) pronounced canonically Low vowels are largely the province of accented syllables, and High vowels the province of unaccented syllables Moreover, there’s a lexical bias towards high vowels for unaccented forms That’s reinforced in patterns of deviation from canonical pronunciation Canonical PronunciationsNon-Canonical Pronunciations Vocalic Variation – Importance of Stress Accent

157 Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented

158 Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare

159 Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height

160 Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon

161 Vocalic Height Deviation from Canonical Amount of ChangeDirection of Change Vowels are more likely to RISE in height than to descend when unaccented Vocalic lowering of height is rare Most deviations from the canonical maintain vowel height More than a single height step deviation is uncommon Virtually all 2-step height deviations occur in unaccented syllables

162 The Vowel Space Under (Full) Stress (Accent) In unaccented nuclei there is a relatively even distribution of segments across the vowel space, with a slight bias towards the front and central vowels Canonical Vowels Only

163 In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The Vowel Space Without (Stress) Accent Canonical Vowels Only

164 In unaccented syllables vowels are confined largely to the high-front and high-central sectors of the articulatory space The low and mid vowels “get creamed” The Vowel Space Without (Stress) Accent Canonical Vowels Only

165 Stress accent exerts a profound effect on the character of the vowel space The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

166 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

167 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

168 Stress accent exerts a profound effect on the character of the vowel space High vowels are largely associated with unaccented syllables Low vowels are mostly associated with accented forms This distinction between accented and unaccented syllables is of profound importance for understanding (and modeling) pronunciation variation The Vowel Spaces Compared Heavily AccentedUnaccented Canonical Vowels Only

169 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse Is It Stress? Vocalic Identity? Or What?

170 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) Is It Stress? Vocalic Identity? Or What?

171 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Is It Stress? Vocalic Identity? Or What?

172 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels Is It Stress? Vocalic Identity? Or What?

173 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Is It Stress? Vocalic Identity? Or What?

174 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent Is It Stress? Vocalic Identity? Or What?

175 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs Is It Stress? Vocalic Identity? Or What?

176 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed Is It Stress? Vocalic Identity? Or What?

177 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Is It Stress? Vocalic Identity? Or What?

178 Duration appears to play an important (but certainly not exclusive) role in stress accent for spontaneous American English discourse For any given vocalic class, stressed segments are longer (on average) The durational disparity is most pronounced among the low vowels and the diphthongs Low vowels tend to be much longer in duration than high vowels This is the case even for diphthongs Low vowels are rarely without some measure of stress accent This is true for monophthongs as well as diphthongs High vowels are RARELY fully stressed This is particularly so for monophthongs, but also applies to diphthongs Thus, stress accent appears to be intricately involved with vocalic identity Is It Stress? Vocalic Identity? Or What?

179 PART EIGHT Stress Accent’s Impact on Syllable Onsets

180 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access”

181 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level

182 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level

183 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT

184 Stress Accent and Syllable Onsets The onset is often cited as the key syllabic constituent with respect to “lexical access” It is therefore of interest to ascertain how the onset’s duration behaves as a function of accent level Because of the onset’s key role in lexical access one might assume that its duration would be relatively stable across accent level The following slides suggest that this assumption is INCORRECT, And that the structure of the onset is more complex (and more interesting) than initial intuition would suggest

185 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents)

186 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form

187 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter)

188 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form The duration of the syllable onset varies significantly as a function of accent level (though not quite as much as exhibited by vocalic constituents) Onset duration is similar across syllable form (except that segments comprising complex onsets [i.e., CCVC] are slightly shorter) The duration of unaccented onsets is similar across syllable forms

189 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts

190 Canonical Syllable Forms Onset Duration - Accent Level/Syllable Form Onsets of accented syllables are generally 50-60% longer than their unaccented counterparts Although this durational difference is not quite as large as observed for vocalic nuclei, it is still substantial (and mostly consistent across forms)

191 Place of Articulation – A Brief Primer The tongue contacts (or nearly so) the roof of the mouth in producing many of the consonantal sounds in English Anterior Labial [p] [b] [m] Labio-dental [f] [v] Inter-dental [th] [dh] Central Alveolar [t] [d] [n] [s] [z] Posterior Palatal [sh] [zh] Velar [k] [g] [ng] From Daniloff (1973)

192 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …)

193 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not …) Usually, non-canonical realizations are manifest as segmental deletions

194 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level

195 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences

196 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization)

197 Segmental Identity and Stress Accent It is of interest to compare accent’s impact on segmental duration with its impact on segmental realization (i.e., whether the segment is realized canonically or not... ) Usually, non-canonical realizations are manifest as segmental deletions The pattern of segmental realization bears some correspondence to durational variation as a function of accent level But also exhibits some interesting differences (which are potentially significant for models of phonetic organization) Before we examine the segmental patterns in detail, a brief primer on the interpretation of these data is presented

198 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

199 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

200 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

201 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

202 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

203 Road Map - How to Interpret the Data Compare the numbers in the YELLOW and ORANGE columns Most numbers in the YELLOW / ORANGE columns will be similar Indicating that the phonetic realization of the segment is the canonical form A large disparity between columns is marked with a blue box READY? OK, Let’s go! Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

204 Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

205 Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

206 Syllable Onset Statistics – ANTERIOR Place Stress accent has relatively little impact on anterior onset segments EXCEPT for [dh] and [y] [dh] (as in “the” and “them”) tends to delete in unaccented syllables, as does [y] (although to a lesser extent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

207 Central segments tend to “disappear” under (absence of) stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place

208 Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions Can = Canonical form Trans = Transcribed (i.e., phonetically realized) Syllable Onset Statistics – CENTRAL Place

209 Central segments tend to “disappear” under (absence) of stress (accent) There is also a tendency for flaps ([dx] and [nx]) to insert under similar conditions In heavily accented syllables, central segments maintain their canonical identity Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

210 Syllable Onset Duration - Posterior Place Posterior segments are remarkably stable in onset position Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

211 Syllable Onset Statistics – Posterior Place Posterior segments are remarkably stable in onset position The only significant “deviation” from canonical realization is the intrusion of the glottal stop [q], which lacks phonemic status in English Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

212 Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

213 Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

214 Syllable Onset Statistics – Place Chameleons “Chameleons” assimilate their place of articulation to the following vowel They are relatively stable at syllable onset, except in unaccented forms The reduced form of [l] is [lg], a glide-like element – it tends to assume the functional status of [l] in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

215 PART NINE Stress Accent’s Impact on Syllable Codas

216 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets

217 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration)

218 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents

219 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration

220 Stress Accent and Syllable Codas Stress accent’s impact on syllable codas differs from that of onsets The disparity in duration between accented and unaccented forms tends to be significantly less for codas than for onsets (at least when deletions are omitted from consideration) There is a far greater probability of segmental deletion in coda constituents Accent level exerts a powerful influence on segmental deletion and on segmental duration To a certain degree segmental deletion and duration interact (or are flip sides of the same phonetic coin)

221 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms Canonical Syllable Forms

222 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Canonical Syllable Forms

223 Coda Duration - Accent Level/Syllable Form Coda duration (on average) is similar across syllable structure, both for accented and unaccented forms There is a relatively small dynamic range in duration between accented and unaccented codas (relative to onsets and nuclei) Moreover, the duration of certain coda constituents is virtually identical in accented and unaccented syllables Canonical Syllable Forms

224 Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

225 Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

226 Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

227 Syllable Coda Statistics – Anterior Place Anterior coda segments are relatively stable under stress (accent) The segments [m] and [v] are exceptions – they often function as “flaps” in this context, and They tend to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

228 Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

229 Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

230 Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

231 Syllable Coda Statistics – Central Place Central coda segments are extremely unstable under stress (accent) (except for the fricatives [s] and [z]) The segments [t], [d] and [n] tend to delete in coda position, even in heavily accented syllables The major effect of stress accent is its affect on the probability of segmental deletion (which is appreciably higher in unaccented forms) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

232 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties

233 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration

234 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to:

235 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets

236 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas

237 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas

238 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

239 Syllable Coda Duration - CENTRAL Place CANONICAL Syllable Forms The centrally articulated codas exhibit a high probability of deletion, particularly in unaccented syllables – this affects durational properties The duration of many of the coda segments do not exhibit a difference in duration Many of the unaccented central codas are short in duration, in contrast to: (1) central onsets, (2) anterior codas, (3) posterior codas, (4) chameleon codas

240 Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions

241 Syllable Coda Duration - CENTRAL Place ALLSyllable Forms Because of the high probability of deletions for central coda consonants the mean durations are quite low relative to other conditions In some sense the default duration for central codas is very short

242 Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

243 Syllable Coda Statistics – Posterior Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

244 Syllable Coda Statistics – POSTERIOR Place Posterior coda segments are relatively stable under stress (accent) The primary exception is [ng], which tends to delete in unaccented syllables The “infamous” glottal stop [q] tends to insert in this context Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

245 Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

246 Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

247 Syllable Coda Statistics – Place Chameleons Chameleon segments are unstable under stress (accent) This is particularly true for [l] (for all levels of accent), where many canonical segments transmute into [lg], particularly in accented forms The segment [r] tends to delete in unaccented syllables, but not otherwise Can = Canonical form Trans = Transcribed (i.e., phonetically realized)

248 PART TEN What’s Going on in Pronunciation?

249 With respect to onset and coda segments, there are two basic forms … What’s Going On? (in pronunciation)

250 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and What’s Going On? (in pronunciation)

251 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not What’s Going On? (in pronunciation)

252 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior What’s Going On? (in pronunciation)

253 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables What’s Going On? (in pronunciation)

254 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position What’s Going On? (in pronunciation)

255 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – What’s Going On? (in pronunciation)

256 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented What’s Going On? (in pronunciation)

257 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented What’s Going On? (in pronunciation)

258 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space What’s Going On? (in pronunciation)

259 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space What’s Going On? (in pronunciation)

260 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop What’s Going On? (in pronunciation)

261 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well What’s Going On? (in pronunciation)

262 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] What’s Going On? (in pronunciation)

263 With respect to onset and coda segments, there are two basic forms … (1) those that are relatively stable across accent level, and (2) those that are not Most of the non-continuants (i.e., stops and nasals) are stable when the locus of articulation constriction is either anterior or posterior The centrally articulated stops and nasals are highly unstable, particularly in coda position and in unaccented syllables The place chameleons (i.e., the approximants) are not very stable in either onset or coda position The vowels form two basic groups – (1) accented and (2) unaccented The accented vowels are generally canonically realized and quasi-evenly distributed across the vowel space The unaccented forms tend to concentrate in the high-front and high-central regions of the vowel space Certain segments are actually junctures – e.g., the flaps and the glottal stop Several other so-called segments are junctures as well (as they function like flaps), the most noteworthy examples are [dh] and [v] None of these properties is consistent with a segmental model of language What’s Going On? (in pronunciation)

264 Synopsis The Rationale for a Juncture-Accent Model of Spoken Language

265 Take Home Messages Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn:

266 The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language Take Home Messages

267 Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes Take Home Messages

268 Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Take Home Messages

269 Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations Take Home Messages

270 Based on a detailed analysis of a manually annotated corpus of spontaneous American English (Switchboard) the following conclusions are drawn: The pattern of pronunciation variation observed is inconsistent with a segmental model of spoken language The pronunciation patterns observed cut across segment and articulatory- feature classes The patterns observed display systematic variation when syllable structure and stress accent are taken into account Therefore, future-generation speech recognition systems need to build syllable structure and stress-accent information into pronunciation models and lexical representations A preliminary juncture-accent model provides a potential starting point for developing more realistic (and robust) lexical representations Take Home Messages

271 That’s All, Folks Many Thanks for Your Time and Attention


Download ppt "Beyond the Phoneme A Juncture-Accent Model of Spoken Language Steven Greenberg, Hannah Carvey, Leah Hitchcock and Shuangyu Chang International Computer."

Similar presentations


Ads by Google