Presentation is loading. Please wait.

Presentation is loading. Please wait.

Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014.

Similar presentations


Presentation on theme: "Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014."— Presentation transcript:

1 Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014

2  Can exemplar theory account for native listeners’ perception of intonation in English statements and questions? 2

3  Previous studies reveal significant variations in speech.  Peterson & Barney (1952): frequency of F1 (x-axis) vs. frequency of F2 (y-axis) for 10 vowels ( i, ɪ, ɛ, æ, ɑ ɔ, ʊ, u, ʌ, ɝ) produced by 76 speakers  How do listeners perceive speech sounds given the amount of variance? 3

4  Johnson (1997) proposed an exemplar theory to account for listeners’ perception of speech.  According to this theory (Johnson, 1997; Pierrehumbert, 2001), listeners store in memory the fine phonetic details of the words (or exemplars) that they hear, including sounds that are associated with the speaker’s identity, gender, and language.  When listeners hear a new word, they categorize the word with the exemplars in memory that are most similar to the new word, overall. 4

5  The objective of my project was to create an exemplar- based computational model that would learn to categorize English statements and questions based on how similar a sentence is with the previously encountered sentences, according to their intonation patterns.  If a similarity-based calculation model (Johnson, 1997) can accurately classify novel sentences at an acceptable rate on the basis of intonation alone, it can be expanded to account for the human perception of intonation more generally. 5

6 6

7  Reads in audio-recorded samples of speech sounds (in.wav format), e.g. Ann teaches history.  Removes any silence or noise before and after the speech sound. 7

8  This function analyzes the pitch contour of the input sentence for salient cues.  In English, the pitch of the voice tends to fall at the end of a statement but tends to rise at the end of an echo question (Wells, 2006). For example,  Statement: Mary has a little lamb.  Echo question: Mary has a little lamb? 8

9  This step first fills the gaps within a pitch contour using interpolation (a mathematical method) in order to create a continuous curve.  It then locates the nuclear tone in the sentence, that is, the last fall or rise. 9

10  In order to calculate how similar a new exemplar (i.e., sentence) is with other exemplars in ‘memory’, we used the following perceptual dimensions:  the speed of change in pitch value at the nuclear tone,  the direction of the change, and  the timing of the nuclear tone relative to its position in the sentence.  This step extracts these similarity measures from the new exemplars. E.g. for the statement, Ann teaches history.  Category = S, exemplar = e07a21S, speed = 537, direction = -1, time = 0.6, 10

11  In calculating similarities, the model assigns different weights to the dimensions.  For example, the direction of the nuclear tone (whether it is a fall or rise) may serve as a better cue in identifying the sentence type than the timing of the nuclear tone. If that is the case, direction would be weighted more heavily than timing.  This step trains the model to learn the weight distribution of the dimensions that would yield the best accuracy rate in categorizing new sentences. 11

12  This step tests how accurately the model can categorize statements and questions from a set of sentences that is different from the training set.  It uses the weighted sum of the dimensions to estimate to which category a new sentence belongs. (Johnson 1997:147) 12

13  To evaluate how well the model generalizes, this step uses a k-fold cross-validation (Refaeilzadeh et al., 2009).  K refers to the number of folds used.  In a k-fold cross-validation, the training and test data are separate in a given run but they cross-over in successive runs such that each exemplar gets tested (once and only once) eventually. For example, a 3-fold cross-validation 13

14  40 statements and 40 echo questions per speaker: 5 dialogues x 4 sentences x 2 repetitions  Speakers:  One male and one female (18 years old), native speakers of Canadian English  Recruited from the online LING 201 (Introduction to Linguistics) Research Participation System at the University of Calgary.  Received 1% credit towards their LING 201 course grades for completing the one-hour recording session. 14

15  The stimuli were recorded in the sound booth in the Phonetics Lab at the University of Calgary.  Statements and questions of 5, 7, 9, 11, and 13 syllables long; 4 pairs of statements and questions for each length  E.g.  Ann teaches history. Ann teaches history?  Alice went horse riding with a friend. Alice went horse riding with a friend?  Morris wants to visit the old mansion on Monday. Morris wants to visit the old mansion on Monday? 15

16  For testing, we used a 10-fold cross-validation.  There were 15 sentences that showed pitch halving or doubling so these sentences and their corresponding statements or questions were removed from the training and test data. The total number of sentences for each type reduced to 65.  All 65 questions had a rising intonation, but 5 of the 65 statements also had a rising intonation. 16

17  With all the weight on the direction dimension, the 10-fold cross-validation method  correctly trained 95.69% - 97.46% of the exemplars, and  correctly categorized statements (100%) and questions (75% - 100%). 17

18  How well the model categorizes the sentences depends on the intonation patterns of the sentences as well as the generalized weights.  The model works well for this data set when 100% of the weight is on the direction dimension. The accuracy declines when a weight is added to another dimension.  Therefore, this model would need to be modified in order to be able to deal with uptalk, a terminal rising intonation (Ladd, 2006), in statements.  It is also predicted to fail to work for languages that do not mainly rely on the pitch direction, such as Mandarin. 18

19  Mandarin is a tone language that uses lexical tones to differentiate meaning in words.  Some researchers (e.g. Yuan, Shih, & Kochanski, 2002) claim that Mandarin raises the pitch of the overall sentence to signal an echo question.  Can exemplar theory account for the perception of intonation in Mandarin sentences? 19

20  Johnson, K. (1997). Speech perception without speaker normalization: An exemplar model. In K. Johnson & J. W. Mullennix (Eds.), Talker variability in speech processing (pp. 145-165). San Diego: Academic Press.  Pierrehumbert, J. (2001). Exemplar dynamics: Word frequency, lenition, and contrast. In J. L. Bybee, & P. J. Hopper (Eds.), Frequency and emergence of linguistic structure (pp. 137-157). Philadelphia: John Benjamins.  Ladd, D. R. (2008). Intonational phonology. Cambridge: Cambridge University Press. 20

21  Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. In L. Liu & M. T. Zsu (Eds.), Encyclopedia of database systems (pp. 532-538). Springer Publishing Company Incorporated.  Wells, J. C. (2006). English intonation: An introduction. Cambridge: Cambridge University Press.  Yuan, J., Shih, C., & Kochanski, G. (2002). Comparison of declarative and interrogative intonation in Chinese. In B. Bel, & I. Marlien (Eds.), Proceedings of the Speech Prosody 2002 Conference (pp. 711-714). Aix-en-Provence: Laboratoire Parole et Langage. 21

22  This research was funded by the University of Calgary Program for Undergraduate Research Experience (PURE), awarded to Una Chow in 2013. 22

23  Thank you!  Comments? Questions? 23


Download ppt "Una Y. Chow Stephen J. Winters Alberta Conference on Linguistics November 1, 2014."

Similar presentations


Ads by Google