Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge

Similar presentations


Presentation on theme: "Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge"— Presentation transcript:

1 Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge kem37@cam.ac.uk IAFPA 2006

2 Speaker characteristics and static features of speech Most previous research has focussed on static features - instantaneous, average Straightforward to measure Natural progression from other research areas – delineation of different languages and language varieties

3 Reflect certain anatomical dimensions of a speaker, e.g. formant frequencies ~ length and configuration of VT Instantaneous and average measures - demonstrate speaker differences, but unable to distinguish all members of a population  look to dynamic (time-varying) features Speaker characteristics and static features of speech

4 More information than static Reflect movement of a person’s speech organs as well as dimensions - people move in individual ways for skilled motor activities - walking, running, … and speech Dynamic features of speech

5 can view speech as achievement of a series of linguistic ‘targets’ speakers likely to exhibit similar properties at ‘targets’ (e.g. segment midpoints), but move between these in individual ways  examine formant frequency dynamics

6 Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics

7 Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) 10% Formant dynamics

8 Time (s) / a ɪ / in ‘bike’ uttered by two male speakers of Australian English Frequency (Hz) Time (s) Formant dynamics

9 How do speakers’ formant dynamics reflect individual differences in the production of the sequence /  /? How can this dynamic information be captured to characterise individual speakers? Research Questions

10 bike hike like mike spike / ba I k / / ha I k / / la I k / / ma I k / / spa I k / Target words: /aIk//aIk/

11 e.g. I don’t want the scooter, I want the bike now. Later won’t do, I want the bike now. 5 repetitions x 5 words (bike, hike, like, mike, spike) x 2 stress levels (nuclear, non-nuclear) x 2 speaking rates (normal, fast) = 100 tokens per subject Data set

12 5 adult male native speakers of Australian English (A, B, C, D, E) aged 22-28 Brisbane/Gold Coast, Queensland Subjects

13 Speaker A “bike” (normal-nuclear)

14 1 2 Speaker A “bike” (normal-nuclear)

15 1 2 10 20 30 40 50 60 70 80 90% Speaker A “bike” (normal-nuclear)

16 1 2 10 20 30 40 50 60 70 80 90% Speaker A “bike” (normal-nuclear)  F3  F2  F1 F3 F2 F1

17 F1 normal-nuclear Frequency (Hz) +10% step of / a  /

18 F2 normal-nuclear Frequency (Hz) +10% step of / a  /

19 F3 normal-nuclear Frequency (Hz) +10% step of / a  /

20 Discriminant Analysis Multivariate technique used to determine whether a set of predictors (formant frequency measurements) can be combined to predict group (speaker) membership (ref. Tabachnick and Fidell 1996)

21 Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour

22 Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Each datapoint represents 1 token Each speaker’s tokens are represented with a different colour e.g. Speaker E’s 25 tokens of /a ɪ k /

23 Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE DA constructs discriminant functions which maximise differences between speakers (each function is a linear combination of the formant frequency predictors)

24 Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage…

25 Discriminant Analysis fast-nuclear Function 1 6420-2-4-6 Function 2 6 4 2 0 -2 -4 ABCDEABCDE Assess how well the predictors distinguish speakers by extent of clustering of tokens + classification percentage… 95%

26 Discriminant Analysis 95% 88% 95% 89%

27 Discussion DA scatterplots and classification rates promising However, not very efficient – method essentially based on a series of instantaneous measurements, probably containing dependent information Recall: individuals’ F1 contours of /a ɪ k/ …

28 F1 normal-nuclear Frequency (Hz) +10% step of / a  /

29 A new approach … Differences in location in frequency range Differences in curvature – location of turning points, convex/concave, steep/shallow Need to capture most defining aspects of the contours efficiently  linear regression to parameterise curves with polynomial equations

30 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x

31 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x

32 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x

33 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x

34 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept

35 Linear regression Technique for determining equation of a line or curve which approximates the relationship between a set of ( x, y ) points y x y = a 0 + a 1 x y- intercept gradient

36 Linear regression Can also be used for curvilinear relationships y x

37 Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y x

38 Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept y x

39 Linear regression Can also be used for curvilinear relationships quadratic: y = a 0 + a 1 x + a 2 x 2 y- intercept determine shape and direction of curve y x

40 Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5

41 Polynomial Equations x x x y y y Cubic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 Quartic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 Quintic y = a 0 + a 1 x + a 2 x 2 + a 3 x 3 + a 4 x 4 + a 5 x 5

42 /a  k/ data fit F1, F2, F3 contours with polynomial equations test the reliability of the polynomial coefficients in distinguishing speakers Quadratic: y = a 0 + a 1 t + a 2 t 2 Cubic: y = a 0 + a 1 t + a 2 t 2 + a 3 t 3

43 actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t 2 Cubic fit: y = 478.85 - 46.07t + 35.62t 2 - 3.46t 3 “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t

44 actual data points Quadratic fit: y = 420.68 + 79.26t - 5.92t 2 R = 0.879 Cubic fit: y = 478.85 - 46.07t + 35.62t 2 - 3.46t 3 R = 0.978 “bike”, Speaker A (normal-nuclear token 1) Frequency (Hz) Normalised time F1 contour y t

45 “bike”, Speaker A (normal-nuclear token 1) actual data points Quadratic fit: y = 876.01 - 53.24t + 22.46t 2 R = 0.985 Cubic fit: y = 825.49 + 55.64t - 13.63t 2 + 3.01t 3 R = 0.991 Frequency (Hz) Normalised time F2 contour y t

46 DA on polynomial coefficents Quadratic 3 formants x 3 coefficients = 9 predictors Cubic 3 formants x 4 coefficients = 12 predictors Cubic + duration of /a  / 12 + 1 = 13 predictors

47 Comparison of Classification Rates % Correct Classification

48 No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

49 % Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

50 % Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

51 % Correct Classification 96%92%89%90% No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

52 % Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

53 % Correct Classification No. of predictors: (9) (12) (13) (20) Comparison of Classification Rates

54 Summary of findings Comparing polynomial-based tests & direct measurement-based tests: reduction in classification accuracy small in return for much smaller no. of predictors required Future: aim to develop this approach to enable inclusion of additional information  parametrise other dynamic aspects of speech to capture a dense amount of speaker-specific info with a small no. of predictors

55 Conclusion Differences in formant dynamics reflect differences in articulatory strategies (& VT dimensions) among speakers e.g. speaker-specificity of / a  k / formant dynamics - differences in shape and frequency for F1, F2 and F3 - preserved across changes in speaking rate and stress

56 Conclusion Trialled new technique for characterising individuals’ formant contours using polynomial equations on / a  k / data Able to capture almost same amount of speaker-specific information with far fewer predictors  Polynomial approach using formant dynamics should make an important contribution to speaker characterisation techniques in future

57 Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge kem37@cam.ac.uk IAFPA 2006


Download ppt "Characterisation of individuals’ formant dynamics using polynomial equations Kirsty McDougall Department of Linguistics University of Cambridge"

Similar presentations


Ads by Google