The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,

The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall, Gea de Jong, Linda Shockey

Introduction Formant measurement implicitly required legally in the UK in speaker comparison cases Measurements on analogue spectrograms had to be by hand and eye Measurements on digital spectrograms can be assisted by formant trackers, LPC is common

Introduction How replicable are measurements by eye on digital spectrograms?

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability?

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability? −Software settings

Introduction How replicable are measurement by eye on digital spectrograms? If LPC tracking is used what can lead to variability? −Software settings −Point at which data is extracted

Study Aims What is required in order to make measurements more replicable?

Study Aims What is required in order to make measurements more replicable? If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements?

Study Aims What is required in order to make measurements more replicable? If software (but not method) is held constant and data is high quality, can different laboratories make the same F1-3 measurements? If method of analysis is the same does this lead to statistically improved reliability between laboratories?

Aims continued We are aiming to find a reliable means of obtaining formant values We are examining reliability, not validity

Data read speech from Cambridge DyViS database male Standard Southern British English aged 18-25 40 speakers:Set 1 (20 speakers) Set 2 (20 speakers)

Data 6 monophthongs: / i ː, æ, ɑː, ɔː, ʊ, u ː / 6 repetitions per vowel per speaker elicited in hVd contexts in sentences: It’s a warning we’d better HEED today. It’s only one loaf, but it’s all Peter HAD today. We worked rather HARD today. We built up quite a HOARD today. He insisted on wearing a HOOD today. He hates contracting words, but he said a WHO’D today.

Measurements Analysts from 3 labs – Cambridge, Plymouth, Reading Task: to measure F1, F2, F3 for each vowel token using Praat Set 1 – using individual – but constrained- methods Set 2 – after a meeting at which a single method is agreed

Set 1 Methods Measure the formants at a relatively early point in the vowel

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses Use either: −LPC tracking checked against the spectrogram or

Set 1 Methods Measure the formants at a relatively early point in the vowel Measure formants over no more than 5 glottal pulses Use either: −LPC tracking checked against the spectrogram or −hand/eye measures

Set 2 Method Measure towards the start of the vowel

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel Measure around the vowel's maximum intensity

Set 2 Method Measure towards the start of the vowel Measure in a relatively steady early part of the vowel Measure around the vowel's maximum intensity Use a single time slice

Set 2 Method (continued) Use the LPC formant tracker adjusted for best visual fit

Set 2 Method (continued) Use the LPC formant tracker adjusted for best visual fit When values generated by Praat are judged by visual inspection to be incorrect, replace them by correct values from a time-slice immediately preceding or following the slice being measured.

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1 Set 2

Statistical Analysis 3 formants  6 vowels  2 datasets = 36 tests Two-way ANOVA - repeated measures on the factor Lab (3) - between-groups factor Speaker (20) If Lab signficant at p < 0.05: Pairwise comparisons with Sidak correction

Results: HAD, F1 Lab1 Lab2 Lab3 Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant 0.001 0.000 Set 1 Set 2

Results: HAD, F1 Lab1 Lab2 Lab3 0.001 0.000 Set 1 Set 2 Lab: significant Lab: significant but pairwise comparisons NS

Results: HAD, F1 Lab1 Lab2 Lab3 Lab: significant 0.001 0.000 Set 1 Set 2 NS Lab: significant but pairwise comparisons NS

Results: HAD, F2

Lab1 Lab2 Lab3 Set 1 Set 2 NS Lab: not significant NS

Results: HAD, F3

Lab1 Lab2 Lab3 Set 1 Set 2 Lab: significant Lab: not significant NS 0.000 NS

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 main effect

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 pairwise comparisons

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 improvement

Summary - HAD F1F2F3F1F2F3 LabsigNSsig NS 1 vs 2sigNS 1 vs 3sigNSsigNS 2 vs 3sigNSsigNS Set 1 Set 2 Set 2: good news

Effect of Lab - 6 vowels Set 1 F1F2F3 heedsigNSsig hadsigNSsig hardsig hoardsig who’dsig NS hoodsig

Effect of Lab - 6 vowels Set 1 Set 2 F1F2F3F1F2F3 heedsigNSsig NSsig hadsigNSsig NS hardsig NS sig hoardsig NS who’dsig NSsig hoodsig NS sig NS

Influence of Speaker Interaction Lab x Speaker significant (p < 0.05) for F1-F3 of all 6 vowels for both Set 1 and Set 2  certain speakers lead to measurement differences among labs for example…

F3 of HARD (Set 2) means by speaker

Agreement across labs in most cases, but certain individuals lead to measurement differences among labs

F3 of HARD (Set 2) means by speaker Agreement across labs in most cases, but certain individuals lead to measurement differences among labs

Subject 42 HARD6 F3 = 3325 Hz Subject 42 HARD4 F3 = 2219Hz Subject 42 HARD2 F3 = 2579Hz Difficult cases: subject 42 F3

Difficult cases: subject 43 F3 Subject 43 HARD2 F3? Subject 43 HARD1 F3? Visual inspection Visual inspection vs formant tracker Visual inspection

Subject 43 HARD2 F3? Subject 43 HARD1 F3? Visual inspection Tracker

The effect of intraspeaker variability, possibly voice quality This can affect: −The visibility of formants −The functioning of the LPC tracker for example…

The effect of intraspeaker variability Subject 37: HAD1 F1=??Subject 37: HAD6 F1..had today.

Discussion: Laboratory Effects Do different laboratories produce different formant values?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES

Discussion: Laboratory Effects Do different laboratories produce different values formant values? YES Does replicating the measurement method reduce these differences?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES Could these be reduced further?

Discussion: Laboratory Effects Do different laboratories produce different formant values? YES Does replicating the measurement method reduce these differences? YES Could these be reduced further? YES

Other sources of variability Settings (e.g. No. of poles; No of Formants in Praat)

Other sources of variability Settings The exact point in the vowel at which the measure is taken

Other sources of variability Settings The exact point in the vowel at which the measure is taken The ‘readability’ of the spectrogram which can be affected by speaker characteristics

Conclusion Developing standard ways of collecting formant values could assist comparisons between experts in case work If records are kept relating to time points, software and settings then the measurement process can be replicated

Acknowledgements IAFPA Research Grant for travel expenses Economic and Social Research Council UK for funding the DyViS Project ‘Dynamic Variability in Speech: A Forensic Phonetic Study of British English’ [RES-000-23-1248] Other members of the DyViS project – Francis Nolan and Toby Hudson

The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,

Similar presentations

Presentation on theme: "The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,

Similar presentations

Presentation on theme: "The Reliability of Formant Measurements in High Quality Audio Data: The Effect of Agreeing Measurement Procedures Martin Duckworth, Kirsty McDougall,"— Presentation transcript:

Similar presentations

About project

Feedback