1 Francesco Borchi, Monica Carfagni, Matteo Nunziati On TFSR (semi)automatic systems supportability: novel instruments for analysis and compensationFrancesco Borchi, Monica Carfagni, Matteo Nunziati
2 Outline Main goal TFSR Systems LogR estimation Common test procedures for TFSR systemsSystem behaviour classificationSupportability evaluation toolsScore compensation toolsQuality assessment logicsConclusion
3 Main goalOur goal is to propose a general purpose set of tools for system compensation and quality assessmentSpecific goals:to build a generic framework for system analysisto develop a novel generic tool for system compensationto assess system quality level on the basis of the amount of compensation required by the system itself
4 TFSR Systems TFSR system Voice sample 1 LogR Voice sample 2 We define a TFSR system as a black box which receives two or more recordings as inputs and produces one or more scores (LogR) as outputs
5 LogR = log10[P(E | H0) / P( E | H1)] LogR estimation 1/2LogR = log10[P(E | H0) / P( E | H1)]Log-likelihood ratio defines the most supportable hypotesisHypotesis 0: the two samples belong to the same speakerHypotesis 1: the two samples belong to different speakersIf LogR>0 support goes to the H0 hypotesisIf LogR<0 support goes to the H1 hypotesisIf LogR=0 no support is provided
6 Experimentation is the best way to assess system behaviour LogR estimation 2/2The real LogR value is unknown. We can estimate it using some approximations. Our systems are error-prone.The system goodness depends on a number of factors:The way we have used to retrieve voice samplesThe kind of parameters employed in the recognitionThe algorithms used for parameter extractionThe mathematic model used to estimate LogRExperimentation is the best way to assess system behaviour
7 Common test procedures for TFSR systems 1/2 The system is tested against a set of recordings having known origin:Speaker1…2 or more recordings…SpeakerN…
8 Common test procedures for TFSR systems 2/2 Recordings are mixed up and grouped in pairs:Same speaker pairs (SS)Different speaker pairs (DS)SS: test system behaviour when H0 is true. Is LogR>0?DS: test system behaviour when H1 is true. Is LogR<0?
9 System behaviour classification 1/3 Tippett Plot: a common method to show system behaviourFalse negatives% SS% DSH1H0False positives
10 System behaviour classification 2/3 Only false scoresProvide a solution to eliminate “false score only” areas (red boxes)Wrong support
11 System behaviour classification 3/3 isoperformingProvide a solution to reduce the amount of false scoresipoperforming
12 Supportability evaluation tools 1/3 A quantitative evaluation of false scores has been proposed by P. Rose et Al. (2003):LRtest=P(LogR>0 | H0) / P(LogR>0 | H1)Percentage of true positivesPercentage of false positivesInterpretable via Evett TableNo information is provided about false negativesNo information about the distribution of false scoresDo they affect a narrow range of scores? Do they widely perturb the system response?
13 Supportability evaluation tools 2/3 We propose to generalize the LRtest index using a new tool: the “Supportability of System” function (SoS):We know how much we can rely on our system, time by time!SoS(x) = P(LogR>x | H0) / P(LogR>x | H1) if x>0SoS(x) = [1- P(LogR>x | H1)] / [1-P(LogR>x | H0)] if x<0Interpretable via Evett TableDefined for both false positives and negativesUnivocally detects the amount of false scores for each LogRProvides the accuracy of each score
17 Score compensation tools 3/3 Compress all scores by a value defined by the SoS functionReduced amount of false scoresoriginalcompressedDecreased values for true scoresReduce the amount of false scores at the cost of a lower discriminative power
18 Quality assessment logics 1/3 Score compensation reduces system’s discriminative powerScore compensation is required to prevent unbalanced responsesCompensation increases for decreasing values of SoSCompensation is intrinsic to the systemA good system must have a strong SoS for each LogR value
19 Quality assessment logics 2/3 DMTI procedureStep 1: test the system against a dataset (LogR)Step 2: calculate supportability (SoS)Step 3: calculate compensated scores (New LogR)Step 4: calculate the percentage P of new LogR which has a “strong” SoS score (fixed by our standards)Step 5: evaluate the Degree of Supportability (DoS):DoS = atanh (2P-1)
20 Quality assessment logics 3/3 Regardless of the specific procedure, our DoS score is equivalent to a LogR score!
21 ConclusionA general purpose tool has been developed to score system supportabilityAn additional mathematic tool has been developed to compensate unbalanced systemsThe tools are system independent and theoretically motivated rather than empirically builtThe tools are useful to reduce both false positives and false negativesFalse score reduction produces a decrement in discriminative powerSuch decrement is intrinsic to the system response and is univocally usable for system quality assessmentThe proposed procedure for system quality assessment (degree of supportability) uses the well known Evett scale to score the system supportability