Francesco Borchi, Monica Carfagni, Matteo Nunziati

Slides:



Advertisements
Similar presentations
Università degli Studi di Brescia Dipartimento di Ingegneria Meccanica A. Magalini, D. Vetturi, D. Cambiaghi Planck LFI Alignment Analysis Approach SLIDE.
Advertisements

UNIVERSITÀ DEGLI STUDI DI PERUGIA Dipartimento di Ingegneria Industriale Prof. Francesco Castellani Corso di Meccanica Applicata A.
UNIVERSITÀ DEGLI STUDI DI PERUGIA Dipartimento di Ingegneria Industriale Prof. Francesco Castellani Corso di Meccanica Applicata A.
Bayesian mixture models for analysing gene expression data Natalia Bochkina In collaboration with Alex Lewin, Sylvia Richardson, BAIR Consortium Imperial.
Eco 8, Practical intelligence
January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Belfast Naturalistic Database
Lecture 3 Validity of screening and diagnostic tests
Statistics for the Social Sciences Psychology 340 Fall 2006 Using t-tests.
How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-fit Statistics in Categorical Data Analysis Alberto Maydeu-Olivares.
Decision Tree Approach in Data Mining
Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.
What is Statistical Modeling
A fuzzy clustering approach to improve the accuracy of Italian students’data An experimental procedure to correct the impact of the outliers on assessment.
Log-Linear Models & Dependent Samples Feng Ye, Xiao Guo, Jing Wang.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Failure Prediction in Hardware Systems Douglas Turnbull Neil Alldrin CSE 221: Operating System Final Project Fall
PSY 307 – Statistics for the Behavioral Sciences
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Presented by Zeehasham Rasheed
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
PMSB 2006, Tuusula (Finland) A. Bertoni, G.Valentini, DSI - Univ. Milano 1 Alberto Bertoni, Giorgio Valentini
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
1.  Why understanding probability is important?  What is normal curve  How to compute and interpret z scores. 2.
Auditing & Assurance Services, 6e
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Chapter 4 Measures of Variability
Measurement and Data Quality
Introduction to Statistical Quality Control, 4th Edition Chapter 7 Process and Measurement System Capability Analysis.
Inquiry Test “What do I need to study?” asked the curious student. “Well, everything that we have covered so far.” replied the wonderful science teacher.
Novel Sensing Networks for Intelligent Monitoring (Newton) Z Q Lang, H Chen, T Dodd Department of Automatic Control & Systems Engineering University of.
Università degli Studi di Modena and Reggio Emilia Dipartimento di Ingegneria dell’Informazione Prototypes selection with.
1 Macmillan Academy - ICT Department Unit 2 – ICT In Organisations UNIT 2 – ICT IN ORGANISATIONS.
Market Research Lesson 6. Objectives Outline the five major steps in the market research process Describe how surveys can be used to learn about customer.
Copyright © 2004, Graduate Management Admission Council ®. All Rights Reserved. 1 Expected Classification Accuracy Lawrence M. Rudner Graduate Management.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Distributed Monitoring and Aggregation in Wireless Sensor Networks INFOCOM 2010 Changlei Liu and Guohong Cao Speaker: Wun-Cheng Li.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
A methodology for the creation of a forensic speaker recognition database to handle mismatched conditions Anil Alexander and Andrzej Drygajlo Swiss Federal.
Confidence Intervals: The Basics BPS chapter 14 © 2006 W.H. Freeman and Company.
The Major Steps of a Public Health Evaluation 1. Engage Stakeholders 2. Describe the program 3. Focus on the evaluation design 4. Gather credible evidence.
Evaluating Results of Learning Blaž Zupan
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
Prediction statistics Prediction generally True and false, positives and negatives Quality of a prediction Usefulness of a prediction Prediction goes Bayesian.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Towards Estimating & Monitoring Academic Staff Workloads at UKZN G R Barnes, MSc Agric, MGSSA Management Information, UKZN September 2005.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
ROC curve estimation. Index Introduction to ROC ROC curve Area under ROC curve Visualization using ROC curve.
Chapter 13 Understanding research results: statistical inference.
Chapter 1 Review - Get a whiteboard and marker per pair - Take out a blank sheet of paper.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 10 Introduction to the Analysis.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Evaluation What is evaluation?
Measures of Variability Range Standard Deviation Variance.
Accuracy, sensitivity and specificity analysis
Reza Yazdani Albert Segura José-María Arnau Antonio González
STATISTICAL TOOLS FOR AUDITING
Accuracy, sensitivity and specificity analysis
Notes Over 2.1 Function {- 3, - 1, 1, 2 } { 0, 2, 5 }
DESICION TABLE Decision tables are precise and compact way to model complicated logic. Decision table is useful when input and output data can be.
6.1 Quality improvement Regional Course on
A maximum likelihood estimation and training on the fly approach
Research Design and Methods
Chapter 10 Introduction to the Analysis of Variance
Jiahe Li
Presentation transcript:

Francesco Borchi, Monica Carfagni, Matteo Nunziati On TFSR (semi)automatic systems supportability: novel instruments for analysis and compensation Francesco Borchi, Monica Carfagni, Matteo Nunziati

Outline Main goal TFSR Systems LogR estimation Common test procedures for TFSR systems System behaviour classification Supportability evaluation tools Score compensation tools Quality assessment logics Conclusion

Main goal Our goal is to propose a general purpose set of tools for system compensation and quality assessment Specific goals: to build a generic framework for system analysis to develop a novel generic tool for system compensation to assess system quality level on the basis of the amount of compensation required by the system itself

TFSR Systems TFSR system Voice sample 1 LogR Voice sample 2 We define a TFSR system as a black box which receives two or more recordings as inputs and produces one or more scores (LogR) as outputs

LogR = log10[P(E | H0) / P( E | H1)] LogR estimation 1/2 LogR = log10[P(E | H0) / P( E | H1)] Log-likelihood ratio defines the most supportable hypotesis Hypotesis 0: the two samples belong to the same speaker Hypotesis 1: the two samples belong to different speakers If LogR>0 support goes to the H0 hypotesis If LogR<0 support goes to the H1 hypotesis If LogR=0 no support is provided

Experimentation is the best way to assess system behaviour LogR estimation 2/2 The real LogR value is unknown. We can estimate it using some approximations. Our systems are error-prone. The system goodness depends on a number of factors: The way we have used to retrieve voice samples The kind of parameters employed in the recognition The algorithms used for parameter extraction The mathematic model used to estimate LogR Experimentation is the best way to assess system behaviour

Common test procedures for TFSR systems 1/2 The system is tested against a set of recordings having known origin: Speaker1 … 2 or more recordings … SpeakerN …

Common test procedures for TFSR systems 2/2 Recordings are mixed up and grouped in pairs: Same speaker pairs (SS) Different speaker pairs (DS) SS: test system behaviour when H0 is true. Is LogR>0? DS: test system behaviour when H1 is true. Is LogR<0?

System behaviour classification 1/3 Tippett Plot: a common method to show system behaviour False negatives % SS % DS H1 H0 False positives

System behaviour classification 2/3 Only false scores Provide a solution to eliminate “false score only” areas (red boxes) Wrong support

System behaviour classification 3/3 isoperforming Provide a solution to reduce the amount of false scores ipoperforming

Supportability evaluation tools 1/3 A quantitative evaluation of false scores has been proposed by P. Rose et Al. (2003): LRtest=P(LogR>0 | H0) / P(LogR>0 | H1) Percentage of true positives Percentage of false positives Interpretable via Evett Table No information is provided about false negatives No information about the distribution of false scores Do they affect a narrow range of scores? Do they widely perturb the system response?

Supportability evaluation tools 2/3 We propose to generalize the LRtest index using a new tool: the “Supportability of System” function (SoS): We know how much we can rely on our system, time by time! SoS(x) = P(LogR>x | H0) / P(LogR>x | H1) if x>0 SoS(x) = [1- P(LogR>x | H1)] / [1-P(LogR>x | H0)] if x<0 Interpretable via Evett Table Defined for both false positives and negatives Univocally detects the amount of false scores for each LogR Provides the accuracy of each score

Supportability evaluation tools 3/3 LogR = -13 20% false SoS=90/20=4.5 90% true

Score compensation tools 1/3 original X DX Preliminary operation: Eliminate “false score only” areas encreasing or reducing all scores translated

Score compensation tools 2/3 New LogR = LogR*tanh( Log10(SoS) ) LogR=4 LogR=3 LogR=2 LogR=1

Score compensation tools 3/3 Compress all scores by a value defined by the SoS function Reduced amount of false scores original compressed Decreased values for true scores Reduce the amount of false scores at the cost of a lower discriminative power

Quality assessment logics 1/3 Score compensation reduces system’s discriminative power Score compensation is required to prevent unbalanced responses Compensation increases for decreasing values of SoS Compensation is intrinsic to the system A good system must have a strong SoS for each LogR value

Quality assessment logics 2/3 DMTI procedure Step 1: test the system against a dataset (LogR) Step 2: calculate supportability (SoS) Step 3: calculate compensated scores (New LogR) Step 4: calculate the percentage P of new LogR which has a “strong” SoS score (fixed by our standards) Step 5: evaluate the Degree of Supportability (DoS): DoS = atanh (2P-1)

Quality assessment logics 3/3 Regardless of the specific procedure, our DoS score is equivalent to a LogR score!

Conclusion A general purpose tool has been developed to score system supportability An additional mathematic tool has been developed to compensate unbalanced systems The tools are system independent and theoretically motivated rather than empirically built The tools are useful to reduce both false positives and false negatives False score reduction produces a decrement in discriminative power Such decrement is intrinsic to the system response and is univocally usable for system quality assessment The proposed procedure for system quality assessment (degree of supportability) uses the well known Evett scale to score the system supportability

Thank You for your attention… Questions?