Universal and composite hypothesis testing via Mismatched Divergence Jayakrishnan Unnikrishnan LCAV, EPFL Collaborators Dayu Huang, Sean Meyn, Venu Veeravalli,

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Is it statistically significant?
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing
Elementary hypothesis testing
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Evaluating Hypotheses
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Chapter 9 Hypothesis Testing.
Statistics 270– Lecture 25. Cautions about Z-Tests Data must be a random sample Outliers can distort results Shape of the population distribution matters.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Introduction to AEP In information theory, the asymptotic equipartition property (AEP) is the analog of the law of large numbers. This law states that.
Hypothesis Testing.
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
1 A(n) (extremely) brief/crude introduction to minimum description length principle jdu
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Law of large numbers. Central limit theorem. Confidence interval. Hypothesis testing. Types of errors. Lecture 9.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
1 Summarizing Performance Data Confidence Intervals Important Easy to Difficult Warning: some mathematical content.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
© Copyright McGraw-Hill 2004
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Example The strength of concrete depends, to some extent on the method used for drying it. Two different drying methods were tested independently on specimens.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Stat 223 Introduction to the Theory of Statistics
Introduction For inference on the difference between the means of two populations, we need samples from both populations. The basic assumptions.
BAE 6520 Applied Environmental Statistics
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Sample Mean Distributions
Test for Mean of a Non-Normal Population – small n
Chapter 9 Hypothesis Testing.
9 Tests of Hypotheses for a Single Sample CHAPTER OUTLINE
Stat 223 Introduction to the Theory of Statistics
Learning From Observed Data
Data Exploration and Pattern Recognition © R. El-Yaniv
Presentation transcript:

Universal and composite hypothesis testing via Mismatched Divergence Jayakrishnan Unnikrishnan LCAV, EPFL Collaborators Dayu Huang, Sean Meyn, Venu Veeravalli, University of Illinois Amit Surana, UTRC IPG seminar 2 March 2011

Outline Universal Hypothesis Testing – Hoeffding test Problems with large alphabets – Mismatched test Dimensionality reduction Improved performance Extensions – Composite null hypotheses – Model-fitting with outliers – Rate-distortion test – Source coding with training Conclusions 2

Universal Hypothesis Testing Given a sequence of i.i.d. observations test the hypothesis – Focus on finite alphabets i.e. PMFs Applications: anomaly detection, spam filtering etc. 3

Sufficient statistic Empirical distribution: – where denotes the number of times letter appears in – is a random vector 4

Hoeffding’s Universal Test Hoeffding test [1965] : – Uses KL divergence between and as test statistic 5

Hoeffding’s Universal Test Hoeffding test is optimal in error-exponent sense: – Sanov’s Theorem in Large Deviations implies 6

Hoeffding’s Universal Test Hoeffding test is optimal in error-exponent sense: – Sanov’s Theorem in Large Deviations implies Better approximation of false alarm probability via – Weak convergence under 7

Error exponents are inaccurate 8 Alphabet size, A = 20

Large Alphabet Regime Hoeffding test performs poorly for large (alphabet size) – suffers from high bias and variance 9

Large Alphabet Regime Hoeffding test performs poorly for large (alphabet size) – suffers from high bias and variance A popular fix: Merging low probability bins 10

Binning 11

Quantization 12

General principle Dimensionality reduction Essentially we compromise on universality but improve performance against typical alternatives Generalization: parametric family for typical alternatives 13

Hoeffding test 14

Mismatched test 15

Mismatched test 16

Mismatched test 17

Mismatched test 18

Mismatched test Use mismatched divergence instead of KL divergence – interpretable as a lower bound to KL divergence Idea in short: replace with ML estimate from i.e., it is a GLRT 19

Exponential family example Mismatched divergence is solution to a convex problem 20

Exponential family example Mismatched divergence is solution to a convex problem Binning when 21

Mismatched Test properties + Addresses high variance issues - However not universally optimal in error-exponent sense + Optimal when alternate distribution lies in achieves same error exponents as Hoeffding implies optimality of GLRT for composite hypotheses 22

Performance comparison 23 A = 19, n = 40

Weak convergence When observations – Approximate thresholds for target false alarm 24

Weak convergence When observations – Approximate thresholds for target false alarm When observations – Approximate power of test 25

EXTENSIONS AND APPLICATIONS 26

Composite null hypotheses Composite null hypotheses / model fitting 27

Composite null hypotheses Composite null hypotheses / model fitting 28

Composite null hypotheses Composite null hypotheses / model fitting 29

Weak convergence When observations 30

Weak convergence When observations 31

Weak convergence When observations – Approximate thresholds for target false alarm – Approximate power of test – Study outlier effects 32

Outliers in model-fitting Data corrupted by outliers or model-mismatch – Contamination mixture model 33

Outliers in model-fitting Data corrupted by outliers or model-mismatch – Contamination mixture model 34

Outliers in model-fitting Data corrupted by outliers or model-mismatch – Contamination mixture model Goodness of fit metric – Limiting behavior used to quantify the goodness of fit 35

Outliers in model-fitting Data corrupted by outliers or model-mismatch – Contamination mixture model Limiting behavior of goodness of fit metric changes 36

Outliers in model-fitting Data corrupted by outliers or model-mismatch – Contamination mixture model Sensitivity of goodness of fit metric to outliers 37

Rate-distortion test Different generalization of binning –Rate-distortion optimal compression Test based on optimally compressed observations [P. Harremoës 09] –Results on limiting distribution of test statistic 38

Source coding with training A wants to encode and transmit source to B – Unknown distribution on known alphabet – Given training samples 39

Source coding with training A wants to encode and transmit source to B – Unknown distribution on known alphabet – Given training samples Choose codelengths based on empirical frequencies 40

Source coding with training A wants to encode and transmit source to B – Unknown distribution on known alphabet – Given training samples Choose codelengths based on empirical frequencies Expected excess codelength is chi-squared 41

CLT vs LDP Empirical distribution (type) of 42

CLT vs LDP Empirical distribution (type) of Obeys LDP (Sanov’s theorem): Obeys CLT: 43

CLT vs LDP LDP Good for large deviations Approximates asymptotic slope of log- probability – Pre-exponential factor may be significant CLT Good for moderate deviations Approximates probability 44

Conclusions – Error exponents do not tell the whole story Not a good indicator of exact probability Tests with identical error exponents can differ drastically over finite samples – Weak convergence results give better approximations than error exponents (LDPs) – Compromising universality for performance improvement against typical alternatives – Threshold selection, Outlier sensitivity, Source coding with training 45

References J. Unnikrishnan, D. Huang, S. Meyn, A. Surana, and V. V. Veeravalli, “Universal and Composite Hypothesis Testing via Mismatched Divergence” IEEE Trans. Inf. Theory, to appear. J. Unnikrishnan, S. Meyn, and V. Veeravalli, “On Thresholds for Robust Goodness-of-Fit Tests” presented at IEEE Information Theory Workshop, Dublin, Aug J. Unnikrishnan, “Model-fitting in the presence of outliers” submitted to ISIT – available at 46

Thank You! 47