Visual Recognition Tutorial

Slides:



Advertisements
Similar presentations
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Advertisements

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Statistical Estimation and Sampling Distributions
POINT ESTIMATION AND INTERVAL ESTIMATION
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Econ 140 Lecture 61 Inference about a Mean Lecture 6.
Visual Recognition Tutorial
3.3 Toward Statistical Inference. What is statistical inference? Statistical inference is using a fact about a sample to estimate the truth about the.
Maximum likelihood (ML) and likelihood ratio (LR) test
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Simple Linear Regression
Point estimation, interval estimation
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Maximum likelihood (ML)
Parametric Inference.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
Visual Recognition Tutorial
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Maximum likelihood (ML)
Chapter 6: Sampling Distributions
Simulation Output Analysis
Chapter 7 Estimation: Single Population
PROBABILITY (6MTCOAE205) Chapter 6 Estimation. Confidence Intervals Contents of this chapter: Confidence Intervals for the Population Mean, μ when Population.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 9 Samples.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
Estimators and estimates: An estimator is a mathematical formula. An estimate is a number obtained by applying this formula to a set of sample data. 1.
Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.
Joint Moments and Joint Characteristic Functions.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Virtual University of Pakistan
Chapter 6: Sampling Distributions
Estimator Properties and Linear Least Squares
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
Some General Concepts of Point Estimation
Point and interval estimations of parameters of the normally up-diffused sign. Concept of statistical evaluation.
Probability Theory and Parameter Estimation I
Chapter 6: Sampling Distributions
Parameter, Statistic and Random Samples
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
Chapter 2 Minimum Variance Unbiased estimation
Basic Econometrics Chapter 4: THE NORMALITY ASSUMPTION:
10701 / Machine Learning Today: - Cross validation,
Summarizing Data by Statistics
Integration of sensory modalities
Econometrics Chengyaun yin School of Mathematics SHUFE.
Computing and Statistical Data Analysis / Stat 7
Charles University Charles University STAKAN III
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Mathematical Foundations of BME Reza Shadmehr
Chapter 9 Chapter 9 – Point estimation
12. Principles of Parameter Estimation
Chapter 8 Estimation.
Data Exploration and Pattern Recognition © R. El-Yaniv
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

236608 Visual Recognition Tutorial Bias and variance of estimators The score and Fisher information Cramer-Rao inequality 236608 Visual Recognition Tutorial

Estimators and their Properties Let be a parametric set of distributions. Given a sample drawn i.i.d from one of the distributions in the set we would like to estimate its parameter (thus identifying the distribution). An estimator for w.r.t. is any function notice that an estimator is a random variable. How do we measure the quality of an estimator? Consistency: An estimator for is consistent if this is a (desirable) asymptotic property that motivates us to acquire large samples. But we should emphasize that we are also interested in measures for finite (and small!) sample sizes. 236608 Visual Recognition Tutorial

Estimators and their Properties Bias: Define the bias of an estimator to be Here, the expectation is w.r.t. to the distribution The estimator is unbiased if its bias is zero Example: the estimators and , for the mean of a normal distribution, are both unbiased. The estimator for its variance is biased whereas the estimator is unbiased. Variance: another important property of an estimator is its variance . We would like to find estimators with minimum bias and variance. Which is more important, bias or variance? 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Risky Estimators Employ our decision-theoretic framework to measure the quality of estimators. Abbreviate and consider the square error loss function The conditional risk associated with when is the true parameter Claim: Proof: 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Bias vs. Variance So, for a given level of conditional risk, there is a tradeoff between bias and variance. This tradeoff is among the most important facts in pattern recognition and machine learning. Classical approach: Consider only unbiased estimators and try to find those with minimum possible variance. This approach is not always fruitful: The unbiasedness only means that the average of the estimator (w.r.t. to ) is . It doesn’t mean it will be near for a particular sample (if variance is large). In general, an unbiased estimate is not guaranteed to exist. 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial The Score The score of the family is the random variable measures the “sensitivity” of as a function of the parameter . Claim: Proof: Corollary: 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial The Score - Example Consider the normal distribution clearly, and 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial The Score - Vector Form In case where is a vector, the score is the vector whose th component is Example: 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Fisher Information Fisher information: Designed to provide a measure of how much information the parametric probability law carries about the parameter . An adequate definition of such information should possess the following properties: The larger the sensitivity of to changes in , the larger should be the information The information should be additive: The information carried by the combined law should be the sum of those carried by and The information should be insensitive to the sign of the change in and preferably positive The information should be a deterministic quantity; should not depend on the specific random observation 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Fisher Information Definition (scalar form): Fisher information (about ), is the variance of the score Example: consider a random variable 236608 Visual Recognition Tutorial

Fisher Information - Cntd. Whenever is a vector, Fisher information is the matrix where Remainder: Remark: the Fisher information is only defined whenever the distributions satisfy some regularity conditions. (For example, they should be differentiable w.r.t. and all the distributions in the parametric family must have same support set). 236608 Visual Recognition Tutorial

Fisher Information - Cntd. Claim: Let be i.i.d. random variables . The score of is the sum of the individual scores. Proof: Example: If are i.i.d. , the score is 236608 Visual Recognition Tutorial

Fisher Information - Cntd. Based on i.i.d. samples, the Fisher information about is Thus, the Fisher information is additive w.r.t. i.i.d. random variables. Example: Suppose are i.i.d. . From previous example we know that the Fisher information about the parameter based on one sample is Therefore, based on the entire sample, 236608 Visual Recognition Tutorial

The Cramer-Rao Inequality Theorem: Let be an unbiased estimator for . Then Proof: Using we have: 236608 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. Now 236608 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. So, By the Cauchy-Schwarz inequality Therefore, For a biased estimator we have: 236608 Visual Recognition Tutorial

The Cramer-Rao General Case The Cramer-Rao inequality also true in general form: The error covariance matrix for is bounded as follows: 236608 Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd. Example: Let be i.i.d. . From previous example Now let be an (unbiased) estimator for . So matches the Cramer-Rao lower bound. Def: An unbiased estimator whose covariance meets the Cramer-Rao lower bound is called efficient. 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Efficiency Theorem (Efficiency): The unbiased estimator is efficient, that is, iff Proof (If): If then meaning 236608 Visual Recognition Tutorial

236608 Visual Recognition Tutorial Efficiency Only if: Recall the cross covariance between The Cauchy-Schwarz inequality for random variables says with equality iff 236608 Visual Recognition Tutorial

Cramer-Rao Inequality and ML - Cntd. Theorem: Suppose there exists an efficient estimator for all . Then the ML estimator is . Proof: By assumption By previous claim or for all This holds at and since this is a maximum point the left side is zero so 236608 Visual Recognition Tutorial