# Visual Recognition Tutorial

## Presentation on theme: "Visual Recognition Tutorial"— Presentation transcript:

236607 Visual Recognition Tutorial
Bias and variance of estimators The score and Fisher information Cramer-Rao inequality Visual Recognition Tutorial

Estimators and their Properties
Let be a parametric set of distributions. Given a sample drawn i.i.d from one of the distributions in the set we would like to estimate its parameter (thus identifying the distribution). An estimator for w.r.t is any function notice that an estimator is a random variable. How do we measure the quality of an estimator? Consistency: An estimator for is consistent if this is a (desirable) asymptotic property that motivates us to acquire large samples. But we should emphasize that we are also interested in measures for finite (and small!) sample sizes. Visual Recognition Tutorial

Estimators and their Properties
Bias: Define the bias of an estimator to be Here, the expectation is w.r.t. to the distribution The estimator is unbiased if its bias is zero Example: the estimators and , for the mean of a normal distribution, are both unbiased The estimator for its variance is biased whereas the estimator is unbiased. Variance: another important property of an estimator is its variance We would like to find estimators with minimum bias and variance. Which is more important, bias or variance? Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Risky Estimators Employ our decision-theoretic framework to measure the quality of estimators. Abbreviate and consider the square error loss function The conditional risk associated with when is the true parameter Claim: Proof: Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Bias vs. Variance So, for a given level of conditional risk, there is a tradeoff between bias and variance. This tradeoff is among the most important facts in pattern recognition and machine learning. Classical approach: Consider only unbiased estimators and try to find those with minimum possible variance. This approach is not always fruitful: The unbiasedness only means that the average of the estimator (w.r.t. to ) is It doesn’t mean it will be near for a particular sample (if variance is large). In general, an unbiased estimate is not guaranteed to exist. Visual Recognition Tutorial

236607 Visual Recognition Tutorial
The Score The score of the family is the random variable measures the “sensitivity” of as a function of the parameter . Claim: Proof: Corollary: Visual Recognition Tutorial

236607 Visual Recognition Tutorial
The Score - Example Consider the normal distribution clearly, and Visual Recognition Tutorial

236607 Visual Recognition Tutorial
The Score - Vector Form In case where is a vector, the score is the vector whose th component is Example: Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Fisher Information Fisher information: Designed to provide a measure of how much information the parametric probability law carries about the parameter . An adequate definition of such information should possess the following properties: The larger the sensitivity of to changes in , the larger should be the information The information should be additive: The information carried by the combined law should be the sum of those carried by and The information should be insensitive to the sign of the change in and preferably positive The information should be a deterministic quantity; should not depend on the specific random observation Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Fisher Information Definition (scalar form): Fisher information (about ), is the variance of the score Example: consider a random variable Visual Recognition Tutorial

Fisher Information - Cntd.
Whenever is a vector, Fisher information is the matrix where Remainder: Remark: the Fisher information is only defined whenever the distributions satisfy some regularity conditions. (For example, they should be differentiable w.r.t and all the distributions in the parametric family must have same support set). Visual Recognition Tutorial

Fisher Information - Cntd.
Claim: Let be i.i.d. random variables The score of is the sum of the individual scores. Proof: Example: If are i.i.d , the score is Visual Recognition Tutorial

Fisher Information - Cntd.
Based on i.i.d. samples, the Fisher information about is Thus, the Fisher information is additive w.r.t. i.i.d. random variables. Example: Suppose are i.i.d From previous example we know that the Fisher information about the parameter based on one sample is Therefore, based on the entire sample, Visual Recognition Tutorial

The Cramer-Rao Inequality
Theorem: Let be an unbiased estimator for . Then Proof: Using we have: Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd.
Now Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd.
So, By the Cauchy-Schwarz inequality Therefore, For a biased estimator we have: Visual Recognition Tutorial

The Cramer-Rao General Case
The Cramer-Rao inequality also true in general form: The error covariance matrix for is bounded as follows: Visual Recognition Tutorial

The Cramer-Rao Inequality - Cntd.
Example: Let be i.i.d From previous example Now let be an (unbiased) estimator for . So matches the Cramer-Rao lower bound. Def: An unbiased estimator whose covariance meets the Cramer-Rao lower bound is called efficient. Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Efficiency Theorem (Efficiency): The unbiased estimator is efficient, that is, iff Proof (If): If then meaning Visual Recognition Tutorial

236607 Visual Recognition Tutorial
Efficiency Only if: Recall the cross covariance between The Cauchy-Schwarz inequality for random variables says with equality iff Visual Recognition Tutorial

Cramer-Rao Inequality and ML - Cntd.
Theorem: Suppose there exists an efficient estimator for all Then the ML estimator is Proof: By assumption By previous claim or for all This holds at and since this is a maximum point the left side is zero so Visual Recognition Tutorial