PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Estimation of Means and Proportions
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Statistical Estimation and Sampling Distributions
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Maximum likelihood (ML)
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Maximum likelihood (ML)
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 As we have seen in section 4 conditional probability density functions are useful to update the information about an event based on the knowledge about.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 7 Point Estimation
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Multiple Random Variables Two Discrete Random Variables –Joint pmf –Marginal pmf Two Continuous Random Variables –Joint Distribution (PDF) –Joint Density.
1 Two Functions of Two Random Variables In the spirit of the previous lecture, let us look at an immediate generalization: Suppose X and Y are two random.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
BCS547 Neural Decoding.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Joint Moments and Joint Characteristic Functions.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Chapter 2 Minimum Variance Unbiased estimation
CONCEPTS OF ESTIMATION
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
11. Conditional Density Functions and Conditional Expected Values
11. Conditional Density Functions and Conditional Expected Values
12. Principles of Parameter Estimation
16. Mean Square Estimation
Applied Statistics and Probability for Engineers
Presentation transcript:

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation

The Estimation Problem  We use the various concepts introduced and studied in earlier lectures to solve practical problems of interest.  Consider the problem of estimating an unknown parameter of interest from a few of its noisy observations. -the daily temperature in a city -the depth of a river at a particular spot  Observations (measurement) are made on data that contain the desired nonrandom parameter  and undesired noise.

The Estimation Problem  For example  or, the i th observation can be represented as   : the unknown nonrandom desired parameter  : random variables that may be dependent or independent from observation to observation.  The Estimation Problem: -Given n observations obtain the “best” estimator for the unknown parameter  in terms of these observations.

Estimators  Let us denote by the estimator for .  Obviously is a function of only the observations.  “Best estimator” in what sense?  Ideal solution: the estimate coincides with the unknown .  Almost always any estimate will result in an error given by  One strategy would be to select the estimator so as to minimize some function of this error -mean square error (MMSE), -absolute value of the error - etc.

A More Fundamental Approach: Principle of Maximum Likelihood  Underlying Assumption: the available data has something to do with the unknown parameter .  We assume that the joint p.d.f of, depends on .  This method -assumes that the given sample data set is representative of the population -chooses the value for  that most likely caused the observed data to occur

Principle of Maximum Likelihood  In other words, given the observations, is a function of  alone  The value of  that maximizes the above p.d.f is the most likely value for , and it is chosen as the ML estimate for .

 Given the joint p.d.f represents the likelihood function  The ML estimate can be determined either from - the likelihood equation -or using the log-likelihood function  If is differentiable and a supremum exists in the above equation, then that must satisfy the equation

 Let represent n observations where  is the unknown parameter of interest,  are zero mean independent normal r.vs with common variance  Determine the ML estimate for .  Since s are independent r.vs and  is an unknown constant, s are independent normal random variables.  Thus the likelihood function takes the form Example Solution

 Each is Gaussian with mean  and variance (Why?).  Thus  Therefore the likelihood function is:  It is easier to work with the log-likelihood function in this case. Example - continued

 We obtain  and taking derivative with respect to , we get  or  This linear estimator represents the ML estimate for . Example - continued

Unbiased Estimators  Notice that the estimator is a r.v. Taking its expected value, we get  i.e., the expected value of the estimator does not differ from the desired parameter, and hence there is no bias between the two.  Such estimators are known as unbiased estimators.  represents an unbiased estimator for .

Consistent Estimators  Moreover the variance of the estimator is given by  The latter terms are zeros since and are independent r.vs.  So,  And:  another desired property. We say estimators that satisfy this limit are consistent estimators.

 Let be i.i.d. uniform random variables in with common p.d.f  where  is an unknown parameter. Find the ML estimate for .  The likelihood function in this case is given by  The likelihood function here is maximized by the minimum value of . Example Solution

 and since we get to be the ML estimate for .  a nonlinear function of the observations.  Is this is an unbiased estimate for  ? we need to evaluate its mean.  It is easier to determine its p.d.f and proceed directly.  Let where Example - continued

 Then  so that  Using the above, we get Example - continued

 In this case so the ML estimator is not an unbiased estimator for .  However, note that as  i.e., the ML estimator is an asymptotically unbiased estimator.  Also,  so that  as implying that this estimator is a consistent estimator. Example - continued

 Let be i.i.d Gamma random variables with unknown parameters  and .  Determine the ML estimator for  and .  Here and  This gives the log-likelihood function to be Example Solution

 Differentiating L with respect to  and  we get  Thus,  So,  Notice that this is highly nonlinear in Example - continued

Conclusion  In general the (log)-likelihood function -can have more than one solution, or no solutions at all. -may not be even differentiable -can be extremely complicated to solve explicitly

Best Unbiased Estimator  We have seen that represents an unbiased estimator for  with variance  It is possible that, for a given n, there may be other unbiased estimators to this problem with even lower variances.  If such is indeed the case, those estimators will be naturally preferrable compared to previous one.  Is it possible to determine the lowest possible value for the variance of any unbiased estimator?  A theorem by Cramer and Rao gives a complete answer to this problem.

Cramer - Rao Bound  Variance of any unbiased estimator based on observations for  must satisfy the lower bound  The right side of above equation acts as a lower bound on the variance of all unbiased estimator for , provided their joint p.d.f satisfies certain regularity restrictions. (see (8-79)-(8-81), Text).

Efficient Estimators  Any unbiased estimator whose variance coincides with Cramer-Rao bound must be the best.  Such estimates are known as efficient estimators.  Let us examine whether is efficient.  and  So the Cramer - Rao lower bound is

Rao-Blackwell Theorem  As we obtained before, the variance of this ML estimator is the same as the specified bound.  If there are no unbiased estimators that are efficient, the best estimator will be an unbiased estimator with the lowest possible variance.  How does one find such an unbiased estimator?  Rao-Blackwell theorem gives a complete answer to this problem.  Cramer-Rao bound can be extended to multiparameter case as well.

Estimating Parameters with a-priori p.d.f  So far, we discussed nonrandom parameters that are unknown.  What if the parameter of interest is a r.v with a-priori p.d.f  How does one obtain a good estimate for  based on the observations  One technique is to use the observations to compute its a-posteriori p.d.f.  Of course, we can use the Bayes’ theorem to obtain this a-posteriori p.d.f.  Notice that this is only a function of , since represent given observations.

MAP Estimator  Once again, we can look for the most probable value of  suggested by the above a-posteriori p.d.f.  Naturally, the most likely value for  is the one corresponding to the maximum of the a-posteriori p.d.f (The MAP estimator for  ).  It is possible to use other optimality criteria as well.