Copyright © Cengage Learning. All rights reserved.

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
Advertisements

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Statistical Estimation and Sampling Distributions
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
SOLVED EXAMPLES.
Copyright © Cengage Learning. All rights reserved.
Estimation A major purpose of statistics is to estimate some characteristics of a population. Take a sample from the population under study and Compute.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Section 6.1 Let X 1, X 2, …, X n be a random sample from a distribution described by p.m.f./p.d.f. f(x ;  ) where the value of  is unknown; then  is.
Maximum likelihood (ML)
Stat 321 – Lecture 26 Estimators (cont.) The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
Discrete Random Variables and Probability Distributions
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
Copyright © Cengage Learning. All rights reserved. 6 Point Estimation.
Maximum likelihood (ML)
Copyright © Cengage Learning. All rights reserved. 3.5 Hypergeometric and Negative Binomial Distributions.
Chapter 6. Point Estimation Weiqi Luo ( 骆伟祺 ) School of Software Sun Yat-Sen University : Office : # A313
Traffic Modeling.
STATISTICAL INFERENCE PART I POINT ESTIMATION
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 7 Point Estimation
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
CLASS: B.Sc.II PAPER-I ELEMENTRY INFERENCE. TESTING OF HYPOTHESIS.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Copyright © Cengage Learning. All rights reserved. 5 Joint Probability Distributions and Random Samples.
Conditional Expectation
LEARNING OBJECTIVES After careful study of this chapter you should be able to do the following: 1.Explain the general concepts of estimating the parameters.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Copyright © Cengage Learning. All rights reserved. 4 Continuous Random Variables and Probability Distributions.
Virtual University of Pakistan
Statistical Estimation
Discrete Random Variables and Probability Distributions
Points and Interval Estimates
STATISTICS POINT ESTIMATION
STATISTICAL INFERENCE
12. Principles of Parameter Estimation
6 Point Estimation.
Copyright © Cengage Learning. All rights reserved.
Joint Probability Distributions and Random Samples
STATISTICAL INFERENCE PART I POINT ESTIMATION
Estimation Maximum Likelihood Estimates Industrial Engineering
CONCEPTS OF ESTIMATION
Simple Linear Regression
Computing and Statistical Data Analysis / Stat 7
6 Point Estimation.
Learning From Observed Data
STATISTICAL INFERENCE PART I POINT ESTIMATION
Quality Control Methods
Distribution-Free Procedures
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Applied Statistics and Probability for Engineers
Presentation transcript:

Copyright © Cengage Learning. All rights reserved. 6 Point Estimation Copyright © Cengage Learning. All rights reserved.

6.2 Methods of Point Estimation Copyright © Cengage Learning. All rights reserved.

Methods of Point Estimation We now introduce two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem.

Methods of Point Estimation Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators.

The Method of Moments

The Method of Moments The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators.

The Method of Moments Definition Thus the first population moment is E(X) = , and the first sample moment is Xi/n = The second population and sample moments are E(X2) and Xi2/n, respectively. The population moments will be functions of any unknown parameters 1, 2, . . . .

The Method of Moments Definition

The Method of Moments If, for example, m = 2, E(X) and E(X2) will be functions of 1 and 2. Setting E(X) = (1/n)  Xi (= ) and E(X2) = (1/n)  Xi2 gives two equations in 1 and 2. The solution then defines the estimators.

Example 6.12 Let X1, X2, . . . , Xn represent a random sample of service times of n customers at a certain facility, where the underlying distribution is assumed exponential with parameter . Since there is only one parameter to be estimated, the estimator is obtained by equating E(X) to . Since E(X) = 1/ for an exponential distribution, this gives 1/ = or  = 1/ . The moment estimator of  is then 

Maximum Likelihood Estimation

Maximum Likelihood Estimation The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties.

Example 6.15 The best protection against hacking into an online account is to use a password that has at least 8 characters consisting of upper- and lowercase letters, numerals, and special characters. [Note: The Jan. 2012 issue of Consumer Reports reported that only 25% of individuals surveyed used a strong password.] Suppose that 10 individuals who have email accounts with a certain provider are selected, and it is found that the first, third, and tenth individuals have such strong protection, whereas the others do not.

Example 6.15 Let p = P(strong protection), i.e., p is the proportion of all such account holders having strong protection. Define (Bernoulli) random variables 𝑋 1 , 𝑋 2 ,…, 𝑋 10 by Then for the obtained sample, 𝑋 1 = 𝑋 3 = 𝑋 10 =1and the other seven 𝑋 𝑖 ’s are all zero. The probability mass function of any particular 𝑋 𝑖 is, 𝑝 𝑥𝑖 ( 1−𝑝) 1−𝑥𝑖 , which becomes p if 𝑋 𝑖 = 1 and 1 - p when 𝑋 𝑖 = 0. Now suppose that the conditions of various passwords are independent of one another.

Example 6.15 This implies that the 𝑋 𝑖 ’s are independent, so their joint probability mass function is the product of the individual pmf’s. Thus the joint pmf evaluated at the observed 𝑋 𝑖 ’s is (6.4) Suppose that p = .25. Then the probability of observing the sample that we actually obtained is (.25) 3 (.75) 7 =.002086 If instead p = .50, then this probability is (.50) 3 (.50) 7 =.000977. For what value of p is the obtained sample most likely to have occurred? That is, for what value of p is the joint pmf (6.4) as large as it can be? What value of p maximizes (6.4)?

Graph of the likelihood (joint pmf) (6.4) from Example 15 cont’d Figure 6.6(a) shows a graph of the likelihood (6.4) as a function of p. It appears that the graph reaches its peak above p = .3 = the proportion of flawed helmets in the sample. Graph of the likelihood (joint pmf) (6.4) from Example 15 Figure 6.6(a)

Graph of the natural logarithm of the likelihood Example 6.15 cont’d Figure 6.6(b) shows a graph of the natural logarithm of (6.4); since ln[g(u)] is a strictly increasing function of g(u), finding u to maximize the function g(u) is the same as finding u to maximize ln[g(u)]. Graph of the natural logarithm of the likelihood Figure 6.6(b)

Example 6.15 cont’d We can verify our visual impression by using calculus to find the value of p that maximizes (6.4). Working with the natural log of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here ln[ f (x1, . . . , x10; p)] = ln[p3(1 – p)7] = 3ln(p) + 7ln(1 – p) (6.5)

Example 6.15 Thus [the (1) comes from the chain rule in calculus]. cont’d Thus [the (1) comes from the chain rule in calculus].

Example 6.15 cont’d Equating this derivative to 0 and solving for p gives 3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as conjectured. That is, our point estimate is = .30. It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5.

Example 6.15 cont’d Suppose that rather than being told the condition of every password, we had only been informed that three of the ten were strong. Then we would have the observed value of a binomial random variable X = the number with strong passwords. The pmf of X is For x = 3, this becomes The binomial coefficient is irrelevant to the maximization, so again = .30.

Maximum Likelihood Estimation Definition

Maximum Likelihood Estimation The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most closely” with the observed data.

Example 6.16 Suppose X1, X2, . . . , Xn is a random sample from an exponential distribution with parameter . Because of independence, the likelihood function is a product of the individual pdf’s: The natural logarithm of the likelihood function is ln[ f (x1, . . . , xn ; )] = n ln() – xi

Example 6.16 cont’d Equating (d/d)[ln(likelihood)] to zero results in n/ – xi = 0, or  = n/xi = Thus the mle is it is identical to the method of moments estimator [but it is not an unbiased estimator, since

Example 6.17 Let X1, . . . , Xn be a random sample from a normal distribution. The likelihood function is so

Example 6.17 cont’d To find the maximizing values of  and  2, we must take the partial derivatives of ln(f ) with respect to  and  2, equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are The mle of  2 is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators.

Estimating Functions of Parameters

Estimating Functions of Parameters Once the mle for a parameter  is available , the mle for any function of  , such as 1/  or 𝜃 , is easily obtained Proposition

Example 6.20 Example 6.17 continued… In the normal case, the mle’s of  and 2 are To obtain the mle of the function substitute the mle’s into the function: The mle of  is not the sample standard deviation S, though they are close unless n is quite small.

Large Sample Behavior of the MLE

Large Sample Behavior of the MLE Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s. Proposition

Large Sample Behavior of the MLE Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians. Many of the estimators used in the remainder of the book are mle’s. Obtaining an mle, however, does require that the underlying distribution be specified.