Statistical Inference and Regression Analysis: Stat-GB. 3302

Slides:



Advertisements
Similar presentations
EcoTherm Plus WGB-K 20 E 4,5 – 20 kW.
Advertisements

Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
EuroCondens SGB E.
Worksheets.
Sequential Logic Design
STATISTICS Linear Statistical Models
STATISTICS Random Variables and Probability Distributions
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS Univariate Distributions
Addition and Subtraction Equations
David Burdett May 11, 2004 Package Binding for WS CDL.
1 When you see… Find the zeros You think…. 2 To find the zeros...
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Stationary Time Series
Break Time Remaining 10:00.
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
EE, NCKU Tien-Hao Chang (Darby Chang)
PP Test Review Sections 6-1 to 6-6
MM4A6c: Apply the law of sines and the law of cosines.
MCQ Chapter 07.
Regression with Panel Data
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Progressive Aerobic Cardiovascular Endurance Run
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Slide R - 1 Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Prentice Hall Active Learning Lecture Slides For use with Classroom Response.
12 October, 2014 St Joseph's College ADVANCED HIGHER REVISION 1 ADVANCED HIGHER MATHS REVISION AND FORMULAE UNIT 2.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Select a time to count down from the clock above
16. Mean Square Estimation
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
9. Two Functions of Two Random Variables
A Data Warehouse Mining Tool Stephen Turner Chris Frala
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Commonly Used Distributions
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB ,
Professor William Greene Stern School of Business IOMS Department Department of Economics Statistical Inference and Regression Analysis: Stat-GB ,
Econometric Methodology. The Sample and Measurement Population Measurement Theory Characteristics Behavior Patterns Choices.
Statistical Inference and Regression Analysis: Stat-GB. 3302
Presentation transcript:

Statistical Inference and Regression Analysis: Stat-GB. 3302 Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics

Part 3 – Estimation Theory

Estimation Nonparametric population features Parameters Mean - income Correlation – disease incidence and smoking Ratio – income per household member Proportion – proportion of ASCAP music played that is produced by Dave Matthews Distribution – histogram and density estimation Parameters Fitting distributions – mean and variance of lognormal distribution of income Parametric models of populations – relationship of loan rates to attributes of minorities and others in Bank of America settlement on mortgage bias

Measurements as Observations Population Measurement Theory Characteristics Behavior Patterns Choices The theory argues that there are meaningful quantities to be statistically analyzed.

Application – Health and Income German Health Care Usage Data, 7,293 Households, Observed 1984-1995 Data downloaded from Journal of Applied Econometrics Archive. Some variables in the file are DOCVIS = number of visits to the doctor in the observation period HOSPVIS = number of visits to a hospital in the observation period HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped) HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC =  years of schooling AGE = age in years PUBLIC = decision to buy public health insurance HSAT = self assessed health status (0,1,…,10)

Observed Data

Inference about Population Measurement Characteristics Behavior Patterns Choices

Classical Inference Population Measurement Characteristics The population is all 40 million German households (or all households in the entire world). The sample is the 7,293 German households in 1984-1995. Population Measurement Sample Characteristics Behavior Patterns Choices Imprecise inference about the entire population – sampling theory and asymptotics

Bayesian Inference Population Measurement Characteristics Sample Characteristics Behavior Patterns Choices Sharp, ‘exact’ inference about only the sample – the ‘posterior’ density is posterior to the data.

Estimation of Population Features Estimators and Estimates Estimator = strategy for use of the data Estimate = outcome of that strategy Sampling Distribution Qualities of the estimator Uncertainty due to random sampling

Estimation Point Estimator: Provides a single estimate of the feature in question based on prior and sample information. Interval Estimator: Provides a range of values that incorporates both the point estimator and the uncertainty about the ability of the point estimator to find the population feature exactly.

‘Repeated Sampling’ - A Sampling Distribution The true mean is 500. Sample means vary around 500, some quite far off. The sample mean has a sampling mean and a sampling variance. The sample mean also has a probability distribution. Looks like a normal distribution. This is a histogram for 1,000 means of samples of 20 observations from Normal[500,1002].

Application: Credit Modeling 1992 American Express analysis of Application process: Acceptance or rejection; X = 0 (reject) or 1 (accept). Cardholder behavior Loan default (D = 0 or 1). Average monthly expenditure (E = $/month) General credit usage/behavior (Y = number of charges) 13,444 applications in November, 1992

0.7809 is the true proportion in the population of 13,444 we are sampling from.

Estimation Concepts Random Sampling Finite populations i.i.d. sample from an infinite population Information Prior Sample

Properties of Estimators

Unbiasedness The sample mean of the 100 sample estimates is 0.7844. The population mean (true proportion) is 0.7809.

Consistency N=144 .7 to .88 N=1024 .7 to .88 N=4900 .7 to .88

Competing Estimators of a Parameter Bank costs are normally distributed with mean . Which is a better estimator of , the mean (11.46) or the median (11.27)?

Interval estimates of the acceptance rate Based on the 100 samples of 144 observations

Methods of Estimation Information about the source population Approaches Method of Moments Maximum Likelihood Bayesian

The Method of Moments

Estimating a Parameter Mean of Poisson p(y)=exp(-λ) λy / y!, y = 0,1,…; λ > 0 E[y]= λ. E[(1/N)Σiyi]= λ. This is the estimator Mean of Exponential f(y) = exp(-y), y > 0;  > 0 E[y] = 1/. E(1/N)Σiyi = 1/. 1/{(1/N)Σiyi } is the estimator of 

Mean and Variance of a Normal Distribution

Proportion for Bernoulli In the AmEx data, the true population acceptance rate is 0.7809 =  Y = 1 if application accepted, 0 if not. E[y] =  E[(1/N)Σiyi] = paccept = . This is the estimator

Gamma Distribution

Method of Moments (P) = (P) /(P) = dlog (P)/dP

Estimate One Parameter Assume  known to be 0.1. Estimate P E[y] = P/  = P/.1 = 10P m1 = mean of y = 31.278 Estimate of P is 31.278/10 = 3.1278. One equation in one unknown

Application

Method of Moments Solutions create ; y1=y ; y2=log(y) ; ysq=y*y$ calc ; m1=xbr(y1) ; mlog=xbr(y2); m2=xbr(ysq) $ Minimize; start = 2.0, .06 ; labels = p,l ; fcn= (m1 - p/l)^2 + (mlog – (psi(p)-log(l)))^2 $ ---------------------------------------------------- P| 2.41074 L| .07707 --------+------------------------------------------- ; fcn= (m1 - p/l)^2 + (m2 – p*(p+1)/l^2 )^2 $ P| 2.06182 L| .06589

Properties of MoM estimator Unbiased? Sometimes, e.g., normal, Bernoulli and Poisson means Consistent? Yes by virtue of Slutsky Theorem Assumes parameters can vary continuously Assumes moment functions are continuous and smooth Efficient? Maybe – remains to be seen. (Which pair of moments should be used for the gamma distribution?) Sampling distribution? Generally normal by virtue of Lindeberg-Levy central limit theorem and the Slutsky theorem.

Estimating Sampling Variance Exact sampling results – Poisson Mean, Normal Mean and Variance Approximation based on linearization Bootstrapping – discussed later with maximum likelihood estimator.

Exact Variance of MoM Estimate normal or Poisson mean Estimator is sample mean = (1/N)i Yi. Exact variance of sample mean is 1/N * population variance.

Linearization Approach – 1 Parameter

Linearization Approach – 1 Parameter

Linearization Approach - General

Exercise: Gamma Parameters m1 = 1/N yi => P/ m2 = 1/N yi2 => P(P+1)/ 2 1. What is the Jacobian? (Derivatives) 2. How to compute the variance of m1, the variance of m2 and the covariance of m1 and m2? (The variance of m1 is 1/N times the variance of y; the variance of m2 is 1/N times the variance of y2. The covariance is 1/N times the covariance of y and y2.)

Sufficient Statistics

Sufficient Statistic

Sufficient Statistic

Sufficient Statistics

Gamma Density

Rao Blackwell Theorem The mean squared error of an estimator based on sufficient statistics is smaller than one not based on sufficient statistics. We deal in consistent estimators, so a large sample (approximate) version of the theorem is that estimators based on sufficient statistics are more efficient than those that are not.

Maximum Likelihood Estimation Criterion Comparable to method of moments Several virtues: Broadly, uses all the sample and nonsample information available  efficient (better than MoM in many cases)

Setting Up the MLE The distribution of the observed random variable is written as a function of the parameter(s) to be estimated P(yi|) = Probability density of data | parameters. L(|yi) = likelihood of parameter | data The likelihood function is constructed from the density Construction: Joint probability density function of the observed sample of data – generally the product when the data are a random sample. The estimator is chosen to maximize the likelihood of the data (essentially the probability of observing the sample in hand).

Regularity Conditions What they are 1. logf(.) has three continuous derivatives wrt parameters 2. Conditions needed to obtain expectations of derivatives are met. (E.g., range of the variable is not a function of the parameters.) 3. Third derivative has finite expectation. What they mean Moment conditions and convergence. We need to obtain expectations of derivatives. We need to be able to truncate Taylor series. We will use central limit theorems MLE exists for nonregular densities (see text). Questionable statistical properties.

Regular Exponential Density Exponential density f(yi|)=(1/)exp(-yi/) Average time until failure, , of light bulbs. yi = observed life until failure. Regularity (1) Range of y is 0 to  free of  (2) logf(yi|) = -log  – y/ ∂logf(yi|)/∂ = -1/ + yi/2 E[yi]= , E[∂logf()/∂]=0 (3) ∂2logf(yi|)/∂2 = 1/2 - 2yi/3 finite expectation = -1/2 (4) ∂3logf(yi|)/∂3 = -2/3 + 6yi/4 has finite expectation = 4/3 (5) All derivatives are continuous functions of 

Likelihood Function L()=Πi f(yi|) MLE = the value of  that maximizes the likelihood function. Generally easier to maximize the log of L. The same  maximizes log L In random sampling, logL=i log f(yi|)

Poisson Likelihood log and ln both mean natural log throughout this course

The MLE The log-likelihood function: log-L(|data)= Σi logf(yi|) The likelihood equation(s) = first derivative: First derivatives of log-L equals zero at the MLE. ∂[Σi logf(yi|)]/∂MLE = 0. (Interchange sum and differentiation) Σi [∂logf(yi|)/∂MLE]= 0.

Applications Bernoulli Exponential Poisson Normal Gamma

Bernoulli

Exponential Estimating the average time until failure, , of light bulbs. yi = observed life until failure. f(yi|)=(1/)exp(-yi/) L()=Πi f(yi|)= -N exp(-Σyi/) logL ()=-Nlog () - Σyi/ Likelihood equation: ∂logL()/∂=-N/ + Σyi/2 =0 Solution: (Multiply both sides of equation by 2)  = Σyi /N (sample average estimates population average)

Poisson Distribution

Normal Distribution

Gamma Distribution (P) = (P) /(P) = dlog (P)/dP

Gamma Application create ; y1=y ; y2=log(y) ; ysq=y*y$ Gamma (Loglinear) Regression Model Dependent variable Y Log likelihood function -85.37567 --------+-------------------------------------------------------------------- | Standard Prob. 95% Confidence Y| Coefficient Error z |z|>Z* Interval |Parameters in conditional mean function LAMBDA| .07707*** .02544 3.03 .0024 .02722 .12692 |Scale parameter for gamma model P_scale| 2.41074*** .71584 3.37 .0008 1.00757 3.81363 SAME SOLUTION AS METHOD OF MOMENTS USING M1 and Mlog create ; y1=y ; y2=log(y) ; ysq=y*y$ calc ; m1=xbr(y1) ; mlog=xbr(y2); m2=xbr(ysq) $ Minimize; start = 2.0, .06 ; labels = p,l ; fcn= (m1 - p/l)^2 + (mlog – (psi(p)-log(l)))^2 $ -------------------------------------------------------------------- P| 2.41074 L| .07707 --------+---------------------------------------------------------- P| 2.06182 L| .06589

Properties Estimator Regularity Finite sample vs. asymptotic properties Properties of the estimator Information used in estimation

Properties of the MLE Sometimes unbiased, usually not Always consistent (under regularity) Large sample normal distribution Efficient Invariant Sufficient (uses sufficient statistics)

Unbiasedness Usually when estimating a parameter that is the mean of the random variable Normal mean Poisson mean Bernoulli probability is the mean. Almost no other cases.

Consistency Under regularity MLE is consistent. Without regularity, it may be consistent, but cannot be proved. Almost all cases, mean square consistent Expectation converges to the parameter Variance converges to zero. (Proof sketched in text, 275-276)

Large Sample Distribution

The Information Equality

Deduce The Variance of MLE

Computing the Variance of the MLE

Application: GSOEP Income Descriptive Statistics for 1 variables --------+--------------------------------------------------------------------- Variable| Mean Std.Dev. Minimum Maximum Cases Missing HHNINC| .355564 .166561 .030000 2.0 2698 0

Variance of MLE

Bootstrapping Given the sample, i = 1,…,N Sample N observations with replacement – some get picked more than once, some do not get picked. Recompute estimate of . Repeat R times, obtain R new estimates of . Estimate variance with the sample variance of the R new estimates.

Bootstrap Results Estimated Variance = .003112.

Sufficiency If sufficient statistics exist, the MLE will be a function of them Therefore, MLE satisfies the Rao Blackwell Theorem (in large samples).

Efficiency Crame’r – Rao Lower Bound Variance of a consistent, asymptotically normally distributed estimator is > -1/{NE[H()]}. The MLE achieves the C-R lower bound, so it is efficient. Implication: For normal sampling, the mean is better than the median.

Invariance

Bayesian Estimation Philosophical underpinnings How to combine information contained in the sample

“Estimation” Assembling information Prior information = out of sample. Literally prior or outside information Sample information is embodied in the likelihood Result of the analysis: “Posterior belief” = blend of prior and likelihood

Bayesian Investigation No fixed “parameters.”  is a random variable. Data are realizations of random variables. There is a marginal distribution p(data) Parameters are part of the random state of nature, p() = distribution of  independently (prior to) the data Investigation combines sample information with prior information. Outcome is a revision of the prior based on the observed information (data)

Symmetrical Treatment Likelihood is p(data|) Prior summarizes nonsample information about  in p() Joint distribution is p(data,) P(data,) = p(data|)p() Use Bayes theorem to get p( |data) = posterior distribution

The Posterior Distribution

Priors – Where do they come from? What does the prior contain Informative priors – real prior information Noninformative priors Mathematical Complications Diffuse Uniform Normal with huge variance Improper priors Conjugate priors

Application Consider estimation of the probability that a production process will produce a defective product. In case 1, suppose the sampling design is to choose N = 25 items from the production line and count the number of defectives. If the probability that any item is defective is a constant θ between zero and one, then the likelihood for the sample of data is L( θ | data) = θ D(1 − θ) 25−D,   where D is the number of defectives, say, 8. The maximum likelihood estimator of θ will be q = D/25 = 0.32, and the asymptotic variance of the maximum likelihood estimator is estimated by q(1 − q)/25 = 0.008704.

Application: Posterior Density

Posterior Moments

Mixing Prior and Sample Information

Modern Bayesian Analysis Bayesian Estimate of Theta Observations = 5000 (Posterior mean was .333333) Mean = .334017 Standard Deviation = .086336 Posterior Variance = .007936 Sample variance = .007454 Skewness = .248077 Kurtosis-3 (excess)= -.161478 Minimum = .066214 Maximum = .653625 .025 Percentile = .177090 .975 Percentile - .510028

Modern Bayesian Analysis Multiple parameter settings Derivation of exact form of expectations and variances for p(1,2 ,…,K |data) is hopelessly complicated even if the density is tractable. Strategy: Sample joint observations (1,2 ,…,K) from the posterior population and use marginal means, variances, quantiles, etc. How to sample the joint observations??? (Still hopelessly complicated.)

Magic: The Gibbs Sampler Objective: Sample joint observations on 1,2 ,…,K. from p(1,2 ,…,K|data) (Let K = 3) Strategy: Gibbs sampling: Derive p(1|2,3,data) p(2|1,3,data) p(3|1,2,data) Gibbs Cycles produce joint observations 0. Start 1,2,3 at some reasonable values 1. Sample a draw from p(1|2,3,data) using the draws of 1,2 in hand 2. Sample a draw from p(2|1,3,data) using the draw at step 1 for 1 3. Sample a draw from p(3|1,2,data) using the draws at steps 1 and 2 4. Return to step 1. After a burn in period (a few thousand), start collecting the draws. The set of draws ultimately gives a sample from the joint distribution.

Methodological Issues Priors: Schizophrenia Uninformative are disingenuous Informative are not objective Using existing information? Bernstein von Mises and likelihood estimation. In large samples, the likelihood dominates The posterior mean will be the same as the MLE