Section Count Data Models. Introduction Many outcomes of interest are integer counts –Doctor visits –Low work days –Cigarettes smoked per day –Missed.

Slides:



Advertisements
Similar presentations
Structural Equation Modeling
Advertisements

Welcome to Econ 420 Applied Regression Analysis Study Guide Week Fourteen.
Discrete Uniform Distribution
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Discrete Random Variables
Chapter 5 Some Important Discrete Probability Distributions
Chapter 5 Discrete Random Variables and Probability Distributions
SOLVED EXAMPLES.
Introduction to Categorical Data Analysis
Statistical model for count data Speaker : Tzu-Chun Lo Advisor : Yao-Ting Huang.
Classical Regression III
1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland.
1 Module 9 Modeling Uncertainty: THEORETICAL PROBABILITY MODELS Topics Binomial Distribution Poisson Distribution Exponential Distribution Normal Distribution.
Ordered probit models.
The Simple Regression Model
1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.
1 MF-852 Financial Econometrics Lecture 6 Linear Regression I Roy J. Epstein Fall 2003.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
OMS 201 Review. Range The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of dispersion.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
EDUC 200C Section 5–Hypothesis Testing Forever November 2, 2012.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Methods Workshop (3/10/07) Topic: Event Count Models.
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
Covariance and correlation
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 8 Continuous.
Chapter 4-5: Analytical Solutions to OLS
Section 15.8 The Binomial Distribution. A binomial distribution is a discrete distribution defined by two parameters: The number of trials, n The probability.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
1 Sampling Distributions Lecture 9. 2 Background  We want to learn about the feature of a population (parameter)  In many situations, it is impossible.
Go to index Two Sample Inference for Means Farrokh Alemi Ph.D Kashif Haqqi M.D.
01/20151 EPI 5344: Survival Analysis in Epidemiology Maximum Likelihood Estimation: An Introduction March 10, 2015 Dr. N. Birkett, School of Epidemiology,
MTH 161: Introduction To Statistics
Chi-squared Tests. We want to test the “goodness of fit” of a particular theoretical distribution to an observed distribution. The procedure is: 1. Set.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 5-1 Chapter 5 Some Important Discrete Probability Distributions Basic Business Statistics.
Limited Dependent Variables Ciaran S. Phibbs May 30, 2012.
Correlation & Regression Chapter 5 Correlation: Do you have a relationship? Between two Quantitative Variables (measured on Same Person) (1) If you have.
1 Discrete and Categorical Data William N. Evans Department of Economics University of Maryland.
2.4 Units of Measurement and Functional Form -Two important econometric issues are: 1) Changing measurement -When does scaling variables have an effect.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.
Obesity, Medication Use and Expenditures among Nonelderly Adults with Asthma Eric M. Sarpong AHRQ Conference September 10, 2012.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Defining the Goldilocks problem Jane E. Miller, PhD.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
1 G Lect 2w Review of expectations Conditional distributions Regression line Marginal and conditional distributions G Multiple Regression.
Beyond the Two-Part Model: Methods for Handling Truncated and Skewed Dependent Variables Eric P. Slade, PhD 1,2 1.VISN5 Capitol Network Mental Illness.
Review - Confidence Interval Most variables used in social science research (e.g., age, officer cynicism) are normally distributed, meaning that their.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Random Variables Example:
The Analysis of Variance ANOVA
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Chapter 4. Random Variables - 3
Section 10.5 Let X be any random variable with (finite) mean  and (finite) variance  2. We shall assume X is a continuous type random variable with p.d.f.
3.1 Statistical Distributions. Random Variable Observation = Variable Outcome = Random Variable Examples: – Weight/Size of animals – Animal surveys: detection.
Logistic Regression and Odds Ratios Psych DeShon.
Binomial Distribution. Bernoulli Trials Repeated identical trials are called Bernoulli trials if: 1. There are two possible outcomes for each trial, denoted.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin One Sample Tests of Hypothesis Chapter 10.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Non-Linear Dependent Variables Ciaran S. Phibbs November 17, 2010.
CHAPTER 12 MODELING COUNT DATA: THE POISSON AND NEGATIVE BINOMIAL REGRESSION MODELS Damodar Gujarati Econometrics by Example, second edition.
Lecture 4: Count Data Models
Correlation and Simple Linear Regression
Chapter 4. Inference about Process Quality
Our theory states Y=f(X) Regression is used to test theory.
Simple Linear Regression and Correlation
Heteroskedasticity.
Correlation and Simple Linear Regression
Presentation transcript:

Section Count Data Models

Introduction Many outcomes of interest are integer counts –Doctor visits –Low work days –Cigarettes smoked per day –Missed school days OLS models can easily handle some integer models

Example –SAT scores are essentially integer values –Few at ‘tails’ –Distribution is fairly continuous –OLS models well In contrast, suppose –High fraction of zeros –Small positive values

OLS models will –Predict negative values –Do a poor job of predicting the mass of observations at zero Example –Dr visits in past year, Medicare patients(65+) –1987 National Medical Expenditure Survey –Top code (for now) at 10 –17% have no visits

visits | Freq. Percent Cum | | | | | | | | | | | Total | 5,

Poisson Model y i is drawn from a Poisson distribution Poisson parameter varies across observations f(y i ;λ i ) =e -λi λ i yi /y i ! For λ i >0 E[y i ]= Var[y i ] = λ i = f(x i, β)

λ i must be positive at all times Therefore, we CANNOT let λ i = x i β Let λ i = exp(x i β) ln(λ i ) = (x i β)

d ln(λ i )/dx i = β Remember that d ln(λ i ) = dλ i /λ i Interpret β as the percentage change in mean outcomes for a change in x

Problems with Poisson Variance grows with the mean –E[y i ]= Var[y i ] = λ i = f(x i, β) Most data sets have over dispersion, where the variance grows faster than the mean In dr. visits sample,  = 5.6, s=6.7 Impose Mean=Var, severe restriction and you tend to reduce standard errors

Negative Binomial Model Where γ i = exp(x i β) and δ ≥ 0 E[y i ] = δγ i = δexp(x i β) Var[y i ] = δ (1+δ) γ i Var[y i ]/ E[y i ] = (1+δ)

δ must always be ≥ 0 In this case, the variance grows faster than the mean If δ=0, the model collapses into the Poisson Always estimate negative binomial If you cannot reject the null that δ=0, report the Poisson estimates

Notice that ln(E[y i ]) = ln(δ) + ln(γ i ), so d ln(E[y i ]) /dx i = β Parameters have the same interpretation as in the Poisson model

In STATA POISSON estimates a MLE model for poisson –Syntax POISSON y independent variables NBREG estimates MLE negative binomial –Syntax NBREG y independent variables

Interpret results for Poisson Those with CHRONIC condition have 50% more mean MD visits Those in EXCELent health have 78% fewer MD visits BLACKS have 33% fewer visits than whites Income elasticity is 0.021, 10% increase in income generates a 2.1% increase in visits

Negative Binomial Interpret results the same was as Poisson Look at coefficient/standard error on delta Ho: delta = 0 (Poisson model is correct) In this case, delta = 5.21 standard error is 0.15, easily reject null. Var/Mean = 1+delta = 6.21, Poisson is mis-specificed, should see very small standard errors in the wrong model

Selected Results, Count Models Parameter (Standard Error) VariablePoissonNegative Binomial Age (0.026)0.103(0.055) Age (0.026)0.204(0.054) Chronic0.500(0.014)0.509(0.029) Excel-0.784(0.031)-0.527(0.059) Ln(Inc).0.021(0.007)0.038(0.016)