Methods Workshop (3/10/07) Topic: Event Count Models.

Slides:



Advertisements
Similar presentations
Statistical Analysis SC504/HS927 Spring Term 2008
Advertisements

EC220 - Introduction to econometrics (chapter 10)
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Logit & Probit Regression
Limited Dependent Variables
Introduction to Logistic Regression In Stata Maria T. Kaylen, Ph.D. Indiana Statistical Consulting Center WIM Spring 2014 April 11, 2014, 3:00-4:30pm.
Count Models Sociology 229: Advanced Regression Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function F(Z) giving the probability is the cumulative standardized.
CBER Selecting the Appropriate Statistical Distribution for a Primary Analysis P. Lachenbruch.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Multinomial Logit Sociology 8811 Lecture 11 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 17: Regression for Case-control Studies BMTRY 701 Biostatistical Methods II.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
In previous lecture, we highlighted 3 shortcomings of the LPM. The most serious one is the unboundedness problem, i.e., the LPM may make the nonsense predictions.
Sociology 601: Class 5, September 15, 2009
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Chapter 6 The Normal Distribution
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
In previous lecture, we dealt with the unboundedness problem of LPM using the logit model. In this lecture, we will consider another alternative, i.e.
Event History Models Sociology 229: Advanced Regression Class 5
An Introduction to Logistic Regression
Interpreting Bi-variate OLS Regression
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
BINARY CHOICE MODELS: LOGIT ANALYSIS
Generalized Linear Models
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Topic 5 Statistical inference: point and interval estimate
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Count Models 1 Sociology 8811 Lecture 12
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
Quantitative Methods Analyzing event counts. Event Count Analysis Event counts involve a non-negative interger-valued random variable. Examples are the.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
The dangers of an immediate use of model based methods The chronic bronchitis study: bronc: 0= no 1=yes poll: pollution level cig: cigarettes smokes per.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Analysis of Experimental Data IV Christoph Engel.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
Exact Logistic Regression
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
Logistic Regression 2 Sociology 8811 Lecture 7 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Birthweight (gms) BPDNProp Total BPD (Bronchopulmonary Dysplasia) by birth weight Proportion.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Hypothesis Testing and Statistical Significance
The Probit Model Alexander Spermann University of Freiburg SS 2008.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
BINARY LOGISTIC REGRESSION
Advanced Quantitative Techniques
Lecture 18 Matched Case Control Studies
Event History Analysis 3
Generalized Linear Models
Introduction to Logistic Regression
Nonparametric Statistics
Problems with infinite solutions in logistic regression
Count Models 2 Sociology 8811 Lecture 13
Introduction to Econometrics, 5th edition
Presentation transcript:

Methods Workshop (3/10/07) Topic: Event Count Models

Getting Ready  Open STATA  Type “findit spost”  Download spost software for the version of STATA you are running  Type “findit outreg”  Download outreg software

Introduction  What are event counts?  Examples Number of uses of force per year Number of bills vetoed by the president per year Number of Supreme Court draft opinions circulated Number of people murdered by state (genocide)  Comparison to other data forms Dichotomous variables Event history

Motivation  OLS may produce biased, inconsistent, and inefficient estimates of event count data.  OLS makes predictions for negative Y values, while event counts are truncated at zero.  However, OLS models may be okay as the mean number of events in the event count series increases.

Types of Event Count Models  Poisson  Negative Binomial  Generalized Event Count  Truncated  Hurdle  Zero-inflated  Poisson Autoregressive Model

Poisson Assumptions Pr (y | μ) = (e -μ μ y )/y! Let y be a random variable indicating the number of times that an event has occurred during an interval of time E(y) = μ Var(y) = E(y) = μ (equidispersion) As μ increases, the probability of 0’s decreases As μ increases, the Poisson approximates a normal distribution (Long & Freese, 224). Events in non-overlapping time periods are independent.

Estimating a Poisson Model  Data: Long’s (1990) study of 915 biochemists and the number of papers published during graduate school.  Mean articles = 1.7; Figure 8.2 (Long, 1997: 220)  The Poisson model may not fit best because it under-predicts zeros and over-predicts in the 1-4 publication range.

Estimating Poisson in STATA  poisson art fem mar kid5 phd ment, nolog Poisson regression Number of obs = 915 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = art | Coef. Std. Err. z P>|z| [95% Conf. Interval] fem | mar | kid5 | phd | ment | _cons |  We can use a variety of tools to interpret the coefficients.

Interpreting Poisson Estimates  IRR: incidence rate ratios (add irr after comma in poisson command) Poisson regression Number of obs = 915 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = art | IRR Std. Err. z P>|z| [95% Conf. Interval] fem | mar | kid5 | phd | ment |  Married graduate students publish 1.17 more articles than single graduate students.  For every article published by a mentor, a graduate student publishes 1.03 more articles.

Interpreting Poisson Estimates  listcoef fem ment, help or listcoef fem ment, percent help poisson (N=915): Percentage Change in Expected Count Observed SD: art | b z P>|z| % %StdX SDofX fem | ment | b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test % = percent change in expected count for unit increase in X %StdX = percent change in expected count for SD increase in X SDofX = standard deviation of X  Being a female scientist decreases the expected number of articles by 20%, holding all other variables constant.

Interpreting Poisson Coefficients  You can use the mfx command in STATA to make point predictions for the counts.  mfx compute, at (mean fem=0)  Female (1.426), Male (1.785)  Married (1.697), Single(1.453)  Kids 0(1.764), 1(1.467), 2(1.219), 3(1.013)  You can also set more than one variable at a theoretically interesting value (e.g. mfx compute, at (mean fem=1 kid5=3).  You can also use the predict command to generate counts. See also Long and Freese’s description of several other substantive effects.

Negative Binomial  What if the rate of productivity, or μ, differs across individuals? This is known as heterogeneity.  Example: suppose men produce at a rate of μ + δ, while women produce at a rate of μ – δ. If there are equal numbers of men and women, then: [(μ + δ) + (μ – δ)]/2 > μ  When the variance exceeds the mean, as it does in this case, then we have a situation of over- dispersion. The Poisson model is not appropriate in this case, thus we estimate a negative binomial model.

Negative Binomial  The Poisson model would also be problematic if there is contagion in the data, where individuals with a given set of x’s initially have the same probability of an event occurring, but this probability changes as events occur (e.g. higher chance of publishing articles once you get the first 1 or 2 pubs.).  We can test for heterogeneity/contagion using the poisgof command or by examining the significant of the alpha parameter in the negative binomial model.

Negative Binomial  nbreg art fem mar kid5 phd ment, nolog Negative binomial regression Number of obs = 915 LR chi2(5) = Prob > chi2 = Log likelihood = Pseudo R2 = art | Coef. Std. Err. z P>|z| [95% Conf. Interval] fem | mar | kid5 | phd | ment | _cons | /lnalpha | alpha | Likelihood-ratio test of alpha=0: chibar2(01) = Prob>=chibar2 =  Interpretation of alpha: Poisson model assumes alpha equals zero; if p-value for chi-square test is less than.05, then the Negative Binomial model is preferred.

Comparing Models with Outreg  poisson art fem mar kid5 phd ment, nolog  outreg using c:\\outregexample  nbreg art fem mar kid5 phd ment, nolog  outreg using c:\\outregexample, append xstats  View output file, with a bit of manipulation it will look like the table in the handout.

Generalized Event Count  Special cases of the GEC (a) Negative Binomial, Var(y) > E(y) (b) Poission, Var(y) = E(y) (c) Continuous Parameter Binomial, Var(y) < E(y)  This model can be estimated using Gary King’s COUNT program.

Other Issues  Exposure: people might have different exposure times (e.g. years in PhD program); you can add an exposure command or add a variable capturing the natural log of exposure time. This could also be applied if there a maximum number of counts (use lny max ).  No zeros in your event count, e.g. observations enter the sample only after the first count occurs. Solution: use a truncated model  Different processes generating zeros: use hurdle count/split population model

Other Issues  Zero-inflated data with different processes generating zeros: use ZIP or ZINB Zero-inflated negative binomial regression Number of obs = 915 Nonzero obs = 640 Zero obs = 275 Inflation model = logit LR chi2(5) = Log likelihood = Prob > chi2 = art | Coef. Std. Err. z P>|z| [95% Conf. Interval] art | fem | mar | kid5 | phd | ment | _cons | inflate | fem | mar | kid5 | phd | ment | _cons | /lnalpha | alpha |

Interpretation of ZINB  listcoef, help  Top half of output represents scientists who have the opportunity to publish (e.g. Among those with the opportunity to publish, being a woman decreases the expected rate of publication by a factor of 0.91, holding all other factors constant).  Bottom half represents chance of being in the always zero group versus the not always zero group (e.g. Being a woman increases the odds of not having an opportunity to publish by a factor of 1.89).

Other Issues  Autocorrelation: if event count series shows persistence, use a PAR model; alpha parameter is misleading in this case  Plot autocorrelation function of event count series (ac command, need to tsset the data). Look for persistence in series.  PAR model can be estimated in R or Gauss; example in handout from Mitchell and Moore (2002)