STATA WORKSHOP www.lss.stir.ac.uk www.longitudinal.stir.ac.uk.

Slides:



Advertisements
Similar presentations
Multilevel modelling short course
Advertisements

Training opportunities – What do I need? And where can I get it? Vernon Gayle
Longitudinal Data Analysis for Social Science Researchers Introduction to Panel Models
Longitudinal and Multilevel Methods for Models with Discrete Outcomes with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David.
Lecture 8 (Ch14) Advanced Panel Data Method
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
1 Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
Nguyen Ngoc Anh Nguyen Ha Trang
PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Models with Discrete Dependent Variables
Multiple Linear Regression Model
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li November 4, 2004.

Event History Analysis: Introduction Sociology 229 Class 3 Copyright © 2010 by Evan Schofer Do not copy or distribute without permission.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Topic 3: Regression.
Detecting Spatial Clustering in Matched Case-Control Studies Andrea Cook, MS Collaboration with: Dr. Yi Li December 2, 2004.
Generalized Linear Models
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Longitudinal Data Analysis for Social Science Researchers Research Value of Longitudinal Data
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
1 Regression Models with Binary Response Regression: “Regression is a process in which we estimate one variable on the basis of one or more other variables.”
Class 4 Ordinary Least Squares SKEMA Ph.D programme Lionel Nesta Observatoire Français des Conjonctures Economiques
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Longitudinal Data Analysis for Social Science Researchers Thinking About Event Histories
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Chapter 13 Observational Studies & Experimental Design.
Longitudinal Data: An introduction to some conceptual issues Vernon Gayle.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Sep 2005:LDA - ONS1 Event history data structures and data management Paul Lambert Stirling University Prepared for “Longitudinal Data Analysis for Social.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Longitudinal Data Analysis Professor Vernon Gayle
HSRP 734: Advanced Statistical Methods June 19, 2008.
+ Recitation 3. + The Normal Distribution + Probability Distributions A probability distribution is a table or an equation that links each outcome of.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
“Further Modeling Issues in Event History Analysis by Robert E. Wright University of Strathclyde, CEPR-London, IZA-Bonn and Scotecon.
1 Psych 5500/6500 Standard Deviations, Standard Scores, and Areas Under the Normal Curve Fall, 2008.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Modelling Longitudinal Data Survival Analysis. Event History. Recurrent Events. A Final Point – and link to Multilevel Models (perhaps).
10. Basic Regressions with Times Series Data 10.1 The Nature of Time Series Data 10.2 Examples of Time Series Regression Models 10.3 Finite Sample Properties.
Questions to Ask Yourself Regarding ANOVA. History ANOVA is extremely popular in psychological research When experimental approaches to data analysis.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Scottish Social Survey Network: Master Class 1 Data Analysis with Stata Dr Vernon Gayle and Dr Paul Lambert 23 rd January 2008, University of Stirling.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
AP Statistics Chapter 24 Comparing Means.
Longitudinal Data Analysis for Social Science Researchers Re-introducing Analysis Methods
‘Interpreting coefficients from longitudinal models’ Professor Vernon Gayle and Dr Paul Lambert (Stirling University) Wednesday 1st April 2009.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
More complex event history analysis. Start of Study End of Study 0 t1 0 = Unemployed; 1 = Working UNEMPLOYMENT AND RETURNING TO WORK STUDY Spell or Episode.
Generalized Linear Models (GLMs) and Their Applications.
Statistics 3: mixed effect models Install R library lme4 to your computer: 1.R -> Packages -> Install packages 2.Choose mirror 3.Choose lme4 4.Open the.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1/53: Topic 3.1 – Models for Ordered Choices Microeconometric Modeling William Greene Stern School of Business New York University New York NY USA William.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.
Copyright © Cengage Learning. All rights reserved. Hypothesis Testing 9.
Modelling Longitudinal Data General Points Single Event histories (survival analysis) Multiple Event histories.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
Two-Sample-Means-1 Two Independent Populations (Chapter 6) Develop a confidence interval for the difference in means between two independent normal populations.
Multilevel Modelling Dr Andrew Bell,
Generalized Linear Models
Discrete Event Simulation - 4
Fixed, Random and Mixed effects
Presentation transcript:

STATA WORKSHOP

STATA WORKSHOP n Practical things n Toilets n Take a break – do the exercises n Fire drill n Management Centre

STATA WORKSHOP n Culture of the workshop n Practical n Culture of sharing knowledge n Ask questions n Something will go wrong (computers eh?) n You will tell us something we don’t know!

STATA WORKSHOP Structure of the workshop Introductory stuff – Dr Vernon Gayle Data management – Dr Paul Lambert More stuff – Dr Vernon Gayle Some advanced stuff – Professor David Bell

Statistical Modelling Some general points – see handout.

STATA SOFTWARE – GOOD POINTS n Does all the simple stuff (SPSS) n Is specifically designed for survey analysis (all the weighting and design related issues are better catered for) n Fits many more models than standard software n You can get started easily (menus and help) n There is a growing user community (lists etc) n New features emerge almost daily n There are good labour market opportunities (UK little known; USA well known)

STATA SOFTWARE – BAD POINTS n Poor data handling (compared with SPSS etc) n The weighting and design related issues can be complicated (some analysts ignore them) n There are still some models that can’t be fitted (see GLIM4; SABRE; MlWin etc) n STATA syntax is a pain in the bum n There is a growing user community, but they are generally GEEKBOYS (like myself!) n New features emerge almost daily these are sometimes tricky to get to grips with

Recurrent Events Analysis

The structure of many large-scale studies results in survey data being collected at a number of discrete occasions. In this situation, rather than being continuous, time lends itself to be conceptualized as a sequence of discrete events. Furthermore, social scientists are often substantively interested in whether a specific event has occurred. Taken together, these two issues appeal to the adoption of a discrete-time or event history approach.

Recurrent events are merely outcomes that can take place on a number of occasions. A simple example is unemployment measured month by month. In any given month an individual can either be employed or unemployed. If we had data for a calendar year we would have twelve discrete outcome measures (i.e. one for each month).

Social scientists now routinely employ statistical models for the analysis of discrete data, most notably logistic and log-linear models, in a wide variety of substantive areas. I believe that the adoption of a recurrent events approach is appealing because it is a logical extension of these models.

Willet and Singer (1995) conclude that discrete- time methods are generally considered to be simpler and more comprehensible, however, mastery of discrete-time methods facilitates a transition to continuous-time approaches should that be required. Willet, J. and Singer, J. (1995) Investigating Onset, Cessation, Relapse, and Recovery: Using Discrete-Time Survival Analysis to Examine the Occurrence and Timing of Critical Events. In J. Gottman (ed) The Analysis of Change (Hove: Lawrence Erlbaum Associates).

Employment BHPS Data Y0010 Timet1t1 t2t2 t3t3 t4t4 (Year) Wave

Consider a binary outcome or two-state event 0 = Event has not occurred 1 = Event has occurred In the cross-sectional situation we are used to modelling this with logistic regression.

Months obs Constantly unemployed

Months obs Constantly employed

Months obs Employed in month 1 then unemployed

Months obs Unemployed but gets a job in month six

Months obs obs obs obs Mixed employment patterns

Here we have a binary outcome – so could we simply use logistic regression to model it? Months obs000000

Our studio audience says…. Yes and No!

POOLED CROSS-SECTIONAL LOGIT MODEL In conventional logistic regression models, where each observation is assumed to be independent, a logistic link function is used, the contribution to the likelihood by the i th case and the t th event is given by the equation above. )'exp(1 )]'[exp( )( it x y x L B     

POOLED CROSS-SECTIONAL LOGIT MODEL x it is a vector of explanatory variables and  is a vector of parameter estimates.

This approach can be regarded as a naïve solution to our data analysis problem. We need to consider a number of technical issues… Note: If any economist or on the ball social scientists spots this you will get your grant/paper rejected!

Months Y 1 Y 2 obs00 Pickle’s tip - In repeated measured analysis we would require something like a ‘paired’ t test rather than an ‘independent’ t test because we can assume that Y 1 and Y 2 are related.

Repeated measures data violate an important assumption of conventional regression models. The responses of an individual at different points in time will not be independent of each other. This problem has been overcome by the inclusion of an additional, individual-specific error term.

The logical extension to the standard (vanilla) logit (i.e. the pooled analysis) is to use an appropriate longitudinal model. Random Effects Model Logit Logistic Mixture Model Called xtlogit in STATA

The random effects model extends the pooled cross-sectional model to include a case-specific random error term to better represent the effects of residual heterogeneity. For a sequence of outcomes for the i th case, the basic random effects model has the integrated (or marginal likelihood) given by the equation.

Davies and Pickles (1985) have demonstrated that the failure to explicitly model the effects of residual heterogeneity may cause severe bias in parameter estimates. Using longitudinal data the effects of omitted explanatory variables can be overtly accounted for within the statistical model. This greatly improves the accuracy of the estimated effects of the explanatory variables.