Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education Institutes Paul Lambert, University of Stirling Presentation to the Scottish Civil Society Data Partnership Project (S-CSDP), Webinar Mar 2016
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Components: 1)Academic research and statistical software 2)Examples in using SPSS for research 3)Examples in using Stata for research 4)Examples in using R for research 5)HE institutional access and the University of Stirling ‘Affiliate Membership for Third Sector Researchers’ scheme S-CSDP, 11 Mar 20163
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
1) Academic research and statistical software Academic researchers use software designed specifically for the statistical analysis of survey and survey-like data since at least the mid 1960’s (Hundreds of options – e.g. Lambert et al. 2015) Distinction between ‘general purpose’ and ‘specialist’ statistical software Theme of ‘documentation for replication’: software is better when it can provide a replicable trail of data analysis and management activities S-CSDP, 11 Mar 20165
Understanding filestore and software: Linking things together S-CSDP, 11 Mar (i) Somewhere on your computer, you typically have a copy of a data file (& its documentation) (ii) Your next step ordinarily is to access a software package that will be able to open and then do things to the data (iii) If you are good, you will use separately saved ‘command files’ to run processes through the software on the data, generating subsequent outputs
…software wars in academic survey research… If working with microdata, we ordinarily use specialist statistical software for data management and analysis People tend to get individually quite attached to their favourite(s) See also Lambert et al. (2015); and see ‘lab materials’ at ex_summer_school/ ex_summer_school/ S-CSDP, 11 Mar Stata’s origins are in economics but it has spread to other disciplines. It supports a very wide range of data management and analysis functionality. It is popular in North American and North and Central European academic survey research. R is a freeware with a wide range of capabilities. It is mostly used by statisticians and methodologists. MLwiN is an example of specialist software designed for a certain analytical purpose (fitting multilevel models). SPSS used to be the leading social science package for survey research in disciplines other than economics. It is still widely available and commonly taught and used.
S-CSDP, 11 Mar ‘Stat-JR’ offers dowloadable integration between software, including freeware, through locally installed copies ( m/software/statjr/ ) m/software/statjr/
S-CSDP, 11 Mar Controlling software: Using ‘syntax’
10 Documentation as replicable ‘workflows’ Reproducible (for self) Replicable (for all) Paper trail for whole lifecycle Cf. Dale 2006; Freese 2007 In survey research, this means using clearly annotated syntax files (e.g. Long 2009) Syntax Examples: Modern computing / data: There’s no excuse for not documenting / replicating! New opportunities for ‘workflow modelling’ S-CSDP, 11 Mar 2016
The tension between ‘simpler’ & ‘more complex’ statistical analysis ‘Complex’ analytical methods E.g. statistical models; sampling weights and survey design factors; sensitivity analysis for data permutations; ‘multivariate’ and ‘multiprocess’ systems Can be thought of as featuring a substantial element of ‘control’ for other factors relevant to the social mechanisms, e.g. ‘statistical’ models with many parameters expressing influences of ‘background variables’ and complex data structures ‘Simpler’ analytical methods E.g. univariate distributions, bivariate comparisons, accessible graphical summaries and headline percentages Can be appealing to communicate and still have important strengths, e.g. statistically representative patterns Introduce risks in summarising social mechanisms: spurious and unduly simplified trends and associations (e.g. interactions); incorrect point estimates and/or incorrect representation of uncertainty; encourages view that ‘statistics equal lies’ S-CSDP, 4 Mar > Academic software tends to support ‘complex’ methods, whereas many accessible, e.g. online, data analysis tools are using ‘simpler’ methods and moreover cannot readily be adapted to more complex analytical methods
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
2) Examples in using SPSS for research Installation comments SPSS Interface Using command syntax Applied example: Volunteering in the BHPS Sources of help e.g. Field 2013; UCLA statistical software: S-CSDP, 11 Mar ‘Syntax’ editor Alternative ‘paste’ to get syntax code
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
3) Examples in using Stata for research Installation comments Stata Interface Using command syntax Applied example: volunteering in the ESS Sources of help e.g. Kohler & Kreuter 2012; UCLA statistical software: S-CSDP, 11 Mar Typical format of ‘do’ file (‘command’ or ‘syntax’ file) Typical Stata output window (results)
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
4) Examples in using R for research Installation comments R Interface Using command syntax Example: Sample from Lambert (2015) Sources of help e.g. Field et al. 2012; Quick-R: UCLA statistical software: S-CSDP, 11 Mar Standard R RStudio
Webinar 4: Academic tools of data analysis: Comparing SPSS, Stata and R and engaging with Higher Education institutions Scottish Civil Society Data Partnership
What collaborative opportunities are out there? S-CSDP, 11 Mar ‘RCUK’ funding opportunities ESRC SDAI (explicitly promotes impact & collaboration) (ESRC 2015) Secondary analysis in general appeals to major funders Comparative research opportunities Other HE sector collaboration potential Further funded project options Unfunded research capacity PhD studentship sponsorship/collaborative schemes Training enrolments and taught course projects, e.g. MSc dissertation projects 5) HE institutional access and the University of Stirling ‘Affiliate Membership for Third Sector Researchers’ scheme
Routes to HE institutional access…? S-CSDP, 11 Mar Feedback at previous events highlights barriers to use of secondary surveys for research without HE Infrastructural support Filestore Software Library resources Consulting colleagues Collaboration with HE staff is often a good solution Friendly researcher/faculty Funded post, e.g. a sponsored PhD Please see for updates on a prospective new scheme that should help here, the University of Stirling Affiliate Membership scheme for Third Sector Researchers (AM-TSR)
References cited Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics, 4th Edition. London: Sage. Field, A., Miles, J., & Field, Z. (2012). Discovering Statistics Using R. London: Sage. Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), Kohler, H. P., & Kreuter, F. (2012). Data Analysis using Stata, Third edition. College Station, Tx: Stata Press. Lambert, P. S. (2015). Advances in data management for social survey research. In R. Procter & P. Halfpenny (Eds.), Innovations in Digital Research Methods (pp ). London: Sage. Lambert, P. S., Browne, W. J., & Michaelides, D. T. (2015). Contemporary developments in statistical software for social scientists. In R. Procter & P. Halfpenny (Eds.), Innovations in Digital Research Methods (pp ). London: Sage. Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. S-CSDP, 11 Mar