Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical.

Similar presentations


Presentation on theme: "Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical."— Presentation transcript:

1 Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical Analysis

2 Outline Data Cleaning and Preprocessing Outlier Detection Missing Value Imputation Visualizing and Understanding Data Boxplots, Histograms, and Scatterplots Correlation Matrices Analyzing Data Contingency Tables Analysis of Variance (ANOVA) Regression

3 Laboratory for Interdisciplinary Statistical Analysis Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers benefit from the use of Statistics Experimental Design Data Analysis Interpreting Results Grant Proposals Software (R, SAS, JMP, SPSS...) Our goal is to improve the quality of research and the use of statistics at Virginia Tech.

4 How can LISA help? Formulate research question. Screen data for integrity and unusual observations. Implement graphical techniques to showcase the data – what is the story? Develop and implement an analysis plan to address research question. Help interpret results. Communicate! Help with writing the report or giving the talk. Identify future research directions. 4

5 Laboratory for Interdisciplinary Statistical Analysis Collaboration From our website request a meeting for personalized statistical advice Great advice right now: Meet with LISA before collecting your data Short Courses Designed to help graduate students apply statistics in their research Walk-In Consulting MondayFriday 1-3 pm in 401 Hutcheson Also, Tuesdays 1-3 pm in ICTAS Café X & Thursdays 1-3 pm in GLC Video Conf. Room for questions requiring <30 mins All services are FREE for VT researchers. LISA helps VT researchers benefit from the use of Statistics Designing Experiments Analyzing Data Interpreting Results Grant Proposals Using Software (R, SAS, JMP, Minitab...)

6 Some Useful Resources R Statistical Computing Software Can be downloaded for free from: R Studio, a free Integrated Development Environment: For a more interactive and user-friendly experience, try JMP Downloadable from the Virginia Tech software library: /jmp/index.html /jmp/index.html Amelia II: A Program for Missing Data Visit:

7 Types of Survey Data Data TypeDescriptionExamplesStatistics NominalData with no intrinsic relative meaning behind labels Strawberry, Banana, Hispanic Mode OrdinalData with an ordered structure Small, Extra Large, Likert Scale* Median and Percentiles Interval (continuous or discrete) Data with meaningful difference relations Degrees in Celsius, Birthdates, GPS Coordinates Mean, Standard Deviation, Correlation Ratio (continuous or discrete) Data with scale relationsWeight, Income, Length Mean, Standard Deviation, Correlation

8 Outlier Detection and Handling Outlier Outliers are data points that deviate far from the main body of data so as to arouse suspicion about their origins Visualize your data Boxplots, histograms, and scatterplots Only remove outliers that are verifiable errors Extremeness in observations is not in itself cause for data removal R Package outliers

9 Missing Value Imputation Imputation is the process of filling in the missing values of a dataset Before considering imputation, try going after respondents for their true answers Can be very tricky (Come to LISA for help) If only one or two missing values are present in a vast dataset, use the mean of available values as a best guess Honaker, James et al., AMELIA II: A Program for Missing Data

10 Visualizing Your Data Boxplots SAS/GRAPH(R) 9.2: Statistical Graphics Procedures Guide, Second Edition

11 Visualizing Your Data Histograms

12 Visualizing Your Data Scatter Plots

13 Understanding Your Data Correlation Matrices

14 Contingency Tables Tabulates the number of responses in each category Helps to visualize the distribution of data Use χ 2 approximate test for independence Pearson's Chi-squared test data: tab X-squared = , df = 2, p-value = Warning message: In chisq.test(tab) : Chi-squared approximation may be incorrect

15 Analysis of Variance Technique used to test the differences between groups Always plot your data before doing analyses Call: aov(formula = resp_height ~ gender) Terms: gender Residuals Sum of Squares Deg. of Freedom 1 39

16 Regression Actually a generalization of ANOVA Again, always plot your data Call: lm(formula = exercise ~ dad_height) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) dad_height Residual standard error: on 37 degrees of freedom (8 observations deleted due to missingness) Multiple R-squared: ,Adjusted R-squared: F-statistic: on 1 and 37 DF, p-value:

17 Other Useful Resources A PowerPoint on more automated outlier detection techniques: 2010/kdd10-outlier-tutorial.pdf 2010/kdd10-outlier-tutorial.pdf R Package outliers: project.org/web/packages/outliers/outliers.pdf project.org/web/packages/outliers/outliers.pdf On multiple imputation:


Download ppt "Analyzing Surveys Marcos Carzolio Associate Collaborator for LISA PhD Student Department of Statistics, VT Laboratory for Interdisciplinary Statistical."

Similar presentations


Ads by Google