Stats Lunch: Day 2 Screening Your Data: Why and How.

Stats Lunch: Day 2 Screening Your Data: Why and How

Data Screening Much of the following info comes from a more thorough (and frankly much better) description of data screening/cleaning: Tabachnick & Fidell (2001). Cleaning up your act: screening data prior to analysis, Using Multivariate Statistics (4 th Ed.), Allyn & Bacon, MA. Why Screen? 1)Ensure the accuracy of your data hence insuring you don’t publish absolute sh!t… 3)Maximize the power of your data set Which helps ensure you can find at least SOME sh!t to publish… 2)Ensure you meet assumptions of your planned analyses, and to identify any special analysis needs Helps ensure your stats aren’t absolute sh!t…

The Usual Suspects 1)Inaccurate Data a) Data entry errors b) Data merge errors 2) Missing Values 3)Outliers 4)Lack of Normality 5)Lack of Linearity 6)Homoscedasticity and Multivariate Problems

Avoiding Inaccurate Data 1)KNOW WHAT YOUR DATA ARE SUPPOSED TO LOOK LIKE What possible values are (including range). How many categories of subjects (controls, patients, etc) there should be. How many subjects should be in each group. 2)Find out what your data DO look like..  Run Descriptives and Frequencies in SPSS  Can use “Explore” and/or “Crosstabs” option to look at values by Group

Example: Avoiding Inaccurate Data 1)Choose “Descriptive Statistics” 2)Choose “Explore” 3)Explore by “Factor List” 4)Enter variables you care about in “Dependent List” 5)Click on Display “Statistics”, “Plots”, or “Both” 6)And then on the appropriated tabs… Yes, I realize the crossed arrows make for a terrible slide…

Example: Avoiding Inaccurate Data 7)Choose whatever you want, then click “continue” 8)It might be helpful to then click on options and choose “Report Missing Values”, then “continue” and “Ok”

What’s wrong with this picture?  Look for any warnings  Any unexpected groups  Unexpected N’s  Missing Values

What’s wrong with this picture? These are ITC Data  Look for means and Standard Deviations that are very different from other groups.  Do these differences make any sense?

What’s wrong with this picture?  Look for extreme cases  Do they look like OUTLIERS or INNACURATE DATA?

Dealing With Missing Data  Are missing values RANDOM or Systematic E.G., are 2 subjects missing electrode F4, or are %50 of your ketamine group missing day two data…  Options if missing values are RANDOM: 1)Omit the case (but don’t DELETE it) 2)Substitute in a score  Of that subject’s scores (e.g., mean of other frontal electrodes)  Group mean (e.g., F4 mean of all other controls)  Substitute predicted score from regression  Lots of other more complicated substitution algorithms.

Dealing With Missing Data  Options if missing values are NOT RANDOM: 1)You’re pretty much screwed…regardless of your response to problem, it will effect the validity of the study. However:  You might randomly delete subjects in other groups to match the sample sizes with your attenuated group.  Analyze data: both with and without missing cases  Treat missing data as a variable: Use of dummy variables in multiple regression models.

Outliers and Normality  Parametric Statistics (e.g., r, t- tests, ANOVA, Regressions) are built on the assumption of NORMALITY and LINEARITY…  If data are skewed or kurtotic, then this assumption is violated  If violation is large enough, it effects the accuracy of our results (typically towards making Type I errors).  Outliers (extreme scores) can be a major contributor to both (but particularly skewness)

Looking for Non-Normality  Can get measures of skewness and kurtosis from several “Descriptives” modules in SPSS  A basic rule of thumb is that if either number is greater than 1, or less than –1, you might have a problem…  However, dependent on sample size

Looking for Outliers  Outliers can be detected using several SPSS routines, but the easiest is by plotting your data (same for shape of distribution).  Two ways I like to do it 1)The Histogram A.Go to “graphs” and then select “Histogram”

Looking for Outliers B.Enter Variable you want to look at C.I like to display the normal curve… D.Then click “ok” Any score falling outside the curve is an outlier

Looking for Outliers  Another option is to “panel” by your Grouping variable…

Looking for Outliers 2) The sneaky scatterplot method  Make sure your subject # variable is numeric  Plot subject number on X axis, and score of interest on Y  Easiest to do if you “Select Cases” in the data menu (e.g., look at each group one at a time)… A.Go to “graphs” and then select “Scatter/Dot”

B.Choose the kind of plot you want and click “define” Looking for Outliers C. Enter the variales you want and click “ok”

Looking for Outliers  You’ll get a graph where you can easily see if there’s outliers…. D. By messing with the scale and chart size under the “chart editor” function (double click the graph on your output page), you can figure out the subject number of the outlier

Dealing with Outliers 1)Make sure it’s not a mistake (data entry error, impossible value, subject included in wrong group, etc.) 2)If it’s “real” data then you can:  Run with the data as is (and be sure you’re prepared to defend this choice)  Delete the case: this is dangerous ground, but can sometimes be justified (e.g., if z > 3…and again, be sure you can defend your choice)  Transform the data

Transformations  Mathematical routine applied to ALL data in your set, designed to reduce effects of extreme values and/or kurtosis (helps for both skewness and kurtosis): Doesn’t effect relationship between scores! Common Transformations: 1)Taking Square Root: Good for positive skew 2)Taking the Log10: Good for substantial positive skew 3)Taking the Inverse: good for severe positive skewness  Might need to add or subtract a constant from each score prior to transforming:  Add: if you have positively skewed data with some scores = 0  Subtract: If you’re dealing with negative skewness

Transformations These transforms are done using “Compute” commands in SPSS… Tabachnick and Fidel, Table 4.3

Example of a SPSS Transformation 1)Click on “transform”, and then “compute” 2)Name your new variable 3)Type in your expression 4)Click OK

Example of a SPSS Transformation 5)Rerun your descriptives, graphs, etc. Before: After Square Root Transform: If first you don’t succeed… After Log Transform

Linearity The second major assumption of parametric stats is that relationships are LINEAR  If your data fail to meet this assumption the stats won’t work…even if there is a real relationship there…  Another reason why you want to plot out your data…  We’ll deal with multivariate linearity at some later lunch…

If all else fails:  No matter how you transform them, your data are non- normal, or you see a non-linear relationship between your variables… 1)Run the data using parametric stats, but recognize (and describe/defend) limitations:  Use very conservative alpha levels (e.g., p <.001)  Treat your inferential stats primarily as descriptors… 2)Use nonparametric statistics

Stats Lunch: Day 2 Screening Your Data: Why and How.

Similar presentations

Presentation on theme: "Stats Lunch: Day 2 Screening Your Data: Why and How."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Stats Lunch: Day 2 Screening Your Data: Why and How.

Similar presentations

Presentation on theme: "Stats Lunch: Day 2 Screening Your Data: Why and How."— Presentation transcript:

Similar presentations

About project

Feedback