Statistical analysis
Why?? (besides making your life difficult …) Scientists must collect data AND analyze it Does your data support your hypothesis? Is it valid? Statistics helps us find relationships between sets of data. You are the scientist now, you must be comfortable with analysis of your data
Let’s look at two sets of data Sample 1 -10, 0, 10, 20, 30 Sample 2 8, 9, 10, 11, 12 What can you tell me about this data???
Mean: the “average” of the data or the central tendency Sample 1 -10, 0, 10, 20, 30 Sample 2 8, 9, 10, 11, 12 Mean = 10 Is this analysis complete??? NO!
Range: how far is the spread? Largest # - smallest # Sample 1 -10, 0, 10, 20, 30 30 – (-10) Sample 2 8, 9, 10, 11, 12 Range = 40 Range = 4 Does this data help? Yes, Sample 1 is more dispersed Obvious? Perhaps, but now shown mathematically
Something more … standard deviation SD is a measure to show how individual data points are dispersed around the mean
Assuming normal data distribution (bell curve) 68% of all collected values lie within +/- 1 SD 95% of all collected values lie within +/- 2 SD So what???
Standard deviation A small SD indicates the data values are clustered around the mean May also indicate few exteme data points A large SD indicates the data values are spread out May also indicate extreme data points Outliers??
Standard deviation
Let’s practice …
Let’s compare … Sample 1 SD = 15.8 Sample 2 SD = 1.58 How can I use this in my lab?
Error bars Error bars represent the variability of your data STANDARD DEVIATION range measurement uncertainties
Error bars On a bar graph, the bar represents the mean of your data and the error bars represent +/- 1 sd mean sd
Error bars On a line graph, the point represents the mean of your data and the error bars represent +/- 1 sd mean sd
t-test t-test determines statistical significance between 2 sample means Is the difference significant? Is the difference due to your variable?? Or is it random chance?? How valid is your data? t-test determines the probability that difference is due to random chance A p value (probability) of 0.05 (5%) shows a 5% chance of randomness, but a 95% chance of confidence … Key word!!!!! You want 95% or higher! your difference IS DUE TO YOUR VARIABLE
t-test For tests, you do NOT need to calculate t- values, but you must be able to read a t- chart!! For internal assessments, you may use calculators or excel to calculate t-values
This is the range you are hoping for The difference between your samples has a HIGH probability of being due to your variable (and not chance) Need to be able to calculate degrees of freedom
Calculating degrees of freedom df = (n 1 + n 2 ) - 2 Size of sample 1 Size of sample 2 # of samples
Calculating degrees of freedom df = (n 1 + n 2 ) – 2 Population 1 -10, 0, 10, 20, 30 n 1 = 5 Population 2 8, 9, 10, 11, 12 n 2 = 5 df = (5 + 5) -2 df = 8
Using the t-table If df = 8 and t = 3.5, is this a significant difference? Less than 1% probability difference in data is due to chance Therefore, greater than 99% probability difference in data is due to our variable
Other options, less commonly used in our class Median The middle #, when arranged in numeric order Sample 1 -10, 0, 10, 20, 30 Median = 10 Sample 2 8, 9, 10, 11, 12 Median = 10 Mode The # that occurs most often Sample 1 -10, 0, 10, 20, 30 No mode Sample 2 8, 9, 10, 11, 12 No mode
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Calculate the mean for both samples Sun = 130 cm Shade = 130 cm
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Calculate the range for both samples Sun = 58 cm Shade = 152 cm
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Calculate the median for both samples Sun = 126 cm Shade = 131 cm If even # of samples, find the average of the two middle numbers
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Calculate the mode for both samples Sun = 124 cm Shade = 131 cm
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Calculate the sd for both samples Sun = cm Shade = cm
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) Sun: sd = cm Low sd indicates even (close) distribution of data points More valid Shade: sd = cm High sd indicates wide spread of data points MAY indicate a problem with your experimental design
Some practice: looking at plant height Height in sun (cm)Height in shade (cm) If t = 1.5, is this a significant difference? No
Be careful: correlation vs. cause Observations (and carefully chosen data) may imply a CORRELATION, but does NOT necessarily demonstrate a cause The average global temperature has increased over the past 100 years. The number of pirates in the world has decreased over the past 100 years. Therefore, decreased number of pirates causes increased global temperatures
Be careful: correlation vs. cause no no !
Be careful: correlation vs. cause To discern a CAUSE, a valid EXPERIMENT must be done Other scientists must also be able to repeat your experiment
Last word … Remember, it is ALWAYS better to PROVE your experiment failed to support your hypothesis, than to lie about it being a success!!!
Any questions?