Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Data Analysis and Presentation Fr Clinic II.

Similar presentations


Presentation on theme: "Statistics: Data Analysis and Presentation Fr Clinic II."— Presentation transcript:

1

2 Statistics: Data Analysis and Presentation Fr Clinic II

3 Overview n Tables and Graphs n Populations and Samples n Mean, Median, and Standard Deviation n Standard Error & 95% Confidence Interval (CI) n Error Bars n Comparing Means of Two Data Sets n Linear Regression (LR)

4 Warning n Statistics is a huge field, I’ve simplified considerably here. For example: –Mean, Median, and Standard Deviation n There are alternative formulas –Standard Error and the 95% Confidence Interval n There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) –Error Bars n Don’t go beyond the interpretations I give here! –Comparing Means of Two Data Sets n We just cover the t test for two means when the variances are unknown but equal, there are other tests –Linear Regression n We only look at simple LR and only calculate the intercept, slope and R 2. There is much more to LR!

5 Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns 4 5 12

6 Figures 11 Figure 1: Turbidity of Pond Water, Treated and Untreated 20 10 7 5 1 11 Consistent Format, Title, Units Good Axis Titles, Big Fonts

7 Populations and Samples n Population –All of the possible outcomes of experiment or observation n US population n Particular type of steel beam n Sample –A finite number of outcomes measured or observations made n 1000 US citizens n 5 beams n We use samples to estimate population properties –Mean, Variability (e.g. standard deviation), Distribution n Height of 1000 US citizens used to estimate mean of US population

8 Mean and Median n Turbidity of Treated Water (NTU) Mean = Sum of values divided by number of samples = (1+3+3+6+8+10)/6 = 5.2 NTU Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 For even number of sample points, average middle two = (3+6)/2 = 4.5 = (3+6)/2 = 4.5 1 3 6 8 10 Excel: Mean – AVERAGE; Median - MEDIAN

9 Variance n Measure of variability –sum of the square of the deviation about the mean divided by degrees of freedom n = number of data points Excel: variance – VAR

10 n Square-root of the variance n For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean n Area under curve is probability of getting value within specified range Standard Deviation, s -1.961.96 95% Standard Deviations from Mean Excel: standard deviation – STDEV

11 n Standard error of mean –Of sample of size n –taken from population with standard deviation s –Estimate of mean depends on sample selected –As n , variance of mean estimate goes down, i.e., estimate of population mean improves –As n , mean estimate distribution approaches normal, regardless of population distribution Standard Error of Mean

12 n Interval within which we are 95 % confident the true mean lies n t 95%,n-1 is t-statistic for 95% CI if sample size = n –If n  30, let t 95%,n-1 = 1.96 (Normal Distribution) –Otherwise, use Excel formula: TINV(0.05,n-1) n n = number of data points 95% Confidence Interval (CI) for Mean

13 n Show data variability on plot of mean values n Types of error bars include: n ± Standard Deviation, ± Standard Error, ± 95% CI n Maximum and minimum value Error Bars

14 n Standard Deviation –Demonstrates data variability, but no comparison possible n Standard Error –If bars overlap, any difference in means is not statistically significant –If bars do not overlap, indicates nothing! n 95% Confidence Interval –If bars overlap, indicates nothing! –If bars do not overlap, difference is statistically significant n We’ll use 95 % CI Using Error Bars to compare data

15 Example 1 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.

16 Example 2

17 What can we do? n Plot mean water quality data for various filters with error bars n Plot mean water quality over time with error bars

18 Comparing Filter Performance n Use t test to determine if the mean of two populations are different. –Based on two data sets n E.g., turbidity produced by two different filters

19 Comparing Two Data Sets using the t test n Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity. –Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20 –Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20 n You ask the question - Do the Filters make water with a different mean turbidity?

20 Do the Filters make different water? n Use TTEST (Excel) n Fractional probability of being wrong if you answer yes –We want probability to be small  0.01 to 0.10 (1 to 10 %). Use 0.01

21 “t test” Questions n Do two filters make different water? –Take multiple measurements of a particular water quality parameter for 2 filters n Do two filters treat difference amounts of water between cleanings? –Measure amount of water filtered between cleanings for two filters n Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated? –For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated

22 Linear Regression n Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to get equation and R 2.

23 R 2 - Coefficient of multiple Determination ŷ i = Predicted y values, from regression equation y i = Observed y values R 2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line


Download ppt "Statistics: Data Analysis and Presentation Fr Clinic II."

Similar presentations


Ads by Google