Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.

Similar presentations


Presentation on theme: "Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals."— Presentation transcript:

1 Data Freshman Clinic II

2 Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals n Error Bars n Student t test n Linear Regression n Applications

3 Populations and Samples n Population –All possible data points n Entire US population n Every rainfall event in Glassboro (past, present, and future) n Sample –Subset of population n We use samples to estimate population parameters

4 Presentation n Present clearly, objectively n Properly communicate uncertainty n Compare using valid statistics

5 Tables Table 1: Water Quality (average of 3 to 5 values)

6 Figures – Bar Chart 11 Figure 1: Average Turbidity of Pond Water, Treated and Untreated 20 10 7 5 1 11

7 Figures – XY Scatter Figure 2: Change in Water Quality

8 Central Tendency n Example: Turbidity of Treated Water (NTU) –Sample is 1, 3, 3, 6, 8, 10 n = 6 Mean = Sum of values divided by number of data points e.g., (1+3+3+6+8+10)/6 = 5.17 NTU e.g., (1+3+3+6+8+10)/6 = 5.17 NTU Median = The middle number Rank - 1 2 3 4 5 6 Number - 1 3 3 6 8 10 (ordered) For even number of sample points, average middle two For even number of sample points, average middle two e.g., (3+6)/2 = 4.5 e.g., (3+6)/2 = 4.5 For odd number of sample points, median = middle point For odd number of sample points, median = middle point

9 Variability n Standard deviation of a sample x i = i th data point = mean of sample n = number of data points e.g., (1-5.2) 2 +(3-5.2) 2 +(3-5.2) 2 +(6-5.2) 2 +(8-5.2) 2 +(10-5.2) 2 }/(6-1)] 0.5 [{(1-5.2) 2 +(3-5.2) 2 +(3-5.2) 2 +(6-5.2) 2 +(8-5.2) 2 +(10-5.2) 2 }/(6-1)] 0.5 = 3.43

10 Confidence Interval of Mean n Estimated range within which population mean falls –e.g., 95% confidence interval of mean, based on our sample, is (1.57    8.77) where  = population mean –We are 95% confident true mean of population (from which our sample was drawn) lies within this range n Confidence interval (CI) calculated from sample: Where = sample mean, t = statistical parameter related to confidence, s = sample standard deviation, and n = sample size

11 Calculating “t” n In Excel, type “=TINV” into a cell and select the “=“ symbol in the formula bar n The student’s t-distribution inverse formula palette pops up n “Probability” = 1 – confidence level (as a fraction) –e.g., if confidence level is 95%, “probability” = 1 - 0.95 = 0.05 n “Deg_freedom” = degrees of freedom = n - 1 n TINV returns “t”, the statistical parameter we need to estimate a confidence interval based on a sample

12 Calculating a Confidence Interval n For our example: –“TINV” returned 2.57 –t x s / sqrt(n) = 2.57 x 3.43 / sqrt(6) = 3.60 n 5.17 – 3.60 = 1.57 n 5.17 + 3.60 = 8.77 –CI: (1.57    8.77) with 95% confidence n i.e., we are 95% confident the population mean lies between 1.57 and 8.77 n Quite Wide! –Lower “s” or higher “n” will narrow range

13 Error Bars n Used to show data variability on a graph n Bar chart, XY,…

14 Types of Error Bars n Standard Error of Mean n Confidence Interval n Standard Deviation n Percentage Standard Error http://www.graphpad.com/articles/errorbars.htm

15 Adding Error Bars 5. Select + and – error bar data. This could be standard deviation, standard error, or confidence limits. 4. Select “custom” 1. Create chart in Excel 2. Select a data series by selecting a data point or bar 3. From “Format” menu, select “Selected data series…”

16 Error Bars and our Example n Standard Error of Mean n s / sqrt(n) = 3.43 / sqrt(6) = 1.40 n Put 1.40 in + and - cells n Since the mean = 5.17, the error bars in a bar chart would go from –5.17 – 1.40 = 3.77 to –5.17 + 1.40 = 6.57

17 Interpreting Error Bars n Error bars can be used to compare two sample means n Standard Error (SE) –SE bars do not overlap, no conclusions can be drawn –SE bars overlap, sample appear to be not drawn from significantly different populations n Confidence Interval (CI) –CI bars do not overlap, samples appear to be drawn from significantly different populations, at confidence level of confidence interval –CI bars overlap, no conclusions can be drawn http://www.graphpad.com/articles/errorbars.htm

18 Comparing Samples with a t-test n Example - You measure untreated and treated pond water –Treated: mean = 2 NTU, s = 0.5 NTU, n = 20 –Untreated: mean = 3 NTU, s = 0.6 NTU, n = 20 n You ask the question – Is the average turbidity of treated water different from that of untreated water? –Use a t-test

19 Is the water different? n Use TTEST (Excel) n Probability (as fraction) of being wrong if you claim statistically significant difference (type I error) –Select significance level ahead of time, usually 0.01 - 0.1 –For our example, our #, 0.0000015, is very small

20 T test steps 1.Identify two samples to compare 2.Select , significance of statistical test –We’ll use 0.05 in this class –Confidence = 1 -  3.Use Excel “TTEST” formula to estimate probability of Type I Error 4.If probability returned by TTEST is less than or equal to 0.05, assume the samples come from two different populations For our example, 0.0000015 < 0.05, assume the treated water is different from the untreated water

21 Linear Regression n Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to show equation and R 2.

22 R 2 - Coefficient of multiple Determination = Predicted y values, from regression equation = Average of y y i = Observed y values R 2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line

23 What might you do in this class? n Flow rate versus stroke rate –Figure with linear regression over linear range n Ability to improve water quality –Table and t-test comparison with untreated water (for turbidity and apparent color), or –Bar chart (for turbidity and apparent color) with confidence interval error bars n Pressure change versus flow rate, Power versus flowrate –Figure (no statistics possible because we only took one reading of pressure for each flow rate and relationship is non-linear) n Force versus stroke rate, –Figure w/95% confidence interval error bars for each data point n Power versus Flowrate –Figure

24 Example – Water Quality Table 2: Improvement in Water Quality Note: Statistical significance tested at level = 0.05 using t-test


Download ppt "Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals."

Similar presentations


Ads by Google