Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.

Slides:



Advertisements
Similar presentations
Computer Programming (TKK-2144) 13/14 Semester 1 Instructor: Rama Oktavian Office Hr.: T.12-14, Th
Advertisements

Objectives 10.1 Simple linear regression
Hypothesis Testing Steps in Hypothesis Testing:
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Lecture 5 Regression. Homework Issues…past 1.Bad Objective: Conduct an experiment because I have to for this class 2.Commas – ugh  3.Do not write out.
Statistics: Data Analysis and Presentation Fr Clinic II.
The Simple Regression Model
Statistics: Data Presentation & Analysis Fr Clinic I.
Simple Linear Regression Analysis
Introduction to Probability and Statistics Linear Regression and Correlation.
Social Research Methods
Standard Error for AP Biology
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Simple Linear Regression Analysis
Standard error of estimate & Confidence interval.
Hydrologic Statistics
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Correlation and Linear Regression
Statistics for clinical research An introductory course.
Section #6 November 13 th 2009 Regression. First, Review Scatter Plots A scatter plot (x, y) x y A scatter plot is a graph of the ordered pairs (x, y)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Simple Linear Regression Models
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
POPULATION DYNAMICS Required background knowledge:
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) Each has some error or uncertainty.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Statistical estimation, confidence intervals
General Statistics Ch En 475 Unit Operations. Quantifying variables (i.e. answering a question with a number) 1. Directly measure the variable. - referred.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 2 – Slide 1 of 27 Chapter 3 Section 2 Measures of Dispersion.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
Determination of Sample Size: A Review of Statistical Theory
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
Chapter Eight: Using Statistics to Answer Questions.
PCB 3043L - General Ecology Data Analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Data Analysis, Presentation, and Statistics
Regression Analysis Deterministic model No chance of an error in calculating y for a given x Probabilistic model chance of an error First order linear.
Measurements and Their Analysis. Introduction Note that in this chapter, we are talking about multiple measurements of the same quantity Numerical analysis.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Freshman Engineering Clinic II
Statistical analysis.
Regression Analysis AGEC 784.
AP Biology Intro to Statistics
Basic Estimation Techniques
Statistical analysis.
PCB 3043L - General Ecology Data Analysis.
Basic Estimation Techniques
Correlation and Regression
BA 275 Quantitative Business Methods
Using Statistics in Biology
Using Statistics in Biology
Statistics in Biology.
STATISTICS Topic 1 IB Biology Miss Werba.
Correlation and Regression
15.1 The Role of Statistics in the Research Process
HIMS 650 Homework set 5 Putting it all together
Presentation transcript:

Data Freshman Clinic II

Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals n Error Bars n Student t test n Linear Regression n Applications

Populations and Samples n Population –All possible data points n Entire US population n Every rainfall event in Glassboro (past, present, and future) n Sample –Subset of population n We use samples to estimate population parameters

Presentation n Present clearly, objectively n Properly communicate uncertainty n Compare using valid statistics

Tables Table 1: Water Quality (average of 3 to 5 values)

Figures – Bar Chart 11 Figure 1: Average Turbidity of Pond Water, Treated and Untreated

Figures – XY Scatter Figure 2: Change in Water Quality

Central Tendency n Example: Turbidity of Treated Water (NTU) –Sample is 1, 3, 3, 6, 8, 10 n = 6 Mean = Sum of values divided by number of data points e.g., ( )/6 = 5.17 NTU e.g., ( )/6 = 5.17 NTU Median = The middle number Rank Number (ordered) For even number of sample points, average middle two For even number of sample points, average middle two e.g., (3+6)/2 = 4.5 e.g., (3+6)/2 = 4.5 For odd number of sample points, median = middle point For odd number of sample points, median = middle point

Variability n Standard deviation of a sample x i = i th data point = mean of sample n = number of data points e.g., (1-5.2) 2 +(3-5.2) 2 +(3-5.2) 2 +(6-5.2) 2 +(8-5.2) 2 +(10-5.2) 2 }/(6-1)] 0.5 [{(1-5.2) 2 +(3-5.2) 2 +(3-5.2) 2 +(6-5.2) 2 +(8-5.2) 2 +(10-5.2) 2 }/(6-1)] 0.5 = 3.43

Confidence Interval of Mean n Estimated range within which population mean falls –e.g., 95% confidence interval of mean, based on our sample, is (1.57    8.77) where  = population mean –We are 95% confident true mean of population (from which our sample was drawn) lies within this range n Confidence interval (CI) calculated from sample: Where = sample mean, t = statistical parameter related to confidence, s = sample standard deviation, and n = sample size

Calculating “t” n In Excel, type “=TINV” into a cell and select the “=“ symbol in the formula bar n The student’s t-distribution inverse formula palette pops up n “Probability” = 1 – confidence level (as a fraction) –e.g., if confidence level is 95%, “probability” = = 0.05 n “Deg_freedom” = degrees of freedom = n - 1 n TINV returns “t”, the statistical parameter we need to estimate a confidence interval based on a sample

Calculating a Confidence Interval n For our example: –“TINV” returned 2.57 –t x s / sqrt(n) = 2.57 x 3.43 / sqrt(6) = 3.60 n 5.17 – 3.60 = 1.57 n = 8.77 –CI: (1.57    8.77) with 95% confidence n i.e., we are 95% confident the population mean lies between 1.57 and 8.77 n Quite Wide! –Lower “s” or higher “n” will narrow range

Error Bars n Used to show data variability on a graph n Bar chart, XY,…

Types of Error Bars n Standard Error of Mean n Confidence Interval n Standard Deviation n Percentage Standard Error

Adding Error Bars 5. Select + and – error bar data. This could be standard deviation, standard error, or confidence limits. 4. Select “custom” 1. Create chart in Excel 2. Select a data series by selecting a data point or bar 3. From “Format” menu, select “Selected data series…”

Error Bars and our Example n Standard Error of Mean n s / sqrt(n) = 3.43 / sqrt(6) = 1.40 n Put 1.40 in + and - cells n Since the mean = 5.17, the error bars in a bar chart would go from –5.17 – 1.40 = 3.77 to – = 6.57

Interpreting Error Bars n Error bars can be used to compare two sample means n Standard Error (SE) –SE bars do not overlap, no conclusions can be drawn –SE bars overlap, sample appear to be not drawn from significantly different populations n Confidence Interval (CI) –CI bars do not overlap, samples appear to be drawn from significantly different populations, at confidence level of confidence interval –CI bars overlap, no conclusions can be drawn

Comparing Samples with a t-test n Example - You measure untreated and treated pond water –Treated: mean = 2 NTU, s = 0.5 NTU, n = 20 –Untreated: mean = 3 NTU, s = 0.6 NTU, n = 20 n You ask the question – Is the average turbidity of treated water different from that of untreated water? –Use a t-test

Is the water different? n Use TTEST (Excel) n Probability (as fraction) of being wrong if you claim statistically significant difference (type I error) –Select significance level ahead of time, usually –For our example, our #, , is very small

T test steps 1.Identify two samples to compare 2.Select , significance of statistical test –We’ll use 0.05 in this class –Confidence = 1 -  3.Use Excel “TTEST” formula to estimate probability of Type I Error 4.If probability returned by TTEST is less than or equal to 0.05, assume the samples come from two different populations For our example, < 0.05, assume the treated water is different from the untreated water

Linear Regression n Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to show equation and R 2.

R 2 - Coefficient of multiple Determination = Predicted y values, from regression equation = Average of y y i = Observed y values R 2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line

What might you do in this class? n Flow rate versus stroke rate –Figure with linear regression over linear range n Ability to improve water quality –Table and t-test comparison with untreated water (for turbidity and apparent color), or –Bar chart (for turbidity and apparent color) with confidence interval error bars n Pressure change versus flow rate, Power versus flowrate –Figure (no statistics possible because we only took one reading of pressure for each flow rate and relationship is non-linear) n Force versus stroke rate, –Figure w/95% confidence interval error bars for each data point n Power versus Flowrate –Figure

Example – Water Quality Table 2: Improvement in Water Quality Note: Statistical significance tested at level = 0.05 using t-test