Statistics: Data Analysis and Presentation Fr Clinic II.

Slides:



Advertisements
Similar presentations
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Advertisements

Correlation and Regression
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Lecture 5 Regression. Homework Issues…past 1.Bad Objective: Conduct an experiment because I have to for this class 2.Commas – ugh  3.Do not write out.
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Chapter 10 Simple Regression.
Data Freshman Clinic II. Overview n Populations and Samples n Presentation n Tables and Figures n Central Tendency n Variability n Confidence Intervals.
The Simple Regression Model
The Basics of Regression continued
Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.
Statistics: Data Presentation & Analysis Fr Clinic I.
Chapter Topics Types of Regression Models
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Dr. Hong Zhang.  Tables and Graphs  Populations and Samples  Mean, Median, and Standard Deviation  Standard Error & 95% Confidence Interval (CI) 
Simple Linear Regression Analysis
Hydrologic Statistics
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistical Analysis Topic – Math skills requirements.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Copyright © 2005 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics Thomas Maurice eighth edition Chapter 4.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
10B11PD311 Economics REGRESSION ANALYSIS. 10B11PD311 Economics Regression Techniques and Demand Estimation Some important questions before a firm are.
Statistics in Biology. Histogram Shows continuous data – Data within a particular range.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
CHEMISTRY ANALYTICAL CHEMISTRY Fall Lecture 6.
PCB 3043L - General Ecology Data Analysis.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Data Analysis, Presentation, and Statistics
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
Chapter 4 Exploring Chemical Analysis, Harris
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Freshman Engineering Clinic II
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Basic Estimation Techniques
PCB 3043L - General Ecology Data Analysis.
Lecture Slides Elementary Statistics Thirteenth Edition
Basic Estimation Techniques
Correlation and Regression
Descriptive and Inferential
Simple Linear Regression
Simple Linear Regression
Simple Linear Regression
SIMPLE LINEAR REGRESSION
15.1 The Role of Statistics in the Research Process
HIMS 650 Homework set 5 Putting it all together
Presentation transcript:

Statistics: Data Analysis and Presentation Fr Clinic II

Overview n Tables and Graphs n Populations and Samples n Mean, Median, and Standard Deviation n Standard Error & 95% Confidence Interval (CI) n Error Bars n Comparing Means of Two Data Sets n Linear Regression (LR)

Warning n Statistics is a huge field, I’ve simplified considerably here. For example: –Mean, Median, and Standard Deviation n There are alternative formulas –Standard Error and the 95% Confidence Interval n There are other ways to calculate CIs (e.g., z statistic instead of t; difference between two means, rather than single mean…) –Error Bars n Don’t go beyond the interpretations I give here! –Comparing Means of Two Data Sets n We just cover the t test for two means when the variances are unknown but equal, there are other tests –Linear Regression n We only look at simple LR and only calculate the intercept, slope and R 2. There is much more to LR!

Tables Table 1: Average Turbidity and Color of Water Treated by Portable Water Filters Consistent Format, Title, Units, Big Fonts Differentiate Headings, Number Columns

Figures 11 Figure 1: Turbidity of Pond Water, Treated and Untreated Consistent Format, Title, Units Good Axis Titles, Big Fonts

Populations and Samples n Population –All of the possible outcomes of experiment or observation n US population n Particular type of steel beam n Sample –A finite number of outcomes measured or observations made n 1000 US citizens n 5 beams n We use samples to estimate population properties –Mean, Variability (e.g. standard deviation), Distribution n Height of 1000 US citizens used to estimate mean of US population

Mean and Median n Turbidity of Treated Water (NTU) Mean = Sum of values divided by number of samples = ( )/6 = 5.2 NTU Median = The middle number Rank Number For even number of sample points, average middle two = (3+6)/2 = 4.5 = (3+6)/2 = Excel: Mean – AVERAGE; Median - MEDIAN

Variance n Measure of variability –sum of the square of the deviation about the mean divided by degrees of freedom n = number of data points Excel: variance – VAR

n Square-root of the variance n For phenomena following a Normal Distribution (bell curve), 95% of population values lie within 1.96 standard deviations of the mean n Area under curve is probability of getting value within specified range Standard Deviation, s % Standard Deviations from Mean Excel: standard deviation – STDEV

n Standard error of mean –Of sample of size n –taken from population with standard deviation s –Estimate of mean depends on sample selected –As n , variance of mean estimate goes down, i.e., estimate of population mean improves –As n , mean estimate distribution approaches normal, regardless of population distribution Standard Error of Mean

n Interval within which we are 95 % confident the true mean lies n t 95%,n-1 is t-statistic for 95% CI if sample size = n –If n  30, let t 95%,n-1 = 1.96 (Normal Distribution) –Otherwise, use Excel formula: TINV(0.05,n-1) n n = number of data points 95% Confidence Interval (CI) for Mean

n Show data variability on plot of mean values n Types of error bars include: n ± Standard Deviation, ± Standard Error, ± 95% CI n Maximum and minimum value Error Bars

n Standard Deviation –Demonstrates data variability, but no comparison possible n Standard Error –If bars overlap, any difference in means is not statistically significant –If bars do not overlap, indicates nothing! n 95% Confidence Interval –If bars overlap, indicates nothing! –If bars do not overlap, difference is statistically significant n We’ll use 95 % CI Using Error Bars to compare data

Example 1 Create Bar Chart of Name vs Mean. Right click on data. Select “Format Data Series”.

Example 2

What can we do? n Plot mean water quality data for various filters with error bars n Plot mean water quality over time with error bars

Comparing Filter Performance n Use t test to determine if the mean of two populations are different. –Based on two data sets n E.g., turbidity produced by two different filters

Comparing Two Data Sets using the t test n Example - You pump 20 gallons of water through filter 1 and 2. After every gallon, you measure the turbidity. –Filter 1: Mean = 2 NTU, s = 0.5 NTU, n = 20 –Filter 2: Mean = 3 NTU, s = 0.6 NTU, n = 20 n You ask the question - Do the Filters make water with a different mean turbidity?

Do the Filters make different water? n Use TTEST (Excel) n Fractional probability of being wrong if you answer yes –We want probability to be small  0.01 to 0.10 (1 to 10 %). Use 0.01

“t test” Questions n Do two filters make different water? –Take multiple measurements of a particular water quality parameter for 2 filters n Do two filters treat difference amounts of water between cleanings? –Measure amount of water filtered between cleanings for two filters n Does the amount of water a filter treats between cleaning differ after a certain amount of water is treated? –For a single filter, measure the amount of water treated between cleanings before and after a certain total amount of water is treated

Linear Regression n Fit the best straight line to a data set Right-click on data point and use “trendline” option. Use “options” tab to get equation and R 2.

R 2 - Coefficient of multiple Determination ŷ i = Predicted y values, from regression equation y i = Observed y values R 2 = fraction of variance explained by regression (variance = standard deviation squared) = 1 if data lies along a straight line