Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.

Slides:



Advertisements
Similar presentations
Using R Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Advertisements

Comparing One Sample to its Population
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Chap 8-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 8 Estimation: Single Population Statistics for Business and Economics.
Statistics for Business and Economics
Proportion Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
PSY 307 – Statistics for the Behavioral Sciences
Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Point estimation, interval estimation
The Normal Distribution. n = 20,290  =  = Population.
Final Jeopardy $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 LosingConfidenceLosingConfidenceTesting.
Topic 2: Statistical Concepts and Market Returns
Overview of Lecture Parametric Analysis is used for
T-test.
Hypothesis Testing for Population Means and Proportions
Chapter 8 Estimation: Single Population
Chapter 2 Simple Comparative Experiments
Chapter 7 Estimation: Single Population
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Chapter 11: Inference for Distributions
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Contrasts Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Hypothesis Testing Using The One-Sample t-Test
Chapter 12 Section 1 Inference for Linear Regression.
Looking at differences: parametric and non-parametric tests
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Chapter 13 – 1 Chapter 12: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Errors Testing the difference between two.
II.Simple Regression B. Hypothesis Testing Calculate t-ratios and confidence intervals for b 1 and b 2. Test the significance of b 1 and b 2 with: T-ratios.
T-test Mechanics. Z-score If we know the population mean and standard deviation, for any value of X we can compute a z-score Z-score tells us how far.
Comparing Two Samples Harry R. Erwin, PhD
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Analysis of Covariance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Random Sampling, Point Estimation and Maximum Likelihood.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
One Sample Inf-1 If sample came from a normal distribution, t has a t-distribution with n-1 degrees of freedom. 1)Symmetric about 0. 2)Looks like a standard.
Central Tendency Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
1 1 Slide Simple Linear Regression Coefficient of Determination Chapter 14 BA 303 – Spring 2011.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Chapter 12 Tests of a Single Mean When σ is Unknown.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
Introduction to Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 12 Inference About A Population.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
© Copyright McGraw-Hill 2000
Two-Sample Hypothesis Testing. Suppose you want to know if two populations have the same mean or, equivalently, if the difference between the population.
1 9 Tests of Hypotheses for a Single Sample. © John Wiley & Sons, Inc. Applied Statistics and Probability for Engineers, by Montgomery and Runger. 9-1.
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
I271B The t distribution and the independent sample t-test.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Binary Response Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
© Copyright McGraw-Hill 2004
T tests comparing two means t tests comparing two means.
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Independent Samples: Comparing Means Lecture 39 Section 11.4 Fri, Apr 1, 2005.
Statistics in Applied Science and Technology
Chapter 9 Hypothesis Testing.
Statistical Process Control
Presentation transcript:

Variance Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

Resources Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Gentle, JE (2002) Elements of Computational Statistics. Springer. Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

Measure of Variability This is the most important quantity in statistical analysis. Data may show no central tendency, but they almost always show variation. The greater the variability, –the greater the uncertainty about parameters estimated from the data. –the lower our ability to distinguish between competing hypotheses.

Typical Measures of Variability The range—depends only on outlying values. Sum of the differences between the data and the mean—useless, because it is zero by definition. Sum of the absolute values of the differences between the data and the mean—hard to use, although a very good measure. Sum of the squares of the differences between the data and the mean—most often used. Divide by the number of data points to get the mean squared deviation. Variance is slightly different again….

Degrees of Freedom Suppose you have n data points, v. The mean, m(v), is the sum of the data point values, divided by n (the number of independent pieces of information). Suppose you know n-1 of the data point values and the mean. What is the value of the remaining point? Hence, an estimate involving the data points and the mean actually has only n-1 independent pieces of information. The degrees of freedom of the estimate are n-1. Definition: the degrees of freedom of an estimate, df, is the sample size, n, minus the number of parameters in the estimate p already estimated from the data.

Variance If you have n points of data, v, from an unknown distribution, and you want to compute an estimate of its variability, use the following equation: variance = s 2 = (sum of squares)/(n-1) Note you divide by n-1. This is the df for the sample variance. (The sum of squares uses the sample mean.) If you know the mean, you can divide by n, but if all you have are the sample data, dividing by n will be too small (biased low).

Variance and Sample Size sample variance is not well behaved. The number of data points, n, affects the value of the variance estimated. For a small number of points, the variance estimate varies a lot. It still can vary by about a factor of three for 30 points. Rules of thumb: –You want a large number of independent data points if you need to estimate the variance. –Less than 10 sample points is a very small sample. –Less than 30 points is a small sample. –30 points is a reasonable sample.

Measures of Unreliability Given a sample variance (s 2 ), how much will the estimate of the mean vary with different samples? This is known as the standard error of the mean: SE y = √(s 2 /n) Note that the central limit theorem implies that the estimate of the mean will converge to a normal distribution as n increases. You can use this fact to derive a confidence interval for your estimate of the mean. (n of 30+ allows the normal distribution to be used.)

Small Sample Confidence Intervals For n<30, you can’t assume the normal distribution applies. Instead, you usually use Student’s t-distribution, which incorporates the degrees of freedom of the sample. You can also use bootstrap methods (advanced).

Confidence Intervals Three ways of generating a confidence interval for an estimate: –Assume a normal distribution. (You need lots of samples). –Assume a  2 distribution. (Less samples) –Bootstrapping (makes the fewest assumptions, computationally demanding) Demonstration (advanced)

R Demonstrations of all this… From the book. ozone<-read.table("gardens.txt”,header=T) attach(ozone) ozone

Ozone Data Frame gardenA gardenB gardenC

Continued mean(gardenA) 3 mean(gardenB) 5 mean(gardenC) 5 Are gardenB and gardenC distinguishable?

Continued Further var(gardenA) var(gardenB) var(gardenC) gardenA and gardenB have the same variance, gardenC does not!

Apply var.test var.test(gardenB,gardenC) F test to compare two variances data: gardenB and gardenC F = , num df = 9, denom df = 9, p-value = alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval: sample estimates: ratio of variances

Implications Since gardenA and gardenB have the same variance, you can use the t.test to compare means and conclude they are significantly different. Since gardenC has a different variance, you cannot use the t.test, and must use something weaker to compare their means.

Application of t.test to gardenA and gardenB t.test(gardenA,gardenB) Welch Two Sample t-test data: gardenA and gardenB t = , df = 18, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y 3 5

Application of t.test to gardenA and gardenC t.test(gardenA,gardenC) Welch Two Sample t-test data: gardenA and gardenC t = , df = , p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of x mean of y 3 5