Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability.

Slides:



Advertisements
Similar presentations
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
Advertisements

1 1 Slide © 2003 South-Western /Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Analysis of Variance (ANOVA) ANOVA can be used to test for the equality of three or more population means We want to use the sample results to test the.
1 Chapter 10 Comparisons Involving Means  1 =  2 ? ANOVA Estimation of the Difference between the Means of Two Populations: Independent Samples Hypothesis.
1 1 Slide Slides by JOHN LOUCKS St. Edward’s University.
Chapter 10 Comparisons Involving Means
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2009, Econ-2030 Applied Statistics-Dr Tadesse Chapter 10: Comparisons Involving Means n Introduction to Analysis of Variance n Analysis of.
Statistics Are Fun! Analysis of Variance
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
1 Pertemuan 10 Analisis Ragam (Varians) - 1 Matakuliah: I0262 – Statistik Probabilitas Tahun: 2007 Versi: Revisi.
Inferences About Process Quality
Chapter 12: Analysis of Variance
1 Chapter 11 – Test for the Equality of k Population Means nRejection Rule where the value of F  is based on an F distribution with k - 1 numerator d.f.
1 1 Slide © 2005 Thomson/South-Western AK/ECON 3480 M & N WINTER 2006 n Power Point Presentation n Professor Ying Kong School of Analytic Studies and Information.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
Statistics Design of Experiment.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide 統計學 Spring 2004 授課教師:統計系余清祥 日期: 2004 年 3 月 30 日 第八週:變異數分析與實驗設計.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2006 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 Tests with two+ groups We have examined tests of means for a single group, and for a difference if we have a matched sample (as in husbands and wives)
1 1 Slide © 2005 Thomson/South-Western Chapter 13, Part A Analysis of Variance and Experimental Design n Introduction to Analysis of Variance n Analysis.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 13 Experimental Design and Analysis of Variance nIntroduction to Experimental Design.
1 1 Slide Analysis of Variance Chapter 13 BA 303.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
ANOVA (Analysis of Variance) by Aziza Munir
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Basic concept of statistics Measures of central Measures of central tendency Measures of dispersion & variability.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics S eventh Edition By Brase and Brase Prepared by: Lynn Smith.
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Econ 3790: Business and Economic Statistics Instructor: Yogesh Uppal
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
Inferential Statistics Inferential statistics: The part of statistics that allows researchers to generalize their findings beyond data collected. Statistical.
© 2006 by Thomson Learning, a division of Thomson Asia Pte Ltd.. 1 Slide Slide Slides Prepared by Juei-Chao Chen Fu Jen Catholic University Slides Prepared.
ANalysis Of VAriance can be used to test for the equality of three or more population means. H 0 :  1  =  2  =  3  = ... =  k H a : Not all population.
1/54 Statistics Analysis of Variance. 2/54 Statistics in practice Introduction to Analysis of Variance Analysis of Variance: Testing for the Equality.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
CHAPTER 3 Analysis of Variance (ANOVA) PART 2 =TWO- WAY ANOVA WITHOUT REPLICATION.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 Pertemuan 19 Analisis Varians Klasifikasi Satu Arah Matakuliah: I Statistika Tahun: 2008 Versi: Revisi.
Rancangan Acak Lengkap ( Analisis Varians Klasifikasi Satu Arah) Pertemuan 16 Matakuliah: I0184 – Teori Statistika II Tahun: 2009.
Chapter 13 Analysis of Variance (ANOVA). ANOVA can be used to test for differences between three or more means. The hypotheses for an ANOVA are always:
Chapter 11 – Test for the Equality of k
Pertemuan 17 Analisis Varians Klasifikasi Satu Arah
CHAPTER 3 Analysis of Variance (ANOVA) PART 1
CHAPTER 3 Analysis of Variance (ANOVA)
Statistics Analysis of Variance.
Statistics for Business and Economics (13e)
Econ 3790: Business and Economic Statistics
Chapter 11 Inferences About Population Variances
One-Way Analysis of Variance
Hypothesis Tests for a Standard Deviation
Chapter 10 – Part II Analysis of Variance
Presentation transcript:

Basic concept Measures of central tendency Measures of central tendency Measures of dispersion & variability

Measures of tendency central Arithmetic mean (= simple average ) summation measurement in population index of measurement Best estimate of population mean is the sample mean, X sample size

Measures of variability All describe how “spread out” the data are. All describe how “spread out” the data are. 1.Sum of squares, sum of squared deviations from the mean For a sample,

2. Average or mean sum of squares = variance, s 2 : For a sample, Why?

n – 1 represents the degrees of freedom,, or number of independent quantities in the estimate s 2. n – 1 represents the degrees of freedom,, or number of independent quantities in the estimate s 2. therefore, once n – 1 of all deviations are specified, the last deviation is already determined. Greek letter “nu”

3.Standard deviation, s For a sample, Variance has squared measurement units – to regain original units, take the square root …

4.Standard error of the mean For a sample, Standard error of the mean is a measure of variability among the means of repeated samples from a population. Standard error of the mean is a measure of variability among the means of repeated samples from a population.

Means of repeated random samples, each with sample size, n = 5 values … A Population of Values

For a large enough number of large samples, the frequency distribution of the sample means (= sampling distribution), approaches a normal distribution. For a large enough number of large samples, the frequency distribution of the sample means (= sampling distribution), approaches a normal distribution.

Normal distribution: bell-shaped curve

Testing statistical hypotheses between 2 means Testing statistical hypotheses between 2 means 1.State the research question in terms of statistical hypotheses. We always start with a statement that hypothesizes “no difference”, called the null hypothesis = H 0.  E.g., H 0 : Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds, µ  =µ .

Then we formulate a statement that must be true if the null hypothesis is false, called the alternate hypothesis = H A. Then we formulate a statement that must be true if the null hypothesis is false, called the alternate hypothesis = H A.  E.g., H A : Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds, µ   µ . If we reject H 0 as a result of sample evidence, then we conclude that H A is true.

2.Choose an appropriate statistical test that would allow you to reject H 0 if H 0 were false.

E.g., Student’s t test for hypotheses about means William Sealey Gosset (a.k.a. “Student”)

Is the difference between sample means bigger than we would expect, given the variability in the sampled populations?

Standard error of the difference between the sample means To estimate s (X 1 —X 2 ), we must first know the relation between both populations. Mean of sample 2 Mean of sample 1 t Statistic,

Relation between populations Dependent population Dependent population Independent population Independent population 1.Identical (homogenous ) variance 2.Not identical (heterogeneous) variance

Pooled variance: Pooled variance: Then, Independent Population with homogenous variances

3.Select the level of significance for the statistical test. Level of significance = alpha value =  = the probability of incorrectly rejecting the null hypothesis when it is, in fact, true.

Traditionally, researchers choose  = Traditionally, researchers choose  = percent of the time, or 1 time out of 20, the statistical test will reject H 0 when it is true. Note: the choice of 0.05 is arbitrary!

4.Determine the critical value the test statistic must attain to be declared significant. Most test statistics have a frequency distribution …

When sample sizes are small, the sampling distribution is described better by the t distribution than by the standard normal (Z) distribution. When sample sizes are small, the sampling distribution is described better by the t distribution than by the standard normal (Z) distribution. Shape of t distribution depends on degrees of freedom, = n – 1.

Z = t ( =  ) t ( =25) t ( =1) t ( =5) t

The distribution of a test statistic is divided into an area of acceptance and an area of rejection. The distribution of a test statistic is divided into an area of acceptance and an area of rejection.

t Area of Rejection Area of Acceptance Area of Rejection Lower critical value Upper critical value For  = 0.05

5.Perform the statistical test.  E.g., Mean bill length from a sample of 5 female hummingbirds, X 1 = 15.75;  Mean bill length from a sample of 5 male hummingbirds, X 2 = 14.25; 

6.Draw and state the conclusions. Compare the calculated test statistic with the critical test statistic at the chosen . Obtain the P-value = probability for the test statistic. Reject or fail to reject H 0.

Critical t for a test about equality = t  (2),

 E.g., to test H 0 : µ  = µ , H A : µ   µ  at  = 0.05 using n  = 5, n  = 5, , if |t|  2.306, reject H 0. t  (2), = t 0.05(2),8 =

 Since calculated t > t 0.05(2),8 (because > 2.306), reject H 0.  Conclude that hummingbird bill length is sexually size-dimorphic.

 What is the probability, P, of observing by chance a difference as large as we saw between female and male hummingbird bill lengths?  0.01 < P < 0.02

Analysis of Variance Analysis of Variance(ANOVA)

What is ANOVA? ANOVA (Analysis of Variance) is a procedure designed to determine if the manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable. ANOVA (Analysis of Variance) is a procedure designed to determine if the manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable. It is assumed It is assumed Each independent variable is categorical (nominal scale). Independent variables are called Factors and their values are called levels.Each independent variable is categorical (nominal scale). Independent variables are called Factors and their values are called levels. The dependent variable is numerical (ratio scale)The dependent variable is numerical (ratio scale) The basic idea is that the “variance” of the dependent variable given the influence of one or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable (assuming no influence of the independent variables) {also known as the Mean-Square- Error(MSE)}. The basic idea is that the “variance” of the dependent variable given the influence of one or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable (assuming no influence of the independent variables) {also known as the Mean-Square- Error(MSE)}.

Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies. Analysis of Variance (ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies. We want to use the sample results to test the following hypotheses. We want to use the sample results to test the following hypotheses. H 0 :  1 = 2 = 3 =... =  k  H a : Not all population means are equal If H 0 is rejected, we cannot conclude that all population means are different. If H 0 is rejected, we cannot conclude that all population means are different. Rejecting H 0 means that at least two population means have different values. Rejecting H 0 means that at least two population means have different values. Analysis of Variance

Assumptions for Analysis of Variance For each population, the response variable is normally distributed. For each population, the response variable is normally distributed. The variance of the response variable, denoted  2, is the same for all of the populations. The variance of the response variable, denoted  2, is the same for all of the populations. The effect of independent variable is additive The effect of independent variable is additive The observations must be independent. The observations must be independent.

Analysis of Variance: Testing for the Equality of K Population Means Between-Treatments Estimate of Population Variance Between-Treatments Estimate of Population Variance Within-Treatments Estimate of Population Variance Within-Treatments Estimate of Population Variance Comparing the Variance Estimates: The F Test Comparing the Variance Estimates: The F Test ANOVA Table ANOVA Table

A between-treatments estimate of σ 2 is called the mean square due to treatments (MSTR). A between-treatments estimate of σ 2 is called the mean square due to treatments (MSTR).  The numerator of MSTR is called the sum of squares due to treatments (SSTR). The denominator of MSTR represents the degrees of freedom associated with SSTR. The denominator of MSTR represents the degrees of freedom associated with SSTR. Between-Treatments Estimate of Population Variance

The estimate of  2 based on the variation of the sample observations within each treatment is called the mean square due to error (MSE). The estimate of  2 based on the variation of the sample observations within each treatment is called the mean square due to error (MSE).  The numerator of MSE is called the sum of squares due to error (SSE). The denominator of MSE represents the degrees of freedom associated with SSE. The denominator of MSE represents the degrees of freedom associated with SSE. Within-Treatments Estimate of Population Variance

Comparing the Variance Estimates: The F Test If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to n T - k. If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal to n T - k. If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates σ  2. If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimates σ  2. Hence, we will reject H 0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution. Hence, we will reject H 0 if the resulting value of MSTR/MSE appears to be too large to have been selected at random from the appropriate F distribution.

Test for the Equality of k Population Means Hypotheses Hypotheses H 0 :  1 = 2 = 3 =... =  k  H 0 :  1 = 2 = 3 =... =  k  H a : Not all population means are equal Test Statistic Test Statistic F = MSTR/MSE

Test for the Equality of k Population Means Rejection Rule Rejection Rule Using test statistic: Reject H 0 if F > F a Using test statistic: Reject H 0 if F > F a Using p-value: Reject H 0 if p-value < a where the value of F a is based on an F distribution with k - 1 numerator degrees of freedom and n T - k denominator degrees of freedom

The figure below shows the rejection region associated with a level of significance equal to  where F  denotes the critical value. The figure below shows the rejection region associated with a level of significance equal to  where F  denotes the critical value. Sampling Distribution of MSTR/MSE Do Not Reject H 0 Reject H 0 MSTR/MSE Critical Value FF FF

ANOVA Table Source of Sum of Degrees of Mean Variation Squares Freedom Squares F TreatmentSSTR k - 1 MSTR MSTR/MSE Error SSE n T - k MSE Total SST n T - 1 SST divided by its degrees of freedom n T - 1 is simply the overall sample variance that would be obtained if we treated the entire n T observations as one data set.

Example: Reed Manufacturing Analysis of Variance Analysis of Variance J. R. Reed would like to know if the mean number of hours worked per week is the same for the department managers at her three manufacturing plants (Buffalo, Pittsburgh, and Detroit). A simple random sample of 5 managers from each of the three plants was taken and the number of hours worked by each manager for the previous week is shown on the next slide.

Sample Data Sample Data Plant 1Plant 2Plant 3 ObservationBuffalo Pittsburgh Detroit ObservationBuffalo Pittsburgh Detroit Sample Mean Sample Mean Sample Variance Sample Variance Example: Reed Manufacturing

Hypotheses Hypotheses H 0 :  1 = 2 = 3  H a : Not all the means are equal where: where:  1 = mean number of hours worked per week by the managers at Plant 1  2 = mean number of hours worked per week by the managers at Plant 2  2 = mean number of hours worked per week by the managers at Plant 2  3 = mean number of hours worked per week by the managers at Plant 3 Example: Reed Manufacturing

Mean Square Due to Treatments Mean Square Due to Treatments Since the sample sizes are all equal Since the sample sizes are all equal μ= ( )/3 = 60 μ= ( )/3 = 60 SSTR = 5( ) 2 + 5( ) 2 + 5( ) 2 = 490 SSTR = 5( ) 2 + 5( ) 2 + 5( ) 2 = 490 MSTR = 490/(3 - 1) = 245 MSTR = 490/(3 - 1) = 245 Mean Square Due to Error Mean Square Due to Error SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308 MSE = 308/(15 - 3) = = = Example: Reed Manufacturing

F - Test F - Test If H 0 is true, the ratio MSTR/MSE should be near 1 because both MSTR and MSE are estimating  2. If H a is true, the ratio should be significantly larger than 1 because MSTR tends to overestimate  2. Example: Reed Manufacturing

Rejection Rule Rejection Rule Using test statistic: Reject H 0 if F > 3.89 Using p-value: Reject H 0 if p-value <.05 where F.05 = 3.89 is based on an F distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom

Example: Reed Manufacturing Test Statistic Test Statistic F = MSTR/MSE = 245/ = 9.55 Conclusion Conclusion F = 9.55 > F.05 = 3.89, so we reject H 0. The mean number of hours worked per week by department managers is not the same at each plant.

ANOVA Table ANOVA Table Source of Sum of Degrees of Mean Source of Sum of Degrees of Mean Variation Squares Freedom Square F Variation Squares Freedom Square F Treatments Treatments Error Error Total Total Example: Reed Manufacturing

Step 1 Select the Tools pull-down menu Step 1 Select the Tools pull-down menu Step 2 Choose the Data Analysis option Step 2 Choose the Data Analysis option Step 3 Choose Anova: Single Factor Step 3 Choose Anova: Single Factor from the list of Analysis Tools … continued Using Excel’s Anova: Single Factor Tool

Step 4 When the Anova: Single Factor dialog Step 4 When the Anova: Single Factor dialog box appears: box appears: Enter B1:D6 in the Input Range box Enter B1:D6 in the Input Range box Select Grouped By Columns Select Grouped By Columns Select Labels in First Row Select Labels in First Row Enter.05 in the Alpha box Enter.05 in the Alpha box Select Output Range Select Output Range Enter A8 (your choice) in the Output Range box Enter A8 (your choice) in the Output Range box Click OK Click OK Using Excel’s Anova: Single Factor Tool

Value Worksheet (top portion) Value Worksheet (top portion) Using Excel’s Anova: Single Factor Tool

Value Worksheet (bottom portion) Value Worksheet (bottom portion) Using Excel’s Anova: Single Factor Tool

Using the p-Value Using the p-Value The value worksheet shows that the p-value is.00331The value worksheet shows that the p-value is The rejection rule is “Reject H 0 if p-value <.05”The rejection rule is “Reject H 0 if p-value <.05” Thus, we reject H 0 because the p-value = <  =.05Thus, we reject H 0 because the p-value = <  =.05 We conclude that the mean number of hours worked per week by the managers differ among the three plantsWe conclude that the mean number of hours worked per week by the managers differ among the three plants Using Excel’s Anova: Single Factor Tool