Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)

Similar presentations


Presentation on theme: "Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)"— Presentation transcript:

1 Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)

2 Introduction  In the previous lecture we were concerned with the analysis of data where we compared the sample means.  Frequently data contains more that two samples, they may compare several treatments.  In this lecture we introduce statistical analysis that allows us compare the mean of more that two samples. The method is called ‘Analysis of Variance ‘ or AVOVA for short.

3 Total Sum of Squares Data set: 14, 12, 10, 6,4, 2 Group A: 6,4, 2 Group B: 14, 12, 10 Overall Mean : 8 Total Sum of Squares: SS T = (14-8) 2 + (12-8) 2 + (10-8) 2 + (6-8) 2 + (4-8) 2 + (2-8) 2 =112

4 Between Group Variation  Sum of Squares of the Model: SS m = n a (µ - µ a ) 2 + n b (µ - µ b ) 2 =3*(8-4) 2 + 3*(8-12) 2 =96

5 Within Group Variation  Sum of Squares of the Error: SS e = = (14-12) 2 + (12-12) 2 + (10- 12) 2 + (8-6) 2 + (6-6) 2 + (6- 4) 2 + = 16

6 Structure of the Data GroupObservationTotalMean 1x 11 x 12..........x 1n x1x1 2x 21 x 22.......... x 2n x2x2................ ax a1 x a2..........x an xaxa Total

7 ANOVA Table SourceDegrees of Freedom Sum Of SquaresMean Square F- Stat Modela - 1SS M /(a-1)MS M / MS E Errorn-a SS E /(n-a) Totaln-1 SS T /(n-1) Where : n is the sample size and a is the number of groups

8 ANOVA Table – Original Example SourceDegrees of Freedom Sum Of SquaresMean Square F- Stat Model2 - 1 = 196 24 Error6 – 2 = 416 4 Total6 – 1 = 5112 Where : n is the sample size and k is the number of groups

9 Model Assumptions  Independence of observations within and between samples  normality of sampling distribution  equal variance - This is also called the homoscedasticity assumption

10 The ANOVA Equation  We can describe the observations in the above table usint the following equation: Where : n is the sample size and k is the number of groups

11 ANOVA Hypotheses We wish to test the hypotheses: The analysis of variance partitions the total variability into two parts.

12 Example

13 Graphical Display of Data Figure 13-1 (a) Box plots of hardwood concentration data. (b) Display of the model in Equation 13-1 for the completely randomized single-factor experiment

14 Example  We can use ANOVA to test the hypotheses that different hardwood concentrations do not affect the mean tensile strength of the paper. The hypotheses are:  The ANOVA table is below:

15 Example  The p-value is less than 0.05 therefore the H 0 can be rejected and we can conclude that at least one of the hardwood concentrations affects the mean tensile strength of the paper.

16 Demo

17 Confidence Interval about the mean For 20% hardwood, the resulting confidence interval on the mean is

18 Confidence Interval about on the difference of two treatments For the hardwood concentration example,

19 An Unbalanced Experiment

20 Multiple Comparisons Following the ANOVA  The least significant difference (LSD) is If the sample sizes are different in each treatment:

21 Example: Multi-comparison Test

22


Download ppt "Probability & Statistical Inference Lecture 8 MSc in Computing (Data Analytics)"

Similar presentations


Ads by Google