Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of Variance.

Similar presentations


Presentation on theme: "Analysis of Variance."— Presentation transcript:

1 Analysis of Variance

2 In practice it is often necessary to compare a large number of independent random selections in terms of level, we are interested in hypothesis: for at least one i (i = 1, 2,…m) for m > 2, when i , i =1, 2, …m are mean values of normally distributed populations with equal variances 2 , t.j. N(, 2) To verify this hypothesis is used important statistical method called Analysis of variance, abbreviated ANOVA (resp. AV)

3 AV is frequently used in the evaluation of biological experiments
In practice is AV used for examination of the impact of one, or more factors (treatments) on the statistical sign. Factors are labeled A, B,…in AV they will be regarded as qualitative attributes with different variations – levels of factor Result will be quantitative statistical sign denoted Y AV is frequently used in the evaluation of biological experiments The most simple case is AV with single factor called One factor analysis of variance

4 Level of the factor refer to :
 certain amount of quantitative factor, e.g. Amount of pure nutrients in manure, different income groups of households Certain kind of qualitative factor, e.g. different types of the same crop, methods of products placing in stores, AV is a generalization of Student's t-test for independent choices AV also examines the impact of qualitative factors resulting in a quantitative character -> analyzes the relationships between attributes

5 Scheme of single-factor experiment “balanced attempt”
line average Repetition line sum A 1 2… j… n Yi . yi . 1 y11 y12 y1j y1n Y1. y1. 2 y21 y22 y2j y2n Y2. y2. … ……….. i yi1 yi2 yij yin Yi. yi. m ym1 ym2 ymj ymn Ym. ym. Y.. y.. Levels of the factor Total sum Overall average

6 Line sum: Total sum: Line average: Overall average:

7 Model for resulting observed value:
where i = 1, 2,…, m j = 1,2,…, n  expected values for all levels of the factor and observed values i impact of i-th level of the factor A eij random error, every measurement is biased, resp. impact of random factors

8 or Then we can formulate null hypothesis: Ho : 1 = 2 =… i = m = 0 -> effects of all levels of factor A are zero, insignificant, against the alternate hypothesis H1: i  0 for at least one i (i = 1,2…m) effect i at least one i – level of the factor is significant, => significantly different from zero

9 Estimates of parameters are sample characteristics: :
What can be rewrited:

10 Comparison of two experiments with three levels of factor
3 1 2 1 2 3

11 Principle of the ANOVA Basic principle of the analysis of variance is decomposition of the total variability of the investigated sign. Sr Sc S1 Variability between levels of factor, caused by the action of factor A, “variability between groups” Random variability, residual, “variability within groups“ Total variability

12 s12 sr2 2 Degrees of freedom 3 Mean square (MS) (1/2) 1 Sum of squares
4 F critical Variability Variability between groups m-1 s12 S1 Variability within groups m.n - m sr2 Sr Total variability N-1= m .n-1 Sc

13 Test statistics for one factor ANOVA can be written:
F value will be compared with appropriate table value for F-distribution: F , with (m-1) and (m.n - m) degrees of freedom

14 Decision about test result:
If F calc  F. ((m-1,(N-m)) We reject H0, In that case is effect of at least one level of the factor significant, thus average level of the indicator is significantly different from others. => At least one effect i is significantly different from zero. If F calc  F Do not reject Ho F Acceptance regon Ho Rejection region H0

15 If null hypothesis is rejected:
We found only that effect of the factor on examined attribute is significant. It is also necessary to identify levels of the factor, which are significantly different - for this purpose are used tests of contrasts Test of contrast: Duncan test, Scheffe test, Tuckey test and others…..

16 Terms of use AV: Samples have normal distribution,
violating of this assumption has significant effect on the results of AV statistical independence of random errors eij Identical residual variances 12 = 22 = …. = 2 , t.j. D(eij) = 2 for all i = 1,2…., m, j=1,2, …n this assumption is more serious and can be verified by Cochran, resp. Bartlett test.

17 Scheme of single-factor experiment “unbalanced attempt”
line average Different number of repetitions line sum A 1 2… j … ni Yi . yi . 1 y11 y12 y1j n1 Y1. y1. 2 y21 y22 y2j n2 Y2. y2. … ……….. i yi1 yi2 yij ni Yi. yi. m ym1 ym2 ymj nm Ym. ym. Y.. y.. Levels of the factor Where Overall average

18 s12 sr2 1 Sum of squares (SS) 3 Mean square (MS) (1/2) 4 F- critical 2
Degrees of freedom Variability Variability between groups m-1 s12 S1 Variability within groups N - m sr2 Sr Total variability N-1 S

19 Two-factor analysis of variance with one observation in each subclass
Two-factor analysis of variance with one observation in each subclass.... TAV Consider the effect of factor A, which we investigate on the m - levels, i = 1,2, ...., m Then consider the effect of factor B, which is observed on n - levels , j = 1,2, …, n On every i-level of factor A and j-level of factor B we have only one observation (repetition) yij =>We are veryfying two null hypothesis

20 Scheme for Two-factor experiment with one observation in each subclass TAV
Row average n- levels of factor B row sum B A … j … n Yi . yi . 1 y11 y12 y1j y1n Y1. Y1. 2 y21 y22 y2j y2n Y2. y2. … ……….. i yi1 yi2 yij yin Yi. yi. m ym1 ym2 ymj ymn Ym. ym. Y.1 Y Y.j ... Y Y y.1 y y.j ... y.1 y.. m-levels of factor A Overall average Column sum Column average

21 We are verifying the validity of two null hypothesis
We can write model for examined attribute as follows: We are verifying the validity of two null hypothesis Hypothesis for factor A: Ho 1: 1 = 2 =… i = m = 0 t.j. All effects of factor A levels are equal to zero, thus insignificant, against alternative hypothesis H11 : i  0 for at least one i (i = 1,2…m) effect i of at least one i – level of factor A is significant, significantly different from zero

22 Hypothesis for factor B: Ho 2:  1 =  2 =…  j =  n = 0
=> All effects of factor A levels are equal to zero, thus insignificant, against alternative hypothesis H12 :  j  0 for at least one j (j = 1,2…m) effect  j of at least one j – level of the factor B is significant, significantly different from zero doc.Ing. Zlata Sojková,CSc.

23 S1 s12 S2 s22 Sr sr2 Sc 2 Degrees of freedom 3 Mean square (MS) (1/2)
4 F - critical TAV Variability 1 Sum of squares (SS) Variability between rows S1 m-1 s12 Variability between columns S2 n-1 s22 Residual variability Sr sr2 (m-1)(n-1) Total variability Sc m.n -1

24 Decomposition of the total variability Sc= S1 + S2 + S r
Variability between rows, effect of factor A Variability between columns, Effect of factor B Residual variability Total variability

25 Investigating the relationships between statistical attributes
Investigating the relationship between qualitative attributes, e.g. AB , called measurement of the association Investigating the relationship between quantitative attributes - regression and correlation analysis

26 Inestigating the association
Based on the association, resp. pivot tables For testing the existence of  significant relationship between qualitative signs we use 2 - test of independence H0: two signs A and B are independent H1: signs A and B are dependent Attribute A has m - levels, variations Attribute B has k - levels , variations

27 Hypotheses formulation
Dependence of the attributes will appear in different frequency E.g. We examine wheter the size of the package is affected by the size of the family Ho : Choice of the package size does not depend on the count of family members H1 : Choice of package size is affected by the size of the family The procedure use comparing of empirical and theoretical frequencies, (what should be empirical frequencies, if the attributes A and B were independent

28 Simultanous frequencies, frequencies of the second order (aibj)
Marginal frequencies (ai) resp.(bj) Size of the family Package size < Total (b1) (b2) (b3) do 100g (a1) (a1b1) (a1 b2) g (a2) 250g < (a3) (a3b3) Total Total count of the respondents n

29 Determination of theoretical frequencies
Based on the sentence about independence of the random events A and B: P(AB) = P(A) . P(B), thus signs A and B are independent, then: P(aibj) = P(ai) .P(bj) Estimate based on the relative frequencies: (aibj)o = (ai) . (bj)  (aibj)o = (ai) .(bj) n n n n Theoretical frequencies

30 Calculation of theoretical frequencies (a1b1)o = 70.40/300 = 9,33
Family size Package size and < Total (b1) (b2) (b3) do 100g (a1) , g (a2) 250g < (a3) Total Total count of respondents n

31 Calculation of test criteria and decision:
If 2 calculated  2 for significance  for degrees of freedom (m-1).(k-1)  Ho is rejected => signs A and B are dependent In our case it means, that number of the family members significantly affects choice of the package size. Further, we should measure strength (power) of the dependence.


Download ppt "Analysis of Variance."

Similar presentations


Ads by Google