Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE MULTINOMIAL DISTRIBUTION AND ELEMENTARY TESTS FOR CATEGORICAL DATA It is useful to have a probability model for the number of observations falling.

Similar presentations


Presentation on theme: "THE MULTINOMIAL DISTRIBUTION AND ELEMENTARY TESTS FOR CATEGORICAL DATA It is useful to have a probability model for the number of observations falling."— Presentation transcript:

1 THE MULTINOMIAL DISTRIBUTION AND ELEMENTARY TESTS FOR CATEGORICAL DATA It is useful to have a probability model for the number of observations falling into each of k mutually exclusive classes. Such a model is given by the multinomial random variable, for which it is assumed that : 1. A total for n independent trials are made 2. At each trial an observation will fall into exactly one of k mutually exclusive classes 3. The probabilities of falling into the k classes are p 1, p 2,……., p k where p i is the probability of falling into class i, i = 1,2,…k These probabilities are constant for all trials, with 1

2 If k =2, we have the Binomial distribution. Let us define : X 1 to be the number of type 1 outcomes in the n trials, X 2 to be the number of type 2 outcomes,. X k to be the number of type k outcomes. As there are n trials, 2

3 The joint probability function for these RV can be shown to be : where For k=2, the probability function reduces to which is the Binomial probability of - successes in n trials, each with probability of success. 3

4 EXAMPLE A simple example of multinomial trials is the tossing of a die n times. At each trial the outcome is one of the values 1, 2, 3, 4, 5 or 6. Here k=6. If n=10, the probability of 2 ones, 2 twos, 2 threes, no fours, 2 fives and 2 sixes is : To testing hypotheses concerning the, the null hypothesis for this example, states that the die is fair. vs is false which, of course, means that the die is not fair 4

5 The left-hand side can be thought of as the sum of the terms : Which will be used in testing versus where the are hypothesized value of the 5

6 In the special case of k=2, there are two-possible outcomes at each trial, which can be called success and failure. A test of is a test of the same null hypothesis ( ). The following are observed an expected values for this situation : SuccessFailureTotal Expectedn ObservedXn-Xn 6

7 For an α-level test, a rejection region for testing versus is given by We know that Hence, By definition, We have,, and using if and only if 7

8 GOODNESS - of – FIT TESTS Thus far all our statistical inferences have involved population parameters like : means, variances and proportions. Now we make inferences about the entire population distribution. A sample is taken, and we want to test a null hypothesis of the general form ; H 0 : sample is from a specified distribution The alternative hypothesis is always of the form H 1 : sample is not from a specified distribution A test of H 0 versus H 1 is called a goodness-of-fit test. Two tests are used to evaluate goodness of fit : 1. The test, which is based on an approximate statistic. 2. The Kolmogorov – Smirnov (K-S) test. This is called a non parametric test, because it uses a test statistic that makes no assumptions about distribution. The test is best for testing discrete distributions, and the K-S test is best on continuous distributions. 8

9 Goodness of Fit ?? A goodness of fit test attempts to determine if a conspicuous discrepancy exists between the observed cell frequencies and those expected under H 0. A useful measure for the overall discrepancy is given by : where O and E symbolize an observed frequency and the corresponding expected frequency. The discrepancy in each cell is measured by the squared difference between the observed and the expected frequencies divided by the expected frequency. 9

10 The statistic was originally proposed by Karl Pearson (1857 – 1936), who found the distribution for large n to be approximately a distribution with degrees of freedom = k-1. Due to this distribution, the statistic is denoted by and is called Pearson’s statistic for goodness of fit. Null hypothesis : H 0 : p i = p io ; i = 1,2, ….k H 1 : at least one p i is not equal to its specified value. Test statistic : Rejection Region : distribution with d.f = (k-1) 10

11 Chi – square statistic first proposed by Karl Pearson in 1900, begin with the Binomial case. Let X 1 ~ BIN (n, p 1 ) where 0 < p 1 < 1. According to the CLT : for large n, particularly when np 1 ≥ 5 and n(1- p 1 ) ≥ 5. As you know, that Q 1 = Z 2 ≈ χ 2 (1) If we let X 2 = n - X 1 and p 2 = 1 - p 1, Because, Hence, 11

12 Pearson the constructed an expression similar to Q 1 ; which involves X 1 and X 2 = n - X 1, that we denote by Q k-1, involving X 1, X 2, ……., X k-1 and X k = n - X 1 - X 2 - …….- X k-1 Hence, or 12

13 EXAMPLE We observe n = 85 values of a – random variable X that is thought to have a Poisson distribution, obtaining : The sample average is the appropriate estimate of λ = E(X) It is given by The expected frequencies for the first three cells are : np i, i= o,1,2 85 p 0 = 85 P(X=0) = 85 (0,449) = 38,2 85 p 1 = 85 P(X=1) = 85 (0,360) = 30,6 85 p 2 = 85 P(X=2) = 85 (0,144) = 12,2 x 0 1 2 3 4 5 Frequency 41 29 9 4 1 1 13

14 The expected frequency for the cell { 3, 4, 5 } is : 85 (0,047) = 4,0 ; WHY ??? The computed Q 3, with k=4 after combination,  no reason to reject H 0 H 0 : sample is from Poisson distribution vsH 1 : sample is not from Poisson distribution 14

15 EXERCISE The number X of telephone calls received each minute at a certain switch board in the middle of a working day is thought to have a Poisson distribution. Data were collected, and the results were as follows : Fit a Poisson distribution. Then find the estimated expected value of each cell after combining {4,5,6} to make one cell. Compute Q 4, since k=5, and compare it to Why do we use three degrees of freedom? Do we accept or reject the Poisson distribution? x 0 1 2 3 4 5 6 frequency 40 66 41 28 9 3 1 15

16 CONTINGENCY TABLES In many cases, data can be classified into categories on the basis of two criteria. For example, a radio receiver may be classified as having low, average, or high fidelity and as having low, average, or high selectivity; or graduating engineering students may be classified according to their starting salary and their grade-point-average. In a contingency table, the statistical question is whether the row criteria and column criteria are independent. The null and alternative hypotheses are H 0 : The row and column criteria are independent H 1 : The row and column criteria are associated Consider a contingency table with r rows and c columns. The number of elements in the sample that are observed to fall into row class i and column class j is denoted by 16

17 The row sum for the i th row is And the column sum for j th column is The total number of observations in the entire table is The contingency table for the general case is given ON THE NEXT SLIDESHOW : 17

18 The General r x c Contingency Table X 11 X 12 …................ X 1j.................... X 1c X 21 X 22 …………….X 2j …………… X 2c. X i1 X i2................... X ij ………………X ic. X r1 X r2 ……………X rj ……………... X rc R1R2RiRrR1R2RiRr C 1 C 2 ……………C j ……………….C c n 18

19 There are several probabilities of importance associated with the table. The probability of an element’s being in row class i and column class j in the population is denoted by p ij The probability of being in row class i is denoted by p i, and the probability of being in column class j is denoted by p j Null and alternative hypotheses regarding the independence of these probabilities would be stated as follows : for all pairs (i, j) versus is false As p ij, p i, p j are all unknown, it is necessary to estimate these probabilities. 19

20 and under the hypothesis of independence,, so would be estimated by The expected number of observations in cell (i,j) is Under the null hypothesis,, the estimate of is The chi-square statistic is computed as 20

21 The actual critical region is given by If the computed gets too large, namely, exceeds we reject the hypothesis that the two attributes are independent. 21

22 EXAMPLE Ninety graduating male engineers were classified by two attributes : grade-point average (low, average, high) and initial salary (low, high). The following results were obtained. Salary Grade-Point Average Low Average High Low High 15 18 7 5 22 23 40 50 20 40 3090 22

23 SOLUTION ; APA ARTINYA ??? 23

24 EXERCISES 1. Test of the fidelity and the selectivity of 190 radios produced. The results shown in the following table : Fidelity Low Average High Low Selectivity Average High Use the 0,01 level of significance to test the null hypothesis that fidelity is independent of selectivity. 71231 355918 15130 24

25 2.A test of the quality of two or more multinomial distributions can be made by using calculations that are associated with a contingency table. For example, n = 100 light bulbs were taken at random from each of three brands and were graded as A, B, C, or D. Brand Grade Totals A B C D 123123 27 42 21 10 23 39 25 13 22 36 23 19 100 72 117 69 42300 25

26 Clearly, we want to test the equality of three multinomial distributions, each with k=4 cells. Since under the probability of falling into a particular grade category is independent of brand, we can test this hypothesis by computing and comparing it with. Use. 26

27 ANALYSIS OF VARIANCE 19 27 The Analysis of Variance ANOVA (AOV) is generalization of the two sample t-test, so that the means of k > 2 populations may be compared ANalysis Of VAriance, first suggested by Sir Ronald Fisher, pioneer of the theory of design of experiments. He is professor of genetics at Cambridge University. The F-test, name in honor of Fisher

28 28 The name Analysis of Variance stems from the somewhat surprising fact that a set of computations on several variances is used to test the equality of several means IRONICALLY

29 29 ANOVA The term ANOVA appears to be a misnomer, since the objective is to analyze differences among the group means The terminology of ANOVA can be confusing, this procedure is actually concerned with levels of means The ANOVA deals with means, it may appear to be misnamed The ANOVA belies its name in that it is not concerned with analyzing variances but rather with analyzing variation in means

30 30 DEFINITION: ANOVA, or one-factor analysis of variance, is a procedure to test the hypothesis that several populations have the same means. DEFINITION: ANOVA, or one-factor analysis of variance, is a procedure to test the hypothesis that several populations have the same means. FUNCTION: Using analysis of variance, we will be able to make inferences about whether our samples are drawn from populations having the same means FUNCTION: Using analysis of variance, we will be able to make inferences about whether our samples are drawn from populations having the same means

31 INTRODUCTION 31 The Analysis of Variance (ANOVA) is a statistical technique used to compare the locations (specifically, the expectations) of k>2 populations. The study of ANOVA involves the investigation of very complex statistical models, which are interesting both statistically and mathematically. The first is referred to as a one-way classification or a completely randomized design. The second is called a two-way classification or a randomized block design. The basic idea behind the term “ANOVA” is that the total variability of all the observations can be separated into distinct portions, each of which can be assigned a particular source or cause. This decomposition of the variability permits statistical estimation and tests of hypotheses.

32 32 Suppose that we are interested in k populations, from each of which we sample n observations. The observations are denoted by: Y ij, i = 1,2,…k ; j = 1,2,…n where Y ij represents the j th observation from population i. A basic null hypothesis to test is : H 0 : µ 1 = µ 2 = … =µ k that is, all the populations have the same expectation. The ANOVA method to test this null hypothesis is based on an F statistic.

33 THE COMPLETELY RANDOMIZED DESIGN WITH EQUAL SAMPLE SIZES 33 First we will consider comparison of the true expectation of k > 2 populations, sometimes referred to as the k – sample problem. For simplicity of presentation, we will assume initially that an equal number of observations are randomly sampled from each population. These observations are denoted by: Y 11, Y 12, ……, Y 1n Y 21, Y 22, ……, Y 2n. Y k1, Y k2, ……, Y kn

34 34 where Y ij represents the j th observation out of the n randomly sampled observations from the i th population. Hence, Y 12 would be the second observation from the first population. In the completely randomized design, the observations are assumed to : 1. Come from normal populations 2. Come from populations with the same variance 3. Have possibly different expectations, µ 1, µ 2, …, µ k These assumptions are expressed mathematically as follows : Y ij ~ NOR (µ i, σ 2 ) ; i = 1,2,...k (*) j = 1,2,…n This equation is equivalent to………….

35 35 Y ij = µ i + ε ij, with ε ij ~ NID (0, σ 2 ) Where N represents “normally”, I represents “ independently” and D represents “ distributed”. The 0 means for all pairs of indices i and j, and σ 2 means that Var ( ) = σ 2 for all such pairs. The parameters µ 1, µ 2, …,µ k are the expectations of the k populations, about which inference is to be made. The initial hypotheses to be tested in the completely randomized design are : H 0 : µ 1 = µ 2 = … =µ k versus H 1 : µ i ≠ µ j for some pair of indices i ≠ j (**)

36 36 The null hypothesis states that all of the k populations have the same expectation. If this is true, then we know from equation (*) that all of the Y ij observations have the same normal distribution and we are observing not n observations from each of k populations, but nk observations, all from the same population. The random variable Y ij may be written as : where, defining, So,

37 37 Hence,, with and The hypotheses in equation (**) may be restated as : VS (***) The observation has expectation,

38 38 The parameters are differences or deviations from this common part of the individual population expectations. If all of the are equal (say to ), then. In this case all of the deviation are zero, because : Hence, the wall hypothesis in equation (***) means that,, these expectations consist only of the common part. The total variability of the observations :, where, is the means of all of the observations. It can be shown, that :

39 39 The notation represents the average of the observations from the i th population ; that is The last equation, is represented by : SST = SSA + SSE where SST represents the total sum of squares, SSA represents the sum of squares due to differences among populations or treatments, and SSE represents the sum of squares that is unexplained or said to be “ due to error”. The result of ANOVA, usually reported in an analysis of variance table. ANOVA table……….

40 ANOVA Table for the Completely Randomized Design with Equal Sample Sizes : 40 Source of Variation Degrees of Freedom Sum of Squares Mean SquareF Among populations or treatments k-1SSA Errork(n-1)SSE Totalkn-1SST For an -level test, a reasonable critical region for the alternative hypotheses in equation (**) is

41 THE COMPLETELY RANDOMIZED DESIGN WITH UNEQUAL SAMPLE SIZES 41 In many studies in which expectation of k>2 populations are compared, the samples from each population are not ultimately of equal size, even in cases where we attempt to maintain equal sample size. For example, suppose we decide to compare three teaching methods using three classes of students. The teachers of the classes agree to teach use one of the three teaching methods. The plan for the comparison is to give a common examination to all of the students in each class after two months of instruction. Even if the classes are initially of the same size, they may differ after two months because students have dropped out for one reason or another. Thus we need a way to analyze the k-sample problem, when the samples are of unequal sizes.

42 42 In the case of UNEQUAL SAMPLE SIZE, the observations are denoted by :. where, represents the j th observation from the i th population. For the i th population there are n i observations. In the case of equal sample sizes, n i = n for i = 1,2,…,k. The model assumptions are the same for the unequal sample size case as for the equal sample size case. The are assumed to : 1. Come from normal populations 2. Come from populations with the same variance 3. Have possibly different expectations, µ 1, µ 2, …, µ k

43 43 These assumptions are expressed formally as ; i = 1, 2, …, k j = 1, 2, …, n i or as Y ij = µ i + ε ij, with ε ij ~ NID (0, σ 2 ) The first null and alternative hypotheses to test are exactly the same as those in the previous section-namely : H 0 : µ 1 = µ 2 = … =µ k versus H 1 : µ i ≠ µ l for some pair of indices i ≠ l The model for the completely randomized design may be presented as : with and ε ij ~ NID (0, σ 2 ) In this case the overall mean,, is given by where is the total number of observations.

44 44 Here is a weighted average of the population expectations, where the weights are, the proportion of observations coming from the i th population. The hypotheses, can also be restated as versus for at least one i. The observation Y ij has expectation, If H 0 is true, then, hence all of the have a common distribution. Thus,, under H 0. The total variability of the observations is again partitioned into two portions by or SST = SSA +SSE, here

45 45 where As before : represents the average of the observations from the i th population. N is the total number of observations is the average of all the observations Again, SST represents the total sum of squares. SSA represents the sum of squares due to differences among populations or treatments. SSE represents the sum of squares due to error.

46 46 The number of Degrees of Freedom for : TOTAL = TREATMENTS + ERROR (N-1) = (k-1) + (N-k) DEGREE OF FREEDOM

47 47 The mean square among treatments and the mean square for error are equal to appropriate sum of squares divided by corresponding dof. That is, It can be shown that MSE is an unbiased estimate of σ 2, that is :, similarly ; Under hypothesis, has an F-distribution with (k-1) and (N-k) dof. Finally, we reject the null hypothesis at significance level α if :

48 ANOVA TABLE for the Completely Randomized Design with unequal sample sizes 48 SOURCEdofSSMSF Among Populations or Treatments k-1SSA ERRORN-kSSE TOTALN-1SST Sometimes, SSA be denoted SSTR SSE be denoted SSER SST be denoted SSTO

49 SUMMARY NOTATION FOR A CRD 49 1 2 3 …………..... k MEAN VARIANCE µ 1 µ 2 µ 3 ………………… µ k …………….. POPULATIONS (TREATMENTS) 1 2 3 …………… k SAMPLE SIZE SAMPLE TOTALS SAMPLE MEANS n 1 n 2 n 3 …................ n k T 1 T 2 T 3 …................ T k ……………. Total number of measurements N = n 1 + n 2 + n 3 +…+ n k INDEPENDENT RANDOM SAMPLES

50 ANOVA F-TEST FOR A CRD with k treatments 50 H 0 : µ 1 = µ 2 = … =µ k (i.e., there is no difference in the treatment means) versus H a : At least two of the treatment means differ. Test Statistic : Rejection Region : H 0 : µ 1 = µ 2 = … =µ k (i.e., there is no difference in the treatment means) versus H a : At least two of the treatment means differ. Test Statistic : Rejection Region :

51 PARTITIONING OF THE TOTAL SUM OF SQUARES FOR THE COMPLETELY RANDOMIZED DESIGN 51 TOTAL SUM OF SQUARES (SSTO) SUM OF SQUARES FOR TREATMENTS (SSTR) SUM OF SQUARES FOR ERROR (SSER)

52 FORMULAS FOR THE CALCULATIONS IN THE CRD 52 SSTR = sum of squares for treatments = (sum of squares of treatment totals with each square divided by number of observations for that treatment) - CM =

53 53 where k is the total of treatments and N is the total number of observations.

54 EXAMPLE 54 For group of students were subjected to different teaching techniques and tested at the end of a specified period of time. As a result of drop outs from the experimental groups (due to sickness, transfer, and so on) the number of students varied from group to group. Do the data shown in table (below) present sufficient evidence to indicate a difference in the mean achievement for the four teaching techniques ?? DATA FOR EXAMPLE 1 1 2 3 4 65 75 59 94 87 69 78 89 73 83 67 80 79 81 62 88 81 72 83 69 79 76 90 454 549 425 351 6 7 6 4 75,67 78,43 70,83 87,75

55 SOLUTION 55 The mean squares for treatment and error are

56 56 The test statistic for testing H 0 : µ 1 = µ 2 = µ 3 = µ 4 is The critical value of F for α = 0.05 is reject H 0 CONCLUDE ?????

57 THE RANDOMIZED BLOCK DESIGN 57 The randomized block design implies the presence of two quantitative independent variables, “blocks” and “treatments” Consequently, the total sum of squares of deviations of the response measurements about their mean may be partitioned into three parts, the sum of squares for blocks, treatments and error.

58 58 CRD RBD SSTO SSTR SSER SSBL SSTR SSER

59 59 Definition : A randomized block design is a design devised to compare the means for k treatments utilizing b matches blocks of k experimental unit each. Each treatment appears once in every block. The observations in a RBD can be represented by an array of the following type :.

60 60 As before, the expectation of Y ij the i th observation from the j th treatment (population), was given by : In this section / RBD, the assumption about Y ij is that : (i) ; i = 1,2, …, t j = 1, 2, …, b with and The observation Y ij is said that to be the observation from block j on treatment i. As equation (i), it’s assumed that there are t different treatments and b blocks.

61 61 Hence, overall effect block effect treatment effect One task is to test the null hypothesis which states that there are no treatment differences.

62 62 Here, the i th treatment mean is : The j th block mean is : And the overall mean is : Expression above can be abbreviated as SSTO = SSTR + SSBL +SSER

63 63 The degrees of freedom are partitioned as follows : dof TO = dof TR + dof BL + dof ER bt – 1 = (t-1) + (b-1) + (b-1)(t-1) If the null hypothesis of no treatment differences given in is true,. Then both MSTR and MSER are unbiased estimate of.

64 64 It can be further shown, under : Hence, using an level test, we reject in favor of if : For reasons analogous, a test of : versus, can be carried out using the critical region :

65 65 Data Structure of a RBD with b blocks and k treatments T R E A T M E N T S 1 2 3 ………………… k Block means B 1 L 2 O.. C.. K b ………………..... ………………… Treatment means

66 GENERAL FORM OF THE RANDOMIZED BLOCK DESIGN (TREATMENT i IS DENOTED BY A i ) 66 BLOCK 1 2 ……………………… b Although we show thetreatments in order within the blocks, in practice they would be assigned to the experimental units in a random order (thus the name randomized block design)...... A1A1 A1A1 A1A1 A2A2 A2A2 A2A2 A3A3 A3A3 A3A3 AkAk AkAk AkAk

67 FORMULAS FOR CALCULATIONS IN RBD 67 where, N = total number of observations b = number of blocks k = number of treatments

68 68 ANOVA Summary Table For RBD SOURCEDOFSSMSF Treatments Blocks Error k-1 b-1 N-k-b+1 SSTR SSBL SSER MSTR MSBL MSER TOTALN-1SSTO

69 EXAMPLE 69 A study was conducted in a large city to compare the supermarket prices of the four leading brands of coffee at the end of the year. Ten supermarkets in the city were selected, and the price per pound was recorded for each brand. 1. Set up the test of the null hypothesis that the mean prices of the four brands sold in the city were the same at the end of the year. Use α = 0,05 2. Calculate the F statistic 3. Do the data provide sufficient evidence to indicate a difference in the mean prices for the four brands of coffee?

70 70 SUPER MARKET BRAND A B C D TOTALS 1 $ 2,43 $ 2,47 $ 2,27 $2,419,78 2 2,48 2,52 2,53 2,4810,01 3 2,38 2,44 2,42 2,359,59 4 2,40 2,47 2,46 2,399,72 5 2,35 2,42 2,44 2,329,53 6 2,43 2,49 2,47 2,429,81 7 2,55 2,62 2,64 2,5610,37 8 2,41 2,49 2,47 2,399,76 9 2,53 2,60 2,59 2,4910,21 10 2,35 2,43 2,44 2,369,58 TOTALS 24,31 24,95 24,93 24,17

71 71 SOLUTION

72 72

73 73 Since the calculation F >F 0,05, there is very strong evidence that at least two of the means for the populations/treatments of prices of four coffee brands differ. Treatments : H 0 : µ 1 = µ 2 = µ 3 = µ 4 H 1 : at least two brands have different mean prices Test Statistic Blocks : H 0 :Mean coffee prices are the same for all ten supermarkets H 1 : Mean coffee prices differ for at least two supermarkets Test Statistic

74 74 dof for the test statistic are b - 1 = 9 and N – k – b +1=27 F 0,05 = 2,25 ANOVA TABLE SOURCEDOFSSMSF Treatment Block Error 3 9 27 0,05000 0,17451 0,00485 0,016667 0,019390 0,00017963 92,8 107,9 TOTAL390,22936

75 NON PARAMETRIC TEST 75 The majority of hypothesis tests discussed so far have made inferences about population parameters, such as the mean and the proportion. These parametric tests have used the parametric statistics of samples that came from the population being tested. To formulate these tests, we made restrictive assumptions about the populations from which we drew our samples. For example, we assumed that our samples either were large or came from normally distributed populations. But populations are not always normal.

76 76 And even if a goodness-of-fit test indicates that a population is approximately normal. We cannot always be sure we’re right, because the test is not 100 percent reliable. Fortunately, in recent times statisticians have develops useful techniques that do not make restrictive assumption about the shape of population distribution. These are known as distribution – free or, more commonly, nonparametric test. Non parametric statistical procedures in preference to their parametric counterparts. The hypotheses of a nonparametric test are concerned with something other than the value of a population parameter. A large number of these tests exist, but this section will examine only a few of the better known and more widely used ones :

77 77 NON PARAMETRIC TESTS SIGN TEST WILCOXON SIGNED RANK TEST MANN – WHITNEY TEST (WILCOXON RANK SUM TEST) RUN TEST KRUSKAL – WALLIS TEST KOLMOGOROV – SMIRNOV TEST LILLIEFORS TEST

78 THE SIGN TEST 78 The sign test is used to test hypotheses about the median of a continuous distribution. The median of a distribution is a value of the random variable X such that the probability is 0,5 that an observed value of X is less than or equal to the median, and the probability is 0,5 that an observed value of X is greater than or equal to the median. That is, Since the normal distribution is symmetric, the mean of a normal distribution equals the median. Therefore, the sign test can be used to test hypotheses about the mean of a normal distribution.

79 79 Let X denote a continuous random variable with median and let denote a random sample of size n from the population of interest. If denoted the hypothesized value of the population median, then the usual forms of the hypothesis to be tested can be stated as follows : (right-tailed test) (left-tailed test) (two-tailed test) VERSUS

80 80 Form the differences : Now if the null hypothesis is true, any difference is equally likely to be positive or negative. An appropriate test statistic is the number of these differences that are positive, say. Therefore, to test the null hypothesis we are really testing that the number of plus signs is a value of a Binomial random variable that has the parameter p = 0,5. A p-value for the observed number of plus signs can be calculated directly from the Binomial distribution. Thus, if the computed p-value. is less than or equal to some preselected significance level α, we will reject and conclude is true.

81 81 To test the other one-sided hypothesis, vs is less than or equal α, we will reject. The two-sided alternative may also be tested. If the hypotheses are: vs p-value is :

82 82 It is also possible to construct a table of critical value for the sign test. As before, let denote the number of the differences that are positive and let denote the number of the differences that are negative. Let, table of critical values for the sign test that ensure that If the observed value of the test-statistic, the the null hypothesis should be reject and accepted

83 83 If the alternative is, then reject if. If the alternative is, then reject if. The level of significance of a one-sided test is one-half the value for a two-sided test.

84 84 Since the underlying population is assumed to be continuous, there is a zero probability that we will find a “tie”, that is, a value of exactly equal to. When ties occur, they should be set aside and the sign test applied to the remaining data. TIES in the SIGN TEST

85 85 When, the Binomial distribution is well approximated by a normal distribution when n is at least 10. Thus, since the mean of the Binomial is and the variance is, the distribution of is approximately normal with mean 0,5n and variance 0,25n whenever n is moderately large. Therefore, in these cases the null hypothesis can be tested using the statistic : THE NORMAL APPROXIMATION

86 86 Critical Regions/Rejection Regions for α -level tests of : versus are given in this table : CRITICAL/REJECTION REGIONS FOR Alternative CR/RR

87 THE WILCOXON SIGNED-RANK TEST 87 The sign test makes use only of the plus and minus signs of the differences between the observations and the median (the plus and minus signs of the differences between the observations in the paired case). Frank Wilcoxon devised a test procedure that uses both direction (sign) and magnitude. This procedure, now called the Wilcoxon signed-rank test. The Wilcoxon signed-rank test applies to the case of the symmetric continuous distributions. Under these assumptions, the mean equals the median.

88 88 Description of the test : We are interested in testing, versus

89 89 Assume that is a random sample from a continuous and symmetric distribution with mean/median :. Compute the differences, i = 1, 2, … n Rank the absolute differences, and then give the ranks the signs of their corresponding differences. Let be the sum of the positive ranks, and be the absolute value of the sum of the negative ranks, and let. Critical values of, say. 1. If, then value of the statistic, reject 2. If, reject if 3. If, reject if

90 90 If the sample size is moderately large (n>20), then it can be shown that or has approximately a normal distribution with mean and variance Therefore, a test of can be based on the statistic LARGE SAMPLE APPROXIMATION

91 Wilcoxon Signed-Rank Test 91 Test statistic : Theorem : The probability distribution of when is true, which is based on a random sample of size n, satisfies :

92 92 Proof : Let if, then where For a given, the discrepancy has a 50 : 50 chance being “+” or “-”. Hence, where

93 93

94 94

95 95 The Wilcoxon signed-rank test can be applied to paired data. Let ( ), j = 1,2, …n be a collection of paired observations from two continuous distributions that differ only with respect to their means. The distribution of the differences is continuous and symmetric. The null hypothesis is :, which is equivalent to To use the Wilcoxon signed-rank test, the differences are first ranked in ascending order of their absolute values, and then the ranks are given the signs of the differences. PAIRED OBSERVATIONS

96 96 Let be the sum of the positive ranks and be the absolute value of the sum of the negative ranks, and. If the observed value, then is rejected and accepted. If, then reject, if If, reject, if

97 EXAMPLE 97 Eleven students were randomly selected from a large statistics class, and their numerical grades on two successive examinations were recorded. Use the Wilcoxon signed rank test to determine whether the second test was more difficult than the first. Use α = 0,1. StudentTest 1Test 2DifferenceRankSign Rank 1 2 3 4 5 6 7 8 9 10 11 94 78 89 62 49 78 80 82 62 83 79 85 65 92 56 52 74 79 84 48 71 82 9 13 -3 6 -3 4 1 -2 14 12 -3 8 10 4 7 4 6 1 2 11 9 4 8 10 -4 7 -4 6 1 -2 11 9 -4

98 98 solution : Jumlah ranks positif : TOLAK H 0 01,28 1,69

99 EXAMPLE 99 Ten newly married couples were randomly selected, and each husband and wife were independently asked the question of how many children they would like to have. The following information was obtained. Using the sign test, is test reason to believe that wives want fewer children than husbands? Assume a maximum size of type I error of 0,05 COUPLE 1 2 3 4 5 6 7 8 9 10 WIFE X HUSBAND Y 3 2 1 0 0 1 2 2 2 0 2 3 2 2 0 2 1 3 1 2

100 SOLUSI 100 Tetapkan dulu H 0 dan H 1 : H 0 : p = 0,5 vs H 1 : p < 0,5 Ada tiga tanda +. Di bawah H 0, S ~ BIN (9, 1/2) P(S ≤ 3) = 0,2539 Pada peringkat α = 0,05, karena 0,2539 > 0,05 maka H 0 jangan ditolak. Pasangan 1 2 3 4 6 7 8 9 10 Tanda + - - - - + - + -

101 THE WILCOXON RANK-SUM TEST 101 Suppose that we have two independent continuous populations X 1 and X 2 with means µ 1 and µ 2. Assume that the distributions of X 1 and X 2 have the same shape and spread, and differ only (possibly) in their means. The Wilcoxon rank-sum test can be used to test the hypothesis H 0 : µ 1 = µ 2. This procedure is sometimes called the Mann- Whitney test or Mann-Whitney U Test.

102 Description of the Test 102 Let and be two independent random samples of sizes from the continuous populations X 1 and X 2. We wish to test the hypotheses : H 0 : µ 1 = µ 2 versus H 1 : µ 1 ≠ µ 2 The test procedure is as follows. Arrange all n 1 + n 2 observations in ascending order of magnitude and assign ranks to them. If two or more observations are tied, then use the mean of the ranks that would have been assigned if the observations differed.

103 103 Let W 1 be the sum of the ranks in the smaller sample (1), and define W 2 to be the sum of the ranks in the other sample. Then, Now if the sample means do not differ, we will expect the sum of the ranks to be nearly equal for both samples after adjusting for the difference in sample size. Consequently, if the sum of the ranks differ greatly, we will conclude that the means are not equal. Refer to table with the appropriate sample sizes n 1 and n 2, the critical value w α can be obtained.

104 104 H 0 : µ 1 = µ 2 is rejected, if either of the observed values w 1 or w 2 is less than or equal w α If H 1 : µ 1 < µ 2, then reject H 0 if w 1 ≤ w α For H 1 : µ 1 > µ 2, reject H 0 if w 2 ≤ w α.

105 LARGE-SAMPLE APPROXIMATION 105 When both n 1 and n 2 are moderately large, say, greater than 8, the distribution of W 1 can be well approximated by the normal distribution with mean : and variance :

106 106 Therefore, for n 1 and n 2 > 8, we could use : as a statistic, and critical region is :  two-tailed test  upper-tail test  lower-tail test

107 EXAMPLE 107 A large corporation is suspected of sex-discrimination in the salaries of its employees. From employees with similar responsibilities and work experience, 12 male and 12 female employees were randomly selected ; their annual salaries in thousands of dollars are as follows : Is there reason to believe that there random samples come from populations with different distributions ? Use α = 0,05 Females22,519,820,624,723,219,218,720,921,623,520,721,6 Males21,921,622,424,024,123,421,223,920,524,522,323,6

108 SOLUSI 108 H 0 : f 1 (x) = f 2 (x)  APA ARTINYA?? random samples berasal dari populasi dengan distribusi yang sama H 1 : f 1 (x) ≠ f 2 (x) Gabungkan dan buat peringkat salaries : SEXGAJIPERINGKAT F18,71 F19,22 F19,83 M20,54 F20,65 F20,76 F20,97 M21,28 M21,610 F21,610 F21,610

109 CONT’D........... 109 M21,912 M22,313 M22,414 F22,515 F23,216 M23,417 F23,518 M23,619 M23,920 M24,021 M24,122 M24,523 F24,724

110 110 Andaikan, kita pilih sampel dari female, maka jumlah peringkatnya R 1 = R F = 117 Statistic nilai dari statistic U adalah

111 111 Grafik α = 0,05 Z hit = 1,91 maka terima H 0 -1,96 1,96 ARTINYA ???

112 KOLMOGOROV – SMIRNOV TEST 112 The Kolmogorov-Smirnov Test (K-S) test is conducted by the comparing the hypothesized and sample cumulative distribution function. A cumulative distribution function is defined as : and the sample cumulative distribution function, S(x), is defined as the proportion of sample values that are less than or equal to x. The K-S test should be used instead of the to determine if a sample is from a specified continuous distribution. To illustrate how S(x) is computed, suppose we have the following 10 observations : 110, 89, 102, 80, 93, 121, 108, 97, 105, 103.

113 113 We begin by placing the values of x in ascending order, as follows : 80, 89, 93, 97, 102, 103, 105, 108, 110, 121. Because x = 80 is the smallest of the 10 values, the proportion of values of x that are less than or equal to 80 is : S(80) = 0,1. xS(x) = P(X ≤ x) 800,1 890,2 930,3 970,4 1020,5 1030,6 1050,7 1080,8 1100,9 1211,0

114 114 The test statistic D is the maximum- absolute difference between the two cdf’s over all observed values. The range on D is 0 ≤ D ≤ 1, and the formula is where x = each observed value S(x) = observed cdf at x F(x) = hypothesized cdf at x

115 115 Let X (1), X (2), …., X (n) denote the ordered observations of a random sample of size n, and define the sample cdf as : is the proportion of the number of sample values less than or equal to x.

116 116 The Kolmogorov – Smirnov statistic, is defined to be : For the size α of type I error, the critical region is of form :

117 EXAMPLE 1 117 A state vehicle inspection station has been designed so that inspection time follows a uniform distribution with limits of 10 and 15 minutes. A sample of 10 duration times during low and peak traffic conditions was taken. Use the K-S test with α = 0,05 to determine if the sample is from this uniform distribution. The time are : 11,3 10,4 9,8 12,6 14,8 13,0 14,3 13,3 11,5 13,6

118 SOLUTION 118 1. H 0 : sampel berasal dari distribusi Uniform (10,15) versus H 1 : sampel tidak berasal dari distribusi Uniform (10,15) 2. Fungsi distribusi kumulatif dari sampel : S (x) dihitung dari,

119 Hasil Perhitungan dari K-S 119 Waktu Pengamatan x S(x)F(x) 9,80,100,000,10 10,40,200,080,12 11,30,300,260,04 11,50,400,300,10 12,60,500,520,02 13,00,60 0,00 13,30,700,660,04 13,60,800,720,08 14,30,900,860,04 14,81,000,960,04

120 120, untuk x = 10,4 Dalam tabel, n = 10, α = 0,05  D 10,0.05 = 0,41 f(D) α = P(D ≥ D 0 ) D 0 D 0,12 < 0,41 maka do not reject H 0

121 EXAMPLE 2 121 Suppose we have the following ten observations 110, 89, 102, 80, 93, 121, 108, 97, 105, 103 ; were drawn from a normal distribution, with mean µ = 100 and standard-deviation σ = 10. Our hypotheses for this test are H 0 : Data were drawn from a normal distribution, with µ = 100 and σ = 10. versus H 1 : Data were not drawn from a normal distribution, with µ = 100 and σ = 10.

122 SOLUTION 122 F(x) = P(X ≤ x) xF(x) 80 89 93 97 102 103 105 108 110 121 P(X ≤ 80) = P(Z ≤ -2) = 0,0228 P(X ≤ 89) = P(Z ≤ -1,1) = 0,1357 P(X ≤ 93) = P(Z ≤ -0,7) = 0,2420 P(X ≤ 97) = P(Z ≤ -0,3) = 0,3821 P(X ≤ 102) = P(Z ≤ 0,2) = 0,5793 P(X ≤ 103) = P(Z ≤ 0,3) = 0,6179 P(X ≤ 105) = P(Z ≤ 0,5) = 0,6915 P(X ≤ 108) = P(Z ≤ 0,8) = 0,7881 P(X ≤ 110) = P(Z ≤ 1,0) = 0,8413 P(X ≤ 121) = P(Z ≤ 2,1) = 0,9821

123 123 xF(x)S(x) 800,02280,10,0772 890,13570,20,0643 930,24200,30,0580 970,38210,40,0179 1020,57930,50,0793 = 1030,61790,60,0179 1050,69150,70,0085 1080,78810,80,0119 1100,84130,90,0587 1210,98211,00,0179

124 124 Jika α = 0,05, maka critical value, dengan n=10 diperoleh di tabel = 0,409. Aturan keputusannya, tolak H 0 jika D > 0,409 Karena H 0 jangan ditolak atau terima H 0. Artinya, data berasal dari distribusi normal dengan µ = 100 dan σ = 10.

125 LILLIEFORS TEST 125 In most applications where we want to test for normality, the population mean and the population variance are known. In order to perform the K-S test, however, we must assume that those parameters are known. The Lilliefors test, which is quite similar to the K-S test. The major difference between two tests is that, with the Lilliefors test, the sample mean and the sample standard deviation s are used instead of µ and σ to calculate F (x).

126 EXAMPLE 126 A manufacturer of automobile seats has a production line that produces an average of 100 seats per day. Because of new government regulations, a new safety device has been installed, which the manufacturer believes will reduce average daily output. A random sample of 15 days’ output after the installation of the safety device is shown: 93, 103, 95, 101, 91, 105, 96, 94, 101, 88, 98, 94, 101, 92, 95 The daily production was assumed to be normally distributed. Use the Lilliefors test to examine that assumption, with α = 0,01

127 SOLUSI 127 Seperti pada uji K-S, untuk menghitung S (x) urutkan, sbb : xS(x) 881/15 = 0,067 912/15 = 0,133 923/15 = 0,200 934/15 = 0,267 946/15 = 0,400 958/15 = 0,533 969/15 = 0,600 9810/15 = 0,667 10113/15 = 0,867 10314/15 = 0,933 10515/15 = 1,000

128 128 Dari data di atas, diperoleh dan s = 4,85. Selanjutnya F(x) dihitung sbb : xF(x) 88 91 92........ 101 103 105

129 129 Akhirnya, buat rangkuman sbb : Tabel, nilai kritis dari uji Lilliefors : α = 0,01, n = 15 D tab = 0,257 maka terima H 0 xF(x)S(x) 880,04010,0670,0269 910,12920,1330,0038 920,17880,2000,0212 930,23580,2670,0312 940,30500,4000,0950 950,38210,5330,1509 = D 960,46020,6000,1398 980,62550,6670,0415 1010,82380,8670,0432 1030,91150,9330,0215 1050,96081,0000,0392

130 TEST BASED ON RUNS 130 Usually a sample that is taken from a population should be random. The runs test evaluates the null hypothesis H 0 : the order of the sample data is random The alternative hypothesis is simply the negation of H 0. There is no comparable parametric test to evaluate this null hypothesis. The order in which the data is collected must be retained so that the runs may be developed.

131 131 DEFINITIONS : 1. A run is defined as a sequence of the same symbols. Two symbols are defined, and each sequence must contain a symbol at least once. 2. A run of length j is defined as a sequence of j observations, all belonging to the same group, that is preceded or followed by observations belonging to a different group. For illustration, the ordered sequence by the sex of the employee is as follows : F F F M F F F M M F F M M M F F M F M M M M M F For the sex of the employee the ordered sequence exhibits runs of F’s and M’s.

132 132 The sequence begins with a run of length three, followed by a run of length one, followed by another run of length three, and so on. The total number of runs in this sequence is 11. Let R be the total number of runs observed in an ordered sequence of n 1 + n 2 observations, where n 1 and n 2 are the respective sample sizes. The possible values of R are 2, 3, 4, …. (n 1 + n 2 ). The only question to ask prior to performing the test is, Is the sample size small or large? We will use the guideline that a small sample has n 1 and n 2 less than or equal to 15. In the table, gives the lower r L and upper r U values of the distribution f(r) with α/2 = 0,025 in each tail.

133 133 If n 1 or n 2 exceeds 15, the sample is considered large, in which case a normal approximation to f(r) is used to test H 0 versus H 1. f(r) r AR rLrL rUrU

134 134 The mean and variance of R are determined to be normal approximation

135 THE KRUSKAL - WALLIS H TEST 135 The Kruskal – Wallis H test is the nonparametric equivalent of the Analysis of Variance F test. It test the null hypothesis that all k populations possess the same probability distribution against the alternative hypothesis that the distributions differ in location – that is, one or more of the distributions are shifted to the right or left of each other. The advantage of the Kruskall – Wallis H test over the F test is that we need make no assumptions about the nature of sampled populations. A completely randomized design specifies that we select independent random samples of n 1, n 2, …. n k observations from the k populations.

136 136 To conduct the test, we first rank all : n = n 1 + n 2 + n 3 + … +n k observations and compute the rank sums, R 1, R 2, …, R k for the k samples. The ranks of tied observations are averaged in the same manner as for the WILCOXON rank sum test. Then, if H 0 is true, and if the sample sizes n 1, n 2, …, n k each equal 5 or more, then the test statistic is defined by : will have a sampling distribution that can be approximated by a chi- square distribution with (k-1) degrees of freedom. Large values of H imply rejection of H 0.

137 137 Therefore, the rejection region for the test is, where is the value that located α in the upper tail of the chi- square distribution. The test is summarized in the following :

138 KRUSKAL – WALLIS H TEST FOR COMPARING k POPULATION PROBABILITY DISTRIBUTIONS 138 H 0 : The k population probability distributions are identical H 1 : At least two of the k population probability distributions differ in location Test statistic : where, n i = Number of measurements in sample i R i = Rank sum for sample i, where the rank of each measurement is computed according to its relative magnitude in the totality of data for the k samples. H 0 : The k population probability distributions are identical H 1 : At least two of the k population probability distributions differ in location Test statistic : where, n i = Number of measurements in sample i R i = Rank sum for sample i, where the rank of each measurement is computed according to its relative magnitude in the totality of data for the k samples.

139 139 n = Total sample size = n 1 + n 2 + … +n k Rejection Region : with (k-1) dof Assumptions : 1. The k samples are random and independent 2. There are 5 or more measurements in each sample 3. The observations can be ranked No assumptions have to be made about the shape of the population probability distributions. n = Total sample size = n 1 + n 2 + … +n k Rejection Region : with (k-1) dof Assumptions : 1. The k samples are random and independent 2. There are 5 or more measurements in each sample 3. The observations can be ranked No assumptions have to be made about the shape of the population probability distributions.

140 Example 140 Independent random samples of three different brands of magnetron tubes (the key components in microwave ovens) were subjected to stress testing, and the number of hours each operated without repair was recorded. Although these times do not represent typical life lengths, they do indicate how well the tubes can withstand extreme stress. The data are shown in table (below). Experience has shown that the distributions of life lengths for manufactured product are often non normal, thus violating the assumptions required for the proper use of an ANOVA F test. Use the K-S H test to determine whether evidence exists to conclude that the brands of magnetron tubes tend to differ in length of life under stress. Test using α = 0,05

141 141 BRAND A B C 36 49 71 48 33 31 5 60 140 67 2 59 53 55 42

142 Solusi 142 Lakukan ranking/peringkat dan jumlahkan peringkat dari 3 sample tersebut. H 0 : the population probability distributions of length of life under stress are identical for the three brands of magnetron tubes. versus H 1 : at least two of the population probability distributions differ in location A peringkat B peringkat C peringkat 36 5 49 8 71 14 48 7 33 4 31 3 5 2 60 12140 15 67 13 2 1 59 11 53 9 55 10 42 6 R 1 = 36 R 2 = 35 R 3 = 49

143 143 Test statistic : H 0 ??? f(H) H 1,225,99


Download ppt "THE MULTINOMIAL DISTRIBUTION AND ELEMENTARY TESTS FOR CATEGORICAL DATA It is useful to have a probability model for the number of observations falling."

Similar presentations


Ads by Google