Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.

Chi-Två Test Kapitel 6

Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment. –A contingency table test of independence. Both tests use the  2 as the sampling distribution of the test statistic.

Chi-squared Goodness-of-Fit Test We test whether there is sufficient evidence to reject a pre-specified set of values for p i. The hypothesis: The test builds on comparing actual frequency and the expected frequency of occurrences in all the cells.

Example –Two competing companies A and B have enjoy dominant position in the market. The companies conducted aggressive advertising campaigns. –Market shares before the campaigns were: Company A = 45% Company B = 40% Other competitors = 15%. The multinomial goodness of fit test - Example

Example 16.1 – continued –To study the effect of the campaign on the market shares, a survey was conducted. The multinomial goodness of fit test - Example –200 customers were asked to indicate their preference regarding the product advertised. –Survey results: 102 customers preferred the company A’s product, 82 customers preferred the company B’s product, 16 customers preferred the competitors product.

The multinomial goodness of fit test - Example Example – continued Can we conclude at 5% significance level that the market shares were affected by the advertising campaigns?

Solution –The population investigated is the brand preferences. –The data are nominal (A, B, or other) –This is a multinomial experiment (three categories). –The question of interest: Are p 1, p 2, and p 3 different after the campaign from their values before the campaign? The multinomial goodness of fit test - Example

The hypotheses are: H 0 : p 1 =.45, p 2 =.40, p 3 =.15 H 1 : At least one p i changed. The expected frequency for each category (cell) if the null hypothesis is true is shown below: 90 = 200(.45) 30 = 200(.15) 10282 16 What actual frequencies did the sample return? The multinomial goodness of fit test - Example 80 = 200(.40)

The statistic is The rejection region is The multinomial goodness of fit test - Example

Example 16.1 – continued

The multinomial goodness of fit test - Example Example – continued Conclusion: Since 8.18 > 5.99, there is sufficient evidence at 5% significance level to reject the null hypothesis. At least one of the probabilities p i is different. Thus, at least two market shares have changed. P valueAlpha 5.998.18 Rejection region  2 with 2 degrees of freedom

Required conditions – the rule of five The test statistic used to perform the test is only approximately Chi-squared distributed. For the approximation to apply, the expected cell frequency has to be at least 5 for all the cells (np i  5). If the expected frequency in a cell is less than 5, combine it with other cells.

Chi-squared Test of a Contingency Table This test is used to test whether… –two nominal variables are related? –there are differences between two or more populations of a nominal variable To accomplish the test objectives, we need to classify the data according to two different criteria.

Contingency table  2 test – Example Example –In an effort to better predict the demand for courses offered by a certain MBA program, it was hypothesized that students’ academic background affect their choice of MBA major, thus, their courses selection. –A random sample of last year’s MBA students was selected. The following contingency table summarizes relevant data.

Contingency table  2 test – Example There are two ways to address the problem If each undergraduate degree is considered a population, do these populations differ? If each classification is considered a nominal variable, are these two variables dependent? The observed values

Solution –The hypotheses are: H 0 : The two variables are independent H 1 : The two variables are dependent k is the number of cells in the contingency table. –The test statistic – The rejection region Contingency table  2 test – Example Since e i = np i but p i is unknown, we need to estimate the unknown probability from the data, assuming H 0 is true.

Under the null hypothesis the two variables are independent: P(Accounting and BA) = P(Accounting)*P(BA) UndergraduateMBA Major DegreeAccountingFinanceMarketingProbability BA 6060/152 BENG 3131/152 BBA 3939/152 Other 2222/152 614447152 Probability61/15244/15247/152 The number of students expected to fall in the cell “Accounting - BA” is e Acct-BA = n(p Acct-BA ) = 152(61/152)(60/152) = [61*60]/152 = 24.08 = [61/152][60/152]. 60 61 152 The number of students expected to fall in the cell “Finance - BBA” is e Finance-BBA = np Finance-BBA = 152(44/152)(39/152) = [44*39]/152 = 11.29 44 39 152 Estimating the expected frequencies

The expected frequencies for a contingency table e ij = (Column j total)(Row i total) Sample size The expected frequency of cell of raw i and column j in the contingency table is calculated by

UndergraduateMBA Major DegreeAccountingFinanceMarketing BA 31 (24.08)13 (17.37)16 (18.55)60 BENG 8 (12.44)16 (8.97) 7 (9.58)31 BBA 12 (15.65)10 (11.29)17 (12.06)39 Other 10 (8.83) 5 (6.39) 7 (6.80)22 614447152 The expected frequency 31 24.08 (31 - 24.08) 2 24.08 +….+ 5 6.39 (5 - 6.39) 2 6.39 +….+ 7 6.80 (7 - 6.80) 2 6.80 7 6.80 2=2= = 14.70 Calculation of the  2 statistic Solution – continued

Contingency table  2 test – Example Conclusion: Since  2 = 14.70 > 12.5916, there is sufficient evidence to infer at 5% significance level that students’ undergraduate degree and MBA students courses selection are dependent. Solution – continued – The critical value in our example is:

Code : Undergraduate degree 1 = BA 2 = BENG 3 = BBA 4 = OTHERS MBA Major 1 = ACCOUNTING 2 = FINANCE 3 = MARKETING Select the Chi squared / raw data Option from Data Analysis Plus under tools. See Xm16-02Xm16-02 Define a code to specify each nominal value. Input the data in columns one column for each category. Using the computer

Required condition Rule of five –The  2 distribution provides an adequate approximation to the sampling distribution under the condition that e ij >= 5 for all the cells. –When e ij < 5 rows or columns must be added such that the condition is met. 4 (5.1) 7 (6.3) 4 (3.6) 18 (17.9) 23 (22.3) 12 (12.8) Example 14 + 4 12.8 + 5.1 16 + 7 16 + 6.3 8 + 4 9.2 + 3.6 We combine column 2 and 3

Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.

Similar presentations

Presentation on theme: "Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment.

Similar presentations

Presentation on theme: "Chi-Två Test Kapitel 6. Introduction Two statistical techniques are presented, to analyze nominal data. –A goodness-of-fit test for the multinomial experiment."— Presentation transcript:

Similar presentations

About project

Feedback