MBA Statistics 51-651-00 COURSE #3 Is there a link? Quantitative data analysis.

Slides:



Advertisements
Similar presentations
What is Chi-Square? Used to examine differences in the distributions of nominal data A mathematical comparison between expected frequencies and observed.
Advertisements

Basic Statistics The Chi Square Test of Independence.
Chapter 13: The Chi-Square Test
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
CJ 526 Statistical Analysis in Criminal Justice
Chi-square Test of Independence
Crosstabs and Chi Squares Computer Applications in Psychology.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Diversity and Distribution of Species
Cross-Tabulations.
1 Nominal Data Greg C Elvers. 2 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 13 Chi-Square Tests. The chi-square test for Goodness of Fit allows us to determine whether a specified population distribution seems valid. The.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics, A First Course 4 th Edition.
CJ 526 Statistical Analysis in Criminal Justice
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Chi-Square Test of Independence Practice Problem – 1
Chapter 11: Applications of Chi-Square. Count or Frequency Data Many problems for which the data is categorized and the results shown by way of counts.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Chi-Square. All the tests we’ve learned so far assume that our data is normally distributed z-test t-test We test hypotheses about parameters of these.
Correlation Patterns.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
Chapter 16 The Chi-Square Statistic
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 16 Chi-Squared Tests.
+ Chi Square Test Homogeneity or Independence( Association)
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Essential Statistics Chapter 161 Review Part III_A_Chi Z-procedure Vs t-procedure.
Chi-square Test of Independence
Chi-Square Test James A. Pershing, Ph.D. Indiana University.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Copyright © 2010 Pearson Education, Inc. Slide
Comparing Counts.  A test of whether the distribution of counts in one categorical variable matches the distribution predicted by a model is called a.
Inference for Distributions of Categorical Variables (C26 BVD)
Reasoning in Psychology Using Statistics Psychology
Chapter Outline Goodness of Fit test Test of Independence.
The table shows a random sample of 100 hikers and the area of hiking preferred. Are hiking area preference and gender independent? Hiking Preference Area.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
July, 2000Guang Jin Statistics in Applied Science and Technology Chapter 12. The Chi-Square Test.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
AGENDA:. AP STAT Ch. 14.: X 2 Tests Goodness of Fit Homogeniety Independence EQ: What are expected values and how are they used to calculate Chi-Square?
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Click to edit Master text styles Second level Third level Fourth level Fifth level Test of Categorical Data / Proportion 1.
Week 6 Dr. Jenne Meyer.  Article review  Rules of variance  Keep unaccounted variance small (you want to be able to explain why the variance occurs)
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
Chi-square Test of Independence. The chi-square test of independence is probably the most frequently used hypothesis test in the social sciences. In this.
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.
Copyright © 2009 Pearson Education, Inc LEARNING GOAL Interpret and carry out hypothesis tests for independence of variables with data organized.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 9 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Chapter 9: Non-parametric Tests
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Chapter 11 Chi-Square Tests.
Quadrat sampling & the Chi-squared test
Contingency Tables (cross tabs)
Quadrat sampling & the Chi-squared test
Presentation transcript:

MBA Statistics COURSE #3 Is there a link? Quantitative data analysis

Qualitative data analysis Example: The human resources department of a large multinational enterprise carried out a study on the satisfaction level of the employees with respect to their jobs. A total of 527 employees took part in this study.

Here are the results obtained presented in a table format: JOBS(jobs) SATIS(satisfaction) Frequency |unsatisfied|satisfied| Total | | | professional | 17 | 62 | | | | white collar worker| 50 | 112 | | | | blue collar worker | 99 | 187 | | | | Total

Question: Is there a link between the type of employment and the level of satisfaction in this company? The « type of jobs » variable is a three level qualitative variable i.e. with three categories. In this example, the « satisfaction » variable is also qualitative and with two levels.

It is easier to answer the question, in a descriptive way, with percentages: Frequency % | SATIS(satisfaction) % line | % column |unsatisfied|satisfied| Total | | | professional | 17 | 62 | 79 | 3.23 | | | | | | | | Type | | | of white collar worker| 50 | 112 | 162 Job | 9.49 | | | | | | | | | | | blue collar worker | 99 | 187 | 286 | | | | | | | | | | | | Total

The frequency tables allow: to summarize and present the information to describe the presence or the absence of a link between two qualitative variables (nominal and/or ordinal) to check, by using the hypothesis test, if there is a statistically signifiant link between two qualitative variables.

The two possible hypotheses we want to examine are: H 0 : There is no link between the two qualitative variables i.e. the two variables are independent H 1 : There is a link between the two qualitative variables i.e. the two variables are dependent When two variables are independent, their distribution of percentages per category is similar.

To illustrate the concept of independence testing between two qualitative variables, let’s take our previous example and suppose that we have the following numbers to make calculation easier : JOBS(jobs) SATIS(satisfaction) Frequency |unsatisfied|satisfied| Total | | | professional | 0 | 100 | | | | white collar worker| 100 | 200 | | | | blue collar worker | 300 | 300 | | | | Total

The ditribution of percentages is: JOBS(jobs) SATIS(satisfaction) Frequency | % | % line | % column |unsatisfied|satisfied| Total | | | professional | 0 | 100 | 100 | 0.00 | | | 0.00 | | | 0.00 | | | | | white collar worker| 100 | 200 | 300 | | | | | | | | | | | | blue collar worker | 300 | 300 | 600 | | | | | | | | | | | | Total

In the previous table, the two variables are dependent because: For each type of job, the employees’satisfaction distribution is different. Indeed, 100% of the professionals are satisfied compared to 67% of the white collar workers and only 50% of the blue collar workers (line %); Or, for each category of satisfaction, the type of job distribution is different. Indeed, among the unsatisfied, 0% are professionals, 25% are white collar workers and 75% are blue collar workers, compared to 17%, 33% and 50% respectively in the satisfied groups (% column ).

In the case where the two variables would be completely independent| in the cells table, we would have the following frequencies (note: the lines and columns totals are unchanged): JOBS(jobs) SATIS(satisfaction) Frequency |unsatisfied|satisfied| Total | | | professional | 40 | 60 | | | | white collar worker| 120 | 180 | | | | blue collar worker | 240 | 360 | | | | Total

The distribution of percentages is: JOBS(jobs) SATIS(satisfaction) Frequency | % | % line | % column |unsatisfied|satisfied| Total | | | professional | 40 | 60 | 100 | 4.00 | 6.00 | | | | | | | | | | white collar worker| 120 | 180 | 300 | | | | | | | | | | | | blue collar worker | 240 | 360 | 600 | | | | | | | | | | | | Total

In the previous table, the two variables are independent because: For each type of job, the employees’ satisfaction distribution is the same i.e. 60% of the employees are satisfied and 40% are unsatisfied (line % ). Or, for each category of satisfaction, the type of job distribution is the same, i.e. 10% are professionals, 30% are white collar workers and 60% are blue collar workers (column %).

The ij cells of the previous table are composed of « theoretical » frequencies, i.e. the frequencies we should have if the two variables were perfectly independent. If the hypothesis of independence is true, the theoretical frequencies for each crossed table cell are : f theo ij cell = (total row i) x (total column j) / total

Testing the independence between two qualitative variables is the same as testing the difference between observed frequencies and theoretical frequencies. If the two variables are independent, the observed frequencies should be close to the theoretical frequencies. The test statistic is given by:  2 obs = sum [(f obs -f theo ) 2 /f theo ]

We will reject the hypothesis of independence if the value of the  2 obs statistic is large. The calculation of the threshold (p-value) is done using the Chi-square probability distribution with the number of degrees of freedom given by : (#lines-1) x (#columns-1) in the Table Note: This test is only valid for large samples, i.e. when all the theoretical frequencies are  5 (or nearly). We can demonstrate that 0   2 obs  n(m-1), where m=minimum (# lines, # columns).

The value of the  2 obs statistic is 0 when the two variables are perfectly independent. It reaches its superior limit when a functional dependence binds one of its variables to the other.

Example: independence JOBS(JOBS) SATIS(satisfaction) Frequency % line |unsatisfied|satisfied| Total | | | professional | 40 | 60 | 100 | | | | | | white collar worker| 120 | 180 | 300 | | | | | | blue collar worker | 240 | 360 | 600 | | | | | | Total Statistic DF Value Prob Chi-square n = 1000

Example: dependence (functional link) JOBS(jobs) SATIS(satisfaction) Frequency | | % line |unsatisfied|satisfied| Total | | | professional | 0 | 100 | 100 | 0.00 | | | | | white collar worker| 0 | 300 | 300 | 0.00 | | | | | blue collar worker | 600 | 0 | 600 | | 0.00 | | | | Total Statistic DF value Prob chi-square n = 1000

Example: JOBS (jobs) SATIS(satisfaction) Obs. frequency | Theo.frequency | % | % line | % column |unsatisfied|satisfied| Total | | | professional | 17 | 62 | 79 | | | | 3.23 | | | | | | | | | | | white collar worker| 50 | 112 | 162 | | | | 9.49 | | | | | | | | | | | blue collar worker | 99 | 187 | 286 | | | | | | | | | | | | | | | Total

Results of the statistical test using CT.xls: CT.xls Thus, we will not reject the hypothesis of independence at the  =5% level, because the « p-value » is > 5%.

What happens to the « p-value » if the size of the sample increases but the distributions are the same? JOBS(jobs) SATIS(satisfaction) Obs. frequency | Theo.frequency | % | % line | % column | | |unsatisfied|satisfied| Total | | | professional | 34 | 124 | 158 | | | | 3.23 | | | | | | | | | | | white coller worker| 100 | 224 | 324 | | | | 9.49 | | | | | | | | | | | blue coller worker | 198 | 374 | 572 | | | | | | | | | | | | | | | Total

Results of the statistical test: Thus, we will reject the hypothesis of independence at the  =5% level because the « p-value » is < 5%!!

2x2 tables: test of the difference between two proportions In two neighbouring municipalities, we carried out a survey to obtain the opinion of the taxpayers on the location of a garbage dump site. If the proportion of taxpayers in favour is significantly higher in one municipality than in the other then the site will probably develop in that municipality. In municipality 1, a sample of 130 individuals answered the survey and 84 were in favour (64.6%) while in municipality 2, 124 individuals answered and 62 were in favour (50%).

Equivalent formulations of the problem: 1.H 0 : p 1 = p 2 vs H 1 : p 1  p 2 (two-tailed test) 2.Is there a link between the municipality variable and the opinion on the location of a garbage dump site? H 0 : independence between municipality and opinion vs H 1 : dependence between municipality and opinion

Using CT.xls, one obtains: One can reject H 0. The 2 proportions are significantly different.