Chi-Square and Analysis of Variance (ANOVA) Lecture 9.

Chi-Square and Analysis of Variance (ANOVA) Lecture 9

The Chi-Square Distribution and Test for Independence Hypothesis testing between two or more categorical variables

Chi-square Test of Independence Tests the association between two nominal (categorical) variables. Null Hyp: The 2 variables are independent. Null Hyp: The 2 variables are independent. Its really just a comparison between expected frequencies and observed frequencies among the cells in a crosstabulation table.

YesNoTotal Males 46 (40.97)71 (76.02)117 Females 37 (42.03)83(77.97)120 Total 83154237 Example Crosstab: gender x binary question

Degrees of freedom Chi-square degrees of freedom df = (r-1) (c-1) df = (r-1) (c-1) Where r = # of rows, c = # of columns Thus, in any 2x2 contingency table, the degrees of freedom = 1. As the degrees of freedom increase, the distribution shifts to the right and the critical values of chi-square become larger.

Chi-Square Distribution The chi-square distribution results when independent variables with standard normal distributions are squared and summed. The chi-square distribution results when independent variables with standard normal distributions are squared and summed.

Requirements for Chi-Square test Must be a random sample from population Data must be in raw frequencies Variables must be independent Categories for each I.V. must be mutually exclusive and exhaustive

Using the Chi-Square Test Often used with contingency tables (i.e., crosstabulations) E.g., gender x race E.g., gender x race Basically, the chi-square test of independence tests whether the columns are contingent on the rows in the table. In this case, the null hypothesis is that there is no relationship between row and column frequencies. In this case, the null hypothesis is that there is no relationship between row and column frequencies.

Practical Example: Expected frequencies versus observed frequencies General Social Survey Example

ANOVA and the f-distribution Hypothesis testing between a 3+ category variable and a metric variable

Analysis of Variance In its simplest form, it is used to compare means for three or more categories. Example: Example: Life Happiness scale and Marital Status (married, never married, divorced) Relies on the F-distribution Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df. Just like the t-distribution and chi-square distribution, there are several sampling distributions for each possible value of df.

What is ANOVA? If we have a categorical variable with 3+ categories and a metric/scale variable, we could just run 3 t-tests. The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). The problem is that the 3 tests would not be independent of each other (i.e., all of the information is known). A better approach: compare the variability between groups (treatment variance + error) to the variability within the groups (error)

The F-ratio MS = mean square bg = between groups wg = within groups Numerator is the “effect” and denominator is the “error” df = # of categories – 1 (k-1)

Between-Group Sum of Squares (Numerator) Total variability – Residual Variability Total variability is quantified as the sum of the squares of the differences between each value and the grand mean. Also called the total sum-of-squares Also called the total sum-of-squares Variability within groups is quantified as the sum of squares of the differences between each value and its group mean Also called residual sum-of-squares Also called residual sum-of-squares

Null Hypothesis in ANOVA If there is no difference between the means, then the between-group sum of squares should = the within-group sum of squares.

F-distribution F-test is always a one-tailed test. Why? Why?

Logic of the ANOVA Conceptual Intro to ANOVA Conceptual Intro to ANOVA

Bringing it all together: Choosing the appropriate bivariate statistic

Reminder About Causality Remember from earlier lectures: bivariate statistics do not test causal relationships, they only show that there is a relationship. Even if you plan to use more sophisticated causal tests, you should always run simple bivariate statistics on your key variables to understand their relationships.

Choosing the Appropriate Statistical Test General rules for choosing a bivariate test: Two categorical variables Two categorical variables Chi-Square (crosstabulations) Two metric variables Two metric variablesCorrelation One 3+ categorical variable, one metric variable One 3+ categorical variable, one metric variable ANOVA ANOVA One binary categorical variable, one metric variable One binary categorical variable, one metric variableT-test

Assignment #2 Online (course website) course websitecourse website Due next Monday in class (April 10 th )

Chi-Square and Analysis of Variance (ANOVA) Lecture 9.

Similar presentations

Presentation on theme: "Chi-Square and Analysis of Variance (ANOVA) Lecture 9."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chi-Square and Analysis of Variance (ANOVA) Lecture 9.

Similar presentations

Presentation on theme: "Chi-Square and Analysis of Variance (ANOVA) Lecture 9."— Presentation transcript:

Similar presentations

About project

Feedback