Intermediate Applied Statistics STAT 460 Lecture 20, 11/19/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu

Slides:



Advertisements
Similar presentations
CHAPTER 23: Two Categorical Variables: The Chi-Square Test
Advertisements

Hypothesis Testing IV Chi Square.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Statistics 303 Chapter 9 Two-Way Tables. Relationships Between Two Categorical Variables Relationships between two categorical variables –Depending on.
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Inferences About Process Quality
Chapter 8 Introduction to Hypothesis Testing
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
1 Chapter 20 Two Categorical Variables: The Chi-Square Test.
Presentation 12 Chi-Square test.
Chapter 10 Analyzing the Association Between Categorical Variables
How Can We Test whether Categorical Variables are Independent?
Categorical Data Prof. Andy Field.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
©2011 Brooks/Cole, Cengage Learning Elementary Statistics: Looking at the Big Picture 1 Lecture 33: Chapter 12, Section 2 Two Categorical Variables More.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Two Way Tables and the Chi-Square Test ● Here we study relationships between two categorical variables. – The data can be displayed in a two way table.
Chapter 26 Chi-Square Testing
Analysis of Two-Way tables Ch 9
Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Statistical Testing of Differences CHAPTER fifteen.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
CHI SQUARE TESTS.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests Business Statistics: A First Course Fifth Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 11-1 Chapter 11 Chi-Square Tests and Nonparametric Tests Statistics for.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Stat1510: Statistical Thinking and Concepts Two Way Tables.
Two-Way Tables Categorical Data. Chapter 4 1.  In this chapter we will study the relationship between two categorical variables (variables whose values.
1 Chapter 11: Analyzing the Association Between Categorical Variables Section 11.1: What is Independence and What is Association?
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))}
Chapter Outline Goodness of Fit test Test of Independence.
Intermediate Applied Statistics STAT 460 Lecture 18, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Chapter 6 Two-Way Tables BPS - 5th Ed.Chapter 61.
© Copyright McGraw-Hill 2004
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Intermediate Applied Statistics STAT 460 Lecture 23, 12/08/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Comparing Counts Chapter 26. Goodness-of-Fit A test of whether the distribution of counts in one categorical variable matches the distribution predicted.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
+ Section 11.1 Chi-Square Goodness-of-Fit Tests. + Introduction In the previous chapter, we discussed inference procedures for comparing the proportion.
Nonparametric Statistics
Slide 1 Copyright © 2004 Pearson Education, Inc. Chapter 11 Multinomial Experiments and Contingency Tables 11-1 Overview 11-2 Multinomial Experiments:
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. Statistical Significance for 2 x 2 Tables Chapter 13.
The Chi-square Statistic
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Chapter 12 Chi-Square Tests and Nonparametric Tests
Statistics 200 Lecture #7 Tuesday, September 13, 2016
Presentation 12 Chi-Square test.
Chapter 11 Chi-Square Tests.
Two-Way Tables and The Chi-Square Test*
Hypothesis Testing Review
Data Analysis for Two-Way Tables
The Chi-Square Distribution and Test for Independence
Is a persons’ size related to if they were bullied
Chapter 10 Analyzing the Association Between Categorical Variables
Chapter 11 Chi-Square Tests.
Analyzing the Association Between Categorical Variables
Chapter 11 Chi-Square Tests.
Presentation transcript:

Intermediate Applied Statistics STAT 460 Lecture 20, 11/19/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu

Revised schedule Nov 8 lab on 2-way ANOVANov 10 lecture on two-way ANOVA and blocking Post HW9 Nov 12 lecture repeated measure and review Nov 15 lab on repeated measuresNov 17 lecture on categorical data/logistic regression HW9 due Post HW10 Nov 19 lecture on categorical data/logistic regression Nov 22 lab on logistic regression & project II introduction No class Thanksgiving No class Thanksgiving Nov 29 labDec 1 lecture HW10 due Post HW11 Dec 3 lecture and Quiz Dec 6 labDec 8 lecture HW 11 due Dec 10 lecture & project II due Dec 13 Project II due

Last lecture  Categorical Data

This lecture  Categorical Data/Response (ch. 18,19,20)  Odds

Review: Categorical Variable  Notation: Population proportion =  = sometimes we use p Population size = N Sample proportion = = X/n = # with trait / total # Sample size = n  The Rule for Sample Proportions If numerous samples of size n are taken, the frequency curve of the sample proportions ( ‘s) from the various samples will be approximately normal with the mean  and standard deviation ~ N( ,  (1-  )/n )

One-sample approximate z test and z-interval for π.

These tests can be extended to test the difference in parameters π between two groups.

Difference between proportions These tests can be extended to test the difference in parameters π between two groups.

Warning: z-tests for proportions are based on an approximation. They don’t work for small samples. It is often said that n is large enough if Because of improved computing power, an exact test based on the binomial distribution rather than the normal is now available in most software.

Analysis Grid (ref. Handout)  Quantitative Explanatory Discrete Explanatory Both Quantitative Outcome RegressionANOVARegression (ANCOVA) Discrete Outcome Logistic Regression Chi-Square Test of Independence Logistic Regression

Contingency Table  A statistical tool for summarizing and displaying results for categorical variables  A two-way table if for two categorical variables  2x2 Table, for two categorical variables, each with two categories  Place the counts of each combination of the two variables in the appropriate cells of the table.  Exploratory variable as labels for the rows, response variable as labels for the columns.

Example  A university offers only two degree programs: English and Computer Science. Admission is competitive and there is a suspicion of discrimination against women in the admission process. Here is a two-way table of all applicants by sex and admission status:  These data show an association between the sex of the applicants and their success in obtaining admission. MaleFemaleTotal Admit Deny Total

Marginal & Conditional Distributions  Marginal Distributions: Exploratory Variable: add up values for the rows; take away response variable  In our example distribution is: 55, 85, 140  Observed proportions: ‘admit’ = 55/140 = 0.39 ‘deny’ = 85/140 = 0.61 NOTE: they add up to 1 Response Variable: add up values for the columns; take away exploratory variable  In our example distribution is?  Observed proportions are:  Do they add up to 1?

Marginal & Conditional Distributions  Conditional Distribution: Conditional percentages; what percent of a particular row or a column a count in a cell is. Conditional distribution of gender for those admitted:  % of admitted who are male = 35/55 = 0.63 = 63%  % of admitted who are female = ? What is:  % of male applicants admitted = ?  % of female applicants admitted = ?

Statistical Significance  An observed relationship is statistically significant if the chances of observing the relationship in the sample when there is no actual relationship in the population are small (usually less than 5%)  In other words, a relationship is statistically significant if that relationship is stronger than 95% of the relationships we would expect to see just by chance.  If we say that there was no statistically significant relationship found, that does not mean that there is no relationship at all!  Warnings: If a sample size is small, strong relationships may not achieve significance If a sample size is large, even minor relationships could achieve significance but these might not then have practical importance

Chi-Squared Test (  2 Test)  A Chi-Squared Test for independence  The Chi-Squared Statistics (  2 ) for contingency table. Follows  2 distribution  Skewed to the right  Min = 0, Max = infinity As the strength of observed relationship in the sample increase, the statistic increases. It combines info about a strength of the relationship and the sample size into a one number Can be calculated for any size contingency table For 2 x 2 table: if  2 > 3.84 then we have a statistically significant relationship  We either show (  2 > 3.84) or fail to show significant relationship (if  ) or fail to reject (  2 < 3.84) the claim of independence between two variables that is our null hypothesis.  H 0 : variables are independentH A : variabls are NOT independent

22  The chi-squared distribution with k-1 degrees of freedom acts as though it was the sum the squares of k-1 independent Normal(0,1) distributions. (Not that you need to know.)  See table on pages in textbook.

You Must Know:  How to calculate  2 statistic Compute the expected numbers Compare the expected and observed numbers Compute the  2 statistic  How to compare it to 3.84 for 2x2 tables  How to make proper conclusion about statistical relationship and in general about the question of interest for any two-way and k-way tables.

For our example:  Computing  2 statistic: Expected number = the number of counts (individuals) that we expect to fall in a particular cell = (row total)(column total)/(table total)  Expected number of admitted male students = (55 x 80)/140 =  Expected number of admitted female students = ? Observed number = the number of counts in the cell  Observed number of admitted male students = 35  Observed number of admitted female students = ? Compare the observed and expected number : ( observed – expected) 2 /(expected number) For male students: ( ) 2 /(31.42) = 0.41 For female students: = ? Compute the statistic = Sum all the above calculated numbers for all the cells  In our case  2 = 1.58  Compare it to 3.84  Is it statistically significant? Are admission decisions independent of the gender?

Relative Risk, Increased Risk, Odds Ratio  Quantifications of the chances of a particular outcome and how do these chances change  What are the chances that a randomly selected individual would fall into a particular category for a categorical variable.  There are two basic ways to express these chances: Proportions = expressing one category as a proportion of the total  Proportion of admitted students who are female = 20/55 = 0.36 Odds = comparing one category to another  Odds of being admitted = 55 to 85 = 55/85 to 1

Expressing Proportions & Odds  There are 4 equivalent ways to express proportions: Percent = Proportion = Probability = Risk  36% (percent) of all admitted students are females  The proportion of females admitted is 0.36  The probability that a female would be admitted is 0.36  The risk for a female to be admitted is 0.36  Odds = expressed by reducing the numbers with and without a characteristic we are interested in to the smallest possible whole number: The odds of being admitted = 55 to 85 = 7 to 11 = 7/11 to 1  Going back and forth between proportions and odds: If the proportion has value p then the odds are:  /(1-  ) to 1 If the odds of having a characteristic are a to b, then the proportion with the characteristic is a/(a+b)

Generalized forms for the expressions:  Percentage with the characteristic = (number with the characteristic/total) x 100%  Proportion with the characteristic = (number with the characteristic/total)  Probability of having the characteristics = (number with the characteristic/total)  Risk of having the characteristic = (number with the characteristic/total)  Odds of having the characteristic = (number with the characteristic/number without characteristics) to 1  =  /(1-  )

Types of Risk: Relative risk & Increased Risk  Relative risk = the ratio of the risks for each category of the exploratory variable Relative risk of being a female based on whether you are rejected or accepted:  Risk for being rejected if you are female = 40/85 = 0.47  Risk of being accepted if you are female = 20/55 = 0.36  Relative risk = 0.47/0.36 = 1.31 to 1 What does this mean? What does a relative risk of 1 mean?  Increased Risk = usually, the percent increase in risk Increased risk = (change in risk/original risk) x 100%  Change in risk = 0.47 – 0.36 = 0.11  Original risk = Baseline risk = 0.36  Increased risk = 0.11/0.36x 100% = 0.31 = 31% There is a 23% increase in the chances of females to be rejected Increased risk = (relative risk – 1.0) x 100%  Increased risk = (1.31 – 1.0) x 100% = 31%

Odds Ratio  First calculate the odds of having a characteristic versus not having it: Odds for female being admitted = 20/35 = Odds for female being rejected = 40/45=  Then take the ratio of these odds: Odds ratio = / = Not too close to 1.31, but sometimes it can be close to relative risk  Odds ratio = (upper left * lower right)/(upper right * lower left) Sometimes you need to reverse denominator and numerator so that the ratio is greater than 1 (easier to interpret)

Misleading items about Risk/Odds  The baseline risk is missing  The time period of the risk is not identified  The reported risk is not necessarily your risk (relative risk vs. your risk)  Retrospective vs. Prospective study Prospective: take a random sample and record success and failure in the future Retrospective: take a random sample and record success and failure that happened in the past In retrospective study you can meaningfully interpret odds ratio, but not individual odds

Simpson’s Paradox  Lurking variable = A variable that changes the nature of association even reverses direction of relationship between two other variables.  A nature of association changes due to a lurking variable  In our example we didn’t consider type of a program (major) as a variable. What happens if we do, and if construct two separate tables, one for each major?

Example of Simpson’s Paradox  Computer Science admits each 50% of males and females  English takes ¼ of both males and females  Now there doesn’t seem to be an association between sex and admission decision in either program  Hence, type of program was a lurking variable Computer Science MaleFemale Admit3010 Deny3010 Total6020 English MaleFemale Admit510 Deny1530 Total2040

Commands in SAS  To create contingency tables, calculate chi-square statistic, etc… Statistics/Table Analysis  To run the logistic regression Statistics/Regression/Logistic

Next  Lab Monday Categorical Data,  Logistic Regression -- we will work through the lab together and learn about logistic regression  Project II