Cross-Tabulation Tables Tables in R and Computing Chi Square.

Slides:



Advertisements
Similar presentations
Overview of Lecture Parametric vs Non-Parametric Statistical Tests.
Advertisements

Chapter 18: The Chi-Square Statistic
CHI-SQUARE(X2) DISTRIBUTION
Association Between Two Variables Measured at the Nominal Level
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
PSY 340 Statistics for the Social Sciences Chi-Squared Test of Independence Statistics for the Social Sciences Psychology 340 Spring 2010.
Data analysis Incorporating slides from IS208 (© Yale Braunstein) to show you how 208 and 214 are telling you many of the the same things; and how to use.
Is used when we have categorical (nominal) rather than interval / ratio data can also be used for measurement data, is less powerful and than typical tests.
Cross-Tabulations.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Testing for a Relationship Between 2 Categorical Variables The Chi-Square Test …
Association between Variables Measured at the Nominal Level.
Hypothesis Testing for Ordinal & Categorical Data EPSY 5245 Michael C. Rodriguez.
Organizing Your Data for Statistical Analysis in SPSS
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Cross Tabulation Statistical Analysis of Categorical Variables.
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
Ch2: Exploring Data: Charts 13 Sep 2011 BUSI275 Dr. Sean Ho HW1 due Thu 10pm Download and open “SportsShoes.xls”SportsShoes.xls.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Research Design 10/16/2012. Readings Chapter 3 Proposing Explanations, Framing Hypotheses, and Making Comparisons (pp ) Chapter 5 Making Controlled.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
CHI SQUARE TESTS.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 12.3.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
1/5/2016Slide 1 We will use a one-sample test of proportions to test whether or not our sample proportion supports the population proportion from which.
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Chi Square Tests Chapter 17. Assumptions for Parametrics >Normal distributions >DV is at least scale >Random selection Sometimes other stuff: homogeneity,
Chapter 12 Chi-Square Tests and Nonparametric Tests.
Data & Graphing vectors data frames importing data contingency tables barplots 18 September 2014 Sherubtse Training.
What is a Spreadsheet? A spreadsheet…………………………….. Is an electronic version of a ledger Consists of a grid from rows and columns Is a computation tool Can.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
1 G Lect 10M Contrasting coefficients: a review ANOVA and Regression software Interactions of categorical predictors Type I, II, and III sums of.
Chapter Fourteen Copyright © 2004 John Wiley & Sons, Inc. Data Processing and Fundamental Data Analysis.
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
I. ANOVA revisited & reviewed
Test of independence: Contingency Table
EMPA Statistical Analysis
Active Learning Lecture Slides
Chapter 9: Non-parametric Tests
LEVELS of DATA.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis mutually exclusive exhaustive.
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Making Use of Associations Tests
Essentials of Marketing Research William G. Zikmund
Data Analysis for Two-Way Tables
Association Between Variables Measured at Nominal Level
Statistical Analysis of Categorical Variables
Chi Square Two-way Tables
Ch. 15: The Analysis of Frequency Tables
Relations in Categorical Data
Classification of Variables
Organizing Data Lecture 2 Vernon E. Reyes.
Lecture 42 Section 14.4 Wed, Apr 17, 2007
Lecture 37 Section 14.4 Wed, Nov 29, 2006
Data Analysis Module: Chi Square
Chapter 2 - Data * Context of the data is absolutely essential before we do anything! The W’s (and H): Who, What, When, Where, Why, How “Who” does not.
Comparing Two Variables
Statistical Analysis of Categorical Variables
Inference for Two Way Tables
Chapter 18: The Chi-Square Statistic
Inference for Two-way Tables
A Brief Introduction to Stata(2)
 .
Presentation transcript:

Cross-Tabulation Tables Tables in R and Computing Chi Square

Kinds of Data Nominal or Ordinal (few categories) Interval if it is grouped Some tests ignore the ordering of the categories (e.g. Chi square) In R this means we are working with factors

Kinds of Tables 1.One line per observation, e.g. data on Ernest Witte where each row is a single individual - table() and Rcmdr() 2.One line per cell with a column of numbers representing the count for that cell – xtabs()

Kinds of Tables 3.A row for each category of the first variable and a column for each category of the second variable with counts at the intersection of a row and column – Rcmdr (Enter table directly)

Type 1 > EWG2[sample(rownames(EWG2), 6),c("Age", "Goods")] Age Goods 159 Middle Adult Absent 126 Child Present 075 Child Absent 156 Old Adult Present 095 Adult Absent 157 Old Adult Absent

Type 2 Age Goods Freq Child Absent 18 Adult Absent 51 Child Present 19 Adult Present 55

Type 3 Absent Present Child Adult 51 55

Factors in R Factors use integers to code for categorical data Each integer code is associated with a label, e.g. 1 could stand for “Absent” and 2 for “Present” Usually R creates factors from any character data columns

Factors Regular factors are either equal or not equal (nominal) Ordered factors can be >, ==, and < Rcmdr makes is easy to convert a numeric variable to a factor, to change the factor labels, to change the order of the factor levels, and to make the factor ordered

Tables in R Tables are basically matrices with labeling Transferring between data.frames and tables is possible but can lead to unexpected results Rcmdr does not recognize tables.

Key table commands in R table() – create one and multi-way tables xtabs() - uses formulas (and optionally weights/counts) addmargins() – add row and column totals prop.table() – create table of proportions

Key commands (cont.) ftable() – flatten a multidimensional table – but does not work with xtable() print(xtable(), type=“html”) – print an html version of the table.

# Use Rcmdr to load ErnestWitte and create EWG2 # EWG2 <- subset(ErnestWitte, subset=Group==2) table(EWG2$Age) EWG2$Age <- factor(EWG2$Age) Table1 <- table(EWG2$Age, EWG2$Goods, dnn=c("Age", "Goods")) Table1 str(Table1) Table2 <- xtabs(~Age+Goods, data=EWG2) Table2 str(Table2) DF1 <- data.frame(Table1) DF1 names(DF1) <- c("Age", "Goods","Freq") DF

Table3 <- xtabs(Freq~Age+Goods, data=DF1) Table3 addmargins(Table1) prop.table(Table1) prop.table(Table1, 1) prop.table(addmargins(Table1, 1), 1) # Included in Rcmdr rowPercents(Table1) colPercents(Table1)

Table4 <- xtabs(~Adult+Goods+Pathology, data=EWG2) Table4 str(Table4) ftable(Table4, row.vars=c(1, 2), col.vars=3) ftable(Table4, row.vars=c(3, 2), col.vars=1) # tohtml() puts html code for table into Windows # clipboard or a file # named “clipboard” in Mac OsX or Linux tohtml <- function(x) print(xtable(x), type="html", file="clipboard") tohtml(Table1) # Paste clipboard into Microsoft Excel

Null Hypothesis The usual null hypothesis is that the row and column variables are independent of one another – knowing one does not help us predict the other If the null hypothesis is false, the cell values will deviate from expected values

E.g. Coin Flipping If I flip a coin twice, the chance that the first flip comes up heads is.5 The chance that the second flip comes up heads is.5 as well But what if the chance of getting a head changed depending on the first toss? The probabilities would be conditional

Expected Probabilities Under the null hypothesis the expected value for a cell is –(Row sum * Column sum)/Total count Deviations of the actual counts from the expected values is measured as –(Observed – Expected) 2 /Expected Summing the deviations over all cells gives us a statistic with a chi-square distribution

Chi-Square Test Compares observed counts to expected counts based on independence Rcmdr constructs the tables and computes the test, BUT deletes the results

Two Options chisq.test() –Saves results in multiple tables –Performs Chi Square and simulation for p value CrossTable() and crosstab() in descr –SAS, SPSS style output with xtable() –More formatting options –Mosaic plot with crosstab()

Results <- chisq.test(xtabs(~Age+Pathology, data=EWG2), simulate.p.value=TRUE) Pearson's Chi-squared test with simulated p-value (based on 2000 replicates) data: xtabs(~Age + Pathology, data = EWG2) X-squared = , df = NA, p-value = str(Results) Results$expected Results$residuals fisher.test(xtabs(~Sex+Goods, data=EWG2))

with(EWG2, CrossTable(Age, Pathology)) with(EWG2, CrossTable(Age, Pathology, prop.c=FALSE, prop.t=FALSE)) with(EWG2, crosstab(Age, Pathology)) with(EWG2, crosstab(Age, Pathology, expected=TRUE, resid=TRUE)) with(EWG2, crosstab(Sex, Goods))