Three-dimensional tables (Please Read Chapter 3).

Slides:



Advertisements
Similar presentations
LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES
Advertisements

Three or more categorical variables
Linear Regression.
Factorial ANOVA More than one categorical explanatory variable.
Simple Logistic Regression
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Review of Matrix Algebra
Active Appearance Models Suppose we have a statistical appearance model –Trained from sets of examples How do we use it to interpret new images? Use an.
Copyright © 2014 Pearson Education, Inc.12-1 SPSS Core Exam Guide for Spring 2014 The goal of this guide is to: Be a side companion to your study, exercise.
The Analysis of Variance
Intro to Statistics for the Behavioral Sciences PSYC 1900
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Log-linear Models For 2-dimensional tables. Two-Factor ANOVA (Mean rot of potatoes) Bacteria Type Temp123 1=Cool 2=Warm.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.
Temperature correction of energy consumption time series Sumit Rahman, Methodology Advisory Service, Office for National Statistics.
One-Factor Experiments Andy Wang CIS 5930 Computer Systems Performance Analysis.
Copyright © 2010 Pearson Education, Inc. Warm Up- Good Morning! If all the values of a data set are the same, all of the following must equal zero except.
230 Jeopardy Unit 4 Chi-Square Repeated- Measures ANOVA Factorial Design Factorial ANOVA Correlation $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
Factorial ANOVA More than one categorical explanatory variable STA305 Spring 2014.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Multinomial Distribution
Lecture 9 Chapter 22. Tests for two-way tables. Objectives The chi-square test for two-way tables (Award: NHST Test for Independence)  Two-way tables.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
ANOVA: Analysis of Variance.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Lecture 9 Chapter 22. Tests for two-way tables. Objectives (PSLS Chapter 22) The chi-square test for two-way tables (Award: NHST Test for Independence)[B.
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Multiple Logistic Regression STAT E-150 Statistical Methods.
Smith/Davis (c) 2005 Prentice Hall Chapter Fifteen Inferential Tests of Significance III: Analyzing and Interpreting Experiments with Multiple Independent.
Kin 304 Inferential Statistics Probability Level for Acceptance Type I and II Errors One and Two-Tailed tests Critical value of the test statistic “Statistics.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Nonparametric Statistics
The Sample Variation Method A way to select sample size This slide show is a free open source document. See the last slide for copyright information.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Chapter 1 Computing Tools Variables, Scalars, and Arrays Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
1 G Lect 10M Contrasting coefficients: a review ANOVA and Regression software Interactions of categorical predictors Type I, II, and III sums of.
Chapter 12 Introduction to Analysis of Variance
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Chapter 14 Repeated Measures and Two Factor Analysis of Variance PowerPoint Lecture Slides Essentials of Statistics for the Behavioral Sciences Seventh.
Nonparametric Statistics
Interactions and Factorial ANOVA
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Kin 304 Inferential Statistics
Nonparametric Statistics
Multiple Regression – Split Sample Validation
One-Factor Experiments
Presentation transcript:

Three-dimensional tables (Please Read Chapter 3)

Analogy to factorial ANOVA, again Bacteria Type Temp123 Cool Warm Bacteria Type Temp123 Cool Warm Sweet potatoes White potatoes

Three Factors Storage temperature Bacteria type Potato type Dependent variable is amount of rot – mean rot in each cell. In log-linear models, there is no quantitative dependent variable.

Main effects Main effect for Temperature means, averaging over Bacteria Type and Potato Type, there is a difference in average rot between cool and warm storage. Main effect for Bacteria Type means, averaging over Temperature and Potato Type, there is a difference in average rot among the 3 types of bacteria. Main effect for Potato Type means, averaging over Bacteria Type and Temperature, there is a difference in average rot between sweet potatoes and white potatoes.

Two-factor interactions Bacteria Type by temperature interaction means that, averaging over potato type, the pattern of differences (in mean rot) among bacteria types depends on storage temperature. Or equivalently, the difference in mean rot between cool and warm temperature depends on bacteria type. Similar statements about potato type by Bacteria type and Potato type by Temperature.

3-Factor Interaction The nature of the Bacteria Type by Temperature interaction depends on Type of Potato, or The nature of the Potato Type by Temperature interaction depends on Bacteria Type, or The nature of the Bacteria Type by Potato Type interaction depends on Temperature. All equivalent

Four Factors: A B C D Four (sets of) main effects: A, B, C, D AxB, AxC, AxD, BxC, BxD, CxD AxBxC, AxBxD, AxCxD, BxCxD AxBxCxD The 4-factor interaction means that the nature of each 3-factor interaction depends on the level of the remaining factor (all equivalent).

Log-linear model for a k-dimensional table Model for log of expected frequencies Looks like model for a k-factor ANOVA, with log expected frequency playing the role of the cell mean. Main effects represent departure from equal marginal probabilities Two-factor interactions represent relationship (association, lack of independence) between variables in two-dimensional marginal tables. Three-factor interaction means the nature of the relationship depends on the value of the 3d variable. Etc.

Log-linear model for a 3-dimensional table is the mean of all log expected frequencies. Main effects are deviations of the marginal means from the grand mean, etc. Effects add to zero over any subscript in parentheses.

We will stick to hierarchical models If a higher-order term is in the model, all lower-order terms involving those variables must be in the model too. Non-hierarchical models are useful at times, but interpretation can be very tricky.

Florida Prison Data 1.Prisoner’s Race (B-W) 2.Victim’s Race (B-W) 3.Death Penalty (Y-N)

Bracket Notation Represent variables by numbers, or maybe letters, like (VR, PR, DP) For each variable, enclose vars involving highest order interaction in brackets Main effects and lower order interactions are implied, because the models are hierarchical. For example, [PR VR] [VR DP] means Prisoner’s race and Victim’s race are related, and Victim’s race and Death penalty are related, but any relationship between Prisoner’s race and Death penalty comes from the other 2 relationships. This is a model of conditional independence.

[PR VR] [VR DP] = [1 2] [2 3] 1.Prisoner’s Race (B-W) 2.Victim’s Race (B-W) 3.Death Penalty (Y-N) Obtain estimated expected frequencies by maximum likelihood, test goodness of fit with X 2 or G 2, approximately chisquare if the model is true.

Conditional independence is Important! [1 2] [2 3] means that variables 1 and 2 are related and variables 2 and 3 are related, but any connection between 1 and 3 appears only because they are both related to 2. Given (that is, conditionally upon) the value of variable 2, Variables 1 and 3 are independent. Controlling for (allowing for) variable 2, there is no relationship between variables 1 and 3. Simpson’s paradox: Vars 1 and 3 seem to be related but looking at it separately for each level of Var 2, the relationship disappears or even reverses direction. Kidney stones: V1 = Treatment, V3=Effectiveness, V2=Size of stones.

It’s like multiple regression Suppose X 1 and X 2 are correlated, with Are X 1 and Y related? Yes!

But X 1 and Y are independent conditionally on the value of X 2 [X 1 X 2 ] [X 2 Y] “Controlling” for X 2, Y is unrelated to X 1

Iterative proportional model fitting Maximum likelihood estimation is indirect Go straight to estimation of expected frequencies First a example from the text: [1 2] [1 3] [2 3] Analytically, obtain

Try to make marginals match up. Update all cells at each step.

Now repeat the cycle

Keep repeating Ratios go to one, so the process converges (proved). Stop when you get close enough. It converges to the right answer (proved). Can be extended to any model. This is typical of mature maximum likelihood estimation; It’s usually not what you think.

With expected frequencies, can test H 0 : Model is correct using df = Number of cells – Number of parameters in the model Look at Table 3-4 again

Getting the data into R Put frequencies directly into tables Read a data frame with frequencies, number of rows = number of cells Read a raw data file, number of rows = N

Put frequencies directly into tables For consistency with Table 3-5 in the text, make dimensions 1 = Perch Height, 2 = Perch Diameter, 3 = Species

Need Labels

Labels are dimnames of the array: A list

This is Better

Method 2: Read a data frame

Read data frame from an external file, say a plain text file in your working directory

Berkeley data are on the class website

xtabs(Counts ~ Vars separated by + signs, data = Name of data frame)

Marginal Tables are Easy Probably a data frame and xtabs is the easiest way to import data from a published table.

Method 3: External raw data file

Read data into a data frame, Number of rows = N

The table function

3-Dimensional table

Fitting and testing models with the loglin function Hierarchical models only Very close to bracket notation Give it a table and a list of vectors Vectors are vars in a bracket, like c(1,2,4) means [1 2 4] Iterative proportional model fitting Returns estimated expected frequencies as an option

loglin(table,margin,fit=F,param=F)

Some options

Parameter estimates

Two more models: See right side of Table 3-5, p. 42