2nd meeting: Multilevel modeling: intra class correlation Subjects for today:  Multilevel data base construction  The difference between single level.

Slides:



Advertisements
Similar presentations
Multilevel modelling short course
Advertisements

Regression and correlation methods
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
5nd meeting: Multilevel modeling: Summary & Extra’s Subjects for today:  How to do multilevel analysis: a 5-step-approach  Interaction, cross-level interactions,
Hypothesis Testing Steps in Hypothesis Testing:
Independent t -test Features: One Independent Variable Two Groups, or Levels of the Independent Variable Independent Samples (Between-Groups): the two.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Inference for Regression
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Inferential Statistics
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
1 Module II Lecture 4:F-Tests Graduate School 2004/2005 Quantitative Research Methods Gwilym Pryce
T-Tests.
t-Tests Overview of t-Tests How a t-Test Works How a t-Test Works Single-Sample t Single-Sample t Independent Samples t Independent Samples t Paired.
T-Tests.
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
Chapter 4 Multiple Regression.
1 Business 90: Business Statistics Professor David Mease Sec 03, T R 7:30-8:45AM BBC 204 Lecture 22 = More of Chapter “Confidence Interval Estimation”
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
3nd meeting: Multilevel modeling: introducing level 1 (individual) and level 2 (contextual) variables + interactions Subjects for today:  Intra Class.
Introduction to Multilevel Modeling Using SPSS
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Overview of Meta-Analytic Data Analysis
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
STAT 3130 Statistical Methods I Session 2 One Way Analysis of Variance (ANOVA)
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
8 - 1 © 2003 Pearson Prentice Hall Chi-Square (  2 ) Test of Variance.
Hypothesis Testing in Linear Regression Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Regression Method.
Education 793 Class Notes T-tests 29 October 2003.
SECTION 6.4 Confidence Intervals for Variance and Standard Deviation Larson/Farber 4th ed 1.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Introduction Multilevel Analysis
Multilevel Linear Models Field, Chapter 19. Why use multilevel models? Meeting the assumptions of the linear model – Homogeneity of regression coefficients.
Stats Lunch: Day 4 Intro to the General Linear Model and Its Many, Many Wonders, Including: T-Tests.
1st meeting: Multilevel modeling: introduction Subjects for today:  Basic statistics (testing)  The difference between regression analysis and multilevel.
Testing Hypotheses about Differences among Several Means.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Distribution of the Sample Means
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Intervals for Variance and Standard Deviation.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
IE241: Introduction to Design of Experiments. Last term we talked about testing the difference between two independent means. For means from a normal.
Section 6.4 Inferences for Variances. Chi-square probability densities.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Analysis of variance Tron Anders Moger
Chapter 9 Minitab Recipe Cards. Contingency tests Enter the data from Example 9.1 in C1, C2 and C3.
Introduction to Multilevel Analysis Presented by Vijay Pillai.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
BPS - 5th Ed. Chapter 231 Inference for Regression.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Distribution functions
Parameter, Statistic and Random Samples
Chi-square and F Distributions
Presentation transcript:

2nd meeting: Multilevel modeling: intra class correlation Subjects for today:  Multilevel data base construction  The difference between single level OLS regression and Multilevel analysis  Multilevel analysis: the intra class correlation (ICC)

What we have learned first meeting: When we like to say something about higher level units like Indonesian districts or countries it is best to use multilevel analysis, because we use the right standard error and correct number of observations. We need a data file with samples at district level and within district level we need samples from individuals: Individual District body height 1 Bandung150 2 Bandung145 3 Bandung156 4 Majalengka118 5 Majalengka174 6 Majalengka156 7 Serang167 8 Serang153 9 Serang District X District X District X188 2

Because we like to incorporate level 2 variables as well to explain why districts (or countries) differ. The data file look like this (welfare included): Individual District body heightWelfare (in € per capita) 1 Bandung Bandung Bandung Majalengka Majalengka Majalengka Serang Serang Serang District X District X District X HOW TO GET THAT RIGHT? 3

First we have to get the welfare data for Indonesian districts: Can be found at Central Bureau for Statistics or other internet sites Second we like to have them in a SPSS readable data file (for instance an Excel file (Microsoft Office) or SPSS file like SAV or POR files). Third we must connect the welfare data to the individual data: Individual data: 1 Bandung 2 Bandung 3 Bandung 4 Serang 5 Serang Contextual data: Bandung 300 Serang 200 In SPSS called a file In SPSS called a table 4

SPSS Syntax to construct multilevel data files: GET FILE= "c:\multilevelmodeling\welfare.sav". * Watch it: data MUST be sorted by country first!!. sort cases by district. SAVE OUTFILE= "c:/multilevelmodeling/welfare.sav". GET FILE= "c:\multilevelmodeling\all_individuals.sav". * Watch it: sort data MUST be sorted by district first!!. sort cases by district. SAVE OUTFILE= "c:/multilevelmodeling/all_individuals.sav". match files table= "c:/ multilevelmodeling\welfare.sav" /file= "c:/multilevelmodeling/all_individuals.sav" /by district. EXE. 5

Ok, we have our data ready, what’s next? First we like to know whether there is variation WITHIN districts and variation BETWEEN districts: 6

In Mlwin we have something simular: first we have an equation with the within variance: Y = a + e ij where Y = dependent variable, a = intercept, e ij = within variance (error in regression analysis) i=individual, j=level2 (district) Second we have an equation with the between variance: a = B 0j + u j where B0j = intercept, u j = between variance Substituting a in the first equation gives: Y = B0 j + e ij + u 0j  A multilevel null model !!! So in plain words: all individuals scores (Y ij) depend upon some figure (B O j + some individual variation + some level 2 variation. 7

Yij = B 0j + e 0ij + u 0j  for two individuals in Bandung: Mean in Bandung Overall mean across the population of districts Y X 8

Now suppose that all districts have the same mean body weight: Then the between variance = 0. Suppose that all individuals within a district have all the same weight: Then the within variance = 0. In many research there is both within and between variance or both level 1 and level 2 variance. The total variance of course is level 1 variance + level 2 variance. Now suppose that all individuals are relatively closely clustered arond their district (or Group) means then the so-called intra class correlation is said to be high: ICC = level 2 variance / total variance (=variance level 1 and 2) ICC is always between 0 (only level 1 variance, no clustering) and 1 (only level 2 variance) 9

Now down to business. We have data (name SCHOOL23.sav, see our site, data used with kind permission from I. Kreft and J. De Leeuw. Introducing Multilevel Modeling. Sage Publications, 1998.) from 23 schools including 519 pupils and we have a math test als Y variabele. We like to know the between en within variances. * SPSS syntax: mixed math /random intercept | subject(school) covtype(un) /print solution g testcov /method ml. ICC= 24.85/ =.23 10

Multilevel null model in MLwiN: 11

We can also test with a Chi-square test whether ICC is significant. This way of testing is recommended, because it has NOT the normality assumption from a z-test. In Mlwin you can use Chi square testing because the difference between two -2 loglikelihoods is Chi-square distributed. Say we have a model with only level 1 variance with -2 loglikehood of 1600 and the same model but now both level 1 and 2 variance parameters: -2 loglikelihood will be equal or lower! So -2 loglikihood figures are a measures of fit: the lower it is the better the models fits the data. Because the difference in -2 loglikelihood between the models can be zero or higher, the test probability must be devided by 2! Note: On our site we included a brief instruction about statistical testing in MlwiN. 12

Test whether ICC is significant or whether level 2 variance is significant different from zero we perform a Chi-square test: -2 loglikelihood from 1 level model - -2 loglikelihood from 2 level model: – = 132 with 1 df. Which is highly significant. This test is superior to the Z-test in SPSS because the latter uses an estimate for the standard error. Note: we test one sided because outcome is always zero or higher. Model with one level only: 13

Let us assume that the difference between the -2 loglikihoods is 10. We have 1 df, because we added one extra parameter, which is the level 2 variance. The Chi-square distribution looks something like this: In fact we must divide by 2 to get the correct p=value, but the original p is already very low. The conclusion being that beyond reasonable doubt there is level 2 variance or ICC > 0! 14

Testing in MLwin with Chi-square (more info in document about testing, see: ‘statistical testing in Mlwin.pdf’. Type cprob 10 1 and press [Enter] Note that must be divided by 2 voor level 2 variance testing 15