HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.

Slides:



Advertisements
Similar presentations
Questions From Yesterday
Advertisements

Multilevel modelling short course
Hierarchical Linear Modeling: An Introduction & Applications in Organizational Research Michael C. Rodriguez.
Objectives 10.1 Simple linear regression
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Forecasting Using the Simple Linear Regression Model and Correlation
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
10-3 Inferences.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 4 Multiple Regression.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Introduction to Multilevel Modeling Using SPSS
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Linear Regression Inference
Montecarlo Simulation LAB NOV ECON Montecarlo Simulations Monte Carlo simulation is a method of analysis based on artificially recreating.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Introduction Multilevel Analysis
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Lecture 10: Correlation and Regression Model.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Sample Size Determination
LESSON 6: REGRESSION 2/21/12 EDUC 502: Introduction to Statistics.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapter 13 Simple Linear Regression
Sample Size Determination
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Linear Regression.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 11 Simple Regression
Week 10 Chapter 16. Confidence Intervals for Proportions
Chapter 13 Simple Linear Regression
…Don’t be afraid of others, because they are bigger than you
CHAPTER 29: Multiple Regression*
Chapter 6: MULTIPLE REGRESSION ANALYSIS
PENGOLAHAN DAN PENYAJIAN
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Simple Linear Regression
Simple Linear Regression
Simple Linear Regression
Chapter 7: The Normality Assumption and Inference with OLS
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association for the Evaluation of Educational Achievement (IEA), Data Processing and Research Center, Germany, Hamburg Caroline Vandenplas University of Lausanne, Switzerland

Introduction: Inferences and Sample Sizes Usually, researchers are not interested in features of the sample, but want to infer from it on population features. Remember your statistics course: When inferring from sample data on populations, You just estimate a population feature. You need to indicate the “uncertainty” or precision of your estimate. The related measure is the standard error (s.e.). Using the standard error, You can build a confidence interval (CI) around the mean as You can test for group differences. Look up you statistic books for the formula on estimating se! Or, when doing it for LSA, look in the User Guides!

Introduction: Inferences and Sample Sizes Sometimes, we cannot detect “significant” differences between groups to be compared. However: The fact that we cannot detect a difference does not mean there is none!! Based on our data, we just don’t know… How can we get more precise results? One possibility: Increase the sample size How/why does this work?

The Sample is Picturing the Population… What if a picture is our population? Has about 340,000 pixels (if I remember correctly…)

The Sample is Picturing the Population… Sample Size = 1,000 10,000 50,000 We can play with the sample size to change the precision of our picture. With increasing sample size, sampling error reduces. Measure for this preciseness: Standard error (s.e.)

Introduction: Inferences and Sample Sizes This relationship between sample size and s.e. holds for any estimated parameter. Percentages Correlation coefficients Regression coefficients Etc. It also holds for any estimated coefficient of a hierarchical model! The relationship is not linear though and can depend on many factors.

Connection to HLM What is HLM? HLM = Hierarchical Linear Modeling Nice introduction to HLM give, e.g., Snijder & Bosker (1999) Analysis method that addresses the hierarchical structure of data/populations Almost all datasets from large-scale assessments (LSA) display a hierarchical structure. E.g., students nested in classes/schools (TIMSS, PIRLS, PISA …) Effects playing out at different levels of a hierarchy can be disentangled. We can specify “fixed” and “random” effects. HLM is an enhancement of linear regression analysis… Students nested in teachers. (TIPI) Teachers nested in schools. (ICCS, ICILS) Example Random effects: If we know the slope of some parameter differs between cluster a lot, we may want to let it vary at random.

From Linear Regression to HLM Assume we are trying to predict student math achievement from SES scores. Linear regression model: 𝑦 𝑖 = 𝛽 0 + 𝛽 1 𝑥 𝑖 + ε, for i = 1, …,n where 𝑦 𝑖 = student achievement, 𝛽 0 and 𝛽 1 are unknown coefficients, 𝑥 𝑖 = student SES, ε = error term (residual variance) This model confounds effects between and within groups! Model parameters to be estimated.  All subject to sampling error! Subscript “i” denotes students in the sample.

Example: What if we have a case like this?

Example: What if we have a case like this? Linear regression, no consideration of cluster effects

Example – Individuals Belong to Clusters! Intercepts can vary (“random intercepts”) Slopes are fixed (parallel lines) Slopes can vary (“random slopes”)

From Linear Regression to HLM Extending the model allows us to disentangle the effects at different levels (= Hierarchical Linear Model) E.g., we can specify a model with one explanatory variable at level 1 and level 2, the intercept and the slope are random. Example research question: Does the influence of SES on achievement vary between schools, i.e., does SES affect students achievement in different schools in different magnitudes or even directions? We acknowledge the fact that within each school, we may have a different regression line. Residuals at both levels are assumed to follow normal distributions with zero means.

From Linear Regression to HLM Hierarchical model: The research question can be answered by measuring by U1 and its significance. Model parameters to be estimated.  All subject to sampling error! First equation = normal regression equation. Specifying β0 (second equation), we acknowledge the fact that each school has a different intercept AND we consider the average SES at school level. Specifying β1 (last equation), we acknowledge the fact that each school has a regression line with a different slope. U0, U1 & ε are all variances. Residuals at both levels are assumed to follow normal distributions with zero means.

Purpose of this Research Increasing demand to apply hierarchical linear models to educational large-scale assessment (LSA) data FAQ: “How many units do I need on different levels of hierarchy to do meaningful multilevel modeling?“ (secondary researchers, survey designers) “Meaningful“ = achieving parameter estimates with certain precision levels

Data and Methods We utilized a Monte-Carlo simulation study, mimicking typical large-scale assessment data: Selection of samples from a virtual population with 2 hierarchical levels Varied parameters: Sample sizes at both levels Intra-class correlation coefficient Selection probabilities of level 2 units Covariance distributions between levels Model complexity 288 sampling scenarios were explored, each with 6,000 replicates

Research Question What is the association between sample sizes and precision of the estimated parameters of hierarchical models under varying population conditions?

Results Precision of all explored parameters increase when sample size increases in all scenarios No rule of thumb concerning required sample sizes can be given! Rather, required sample sizes depend heavily on the parameter of interest: Sampling precision levels vary extremely for different model parameters Sample size requirements differ widely for the estimation of fixed model parameters vs. the estimation of variances Research interest in macro-level regression coefficients: Increase the number of sampled clusters Focus on variance estimates: The level at which the sample size is increased is less important In opposition to what is often suggested in the literature, no rule of thumb…

Results With the sample sizes typically employed in LSA, some parameters cannot be estimated with sufficient precision (i.e., being significantly different from zero) in relatively simple hierarchical models With such imprecise parameters, group differences are even less likely to be detected Mean of random intercepts, residual variance: about 1% to 5% Slope of random intercepts: > 100% (S.E. Bigger than the parameter)

Conclusions Be mindful when phrasing your research questions and interpreting your results: Does the data at hand actually suits the type of analysis? What are relevant group differences that you want to detect? Examine closely standard errors of the explored model parameters. Full paper with practical guidelines and detailed results can be downloaded at http://www.ierinstitute.org/dissemination-area.html (IERI Monograph Series Special Issue 1, October 2012) Mean of random intercepts, residual variance: about 1% to 5% Slope of random intercepts: > 100% (S.E. Bigger than the parameter)

Thank you for your attention! We are grateful to the National Center for Education Statistics (NCES), U.S. Department of Education, who funded the project. Contact: sabine.meinck@iea-dpc.de caroline.vandenplas@unil.ch