Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010.

Slides:

Advertisements

Similar presentations

The effect of differential item functioning in anchor items on population invariance of equating Anne Corinne Huggins University of Florida.

Advertisements

1 Scaling of the Cognitive Data and Use of Student Performance Estimates Guide to the PISA Data Analysis ManualPISA Data Analysis Manual.

Children’s subjective well-being Findings from national surveys in England International Society for Child Indicators Conference, 27 th July 2011.

Item Response Theory in Health Measurement

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.

Proposed Cognitive Test for the Main Study National Research Coordinators Meeting Windsor, June 2008.

Today Concepts underlying inferential statistics

Chapter 4 Selecting a Sample Gay, Mills, and Airasian

Chapter 7 Correlational Research Gay, Mills, and Airasian

Chapter 14 Inferential Data Analysis

Richard M. Jacobs, OSA, Ph.D.

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

ICCS European Regional Report National Coordinators Meeting Madrid February 2010.

Outline of International Report on ICCS National Research Coordinators Meeting Madrid, February 2010.

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Correlations & Regression.

Introduction to plausible values National Research Coordinators Meeting Madrid, February 2010.

ICCS th NRC Meeting, February 15 th - 18 th 2010, Madrid 1 Sample Participation and Sampling Weights.

1st NRC Meeting, October 2006, Amsterdam 1 ICCS Sampling Design.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Overview of ICCS field trial National Research Coordinators Meeting Windsor, June 2008.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

Dealing with Omitted and Not- Reached Items in Competence Tests: Evaluating Approaches Accounting for Missing Responses in Item Response Theory Models.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.

Explaining variation in CCE outcomes (Chapters 7 & 8) National Research Coordinators Meeting Madrid, February 2010.

Tests and Measurements Intersession 2006.

Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.

Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.

The reporting of student questionnaire data (Chapters 4&5) National Research Coordinators Meeting Madrid, February 2010.

Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid Using the IEA IDB Analyzer Percentages & Means.

ICCS Main Survey Overview National Research Coordinators Meeting Madrid, February 2010.

Chapter 16 Data Analysis: Testing for Associations.

Chapter 13 Multiple Regression

Proposed student questionnaire for the main survey National Research Coordinators Meeting Windsor, June 2008.

ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.

Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.

Multivariate Data Analysis Chapter 1 - Introduction.

Study Overview National Research Coordinators Meeting Amsterdam, October

School-level Correlates of Achievement: Linking NAEP, State Assessments, and SASS NAEP State Analysis Project Sami Kitmitto CCSSO National Conference on.

Chapter 6: Analyzing and Interpreting Quantitative Data

Student Background Questionnaire National Research Coordinators Meeting Amsterdam, October 2006.

IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.

NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Item Response Theory in Health Measurement

ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.

Variability Introduction to Statistics Chapter 4 Jan 22, 2009 Class #4.

1 SPSS MACROS FOR COMPUTING STANDARD ERRORS WITH PLAUSIBLE VALUES.

Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.

Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.

Teacher Questionnaire National Research Coordinators Meeting Windsor, June 2008.

Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Appendix I A Refresher on some Statistical Terms and Tests.

The Basics of Social Science Research Methods

Statistical analysis.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Statistical analysis.

Statistics: The Z score and the normal distribution

CHAPTER 29: Multiple Regression*

EPSY 5245 EPSY 5245 Michael C. Rodriguez

Investigating item difficulty change by item positions under the Rasch model Luc Le & Van Nguyen 17th International meeting of the Psychometric Society,

15.1 The Role of Statistics in the Research Process

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Overview of Main Survey Data Analysis and Scaling National Research Coordinators Meeting Madrid, February 2010

NRC Meeting Madrid February 2010 Content of presentation Scaling and analysis of test items Scaling and analysis of questionnaire items Data analysis for the reporting of ICCS data

NRC Meeting Madrid February 2010 Steps in analysis Preliminary analysis of first data sets received –Review at JMC data analysis meeting in Hamburg in July 2009 Analysis of clean and uncleaned data sets from almost all participating countries –Review at PAC meeting in Tallinn (Oct 2009) and JMC data analysis meeting in Hamburg in early December 2009 Final scaling and analysis with clean data from all 38 countries

NRC Meeting Madrid February 2010 Test item analysis Review of missing data Analysis of item dimensionality Review of item statistics (international) Analysis of differential item functioning by gender Analysis of item-by-country interaction –Measurement equivalence Item adjudication

NRC Meeting Madrid February 2010 Scaling model Rasch one-parameter model P i (  ) is the probability for person n to score 1 on item i  n is the estimated ability of person n and  i

NRC Meeting Madrid February 2010 Probability curves

NRC Meeting Madrid February 2010 Partial credit model For open-ended items (and questionnaire items) with more than two categories the Partial Credit model was used: Here, t ij denotes an additional step parameter

NRC Meeting Madrid February 2010 Threshold curves

NRC Meeting Madrid February 2010 Response probabilities

NRC Meeting Madrid February 2010 Missing data issues Different categories of missing data Omitted responses –Somewhat higher percentages for open response items Invalid responses –Generally very low percentages Not reached responses –Omitted items at end of test booklets –Generally low, in few countries more considerable

NRC Meeting Madrid February 2010 Not reached % by region

NRC Meeting Madrid February 2010 Test characteristics Test items were generally a little easier than the average student abilities (pooled across countries) Test reliability was 0.84 (similar to CIVED assessment) Very high latent correlations between possible sub-dimensions –Decision not to pursue sub-scales

NRC Meeting Madrid February 2010 Mapping of test items to abilities

NRC Meeting Madrid February 2010 Review of item scaling properties Most items had excellent scaling properties –Weighted mean square item fit –Item-total correlation –Item characteristic curves Only on test item (CI2HRM2) was omitted from scaling

NRC Meeting Madrid February 2010 Item statistics

NRC Meeting Madrid February 2010 Item characteristic curves

NRC Meeting Madrid February 2010 Scoring reliabilities - 1 Open-ended items were scored according to international scoring guidelines Double-scoring of sub-samples On average, percentages of scorer agreement ranged between 84 and 92 across participating countries

NRC Meeting Madrid February 2010 Scoring reliabilities - 2 Only items accepted where scorer agreement was 70% or more Data for items where this criterion was not met were not included in scaling In two countries open-ended items were consistently easier than other items –Omitted from scaling and database

NRC Meeting Madrid February 2010 Gender DIF DIF estimates reflect the differences between item difficulties for males and females of equal ability –This may cause bias in favour of one group Generally, only few items with gender DIF were found

NRC Meeting Madrid February 2010 Cross-national measurement equivalence Occurrence of item-by-country interaction –Items relatively much harder in some countries but much easier in others In ICCS, national item calibrations were compared with those for the international calibration sample Standard errors were adjusted for sample design effects and multiple comparisons

NRC Meeting Madrid February 2010 Example for CI2HRM2

NRC Meeting Madrid February 2010 Item-by-country interaction Generally, items tended to behave in a similar way Number of items with parameter variance –Sometimes due to translation errors –Often due to other factors (national context, curricula) Occurrence of some parameter variation across countries –Similar results as in other cross-national studies

NRC Meeting Madrid February 2010 Item adjudication Based on results from scaling analysis (item statistics, item curves, item-by- country interaction etc.) International item adjudication –Omission of CI2HRM2 from scaling National item adjudication –Re-verification for items with larger discrepancies in item difficulty –Omission of item for national scaling with translation or scoring issues

NRC Meeting Madrid February 2010 Calibration of items Based on international calibration sample with 500 randomly selected students from each of the 36 participating countries that met sampling requirements ACER ConQuest was used for estimation Booklet effects adjusted by including booklet as a facet in the scaling model

NRC Meeting Madrid February 2010 Scaling methodology Plausible values were generated as student ability estimates –More information at workshop! Dummy indicators for classroom and all student level variables (international and regional) were included in the conditioning model Scale scores set to international metric with mean of 500 and SD of 100 for equally weighted countries

NRC Meeting Madrid February 2010 Estimation of changes in cognitive knowledge test items from CIVED included as intact cluster 17 countries with comparable data –Three countries with grade 9 in CIVED and additional grade 9 samples in ICCS Small number of items in some countries had to be discarded due to translation errors or differences between ICCS and CIVED

NRC Meeting Madrid February 2010 Estimation of changes in cognitive knowledge - 2 Comparison of item parameters showed high similarity (correlation of 0.95) Slight positioning effect due to different test designs –CIVED: One booklet –ICCS: CIVED link cluster in each of the three positions CIVED items at beginning slightly easier, at end slightly harder than in ICCS

NRC Meeting Madrid February 2010 Estimation of changes in cognitive knowledge - 3

NRC Meeting Madrid February 2010 Estimation of changes in cognitive knowledge - 4 Framework broadened since CIVED –Re-scaling CIVED data to equate with ICCS not appropriate Selection of CIVED items not representative for overall CIVED test –Equating link items with CIVED scale (or sub-scale) also not appropriate Solution: Establish new comparison scale based only on 17 link items

NRC Meeting Madrid February 2010 Estimation of changes in cognitive knowledge - 5 Concurrent calibration of item parameters based on calibration samples with 34 samples from 17 countries (CIVED and ICCS) Establishing a metric with a mean of 500 and SD of 100 for equally weighted 17 CIVED countries For results in tables, weighted likelihood estimates were used –Usually unbiased for country averages

NRC Meeting Madrid February 2010 Questionnaire item analysis Missing data issues Item dimensionality and scaling review Item/scale adjudication Scaling procedures

NRC Meeting Madrid February 2010 Missing data - 1 On average about 3 percent of students have missing scale scores –Only in two countries there are percentages of 18 and 12 percent Teacher survey data relatively low missing percentages were found (about 2 percent) Very low percentages of missing data in school questionnaire

NRC Meeting Madrid February 2010 Missing data - 2 Concerns about missing data for socio-economic indicators –Highest parental occupation: 5% –Highest parental education: 3% –Books at home: 1% However, in a few countries higher percentages of missing data were found (up to 15% for parental education)

NRC Meeting Madrid February 2010 Analysis of item dimensionality Exploratory and confirmatory factor analyses showed generally very similar results to those from the field trial These analyses will be described in detail in the ICCS technical report

NRC Meeting Madrid February 2010 Scaling analysis Scale reliabilities (Cronbach’s alpha) –Over 0.7 satisfactory internal consistency Item-total correlations: –Useful for reviewing translation errors Scaling with IRT Partial Credit Model –Item fit –Category characteristic curves

NRC Meeting Madrid February 2010 Item and scale adjudication Only three scales with median scale reliabilities below 0.7 –Democratic value beliefs, civic participation in community and at school Adjudication for student, teacher, school and each regional questionnaire Some items were removed from scale In some cases, single-item reporting

NRC Meeting Madrid February 2010 Scaling procedures - 1 IRT scaling with Partial Credit Model So-called weighted likelihood estimates as scale scores International metric with mean of 50 and a standard deviation of 10

NRC Meeting Madrid February 2010 Scaling procedures - 2 Item parameter calibration with ACER ConQuest Calibration samples: –500 students per country –250 teachers per country –All school data with equal weights for each country Only data from countries that met sampling requirements (categories 1 or 2) included in calibration

NRC Meeting Madrid February 2010 Questionnaire scales Advantages of IRT scales –Inclusion of students with at least two item responses per scale –Possibility to describe scale From IRT Partial Credit Model it is possible to map scale scores to expected item responses Item maps will be provided in appendix to international report

NRC Meeting Madrid February 2010 Example of item map

NRC Meeting Madrid February 2010 Data analysis for reporting Estimation of sampling variance Estimation of measurement variance Reporting of differences

NRC Meeting Madrid February 2010 Estimation of sampling variance Data from cluster samples are not simple random samples –Standard formula for estimating sampling error not appropriate Jackknife repeated replication technique used for ICCS IDB Analyser, WESVAR or SPSS/SAS macros may be used for applying this methodology

NRC Meeting Madrid February 2010 Estimation of measurement variance Using plausible values allows estimating the measurement error –The variation between the five PVs can be used for estimation IDB Analyser, WESVAR or SPSS macros (ACER replicates module) include features to do this More information will be provided at the training workshop on Wednesday

NRC Meeting Madrid February 2010

NRC Meeting Madrid February 2010 Reporting of differences - 1 The following types of significance tests will be reported: –For differences in population estimates between countries –For differences between a country and the international –in population estimates between subgroups within countries. –For differences between population estimates in ICCS and in CIVED (trend estimation)

NRC Meeting Madrid February 2010 Reporting of differences - 2 Adjustment for multiple comparisons with Dunn-Bonferroni method – increasing critical value (p>.05) from 1.96 to SE for differences between samples Estimation of SE for sub-group differences with JRR

NRC Meeting Madrid February 2010 Reporting of differences - 3 For the SE of trend differences it is important to take the equating error into account The estimation of SE for differences between CIVED and ICCS can be computed as The equating error in the international metric is 3.31

NRC Meeting Madrid February 2010 Multivariate analysis Multiple regression models were used for the tables in draft Chapter 7 –Bivariate regression –Multiple regression Multi-level models were used for the analysis in draft Chapter 8 –Students nested within classrooms –Classrooms mostly equivalent to schools

NRC Meeting Madrid February 2010 Questions or comments?