Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June.

Slides:



Advertisements
Similar presentations
1 What Is The Next Step? - A review of the alignment results Liru Zhang, Katia Forêt & Darlene Bolig Delaware Department of Education 2004 CCSSO Large-Scale.
Advertisements

What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
Iowa Assessment Update School Administrators of Iowa November 2013 Catherine Welch Iowa Testing Programs.
Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland.
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Chapter Fifteen Understanding and Using Standardized Tests.
©2010 Prentice Hall Business Publishing, Auditing 13/e, Arens//Elder/Beasley Audit Sampling for Tests of Controls and Substantive Tests of Transactions.
Math-Science Subgroup Report Recommendations. APEC Context Members are keenly interested in collaborating to learn from each other how to provide 21 st.
1 Alignment of Alternate Assessments to Grade-level Content Standards Brian Gong National Center for the Improvement of Educational Assessment Claudia.
Chapter 7 Sampling Distributions
MCAS-Alt: Alternate Assessment in Massachusetts Technical Challenges and Approaches to Validity Daniel J. Wiener, Administrator of Inclusive Assessment.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.
BA 427 – Assurance and Attestation Services
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
1 ITEA Presentation of STANDARDS FOR TECHNOLOGICAL LITERACY William E. Dugger Jr., DTE Pam B. Newberry September, 2000.
A Comparison of Progressive Item Selection Procedures for Computerized Adaptive Tests Brian Bontempo, Mountain Measurement Gage Kingsbury, NWEA Anthony.
Copyright © 2007 Pearson Education Canada 1 Chapter 12: Audit Sampling Concepts.
Chapter 14 Understanding and Using Standardized Tests Viewing recommendations for Windows: Use the Arial TrueType font and set your screen area to at least.
Determining Sample Size
Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.
Ensuring State Assessments Match the Rigor, Depth and Breadth of College- and Career- Ready Standards Student Achievement Partners Spring 2014.
1 Alignment of Standards, Large-scale Assessments, and Curriculum: A Review of the Methodological and Empirical Literature Meagan Karvonen Western Carolina.
Alignment Powerful Tool for Focusing Instruction, Curricula, and Assessment.
Charteredaccountants.com.au/training Fundamentals of Auditing in 2007 Chartered Accountants Audit Conference ASA 530 – Audit Sampling and Other Means of.
Assessment Literacy Series 1 -Module 6- Quality Assurance & Form Reviews.
Classroom Assessments Checklists, Rating Scales, and Rubrics
Liru Zhang, Delaware DOE Shudong Wang, NWEA Presented at the 2015 NCSA Annual Conference, San Diego, CA 1.
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
Issues in Comparability of Test Scores Across States Liru Zhang, Delaware DOE Shudong Wang, NWEA Presented at the 2014 CCSSO NCSA New Orleans, LA June.
1 Alignment of Alternate Assessments to Grade-level Content Standards Brian Gong National Center for the Improvement of Educational Assessment Claudia.
Issues Related to Judging the Alignment of Curriculum Standards and Assessments Norman L. Webb Wisconsin Center for Education Research University of Wisconsin-Madison.
Standard Setting Results for the Oklahoma Alternate Assessment Program Dr. Michael Clark Research Scientist Psychometric & Research Services Pearson State.
SCOTT MARION CENTER FOR ASSESSMENT CCSSO JUNE 22, 2010 Ensuring, Evaluating, & Documenting Comparability of AA-AAS Scores.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Pearson Copyright 2010 Some Perspectives on CAT for K-12 Assessments Denny Way, Ph.D. Presented at the 2010 National Conference on Student Assessment June.
Auditing: The Art and Science of Assurance Engagements Chapter 13: Audit Sampling Concepts Copyright © 2011 Pearson Canada Inc.
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
Georgia will lead the nation in improving student achievement. 1 Georgia Performance Standards Day 3: Assessment FOR Learning.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Practical Issues in Computerized Testing: A State Perspective Patricia Reiss, Ph.D Hawaii Department of Education.
1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.
How was LAA 2 developed?  Committee of Louisiana educators (general ed and special ed) Two meetings (July and August 2005) Facilitated by contractor.
End of Course Exams  In February, 2007 the Missouri State Board of Education approved End of Course (EOC) exams.  WHY?
Measuring Progress and Planning Learning William E. Dugger, Jr. Shelli D. Meade.
Establishing by the laboratory of the functional requirements for uncertainty of measurements of each examination procedure Ioannis Sitaras.
Sampling and Sampling Distribution
Designing Scoring Rubrics
Examining Achievement Gaps
What is a CAT? What is a CAT?.
Pre-Referral to Special Education: Considerations
VALIDITY by Barli Tambunan/
Moderator: Dr. Jan Barth Presenters: Dr. Gary Cook, Dr
Test Blueprints for Adaptive Assessments
Validity and Reliability
Item pool optimization for adaptive testing
Booklet Design and Equating
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Considerations of Content Alignment in CAT
Standard Setting for NGSS
Aligned to Common Core State Standards
Brian Gong Center for Assessment
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Shudong Wang, NWEA Liru Zhang, Delaware DOE G. Gage Kingsbury, NWEA
William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland
Understanding and Using Standardized Tests
Analyzing Reliability and Validity in Outcomes Assessment
Innovative Approaches for Examining Alignment
Presentation transcript:

Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June 25-27, 2014

Background of Alignment Alignment is an important attribute in the standards-based educational reform. Peer Review requires states to submit evidence demonstrating alignment between the assessment and its content standards (Standards and Assessments Peer Review Guidance, 2009) in addition to validity evidence. For the Next Generation Assessment, alignment is defined as the degree to which expectations specified in the Common Core State Standards and the assessment are in agreement and serve in conjunction with one another to guide the system toward students learning what they are suppose to know and be able to do. With new and innovative technology and the great potential of online tests, computerized adaptive testing (CAT) has been increasingly implemented in K-12 assessment systems.

Alignment for Linear Tests Different approaches have been employed in the past decade for evaluating the alignment of state assessment programs. Their process is similar in four ways. They are: (1) use approved content standards and implemented tests; (2) review item-by-item from a developed linear test form; (3) use professional judgments for alignment; and (4) evaluate the degree that each item matches the claim, standard, objective and/or topic, and performance expectations (e.g., cognitive complexity and depth of knowledge) in the standards.

Linear Test vs. CAT Linear tests Test form(s) are assembled prior to operation based on the test specifications. The fixed form(s) are used for all students. Linear tests normally target at a medium difficulty level. The degree of alignment is determined based on the content of test form(s) compared with the predetermined criteria. Adaptive tests Many unique test forms are assembled during testing for individual students. Test difficulty for unique test forms vary greatly to match estimated student ability level. How to determine the degree of alignment for adaptive testing?

Alignment for CAT In adaptive testing, the item pool is the source for assembling test forms for individual students, therefore the alignment should be considered as the relationship between the item pool and the standards that intended to measure. With the adaptive nature, t echnical issues must be considered in the design, process, and evaluation in alignment for CAT. What kinds of technical issues should be considered? Should the alignment be based on the item pool or based on a sample of items? How should a representative sample of items be selected so that it can be used to make for a fair inference about the item pool? Is it appropriate to use the same procedure and criteria for linear tests to evaluate the alignment for an adaptive test?

Technical Considerations To demonstrate the possible impact, such as the characteristics of the item pool, student proficiency levels, and item exposure rate, on the alignment results, samples of items and individual test forms were selected from the grade 3 mathematics assessment. Step One: Seven item samples (50 each) were selected from the pool: Two random samples A weighted sample based on the four content strains and their relative proportions in the pool Two samples selected based on the frequency distributions of person- and item-parameters as typical items Two samples based on item exposure rate by excluding never-used items

Sample Selection Logit Person Par.Item Par.Typical 1Typical 2 N%N%N%N% Total

Test Specifications and Constraints Content Strand Test Specs.Item Pool Item Type Test Specs.Item Pool N%N%Min.Max.N% 1. Numeric TE Algebraic MC Geometric Item-GR 4. Quantitative On Total Off

Comparison 1: Content Balance Strand Test Specs.Random1Random2Standards N%N%N%N% Numeric Algebraic Geographic Quantitative Total Strand Typical 1Typical 2Expo1Expo2 N%N%N%N% Numeric Algebraic Geographic Quantitative Total

Comparison 2: Balance of Item Type Item TypeTest Specs. Sample1Sample2StandardsTypical1Typical2Expo1Expo2 MC TE On-Grade Off-Grade Total50

Comparison 3: Item Difficulty Distribution Logit Frequency (%)Random1Random2Standards PoolAbilityN%N%N% Total

Comparison 3: Item-Parameter Distribution Logit Frequency (%)Typical1Typical2Expo1Expo2 PoolAbility%N%N% Total

Comparison 4: Item Exposure Rate Sample Under ExpoOver ExpoSub-TotalNormal ExpoTotal 0%≤ 5% ≥ 25% N%N%N Random Random Standards Typical Typical Expo Expo

Technical Considerations 2 Logit Student Ability (Theta) Sample Forms N%N% Total Step Two: Fourteen individual test forms were selected based on the frequency distributions of estimated student ability (theta) from the grade 3 mathematics.

Comparisons 1: Content and Item Types FormTheta Content StrandItem Type 1234MCTEOn-GradeOff-Grade

Comparison 2: Item Exposure Rate Unit Under ExpoNormalOver ExpoTotal 0≤ N Item Pool Forms Unit Item Pool Utilization RateOverlap Rate PoolUsed itemsRate ≥ 35%≤ 35% N. Items% % 14 Forms – 93%7 – 29%

Comparison 3: Use of Off-Grade Items Sample FormThetaN. Items Content Strand 1Content Strands 2, 3 & 4 N%N%

Comparison 4: Content Balance of Item Pool Content Strand Test Specs.Item Pool Diff. (%) N%N% 1. Numeric Algebraic Geometric Quantitative Total

Comparison 5: Item-Parameter in Item Pool Logit Student Ability (Theta)Item Difficulty (b-p)Diff. N% N% Total

Mapping of Person- and Item-Paramters

Technical Issues in Alignment for CAT In adaptive testing, each successive item is chosen by a set of constraints to maximize information on estimated ability or to minimize the deviation of the information from a target value. Among the many particular requirements for CAT, a sizeable and well- balanced item pool with regard to content and psychometrics characteristics is a fundamental condition for success. To realize many advantages in CAT, the item pool must contain high- quality items that match the criteria of the item selection algorithm for many different levels of proficiency to provide adequate information. In addition to content constraints, constraint on item exposure rate is essential for item utility and test security.

Technical Issues in Alignment for CAT The deficit and excess of items for certain assessed content standard(s) consequently created the issue of under- and over-exposure of test items in the pool. Under-exposed items are often overrepresented in the pool or lack desirable characteristics to meet these constraints for item selection on the test. The presence of unused items in the pool is an unfortunate waste of resources. Over-exposed items not only jeopardize test security, but may result in positive bias of ability estimation, which seriously threatens the validity of high-stakes assessments. In K-12 assessments, the large populations, wide range of proficiency levels among students, broader content coverage, and its high-stakes nature introduces tremendous technical challenges in the development and implementation of CAT.

Criteria Commonly Used in Alignment Alignment Level Categorical Concurrence Depth of Knowledge Range of Knowledge Balance of Representation Acceptable 6 items per standard 50%.70 Weak 6 items per standard 40% - 49% Unacceptable less than 6 items per standard less than 40% less than.60

A Brief Summary 1. Technical issues must be taken into account in the design, process for item review, and evaluation of alignment for computerized adaptive testing, such as test specifications, algorithm and constraints for item selection, item exposure rate. 2. No matter the alignment is based on the entire item pool or based on a sample of items, appropriate reference about the alignment of an adaptive test must be supported with evidence from both content/curriculum and technical perspectives 3. The criteria that are commonly used in evaluating the alignment for linear tests may not be suitable for evaluating the alignment for computerized adaptive tests.