Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June.

Similar presentations


Presentation on theme: "Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June."— Presentation transcript:

1 Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June 25-27, 2014

2 Background of Alignment Alignment is an important attribute in the standards-based educational reform. Peer Review requires states to submit evidence demonstrating alignment between the assessment and its content standards (Standards and Assessments Peer Review Guidance, 2009) in addition to validity evidence. For the Next Generation Assessment, alignment is defined as the degree to which expectations specified in the Common Core State Standards and the assessment are in agreement and serve in conjunction with one another to guide the system toward students learning what they are suppose to know and be able to do. With new and innovative technology and the great potential of online tests, computerized adaptive testing (CAT) has been increasingly implemented in K-12 assessment systems.

3 Alignment for Linear Tests Different approaches have been employed in the past decade for evaluating the alignment of state assessment programs. Their process is similar in four ways. They are: (1) use approved content standards and implemented tests; (2) review item-by-item from a developed linear test form; (3) use professional judgments for alignment; and (4) evaluate the degree that each item matches the claim, standard, objective and/or topic, and performance expectations (e.g., cognitive complexity and depth of knowledge) in the standards.

4 Linear Test vs. CAT Linear tests Test form(s) are assembled prior to operation based on the test specifications. The fixed form(s) are used for all students. Linear tests normally target at a medium difficulty level. The degree of alignment is determined based on the content of test form(s) compared with the predetermined criteria. Adaptive tests Many unique test forms are assembled during testing for individual students. Test difficulty for unique test forms vary greatly to match estimated student ability level. How to determine the degree of alignment for adaptive testing?

5 Alignment for CAT In adaptive testing, the item pool is the source for assembling test forms for individual students, therefore the alignment should be considered as the relationship between the item pool and the standards that intended to measure. With the adaptive nature, t echnical issues must be considered in the design, process, and evaluation in alignment for CAT. What kinds of technical issues should be considered? Should the alignment be based on the item pool or based on a sample of items? How should a representative sample of items be selected so that it can be used to make for a fair inference about the item pool? Is it appropriate to use the same procedure and criteria for linear tests to evaluate the alignment for an adaptive test?

6 Technical Considerations To demonstrate the possible impact, such as the characteristics of the item pool, student proficiency levels, and item exposure rate, on the alignment results, samples of items and individual test forms were selected from the grade 3 mathematics assessment. Step One: Seven item samples (50 each) were selected from the pool: Two random samples A weighted sample based on the four content strains and their relative proportions in the pool Two samples selected based on the frequency distributions of person- and item-parameters as typical items Two samples based on item exposure rate by excluding never-used items

7 Sample Selection Logit Person Par.Item Par.Typical 1Typical 2 N%N%N%N% -4.00501640000 -3.00125137100000 -2.00849984230000 2289241103021422346.005385561133129582754 1.007768410000 2.001892000000 3.00450000000 Total96631003641005010050100

8 Test Specifications and Constraints Content Strand Test Specs.Item Pool Item Type Test Specs.Item Pool N%N%Min.Max.N% 1. Numeric 306016646TE22267 2. Algebraic 8168222MC48 33893 3. Geometric 10207220 Item-GR 4. Quantitative 244412On405031988 Total50100364100Off0104512

9 Comparison 1: Content Balance Strand Test Specs.Random1Random2Standards N%N%N%N% Numeric3060183616322346 Algebraic816112219381122 Geographic1020132612241020 Quantitative2481636612 Total50100501005010050100 Strand Typical 1Typical 2Expo1Expo2 N%N%N%N% Numeric2142234625502856 Algebraic1224122411221122 Geographic1224132611221020 Quantitative510243612 Total50100501005010050100

10 Comparison 2: Balance of Item Type Item TypeTest Specs. Sample1Sample2StandardsTypical1Typical2Expo1Expo2 MC 48 4745484750 TE 22352300 On-Grade 40-50444246374047 Off-Grade 0-10684131033 Total50

11 Comparison 3: Item Difficulty Distribution Logit Frequency (%)Random1Random2Standards PoolAbilityN%N%N% -4.0040483600 -3.0010111227141020 -2.0023991810201326 302413268161326 0.003156122421421326 1.0018121212 2.00 0 2000000 3.00 0 0000000 Total100 501005010050100

12 Comparison 3: Item-Parameter Distribution Logit Frequency (%)Typical1Typical2Expo1Expo2 PoolAbility%N%N% -4.0040000012 -3.00101002448 -2.002390012241224 3024424621421734 0.003156585414281530 1.0018001212 2.00 0 2000000 3.00 0 0000000 Total100 5010050100

13 Comparison 4: Item Exposure Rate Sample Under ExpoOver ExpoSub-TotalNormal ExpoTotal 0%≤ 5% ≥ 25% N%N%N Random112 93366173450 Random2101162754234650 Standards151163264183650 Typical1815122754234650 Typical21421113264183650 Expo10020 40306050 Expo20021 42295850

14 Technical Considerations 2 Logit Student Ability (Theta) Sample Forms N%N% -4.005000 -3.00125100 -2.008499214 228924429.00538556536 1.007768321 2.00189200 3.0045000 Total966310014100 Step Two: Fourteen individual test forms were selected based on the frequency distributions of estimated student ability (theta) from the grade 3 mathematics.

15 Comparisons 1: Content and Item Types FormTheta Content StrandItem Type 1234MCTEOn-GradeOff-Grade 1-2.70 308102482491 2-2.17 30992482473 3-1.69 30992482464 4-1.59 30893482446 5-1.48 30893482455 6-1.19 30893482446 7-0.95 30893482428 8-0.40 30893482437 9-0.14 30893482428 10-0.11 30893482437 110.16 30992482437 121.26 308102482437 131.51 30893482437 141.65 30893482437

16 Comparison 2: Item Exposure Rate Unit Under ExpoNormalOver ExpoTotal 0≤ 5 5.1-25 25.1-4545.1-6060.1-8080.1-100 N Item Pool26203413331364 14 Forms05352910165 Unit Item Pool Utilization RateOverlap Rate PoolUsed itemsRate ≥ 35%≤ 35% N. Items% % 14 Forms36420857472316177 36 – 93%7 – 29%

17 Comparison 3: Use of Off-Grade Items Sample FormThetaN. Items Content Strand 1Content Strands 2, 3 & 4 N%N% 1-2.70 1001100 2-2.17 3267133 3-1.69 4375125 4-1.59 63503 5-1.48 5360240 6-1.19 6583117 7-0.95 8563337 8-0.40 7571229 9-0.14 8563337 10-0.11 7571229 110.16 7457343 121.26 7571229 131.51 7571229 141.65 7571229

18 Comparison 4: Content Balance of Item Pool Content Strand Test Specs.Item Pool Diff. (%) N%N% 1. Numeric 306016646-14 2. Algebraic 8168222+6 3. Geometric 10207220 0 4. Quantitative 244412+8 Total50100364100

19 Comparison 5: Item-Parameter in Item Pool Logit Student Ability (Theta)Item Difficulty (b-p)Diff. N% N% -4.00 50164+4 -3.00 12513710+9 -2.00 84998423+14 22892411030+6.00 53855611331-25 1.00 776841-7 2.00 189200-2 3.00 450000 Total 9663100364100

20 Mapping of Person- and Item-Paramters

21 Technical Issues in Alignment for CAT In adaptive testing, each successive item is chosen by a set of constraints to maximize information on estimated ability or to minimize the deviation of the information from a target value. Among the many particular requirements for CAT, a sizeable and well- balanced item pool with regard to content and psychometrics characteristics is a fundamental condition for success. To realize many advantages in CAT, the item pool must contain high- quality items that match the criteria of the item selection algorithm for many different levels of proficiency to provide adequate information. In addition to content constraints, constraint on item exposure rate is essential for item utility and test security.

22 Technical Issues in Alignment for CAT The deficit and excess of items for certain assessed content standard(s) consequently created the issue of under- and over-exposure of test items in the pool. Under-exposed items are often overrepresented in the pool or lack desirable characteristics to meet these constraints for item selection on the test. The presence of unused items in the pool is an unfortunate waste of resources. Over-exposed items not only jeopardize test security, but may result in positive bias of ability estimation, which seriously threatens the validity of high-stakes assessments. In K-12 assessments, the large populations, wide range of proficiency levels among students, broader content coverage, and its high-stakes nature introduces tremendous technical challenges in the development and implementation of CAT.

23 Criteria Commonly Used in Alignment Alignment Level Categorical Concurrence Depth of Knowledge Range of Knowledge Balance of Representation Acceptable 6 items per standard 50%.70 Weak 6 items per standard 40% - 49%.60 -.69 Unacceptable less than 6 items per standard less than 40% less than.60

24 A Brief Summary 1. Technical issues must be taken into account in the design, process for item review, and evaluation of alignment for computerized adaptive testing, such as test specifications, algorithm and constraints for item selection, item exposure rate. 2. No matter the alignment is based on the entire item pool or based on a sample of items, appropriate reference about the alignment of an adaptive test must be supported with evidence from both content/curriculum and technical perspectives 3. The criteria that are commonly used in evaluating the alignment for linear tests may not be suitable for evaluating the alignment for computerized adaptive tests.


Download ppt "Technical Considerations in Alignment for Computerized Adaptive Testing Liru Zhang, Delaware DOE Shudong Wang, NWEA 2014 CCSSO NCSA New Orleans, LA June."

Similar presentations


Ads by Google