Presentation is loading. Please wait.

Presentation is loading. Please wait.

Moderator: Dr. Jan Barth Presenters: Dr. Gary Cook, Dr

Similar presentations


Presentation on theme: "Moderator: Dr. Jan Barth Presenters: Dr. Gary Cook, Dr"— Presentation transcript:

1 Evolution of Alignment Studies: Technically Sound Independent Alignment Studies
Moderator: Dr. Jan Barth Presenters: Dr. Gary Cook, Dr. Lynette Russell, Dr. Vaughn Rhudy, Dr. Liru Zhang Date and time: June 20, :00 – 5:00 PM

2 This presentation will examine:
This presentation will provide past and current thinking around independent alignment studies that are technically sound as they apply to the alignment of state assessments to state-approved standards. This presentation will examine: Acceptable methods/processes that provide evidence of adequate alignment Past Peer Reviews Current Peer Review How states use these findings Ways the alignment studies could better meet state needs Lessons learned Results of a Delaware study Thoughts and questions about the future of alignments

3 2016 Peer Review Requirements
New Peer Review evidence (Critical Element 3.1) requires the State to include: Results of an independent alignment study that is technically sound with methods/process, appropriate units of analysis, and clear criteria that documents adequate alignment. Each assessment is aligned to the: Test blueprint, and each blueprint is aligned to the full range of State’s academic content standards; or Full range of the State’s academic content standards, and procedures the State follows to ensure such alignment during test development Description of a systematic process and timeline the State will implement to address any gaps or weaknesses identified in the alignment studies

4 Evolution of Alignment Studies: 2000 to 2016 Alignment Approaches Dr
Evolution of Alignment Studies: 2000 to Alignment Approaches Dr. Gary Cook Research Scientist Wisconsin Center for Education Research

5 Assessment Alignment Definition
Source Artifact Associating Artifact A procedure meant to describe the strength of association between a source and associating artifact. This procedure is meant to describe the equivalency of artifacts. Equivalency = Align

6 Alignment Elements Breadth The range of connections one artifact displays in another Depth The level of complexity one artifact manifests in another Match The degree to which one artifact connects to another

7 Recent Alignment History
IASA NCLB ESSA 1995 2000 2005 2010 2015 2016 Peer review process 1997 Webb Methodology 2002 Achieve Methodology 2002 Surveys of Enacted Curriculum (SEC) 2005 Webb-variants (ELL, SWD, LAL, Standards) For a good review see: Martone & Sireci in Review of Educational Research, December 2009, Vol. 79, No. 4, pp. 1332–1361. Council of Chief State School Officers (CCSSO). (2002, September). Models for alignment analysis and assistance to states. Washington, DC: Author.

8 Alignment Approaches Standards Assessment CatCon DOK Range Balance
Webb Approaches Standards Assessment ConCen PerfCen Source Level AchievE Standards Assessments Curriculum Survey of Enacted

9 Survey of Enacted Curriculum (SEC)
Webb or Webb Variants Achieve Survey of Enacted Curriculum (SEC) Key Features ­­Independent, qualitative ratings by selected reviewers Typically, an inclusive, collaborative process Publically available alignment software Qualitative ratings by selected subject matter experts (SMEs) A variety of alignment activities possible including comparing exemplar standards A common content- based matrix of “enacted curriculum” Flexible model that compares standards, assessments, and/or instruction Process Empanel reviewers and assign DOKs to standards and align assessment items (or other artifacts) to standards Take the test Conduct in-depth review of each item Rate each item based on the four analysis dimensions Identify educators and have them fill out SEC survey tools (classroom practices and instructional content) Alignment Approaches

10 Survey of Enacted Curriculum (SEC)
Webb or Webb Variants Achieve Survey of Enacted Curriculum (SEC) Analyses Categorical Concurrence DOK Consistency Range-of-Knowledge Correspondence Balance of Representation Content centrality Performance Centrality Source of Challenge Level Demand Educator survey results aggregated into data maps showing relations between assessment, standards and instruction; also creates an alignment index Resources Facilitator to design study and train raters A large number of reviewers (~6 per subject/grade) Access to web-based computer lab Financial commitment to Achieve, Inc. Subject matter experts (2 to 4) Achieve alignment materials Facilitator trainer to set up surveys A number of educators to complete survey Access to web- based survey tool 

11 Survey of Enacted Curriculum (SEC)
Webb or Webb Variants Achieve Survey of Enacted Curriculum (SEC) Training ½ day to train reviewers Has pool of trained SME raters Criteria Recommended values for each alignment element A priori, and descriptive criteria based on Achieve Inc.’s experience and SME’s review There is generally no a priori recommended criteria; judgement of criteria based on user Adapted from: Case, B.J., Jorgensen, M.A., and Zucker, S. (2004). Pearson Assessment Report: Alignment in educational assessment. San Antonio, TX: Pearson, Inc. Downloaded from Council of Chief State School Officers (CCSSO). (2002). Models for alignment analysis and assistance to states. Washington, DC: Author. Downloaded from Martone, A. and Sireci, S. (2009). Evaluating alignment between curriculum, assessment, and instruction. Review of Educational Research, Vol. 79:4, pp. 1332–1361.

12 Wisconsin 2006 Alignment Study
Reading and Mathematics, Grades 3-8 and 10 Language/Writing and Science, Grades 4, 8, and 10 Dr. Lynette Russell Assistant State Superintendent Wisconsin Department of Public Instruction

13 Assessment Peer Review 2006 Issues
Peer Review Negative Issues: Lack of transparency Lack of consistency in what states were told around similar issues Peer Review Positive Issues: Led to states reviewing the technical quality of their assessments, and learning about that in the process States had conversations with each other, and through comparison learned ways they might make improvements to their own tests

14 Alignment Study: Web or SEC?
Survey of Enacted Curriculum (SEC) Compared state standards, test questions/blueprints, and surveys of what teachers were actually teaching to look at where there was overlap and gaps Great for local use and for professional development We were unsure how the Department would view this and we liked the YES/NO clarity of the Web tool . . . and Norm Webb was right down the street from us and we liked working with him so

15 Web Alignment Process Four Alignment Criteria: Report to Include:
Categorical Concurrence Depth-of-Knowledge Consistency Range-of-Knowledge Correspondence Balance of Representation Report to Include: Each area rated as acceptable, needs improvement, or unacceptable Commentary about what would be required for each of the assessments to be fully aligned Notes from reviewers including how each person coded each item and standard

16 Web Alignment Process Before: Met to clarify exactly how the state intended to use the standards and assessments. For instance, some standards may be expected to be assessed in the classroom. Some may be expected to be emphasized to a greater degree in some grades. We had “grade band” standards with assessment benchmarks between, so we needed to talk about how we used those. During(3-day): Eight reviewers/content area. 50/50 in vs. out-of-state. Reviewers coded standards by DOK, items by DOK, and assigned test items to up to 3 standards/objectives. (Individually, and as a group) After: Written report used to plan improvements to assessment

17 1. Categorical Concurrence
The analysis assumed that the assessment had to have at least six items for measuring content from a standard in order for an acceptable level of categorical concurrence to exist between the standard and the assessment. This involved some discussion about what a standard is, versus an objective, theme, topic, reporting category, cluster, or other term.

18 Depth of Knowledge Consistency
At least 50% of the items corresponding to a standard had to be at or above the level of knowledge of the standard. So if a standard was a DOK 3, at least half of the questions needed to be DOK 3 Obviously, this should have been part of the test design, rather than something to be checked after the test was built. We became aware of the importance of our test vendor understanding the alignment requirements for peer review before creating our test

19 Range of Knowledge Correspondence
This judges whether a comparable span of knowledge expected of students by a standard is the same as the span of knowledge that students need to correctly answer the assessment items. 50% of the objectives for a standard had to have at least one related assessment item for the alignment to be acceptable. Picture a check mark by at least half of the objectives. (A state may choose to make the acceptable level on this criterion more rigorous.) If between 40% and 50%, the criterion was “weakly” met.

20 Balance of Representation
This criterion is used to indicate the degree to which one objective is given more emphasis on the assessment than another, with the goal of balanced representation across objectives within a standard. An index value of 1 signified perfect balance An index value less than .5 indicates most items relate to one objective An index value between .6 and .7 indices this criterion has been “weakly” met

21 Summary Grade 7 Mathematics Alignment
Standards Categorical Concurrence DOK Consistency Range of Knowledge Balance of Representation Mathematical Processes YES Number Operations And Relationships WEAK Geometry Measurement Statistics/Probability NO Algebraic Relationships

22 Next Steps to Improve the Test
Item Development - Mathematics, higher DOK Careful scrutiny ELA (and other content areas)- What claims can we really make? Are items really measuring what we mean for them to measure? Are they mapped correctly to the standards we intend them to be assessing? How do we know they are, if 3 (or 4, or 2) out of 8 reviewers thought an item mapped to different standards? Are we sure the scores are meaningful, and that a change in the score from one year to the next is a true indicator of improvement (or decline) in that student’s learning?

23 How the Alignment Process Could be Better
Design the assessment with alignment in mind. For instance, assign DOK levels to the standards and make sure 50% of items are at or above that DOK level. Apply the same thought to the other alignment criteria as the test is built. Have an alignment model that makes it possible to align clusters of standards to complex tasks, not always single items to single standards. That model drives simplistic test design. As we look toward innovative assessment models, we should be thinking about new ways to map out alignment that are not so flat and linear staying open to new ideas.

24 West Virginia Story Dr. Vaughn G
West Virginia Story Dr. Vaughn G. Rhudy Executive Director Office of Assessment West Virginia Department of Education

25 West Virginia Story West Virginia methods and processes that provide evidence of adequate alignment Past Peer Reviews Current Peer Review How West Virginia used these findings Lessons Learned Ways the alignment studies could be improved to better meet State’s needs

26 Past Peer Review Used Webb methodology and processes for all of state alignment studies for the regular summative assessment R/LA Math Science Social studies

27 Past Peer Review For our high school science (Biology and Chemistry) alignment study, we applied what we learned regarding the use of the findings to improve our system of standards and assessments.

28 Standards Review Prior to writing any items, the CSOs for Biology and Chemistry were examined by selected national and state content experts to assign DOK values as part of the RFP deliverables This eliminated the step where WV educators assigned the DOK level for the CSOs only to revise items when Dr. Webb’s DOK values were published It also streamlined the item writing process since there was a definitive DOK assignment to the CSOs

29 Pre-Alignment Study Followed Webb’s methodology:
Task 1: Determining the DOK level of each CSO Task 2: Taking the test Task 3: Determining what each item measured and the DOK for each item Task 4: Summarizing alignment criteria of the items Task 5: Debriefing Questionnaire

30 Post-Alignment Study Again followed Webb’s methodology:
Task 1: Confirming the depth-of-knowledge (DOK) level of each CSO Task 2: Taking the test Task 3: Determining what each item measured and the DOK for each item Task 4: Summarizing alignment criteria of the items Task 5: Debriefing Questionnaire

31 Final Alignment Study Reports Provide …
Clear definitions and findings for the four categories: Depth-of-Knowledge Consistency Categorical Concurrence Range-of-Knowledge Correspondence Balance of Representation Source-of-challenge issues and suggested edits for test questions DOK Alignment Analysis Reliability among Reviewers Summary of Results and Conclusions The sources of challenge may include such issues as questions containing misleading factual information, questions requiring prior knowledge, questions with possible clueing among distractors, and questions deemed by the reviewer as having two possible correct answers.

32 How West Virginia Used These Findings
Preliminary findings used to tweak standards and assessment items to ensure a stronger alignment in post alignment. Post alignment findings used to refresh items/forms. Reviewed Source of Challenge to assist in understanding the issues to best revise items – and in some cases the standards. Reviewed Source of Challenge to assist in selecting refreshed items/forms.

33 Lessons Learned Saved time and money by NOT developing items to incorrect DOK levels, which traditionally were determined by WV educators Streamlined process with minimal editing required for the science assessment Required contractor to be trained on Webb alignment for best results in training for item writing/form selection Make sure the Curriculum and Instruction people attend the Alignment Studies; this knowledge is helpful when they rewrite or edit the State standards. For example, if WV determined a standard to be a Level 3 DOK, the reviewers might label High Level 2. Then, you have to go back and revise/re-select items in your forms. To avoid these costs and staff time, WV required a DOK Review of Standards in the Science Alignment Study RFP from the national team to avoid developing items to incorrect DOK.

34 Ways Alignment Study Could be Improved to Meet State’s Needs
Standards review/analysis of DOK as part of the early process by national alignment study team for DOK Once Alignment Study team has determined standard’s DOK, they remain the same throughout alignment process. Having national team review standards and analyzing DOK levels early saves the state time and money. Again, WV believes this is cost effective since it creates less item editing because of mismatched DOK assignments by state educators.

35 Current Peer Review West Virginia is a governing member of the Smarter Balanced Assessment Consortium: Smarter Balanced submitted common, coordinated evidence, including Smarter Balanced’s alignment study, for peer review on behalf of member states. States submitted separate submission, referencing Smarter Balanced’s submission and providing additional evidence where necessary.

36 Smarter Balanced Alignment Study
Five workshops to examine alignment connections between: Content Specifications and the CCSS (Connection A) Evidence Statements and Content Specifications (Connection B) Content Specifications and Test Blueprints (Connection C) Evidence Statements and Items (Connection D) Items/Performance Tasks and Content Specifications (Connection G)

37 Smarter Balanced Alignment Study
Alignment Criteria: Criterion 1: Content Representation Criterion 2: Depth of Knowledge (DOK) Distribution Criterion 3: DOK Consistency Criterion 4: Agreement between Reviewers’ and Content Specifications/Developers’ Ratings Criterion 5: Agreement among Reviewers’ Ratings

38 Overall Smarter Alignment Summary
Alignment study included several elements and connections not usually included in typical alignment studies. Study found overall alignment results acceptable for ELA/L and mathematics. Reviewer agreement increased as they worked from standards through Content Specifications to evidence statements to items. Reviewer alignment agreement increased when the level of analysis was broadened, for example analyzing results at the cluster level instead of the target level.

39 Delaware Story: Alignment for Computerized Adaptive Tests
Liru Zhang, Delaware DOE Shudong Wang, NWEA

40 Validity of Test Scores
Validity refers to the degree to which evidence and theory support the interpretations of test scores for intended use. The process of validation involves accumulating relevant evidence to support the proposed score interpretations (Standards, 2014) Alignment is an important attribute in the standards-based educational reform. It is the source of validity evidence to determine to what extent the assessment measures the content standards Over decades, commonly used methods of alignment (e.g., Webb’s procedure) are curriculum-based approach Curriculum-based alignments share a similar process, in which content specialists review item-by-item from an assembled test form and match each item to the claim or objective or topic level of the content standards to determine the degree of alignment Webb’s model Survey of Enacted Curriculum Achieve Assessment-to-Standards Alignment Protocol

41 Curriculum-based Alignment
Serious concerns have been raised about the process and the evaluation criteria from the curriculum-based alignment. For instance Test specifications are excluded from the review process; All content standards are expected to be measured in a test and all are with equal weight; The process for alignment and the evaluation of alignment results are disconnected to the interpretations of test scores. “Although it follows from a consideration of standard-based testing that assessments should be tightly aligned with content standards, it is usually impossible to comprehensively measure all the content standards using a single summative test” (Standards, 2014). Content-oriented evidence of validation is at the heart of the process of alignment to evaluate whether test content appropriately samples the domain set forward in curriculum standards Webb’s alignment Survey of Enacted Curriculum Achieve Assessment-to-Standards Alignment Protocol

42 Alignment for Adaptive Tests
With advanced and innovative technology, computerized adaptive testing (CAT) has been increasingly implemented in K-12 assessments in recent years. The curriculum-based alignment is designed for linear tests (or fixed test forms). Can we use the same approach for the alignment of adaptive tests? How to perform the alignment for CAT, using the item pool or select a sample of items or select a sample of individual tests? Are there any technical issues that must be considered? In this presentation, a) the differences between linear tests and adaptive tests are discussed; and b) the potential impacts of some technical issues on the alignment results are explored. It is inconvenient and time consuming to use the item pool for alignment, especially when the pool contains thousand of items.

43 Linear Tests vs. Adaptive Tests
Test form(s) are assembled prior to operation based on the test specifications. All students take the same test items. Linear tests normally target at a medium difficulty level. The alignment is primarily on the basis of pre- assembled test form(s). Many unique test forms are assembled during testing for individual students. Items are selected to match (1) the test specifications and (2) estimated student ability level. Items of unique test forms may vary greatly. How to determine the degree of alignment in adaptive testing?

44 Test Specifications and Item Pool
Content Strand Test Specs. Item Pool Item Type N % Min. Max. 1. Numeric R. 30 60 166 46 TE 2 26 7 2. Algebraic R. 8 16 82 22 MC 48 338 93 3. Geometric R. 10 20 72 Item-GR 4. Quantitative R. 4 44 12 On 40 50 319 88 Total 100 364 Off 45 Comparison of test sepcs, item pool, and primary constraints for item selection The test is fixed length of 50 items Imbalanced item pool in content category and by item types

45 Item Pool Analysis by Content
Content Strand Test Specs. Item Pool Diff. (%) N % Numeric R. 30 60 166 46 -14 2. Algebraic R. 8 16 82 22 +6 3. Geometric R. 10 20 72 4. Quantitative R. 2 4 44 12 +8 Total 50 100 364 1. Here is one reason. The content imbalanced item pool

46 Item Pool Analysis by Item Difficulty
Logit Student Ability (Theta) Item Difficulty (b-p) Diff. N % -4.00 5 16 4 +4 -3.00 125 1 37 10 +9 -2.00 849 9 84 23 +14 -1.00 2289 24 110 30 +6 .00 5385 56 113 31 -25 1.00 776 8 -7 2.00 189 2 -2 3.00 45 Total 9663 100 364 This table show the disparity of the item-parameters in the pool compared with person-parameter for high-achieving students.

47 Sample Items Step One: To explore the possible impacts of the characteristics of an item pool, estimated student ability, and item exposure rate on the alignment, seven samples of items, with 50 items of each, were selected from the grade 3 mathematics item pool. Two random samples (samples 1a and 1b) A weighted sample based on the relative proportions of the four content categories in the pool (sample 2) Two samples based on the most typical person-parameters and item- parameters from the corresponding frequency distributions (samples 3a and 3b) Two samples based on item exposure rate by excluding never-used items (samples 4a and 4b) Random: 50 items are randomly selected from the item pool. Standards: A weighted sample of 50 items are selected by content constraint based on the test specifications Typical: In grade 3, 2289 (24%) and 5385 (56%) = 7674 (80%) students were estimated ability level at 0.00 to -1.00; and the item pool includes 61% items are at the same logit levels. Expo:

48 Samples 3a and 3b Logit Person Parameter Item Parameter Sample 3-a
Sample 3-b N % -4.00 5 16 4 -3.00 125 1 37 10 -2.00 849 9 84 23 -1.00 2289 24 110 30 21 42 46 .00 5385 56 113 31 29 58 27 54 1.00 776 8 2.00 189 2 3.00 45 Total 9663 100 364 50 Since 80% of the third graders were estimated the ability level at 0.00 to and 61% of the items are at the same region of logit in the pool, two samples were selected as considered typical or representative to the item pool. Rasch model – item and person parameters are on the same scale.

49 Comparison by Item Content
Strand Item Pool Sample 1a Sample 1b Sample 2 N % Numeric 166 46 18 36 16 32 23 Algebraic 82 22 11 19 38 Geographic 72 20 13 26 12 24 10 Quantitative 44 8 3 6 Total 364 100 50 Sample 3a Sample 3b Sample 4a Sample 4b 21 42 25 28 56 5 2 4 1 The resulted samples show varying combinations in terms of the four content constrains. Samples 1a-b are random samples - different Sample 2 is selected by content category, so it matches the pool by content - matched well in content 3. Samples 3a-b are selected by most frequent cases at the median level – different 4. Samples 4a-b selected by item exposure rate, excluding not used items – over for 1; less for 4.

50 Comparison by Item Type
Item Pool Sample 1a Sample 1b Sample2 Sample 3a Sample 3b Sample 4a Sample 4b MC 338(.93) 48 (.96) 47 (.94) 45 (.90) 50 (1.0) TE 26 (.07) 2 (.04) 3 (.06) 5 (.10) 0 (0) On-Grade 319 (.88) 44 (.88) 42 (.84) 46 (.92) 37 (.84) 40 (.80) Off-Grade 45 (.12) 6 (.12) 8 (.16) 4 (.08) 13 (.16) 10 (.20) Total 364 50 1. The resulted samples show varying combinations in terms of item types. 2. The test needs 96% MC and 4% GI; the pool has 93% MC (3% less) and 7% GI (4% more). The EXPO1 & 2 samples indicate the unbalanced pool for the to item types.

51 Comparison by Item-Parameter
Logit Frequency (%) 1a 1b 2 3a 3b 4a 4b Item Person % -4.00 4 8 6 -3.00 10 1 22 14 20 -2.00 23 9 18 26 24 -1.00 30 16 42 46 34 0.00 31 56 58 54 28 1.00 2.00 3.00 Total 100 This table shows the comparisons between the distributions of item difficulties (b-parameter) in the pool with selected sample items. 1. All samples are more reflect the characteristics of the pool, rather than student ability Comparison by Item-Parameter

52 Comparison by Item Exposure Rate
Sample ≤ 5% 5.1 – 25% 25.1 – 100% Item Pool 26 20 34 Sample 1a 12 17 9 Sample 1b 10 11 23 46 6 Sample 2 15 18 36 Sample 3a 8 Sample 3b 14 21 Sample 4a 30 60 Sample 4b 29 58 This table shows the actual exposure rate of the items selected in each sample Except for the two EXPO samples, the under- and over-exposed items are ½ to 2/3 More importantly, 8-15% of the 50 items are never being used and 11-21% of the 50 items are under-exposed Comparison by Item Exposure Rate

53 Impacts of Using Individual Tests
Logit Student Ability (Theta) Sample Tests N % -4.00 5 -3.00 125 1 -2.00 849 9 2 14 -1.00 2289 24 4 29 .00 5385 56 36 1.00 776 8 3 21 2.00 189 3.00 45 Total 9663 100 Step Two: Fourteen individual tests were selected proportionally based on the frequency distributions of estimated student ability (theta) from the operational data of the grade 3 mathematics. 14 individual test forms were selected based on the empirical data to reflect the distribution of theta.

54 Comparison by Content and Item Type
Form Theta Content Strand Item Type NR (30) AR (8) GR (10) QR (2) MC (48) TE (2) On-GR (40-50) Off-GR (0-10) 1 -2.70 30 8 10 2 48 49 -2.17 9 47 3 -1.69 46 4 -1.59 44 6 5 -1.48 45 -1.19 7 -0.95 42 -0.40 43 -0.14 -0.11 11 0.16 12 1.26 13 1.51 14 1.65 The 4 content strands, number of on- and off-grade items, and number MC and GI items are the primary constraints for items selection. The sample test forms show perfect match to the constraints within the acceptable ranges

55 Comparison by Off-Grade Items
Sample Test Theta N. Items Content Strand 1 Content Strands 2, 3 & 4 N % 1 -2.70 100 2 -2.17 3 67 33 -1.69 4 75 25 -1.59 6 50 5 -1.48 60 40 -1.19 83 17 7 -0.95 8 63 37 -0.40 71 29 9 -0.14 10 -0.11 11 0.16 57 43 12 1.26 13 1.51 14 1.65 This table shows how off-grade items were used and if these items match student estimated theta to improve the measurement accuracy. It is found that 50-83% off-grade items measure content strand one. Numeric Reasoning. Why?

56 Comparison by Item Exposure Rate
Unit ≤ 5 5.1-25 Subtotal Item Pool 26 20 34 13 3 1 14 Forms 5 35 29 10 16 60 Item Pool Utilization Rate Overlap Rate Pool Used items Rate >35% ≤ 35% N. Items % 364 208 57% 47 23 161 77 36 – 93% 7 – 29% 1. This table shows the item exposure rate in the pool and for the 14 sample test forms 2. The utility rate is 57% even though the 14 forms are selected from the low to high proficiency levels items out of 208 (23%) used items show the overlapping rate 36-93%. Among them, 26 items are shared by 7 test forms (50%) Comparison by Item Exposure Rate

57 A Brief Summary Should we use the entire item pool for alignment or select samples of items or individual tests for alignment? The results of analyses in the current study show great variations of selected items, from sample to sample, in content categories, item types, item difficulties, and exposure rate in operation. The variations across samples suggest the influence of technical issues on the alignment results. In K-12 assessments, the large populations, wide range of proficiency levels among students, broader content coverage, and its high-stakes nature introduces tremendous technical challenges in the alignment for CAT.

58 A Brief Summary (2) Technical issues must be taken into account in the design, process for item review, and evaluation of alignment for computerized adaptive testing, such as test specifications, algorithm and constraints for item selection, item exposure rate. (3) No matter the alignment is based on the entire item pool or based on a sample of items, appropriate reference about the alignment of an adaptive test must be supported with evidence from both content/curriculum and technical perspectives (4) The criteria that are commonly used in evaluating the alignment for linear tests may not be suitable for evaluating the alignment for computerized adaptive tests. I am working on a paper on the topic including some guideline for technical considerations, process, and evaluatiuon.

59 Alignment Approaches and Consideration for 2016
Dr. Gary Cook Research Scientist Wisconsin Center for Education Research

60 Some thoughts and questions about Peer Review and the future of alignment

61 Alignment & Peer Review
Critical Element 2.1: The State’s test design and test development process is well-suited for the content, is technically sound, aligns the assessments to the full range of the State’s academic content standards Critical Element 3.1: Documentation of adequate alignment between the State’s assessments and the academic content standards the assessments are designed to measure in terms of content (i.e., knowledge and process), the full range of the State’s academic content standards, balance of content, and cognitive complexity;

62 Alignment & Peer Review
Critical Element 4.7: The State has a system for monitoring and maintaining, and improving as needed, the quality of its assessment system, including clear and technically sound criteria for the analyses of all of the assessments in its assessment system (i.e., general assessments and alternate assessments). [This includes independent alignment studies.]

63 Peer Review Takeaways Alignment to standards should be a documented part of the test development process Adequate alignment is required for content and alternate assessments When reporting alignment findings, peer reviewers will be looking for adequate match, depth and breadth and what the SEA is doing with findings Alignment is not a once-and-done activity; there should be an ongoing process to monitor and adjust assessments based on standards alignment

64 Things to think about… Methods for aligning adaptive assessments
Challenges with Alternate Assessments Alignment (correspondence) of ELP standards to college and career content standards Alignment of standards to ”entrance requirements of public higher education… and relevant career and technical education standards” (s.1111(b)(1)(D))

65 Adaptive Assessment Alignment Questions
How can you assure that items within the item bank are all aligned? How can you assure that the selection algorithm adequately covers the match, depth and breadth of the standards? Two ideas: Align all items in the bank and show how selection algorithm provides adequate alignment Align a representative set of forms based on the selection algorithm

66 Association = Correspond
Correspondences Since 2005, procedures have been developed to describe derivative or associative relationships between artifacts. These procedures are termed ”Correspondences,” e.g., Alt assessment to content standards; ELP standards to content standards. Source Artifact Associating Artifact Association = Correspond

67 Good Correspondence Resources
Links to Academic Learning; Flowers, et al (2007) Webb-based SWD alignment; Roach, et al (2010) ELPD Framework; CCSSO (2002)

68 Alignment to College? How might we think about alignment to entrance requirements of credit bearing courses? What methods should one apply to show this alignment? How should this information be documented? Communicated?

69 Questions to Ponder What is the purpose of the alignment?
Methodology Findings Actions What is the purpose of the alignment? What resources are available to conduct the alignment? Does the methodology fit the alignment’s purpose(s)? How will the findings be communicated and to whom? What actions will take place as a result of the alignment study? How will these actions be documented.

70 Questions And Thank You


Download ppt "Moderator: Dr. Jan Barth Presenters: Dr. Gary Cook, Dr"

Similar presentations


Ads by Google