Introduction to Accessible Reading Assessment June 14, 2008

Introduction to Accessible Reading Assessment June 14, 2008
CCSSO National Conference on Student Assessment Martha L. Thurlow

Today’s Purpose Highlight the challenges in reading assessment for students with disabilities Provide an overview of relevant research design and analysis Present research findings from projects funded to research and develop accessible reading assessments Identify implications of the research for you Share Principles and Guidelines of based on our research and other resources 2

National Accessible Reading Assessment Projects
Designing Accessible Reading Assessments (DARA) Partnership for Accessible Reading Assessment (PARA) Technology Assisted Reading Assessment (TARA) 3

NARAP Goals Develop a definition of reading proficiency
Research the assessment of reading proficiency Develop research-based principles and guidelines making large-scale reading assessments more accessible for students who have disabilities that affect reading Develop and field trial a prototype reading assessment 4

Designing Accessible Reading Assessments (DARA)
Educational Testing Service (ETS) Focuses on students with learning disabilities Focuses on component approach to assessing reading skills. Primary focus are: Word Recognition Reading Fluency Vocabulary Knowledge Comprehension 5

Partnership for Accessible Reading Assessments (PARA)
Collaboration of National Center on Educational Outcomes and U of MN Department of Curriculum and Instruction, CRESST, U of CA Davis, and Westat Focus on all disabilities that impact reading, particularly: Learning disabilities Speech or language impairments Mental retardation Deafness or hard of hearing 6

Technology Assisted Reading Assessment (TARA)
ETS, NCEO and Center for Applied Special Technology (CAST) Focus on students with visual impairments Focus on: Examining the performance of operational ELA tests for students with visual impairments Development of prototype Technology Assisted Reading Assessment Inclusion of VI students in NARAP field test 7

Background All projects focus on standards-based general assessments based on grade-level achievement standards – the regular assessment! Not focused on alternate assessments based on alternate achievement standards Not focused on alternate assessments based on modified achievement standards Still, work may sometimes be applicable to these too 8

Who We Are Martha Thurlow (PARA, TARA)
Cara Cahalan Laitusis (DARA, TARA) Linda Cook (DARA, TARA) David O’Brien (PARA) Jamal Abedi (PARA) Discussant – Peggy Carr 9

Who Are You? 10

Plan for Today 1:00 – 2:30 Introduction
Issues for Students with Disabilities Research Design and Analysis 2:30 – 2:45 BREAK 2:45 – 3:35 Identifying Less Accurately Measured Students Impact of Motivation and Engagement 3:35 – 3:50 BREAK 3:50 – 4:30 Segmented Reading Passages Principles and Guidelines 4:30 – 5:00 Peggy Carr, Discussant 11

Workshop Notebook Agenda Each Topic Biographies for Presenters
Notebook Tabs – Guide to Workshop Process Agenda Each Topic Powerpoint presentation Resource materials Biographies for Presenters Notepaper 12

Ground Rules Ask questions for clarification Interact with us!
Take care of own needs 13

Contact Information Martha L. Thurlow University of Minnesota 207 Pattee Hall 150 Pillsbury Drive SE Minneapolis, MN 14

CCSSO National Conference on Student Assessment Martha L. Thurlow
Issues in Assessing Reading of Students with Disabilities June 14, 2008 CCSSO National Conference on Student Assessment Martha L. Thurlow 15

Poor performance of students with disabilities is a big indicator that there are issues in assessing their reading performance Data from state reading assessments shows that this is so 2

Gaps in Performance on Reading Assessments
Elementary School 3

Middle School 4

High School 5

Gaps exist in reading performance at all school levels:
They increase as the grade level increases They vary by state, but the variability seems to be more a function of the difficulty of the test than its accessibility (states with the lowest and highest average scores for students with disabilities have smaller gaps – probably due to ceiling and floor effects) 6 20

Disabilities affect reading in many ways – we explored the ways in which disabilities may affect reading for 7 categories and developed a report about each: Visual Impairments Deaf or Hard of Hearing Autism Learning Disabilities Mental Retardation Speech or Language Impairments Emotional or Behavioral Disabilities 7 21

Purpose To provide general information about specific disabilities and how they interact with reading, so that reading professionals and others who might contribute to the development of accessible reading assessments understand some of the challenges that need to be addressed. 8 22

Disclaimer Papers clearly state that the purpose is to begin a discussion of the issues surrounding reading and students with each disability. The papers were not intended to be comprehensive research reviews. We have clarified that the papers are for people who do not know the disabilities or for those who have not considered the interaction of disabilities with reading. 9

Overview Students Receiving Special Education Services 10 24

Reading and Students with Visual impairments
Most students with visual impairments are not blind. Tactile (braille) and auditory methods of accessing text are most common. Common classroom supports and accommodations may not be available for state assessment. 11

Reading and Students who are Deaf or Hard of Hearing
Age of onset of hearing loss and other factors shape educational and communication experiences. Many communication forms (e.g., American Sign Language, Manually Coded English, lip reading); cochlear implants have raised new issues. State assessment policies vary in whether they allow commonly used accommodations. 12

Reading and Students with Autism
Many students with Asperger Syndrome can decode words well, but may lack comprehension skills (Barnhill, 2004). Students with autism may find it difficult to screen out distractions. Accommodations are not as often designated toward this group. 13

Reading and Students with Specific Learning Disabilities
90% of students with learning disabilities identify reading as their primary difficulty (President’s Commission on Excellence, 2003). The read aloud accommodation is one of the most common and controversial accommodations provided for these students. 14 28

Reading and Students with Mental Retardation
Historically, educators often skipped academics (including reading) in favor of functional, social, or motor skills. Despite wide variety of characteristics that can influence reading (poor short-term memory, low-level meta-cognition), reading skills can be mastered by many students with mental retardation. Access to the general curriculum, broader accommodations, and alternate assessments are aspects of reading achievement for students with mental retardation. 15

Reading and Students with Speech or Language Impairments
Since reading is a language-based skill, students without strong language skills may be at-risk. Accommodations for these students reflect reading strategies used with them – read aloud, assistive augmentative communication devices, and frequent breaks during assessment. 16

Reading and Students with Emotional or Behavioral Disorders
Students need to compensate for lack of attention, distractibility, etc. Some accommodations that are needed are generally acceptable (breaks, quite room), while other are questionable (motivational prompts, calming music). 17

Accommodations Accommodations are more of an issue for reading assessments than for other content areas. This occurs because many students use those accommodations that may produce invalid scores (known as modifications in most places). 18

Percentages of Students Using Certain Accommodations
Not all states publicly report on the use of accommodations, much less the specific accommodations or percentages of students. States that do include: Colorado North Carolina Based on data 19 33

The Challenge for Accountability
Students who use accommodations that produce invalid scores – modifications – will now count as nonparticipants in the assessment. 20

Accessibility An accessible assessment is one that reveals the knowledge and skills of students whose characteristics create barriers to accurate measurement of these on traditional reading assessments It measures the same knowledge and skills, at the same level It may reduce the need for accommodations 21 35

Implications It is important to understand the characteristics of all students taking assessments, including those that may affect performance but are not what is being measured (e.g., short term memory) Disabilities do not are not the “cause” of poor performance – most students with disabilities can perform at levels comparable to their peers – if we make sure they get access to the curriculum, instruction, accommodations, and accessible assessments! 22

Implications Accommodations are an important part of accessibility – more important now than ever before to address them (which can be incorporated into the assessment, which really produce invalid scores) We need to explore innovative approaches to improving accessibility – things that in the end may benefit all students 23

Research Design and Analysis June 14, 2008
CCSSO National Conference on Student Assessment Cara Cahalan Laitusis and Linda Cook 39

Overview Types of Questions Research Designs and Analyses
Case Example from DARA project Can read aloud and standard scores be reported on the same scale? Questions and answers 2 40

Research Questions Are test scores from accommodated and non-accommodated tests: Psychometrically comparable? Measuring the same construct? Equally valid predictors of the construct? What changes to test items (or administration) can: Increase/decrease accessibility? Increase/decrease validity of scores? Engage students with disabilities? Provide useful feedback to teachers? 3 41

Research Studies Opinion Research Item Tryouts Experimental Studies
Analysis of Operational Test Data 4 42

Opinion Research Types of Opinion Research Potential uses: Surveys
Interviews Focus Groups Potential uses: Explore the types of changes to test items (or administration) that may be worthy of additional research Identify problems in assessment design or administration 5

Item Tryouts Cognitive Labs (Think Alouds) Pilot Testing Field Testing
6

Item Tryouts Cognitive Labs (Think Alouds) 9-20 students per subgroup
Requires one-on-one administration Qualitative analysis of responses 7

Item Tryouts Pilot Testing 20-40 students per subgroup
Group administration Qualitative and Quantitative analysis of responses 8

Item Tryouts Field Testing 100 students per subgroup
Group administration Quantitative analysis of responses 9

Experimental Studies Are test scores from accommodated and non-accommodated tests: Psychometrically comparable? Measuring the same construct? Equally valid predictors of the construct? What changes to test items (or administration) can: Increase/decrease accessibility? Increase/decrease validity of scores? Engage students with disabilities? Provide useful feedback to teachers? 10 48

Requirements of Experimental Studies
Large sample sizes Random assignment of students to experimental groups to eliminate Form effects Order effects (test form or accommodation) Examinations of change generally require: Two samples (students with and without disabilities) Two testing conditions (standard and test change) Two equated test forms 11 49

Pros and Cons Experimental Design
Disentangle accommodation or test change from disability Impact of test change on total test score can be directly measured Impact of other effects (order, test form, disability-accommodation interactions) can be mitigated Cons Expensive Time consuming May not be able to simulate testing environment for high stakes testing 12

Differential Boost Data Collection Design
Group Session 1 Session 2 Form Accommodation/ Modification 1 Standard 2 Accommodation 3 4 13 51

Analyses Does test change result in differential performance gains for students with disabilities? Repeated Measures Analysis of Variance Sample sizes vary based on degree of change expected Power .8, significance level .05, sample size needs to be 175 per group to detect and effect size of .20 Select degree of change that is practically significant rather than just statistically significant: 14 52

Analyses Which score is a better predictor of construct?
Collect alternative data on construct Teachers Ratings Grades Alternate test of same construct Future performance (if test predicts readiness) Analyze data using Regression Analyses Minimum sample size of 100 per group 15 53

Studies Using Operational Data
Using operational test data to study the validity and fairness of assessments for students with disabilities 16 54

Studies Based on Operational Test Data (Overview)
Using operational test data to study fairness and validity Studies that use differential item functioning Studies that use factor analysis 17 55

Using Operational Test Data to Study Fairness and Validity
Willingham’s definition of fairness and validity “It seems clear that the overriding issue is the comparability of tests administered to people with disabilities to those administered to others.” Marks of Comparability Reliability Factor structure Item functioning 18 56

Using Operational Test Data to Study Fairness and Validity
Effective program of research might use Willingham’s framework to compare psychometric properties and internal structure of a state test for: Students without disabilities who take the test under standard conditions; Students with disabilities who take the test under standard conditions; and Students with disabilities who take the test with accommodations and/or modifications 19 57

Pros and Cons of Using Operational Test Data
Readily available Large sample sizes Less expensive Realistic Cons Disability may be poorly or inaccurately described Accommodations are bundled; not always described accurately 20 58

Comparing the Internal Structure of an Assessment
Differential item functioning (DIF) Factor analysis 21 59

What is Differential Item Functioning
Reference Group Students whose performance is used as the standard for the DIF comparison Focal Group Students whose item performance is the focus of the study Test takers matched on proficiency level 22 60

Pros and Cons of Using Differential Item Functioning
Well established procedures Can be used with relatively small samples Analyses are simple and inexpensive 23 61

Pros and Cons of Using Differential Item Functioning
Cons/issues Results sometimes difficult to interpret Non-uniform DIF Differences in ability level of focal and reference groups 24 62

Pros and Cons of Using Differential Item Functioning (cont.)
Cons/issues Matching criterion Confounding of disability and accommodation 25 63

What is Factor Analysis
A statistical method used to explain relationships among variables Variables can be test scores or item scores Exploratory analyses Confirmatory analyses Single-group and multiple-group analysis 26 64

Pros and Cons of Using Factor Analysis
Reduction in number of variables Identification of groups of interrelated variables 27 65

Pros and Cons of Using Factor Analysis
Requires relatively large samples More than one interpretation can be made of the same data factor analyzed in the same way Factor analysis cannot identify causality 28 66

Case Example Designing Accessible Reading Assessment Project
29 67

DARA Project Primary Question:
Can we assess some components of reading (decoding and comprehension) in isolation using multi-stage test design? Primary Focus: Students with learning disabilities, particularly those who receive read aloud accommodations 30 68 68

Possible Solutions Increase the reliability of test scores for students scoring at the chance level (or below) on current state assessments Allow students with reading-based disabilities to receive scores on separate components of reading (potentially comprehension, vocabulary, decoding, fluency) as well as total test score for accountability purposes. Allows state to count scores of students that receive read aloud modification for AYP because it includes a separate measure of fluency and decoding. Holds teachers accountable for both decoding and comprehension instruction. 32 70

Research Questions Are test scores from standard and read aloud administrations psychometrically comparable? Does the read aloud administration offer an unfair advantage to test takers with disabilities? Is the “audio + fluency” route comparable to “standard administration” route in terms of predicting teacher’s ratings of reading comprehension? 33

Research Studies Experimentally designed differential boost study
RM ANOVA Regression DIF Factor Analysis Analysis of operational test data Simulation of multistage design Cognitive labs 34

Questions 1 Does the read aloud administration offer an unfair advantage to test takers with disabilities? 35

Differential Boost Data Collection Design
Group Session 1 Session 2 Form Accommodation/ Modification 1 S Standard T Audio 2 3 4 36 74

Data Collected Primary Measure
2 Reading Comprehension Tests (Form S and T) Extra time Extra time with Read Aloud via CD Additional Measures 2 Fluency Measures 2 Decoding Measures (4th grade only) Student Survey Teacher Survey 37 75

Sample 1181 4th grade students 855 8th Graders
527 students with reading based learning disabilities (RLD) 654 students without a disability (NLD) 855 8th Graders 394 RLD 461 NLD 38 76

Results of the Differential Boost Study
RM-ANOVA indicated that students with reading disabilities had a significantly larger boost from an audio (read-aloud) accommodation than students without disabilities findings consistent with Fletcher et al and Crawford & Tindal, 2004) Other Findings Controlling for other factors (e.g., reading fluency, decoding)) using RM-ANCOVA does not change these findings Controlling for ceiling effects does not change these findings 39 77

Questions 2 Are test scores from standard and read aloud administrations psychometrically comparable? 40

Analyses of ELA Assessment Using Operational Data
Factor analyses Differential item functioning analyses and distractor analyses Groups of Interest Students with learning disabilities who took the test with and without a change in testing conditions Test Grade 4 and grade 8 English-language Arts (ELA) assessment Focus Determine if the test measures the same constructs for Examinees without disabilities Examinees with learning disabilities who took the test with and without a change in testing conditions 41 79

Differential Item Functioning (DIF) Analyses
The purpose of the DIF study was to examine whether or not the ELA assessment measured the same construct (s) for the groups in our study Used Mantel-Haenszel procedure with total score as criterion Mantel-Haenszel categorization A—negligible DIF B—slight to moderate DIF C—moderate to large DIF Direction of DIF Flags Negative favors reference group Positive favors focal group 43 81

Results of the DIF Study
Fourth grade results 1 C DIF item, 10 B DIF items 5 B DIF items were reading items that favored students with disabilities who took test with read-aloud change in testing conditions 5 B DIF items (3 reading and 2 writing) favored students without disabilities 44 82

Results of the DIF Study
Eighth grade results 1 C DIF item, 7 B DIF items Five B DIF items (4 reading and 1 writing) favored students who took test with read-aloud change in testing conditions 45 83

Factor Analyses of ELA Assessment
Purpose: to examine whether or not the ELA assessment measured the same construct (s) for the groups in our study Exploratory analyses (separately in each group) how many factors Confirmatory (multi-group) Establish base-line model Confirm number of factors needed to describe data across all groups 46 84

Results of Factor Analysis of ELA Assessment
Compared the internal structure of the grade 4 and grade 8 ELA assessment Students without disabilities Students with disabilities (no test conditions changes) Students with disabilities (504/IEP accommodations) Students with disabilities (read-aloud change in testing conditions) Results suggest test measures same single dimension for all groups 47 85

Questions 3 Is the “audio + fluency” route comparable to “standard administration” route in terms of predicting teacher’s ratings of reading comprehension? 48

Predictive Validity of Scores
Regression analyses were conducted to examine which test scores captured the most variance in teachers ratings of reading comprehension by grade and disability group. Tested 4 models: – Standard – Standard + Fluency – Audio – Audio + Fluency 49

Predictive Validity of Scores
Tests taken with read-aloud do not predict teachers ratings of reading comprehension as well as tests taken under standard conditions. However combining read-aloud scores with reading fluency scores results in equal (or better) predictions of teacher ratings than tests taken under standard conditions. 50

Lessons Learned and Implications
Reading fluency is an important element of reading comprehension based on teachers ratings States should consider administering reading fluency measure when read aloud is used on reading comprehension assessments Scores from standard administration and read aloud condition are fairly comparable psychometrically States should consider replicating these DIF and Factor Analyses studies to provide some justification for reporting read aloud and standard scores on the same scale 51

Lessons Learned and Implications
4th and 8th graders have no problems using individualized CD players States should consider this type of standardized administration instead of human readers Students were better than their teachers in predicting if scores would improve with read aloud Students should be included in the decision making process on the use of read aloud accommodations 52

Questions? Contact Information Cara Cahalan Laitusis Linda Cook 53

Identifying Less Accurately Measured Students June 14, 2008
CCSSO National Conference on Student Assessment Ross Moen, Martha Thurlow, Kristi Liu

Higher Scores for All? As was previously described, typical reading assessments have limitations for assessing the reading skills of students with disabilities. Is accessible reading assessment a way to increase test scores for all students with disabilities? 2

Interaction Hypothesis Might Suggest All SWD Scores Rise
Students Lacking Disabilities Students With Disabilities Typical Assessment Accessible Assessment 3

But Reality Is More Complicated
Some students with disabilities already score well despite their disabilities. Some students with disabilities truly cannot do what a State’s standards require. Regardless of where the fault lies - whether with the instruction, the student or elsewhere – assessments should show if a student cannot do what is required. 4

Scores Should Rise For Some Less Accurately Measured Students
Clear High Scores Assessment Less Accurately Measured Students (LAMS) Questionable Low Scores Clear Low Scores 5

Sources of Reduced Accuracy
Systematic Error Random Error Cheating, Narrow Teaching to the Test Lucky Learning, Good Guesses Bias, Inappropriate Obstacles Bad Day, Bad Guesses, Test Taking Errors Too High Scores Too Low Scores 6

How Can We Identify Potential LAMS?
Compare test results with (what?) other information Match Compare Mismatch ? Match 7

Compare Tests with Teacher Judgment?
? = 8

Teacher Nominations: Study Goals
How well can teachers identify LAMS? Do they say they can? Can they distinguish reasons for LAMS? Can they provide supporting evidence? Do brief supplemental examinations match teacher judgments? What can we learn from teachers’ LAMS? What do they say they need or want? What do we observe in assessment situations? 9

Teacher Nominations: Study Procedures
Teachers complete LAMS nomination questionnaire 4th and 8th grade classroom, reading, English/language arts, special education teachers Researchers meet with teachers Structured interview & examine supporting evidence Researchers meet with students 4th through 8th grade, native speakers of English Structured interview and differentiated assessment 10

Teacher Nominations: Study Results
Two phases separated by adjustments in meeting procedures 21 teachers at 10 sites completed LAMS nomination questionnaires on 77 students Average “misrepresentation” (1-5): 3.89 First phase, met with 2 teachers and 6 students Second phase, met with 7 teachers and 17 students – all elementary 11

Reasons for Identifying Students as LAMS
Count* Percentage* Fluency Limitations Obscure Comprehension Skills 32 41.6% Some Comprehension Limitations Obscure other Skills 22 28.6% Test Fails to Reveal Non-Tested Strengths 18 23.4% Responds Poorly to Testing Circumstances or Materials 31 40.3% Other 5 6.5% * Note duplicate counts on 77 students sum to a total count of 108 and total percentage of 140% 12

Teacher Ratings of Hindrances to Student Performance
Hardly At All A Little Some Quite a Bit A Lot Blank Mean Fluency limitations 3 4 6 <0> 3.47 17.6% 0.0% 23.5% 35.3% Comprehension limitations 1 5 7 3.82 5.9% 29.4% 41.2% Low motivation for the test 2.65 Keeping attention focused on the test 2 2.71 11.8% Getting worn out by the test Anxiety <1> 2.44 Other: <7> 4.50 13

Student Attitudes Toward Reading and Tests
Hardly At All A Little Some Quite a Bit A Lot Blank Mean How much do your read not for school? 1 4 7 3 <1> 3.06 5.9% 23.5% 41.2% 17.6% How much do you Like reading? 9 3.63 0.0% 52.9% How hard is reading for you? 2 2.75 11.8% How well do tests show your reading? 6 5 <3> 3.57 35.3% 29.4% 14

Student Ratings of What Might Help
Hardly At All A Little Some Quite a Bit A Lot Blank Mean Shorter reading passages 2 4 7 1 <3> 3.50 0.0% 11.8% 23.5% 41.2% 5.9% 17.6% More interesting passages 3 6 3.93 35.3% Computer instead of paper and pencil <4> 3.54 Entire test read aloud by CD etc 3.36 Computer pronounces or explains words you pick 4.43 Other ideas you have 5 <10> 29.4% 58.8% 15

Qualitative Analysis: Teachers’ LAMS confirmed?
Off Target Indications that student is not a LAMS n = ? Seems Close Differ on why LAMS n = ? Seems Close Weak confirmation n = ? Clear Bulls Eye Consensus between researchers & teacher n = ? 16 107

For More Information, Contact:
Ross Moen 17

The Impact of Motivation and Engagement on Assessing Reading Comprehension June 14, 2008
CCSSO National Conference on Student Assessment Deborah R. Dillon and David G. O’Brien

Construct of Comprehension
Current assessments of reading comprehension are inadequate (Sweet, 2005) The current knowledge base on reading comprehension is sizeable but too “sketchy” to provide a foundation for a systematic instructional agenda (RAND Reading Study Group, 2002) 2

Relation of Motivation and Engagement to Comprehension
The RAND RSG initiated its work by generating this definition of reading comprehension: “Reading comprehension is the process of simultaneously extracting and constructing meaning through interaction and involvement with written language” (Snow, 2002, p.11). Guthrie and Wigfield (2005) built upon this definition, contending that involvement with written language connotes motivation 3

Motivation and Comprehension
Involvement assumes an active, intentional stance toward the text, enabling one to both persevere in getting information from text and using both the textual information and cognitive processes to make meaning Without motivation, specifically the intention and persistence to the goal of understanding texts for various purposes, there is little comprehension Thus, Guthrie and Wigfield (2005) argued that definitions of reading comprehension should include motivation 4

Motivation and Comprehension
Guthrie and Wigfield (1999) outlined a motivational-cognitive model of reading with import for the development of accessible reading assessments. In the model they posit that both cognitive and motivational processes influence reading comprehension Wang and Guthrie (2004) found that intrinsic motivation for reading was highly predictive of reading comprehension test performance. Beliefs about reading and perceptions of oneself as a reader impacts whether students expect reading to be useful and whether they want to be effective readers (Guthrie & Wigfield, 2005; O’Brien & Dillon, 2008). 5

Motivation Constructs with Potential for Enhancing Accessibility
Guthrie and Wigfield (2005) argued that the validity of comprehension assessment can be improved by enabling students to: read with clearly defined purposes take positive stances that support self-efficacy exert autonomy through choice to better employ their cognitive competencies in the testing situation. 6

Interest and Choice Enhance Accessibility
Interest and choice to are especially compelling research indicates that text interest may be more important for lower achievers than for more proficient readers choice may have a greater impact on these readers’ comprehension (deSousa & Oakhill, 1996). However Schiefle’s (1999) findings indicated that the “interest effect” is independent of prior knowledge or verbal ability. 7

This positive impact on comprehension is particularly true if readers perceive the texts to be attractive appealing visual elements—fonts, illustrations, layout, full of interesting details--desirable length and difficulty level of (Schraw, Bruning, & Svobada, 1995). We acknowledge that prior knowledge, which is correlated with both situational interest and reading achievement, may be a confounding variable 8

The NAEP Reader Study: The Effects of Choice in Reading Assessment—Results from The NAEP Reader Special Study of the 1994 National Assessment of Education Progress (Campbell & Donahue, 1997) U.S. Department of Education, Office of Educational Research and Improvement ((NCES ) special study Examined the feasibility and measurement impact of offering grade 8 and grade 12 test takers a choice of reading material on an assessment of reading comprehension. 9

In the design, a group of readers who could exert choice in selecting from among seven stories to read as part of the 1994 NAEP, was compared with a group who were assigned stories. In the choice condition, the researchers found no significant effect for choice for twelfth graders and slightly lower performance in the choice condition for eighth graders 10

Our Expansion on the NAEP Reader Study Assessment matching the goals of the 2009 NAEP Framework in terms of the text types and the cognitive targets assessed (Reader study used assessed more generic reading comprehension constructs following all passages) Using both literary-fiction and informational-exposition passages (Reader study used narrative only) Items following the reading of passages match the content and key ideas of individual passages 11

Respondents exert choice “choose your own assessment” before assessment is administered (Reader study included reading passage summary and choice as part of testing time) 12

Calibration Study The purpose of the study is to scale or calibrate the measurement tools that will be used in a large-scale accessible reading assessment for students with disabilities. This process allows investigators to empirically determine the comparability of passages and items used in the reading assessment study by placing all passages and questions on a common IRT (item response theory)-based equal-interval measurement scale. 13

Research Questions What is the difficulty of each reading passage (based on a passage total score, which, in turn, is based on performance on all passage comprehension items/questions) and each comprehension item/question? How well can the reading passages be placed on a common interval measurement scale to allow scores from different passages (of equal or unequal difficulty) to be compared and equated? Based on IRT item fit statistics, what multiple choice items should be retained and which should be eliminated? Which reading passages do students prefer to read? 14

Design Representative total sample of 1200 students, representing a range of reading ability and including students with disabilities for 4th and 8th grade Selection of 40 Passages, including 10 literary/fiction and 10 informational/exposition administered five forms 10 items for each passage using 2009 NAEP cognitive targets 15

Analysis This preliminary item/passage psychometric calibration study allows for: the placement of all passages/questions on a common equal-interval measurement scale, the development of passage scoring tables by which to assign subjects reading “ability” scores, and provision of a mechanism for equating scores across different passages. This “item fit analysis” will determine which items will be retained and those that will be eliminated. 16

Motivation “Choice” Study
Purpose: To examine whether improving the motivational characteristics of a large-scale reading assessment increases its accessibility for students with disabilities, and in so doing provides a more valid assessment of these students’ reading proficiency due to their increased engagement. 17

Research Questions Is there an interaction effect between choice, type of text, and type of student? Is there a correlation between general motivation to read and performance on a large-scale comprehension assessment? Are participants who are more motivated to read more likely to benefit from the choice option on a large-scale assessment? Does the option of exercising choice improve comprehension for general education students and for students with disabilities? 18

Participants and Design
Students fluent in English (140 4th graders; 140 8th graders) Targeted samples representing range of disability groups Random assignment to treatment (motivation-choice) and control (no choice) reading 2 literary fiction and 2 informational-expository passages followed by 5-6 multiple choice items Untimed administration Assessment of general and situational motivation 19

Design & Procedures cont.
Post-assessment interviews conducted with subsets of students from the control and experimental groups at both grade levels. Students from the various disabilities groups as well as regular education students selected for interviews (16 students from 4th grade and 16 from 8th grade) 20

Analysis A split-plot design will be used
two between-subjects factors (A = passage choice & B = disability status), one within-subjects factor (C = text type), one blocking variable (S = subject), & one covariate (X = motivation as assessed on the MRQ) at the between-subject level (A, B, C, and X are fixed effects, and S is a random effect) 21

Analysis--contd. Analysis of variance will be used to evaluate various effects; correlations of students’ performance on the comprehension test & responses on the MRQ and situated motivation questions will be calculated Various analytic deduction approaches will also be used to analyze the post assessment interview data and a mixed-design approach will be used to integrate the overall quantitative and qualitative findings 22

Implications Motivation can lead to increased engagement, which can lead to higher, more valid comprehension performance on high stakes assessments In constructing “bias-free” assessments, test designers may eliminate engaging passages or design features either unwittingly or in the interest of psychometrics Students with disabilities may be more likely to engage with, persevere with, and exert more effort with more motivating and engaging passages, defining these assessments as more accessible than typical reading assessments 23

Contact Information Deborah R. Dillon University of Minnesota, Twin Cities 330B Peik Hall, 159 Pillsbury Dr. S. E. Curriculum & Instruction Department Minneapolis, MN 55455 David G. O’Brien 125 Peik Hall, 159 Pillsbury Drive Curriculum and Instruction Department Minneapolis, MN 55455 24

Segmented Text Study June 14, 2008
CCSSO National Conference on Student Assessment Jamal Abedi, Seth Leon & Jenny Kao 133

Purpose To determine if reducing the length of reading passages by segmenting them would impact performance of student’s with disabilities. To examine the impact of segmenting on students’ non-cognitive domains such as anxiety, fatigue, frustration and motivation. 2 134 134

Background of Study From “chunking” to “segmented text”
Chunking in past literature deals with working memory capacity, with the hypothesis that reading material chunked into meaningful units facilitates reading comprehension and efficiency. However, chunking in the literature refers to chunking sentences. We use “segmented text” to refer to how passage segments are grouped with their corresponding items on the test page. 3

Segmented Text Segmented text also serves as “built-in” test breaks, possibly reducing the need for accommodation 4

Participants 738 Grade 8 students from ten public schools in California 620 non-SD, 117 SD Of the 117 SD: 107 specific learning disabilities 2 deaf/hard of hearing 3 autistic 2 speech/language impairment 4 other health impairments 5 137 137

Reading Test Three reading comprehension passages were obtained from publicly-released tests from two states. Two versions of the test were created: Original (version A) and Segmented (version B) Test designed to be completed in one classroom period (approx. 50 min.) 6

Passages All passages were informational (i.e., not fiction or literature). First passage was 700 words, other two passages were about 550 words each. Each passage had 8 multiple-choice items with 4 possible answer choices (24 total MC items). 7

Process of Segmenting Segments were grouped with corresponding test items Each passage was broken down into 3 to 4 segments; each segment contained 1-3 questions Inferential questions appeared at the end Test items appeared in the same order in both versions 8

Other Instruments Teacher Ratings Emotion/Mood Inventory
Motivation Scale 9

Teacher Ratings Asked teachers to rate each of their students. Corresponds with Calif. (CST) proficiency levels. In your opinion, how would you rate this student’s reading comprehension ability? Advanced Proficient Basic Below basic Far below basic 10

Emotion/Mood Inventory
Asked students after each passage: How does taking the test make you feel? Please circle all the words that describe how you feel. There is no right or wrong answer. If none of these words describe how you feel, please circle NONE. good tired energetic upset bored confident frustrated okay happy stressed blanked out interested relaxed bad NONE 11 143 143

Motivation Scale Post-test (printed at the end of the test booklets)
10-item, 4-point Likert-type, combining “importance” and “effort” questions 12

Research Questions Accessibility Affective Factors
Segmented Text and Reliability Segmented Text and Performance Segmented Text and the correlation between teacher ratings, English language arts (ELA) achievement test level and reading performance Affective Factors Segmented Text and Motivation Segmented Text and Emotion/Mood Inventory 13

Segmented Text and Reliability Findings
The original version of the assessment was more reliable for non-SDs than for SDs. This reliability gap decreased on the segmented version (no longer a significant difference). This suggests the segmented version may be more accessible for SD students Reliability limits validity, because rxy < √ rxx’ (Allen & Yen, p. 113) Groups Reliability Validity SD/Original (n=53) 0.516 .718 SD/Segment (n=62) 0.689 .830 Non-SD/Original (n=312) 0.783 .884 Non-SD/Segment (n=305) 0.788 .888 14 146 146

Segmented Text and Performance
No significant differences in reading performance of either group due to segmenting Groups Mean SD n SD/Original 9.94 3.32 52 SD/ Segment 9.32 4.05 57 Non-SD/ Original 13.89 4.58 301 Non-SD/ Segment 13.88 4.67 292 15 147 147

Motivation Results Group Mean SD n
Summary of descriptive analyses for the motivation section No significant differences Group Mean SD n Students with disabilities, original 22.21 3.65 53 Students with disabilities, seg. 22.83 3.44 60 Students with disabilities, total 22.54 3.54 113 Non-disabled, original 21.36 5.07 313 Non-disabled, seg 22.16 4.23 296 Non-disabled, total 21.75 4.69 609 Original version, total 21.48 4.89 366 Segmented version, total 22.27 4.12 356 Total 21.87 4.54 722 16 148 148

Conclusions Results suggest segmented text may be more accessible to students with disabilities Segmenting did not affect performance of non-SD students; therefore, it did not alter the reading construct Segmenting did not affect performance of SD students either Segmenting improves psychometric characteristics of reading comprehension assessments 17 149 149

Implications for the large scale reading assessments
The study could help states in identifying factors that affect accessibility of reading assessment It provides methodological paradigm for the study of accessibility of reading It examined important factors that affects presentation of test items Since segmenting passages improves the reliability of reading assessment without altering the construct, states can apply this feature into their assessments. This study encourages states and test publishers to look into test characteristics in a more comprehensive way to identify factors affecting accessibility of assessments for all students particularly for students with disabilities. 18

Contact Information Jamal Abedi 19

Accessibility Principles for Reading Assessments June 14, 2008
CCSSO National Conference on Student Assessment Martha L. Thurlow 152

Purpose of the Accessibility Principles
To identify supported* principles and guidelines for making large scale assessments of reading proficiency more accessible for students who have disabilities that affect reading, while maintaining a high level of validity for all students taking the assessments. Support = research, standards, and theory 2 153

“Accessibility” An accessible assessment is one that reveals the knowledge and skills of students whose characteristics create barriers to accurate measurement of these on traditional reading assessments 3 154

Large-scale reading assessments
Intended Use Large-scale reading assessments Assessments focused on grade-level content standards based on grade-level achievement standards (regular or alternate) Reading portion of normative assessments included in state assessment programs Reading component of English language proficiency assessments 4

NARAP Technical Advisory Committees NARAP Principles Committee
Review Process NARAP Technical Advisory Committees NARAP Principles Committee Workshop at annual conference of the Association of Test Publishers Interactive session at National Conference on Student Assessment* * Attend interactive session on Monday, June 16 at 8:15 am (Florida Ballroom I) 5 156

Organization of Accessibility Principles
Principles are “rules” that define the overarching goals to achieve accessibility A rationale is provided for each principle to justify why it is included Guidelines under each principle address the implementation Support is provided for each guideline via an annotated bibliography 6

Overview of Principles
Reading assessments should be accessible to all students in the testing population, including students with disabilities Reading assessments should be grounded in the field of reading Reading assessments should be developed with accessibility as a goal throughout rigorous and well-documented test design, development, and implementation procedures 7 158

Overview of Principles
Reading assessments should reduce the need for accommodations, yet be amenable to accommodations that are needed to make valid inferences about a student’s performance Reporting of reading assessment results should be designed to be transparent to relevant audiences and to encourage valid interpretation and use 8 159

Introduction to Accessible Reading Assessment June 14, 2008

Similar presentations

Presentation on theme: "Introduction to Accessible Reading Assessment June 14, 2008"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Accessible Reading Assessment June 14, 2008

Similar presentations

Presentation on theme: "Introduction to Accessible Reading Assessment June 14, 2008"— Presentation transcript:

Similar presentations

About project

Feedback