CRESST ONR/NETC Meetings, 17-18 July 2003, v1 ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment.

Slides:



Advertisements
Similar presentations
National Accessible Reading Assessment Projects Examining Background Variables of Students with Disabilities that Affect Reading Jamal Abedi, CRESST/University.
Advertisements

Assessment, Accountability and NCLB Education 388 Lecture March 15, 2007 Kenji Hakuta, Professor.
Reliability & Validity.  Limits all inferences that can be drawn from later tests  If reliable and valid scale, can have confidence in findings  If.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Jamal Abedi National Center for Research on Evaluation, Standards, and Student Testing UCLA Graduate School of Education & Information Studies November.
1 Academic Performance of English Language Learners on Grades 3-8 ELA Tests (2007 to 2009) David Abrams Assistant Commissioner Office of Standards, Assessment.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
C R E S S T / U C L A Issues and problems in classification of students with limited English proficiency Jamal Abedi UCLA Graduate School of Education.
Concept of Measurement
RELIABILITY consistency or reproducibility of a test score (or measurement)
Reliability and Validity
When Measurement Models and Factor Models Conflict: Maximizing Internal Consistency James M. Graham, Ph.D. Western Washington University ABSTRACT: The.
Using Growth Models for Accountability Pete Goldschmidt, Ph.D. Assistant Professor California State University Northridge Senior Researcher National Center.
Are Accommodations Used for ELL Students Valid? Jamal Abedi University of California, Davis National Center for Research on Evaluation, Standards and Student.
Session 3 Normal Distribution Scores Reliability.
1/16 CRESST/UCLA Alternative Assessment for English Language Learners Christy Kim Boscardin Barbara Jones Shannon Madsen Claire Nishimura Jae-Eun Park.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Understanding Validity for Teachers
Measurement and Data Quality
Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.
Creating Assessments with English Language Learners in Mind In this module we will examine: Who are English Language Learners (ELL) and how are they identified?
The University of Central Florida Cocoa Campus
Reliability and Validity what is measured and how well.
Instrumentation.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Principles of Test Construction
Texas Comprehensive SEDL Austin, Texas March 16–17, 2009 Making Consistent Decisions About Accommodations for English Language Learners – Research.
CRESST ONR/NETC Meetings, July 2003, v1 1 ONR Advanced Distributed Learning Language Factors in the Assessment of English Language Learners Jamal.
C R E S S T / U C L A Impact of Linguistic Factors in Content-Based Assessment for ELL Students Jamal Abedi UCLA Graduate School of Education & Information.
Forum - 1 Assessments for Learning: A Briefing on Performance-Based Assessments Eva L. Baker Director National Center for Research on Evaluation, Standards,
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
CCSSO Criteria for High-Quality Assessments Technical Issues and Practical Application of Assessment Quality Criteria.
1/27 CRESST/UCLA Research findings on the impact of language factors on the assessment and instruction of English language Learners Jamal Abedi University.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Baker ONR/NETC July 03 v.4  2003 Regents of the University of California ONR/NETC Planning Meeting 18 July, 2003 UCLA/CRESST, Los Angeles, CA ONR Advanced.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Measurement MANA 4328 Dr. Jeanne Michalski
Experimental Research Methods in Language Learning Chapter 12 Reliability and Reliability Analysis.
Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Chapter 6 - Standardized Measurement and Assessment
C R E S S T / U C L A Psychometric Issues in the Assessment of English Language Learners Presented at the: CRESST 2002 Annual Conference Research Goes.
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Jamal Abedi, UCLA/CRESST Major psychometric issues Research design issues How to address these issues Universal Design for Assessment: Theoretical Foundation.
Testing. Psychological Tests  Tests abilities, interests, creativity, personality, behavior  Must be standardized, reliable, and valid  Timing, instructions,
Critical Issues Related to ELL Accommodations Designed for Content Area Assessments The University of Central Florida Cocoa Campus Jamal Abedi University.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
Assessment Assessment is the collection, recording and analysis of data about students as they work over a period of time. This should include, teacher,
C R E S S T / U C L A UCLA Graduate School of Education & Information Studies Center for the Study of Evaluation National Center for Research on Evaluation,
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
ELL-Focused Accommodations for Content Area Assessments: An Introduction The University of Central Florida Cocoa Campus Jamal Abedi University of California,
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
XI. Testing and Individual Differences
Evaluation of measuring tools: validity
Assistant Commissioner Office of Standards, Assessment and Reporting
Reliability & Validity
Week 3 Class Discussion.
پرسشنامه کارگاه.
Reliability and Validity of Measurement
The first test of validity
Presentation transcript:

CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Impact of Language Factors on the Reliability and Validity of Assessment for ELLs Jamal Abedi University of California, Los Angeles National Center for Research on Evaluation, Standards, and Student Testing (CRESST) July 18, 2003

CRESST ONR/NETC Meetings, July 2003, v1 Classical Test Theory: Reliability  2 X =  2 T +  2 E X: Observed Score T: True Score E: Error Score  XX’=  2 T /  2 X  XX’= 1-  2 E /  2 X Textbook examples of possible sources that contribute to the measurement error: 2 Rater Occasion Item Test Form

CRESST ONR/NETC Meetings, July 2003, v1 Generalizability Theory: Partitioning Error Variance into Its Components s 2 (X pro ) =  2 p +  2 r +  2 o +  2 pr +  2 po +  2 ro +  2 pro,e p: Person r: Rater o: Occasion Are there any sources of measurement error that may specifically influence ELL performance? 3

CRESST ONR/NETC Meetings, July 2003, v1 Validity of Academic Achievement Measures We will focus on construct and content validity approaches: A test’s content validity involves the careful definition of the domain of behaviors to be measured by a test and the logical design of items to cover all the important areas of this domain (Allen & Yen, 1979, p. 96). A test’s construct validity is the degree to which it measures the theoretical construct or trait that it was designed to measure (Allen & Yen, 1979, p. 108). A content-based achievement test has construct validity if it measures the content that it is supposed to measure. A content-based achievement test has content validity if the test content is representative of the content being measured. 4 Examples:

CRESST ONR/NETC Meetings, July 2003, v1 Two major questions on the psychometric of academic achievement tests for ELLs: Are there any sources of measurement error that may specifically influence ELL performance? Do achievement tests accurately measure ELLs’ content knowledge? 5

CRESST ONR/NETC Meetings, July 2003, v1 Study #9 Impact of students’ language background on content-based performance: analyses of extant data (Abedi & Leon, 1999). Analyses were performed on extant data, such as Stanford 9 and ITBS SAMPLE: Over 900,000 students from four different sites nationwide. Study #10 Examining ELL and non-ELL student performance differences and their relationship to background factors (Abedi, Leon, & Mirocha, 2001). Data were analyzed for the language impact on assessment and accommodations of ELL students. SAMPLE: Over 700,000 students from four different sites nationwide. Finding l The higher the level of language demand of the test items, the higher the performance gap between ELL and non-ELL students. l Large performance gap between ELL and non-ELL students on reading, science and math problem solving (about 15 NCE score points). l This performance gap was reduced to zero in math computation.

CRESST ONR/NETC Meetings, July 2003, v1 Normal Curve Equivalent Means and Standard Deviations for Students in Grades 10 and 11, Site 3 School District Reading Science Math MSD M SD M SD Grade 10 SD only LEP only LEP & SD Non-LEP & SD All students Grade 11 SD Only LEP Only LEP & SD Non-LEP & SD All Students Note. LEP = limited English proficient. SD = students with disabilities.

CRESST ONR/NETC Meetings, July 2003, v1 Disparity Index (DI) was an index of performance differences between LEP and non-LEP. SITE 3 Disparity Index (DI) Non-LEP/Non-SD Students Compared to LEP-Only Students Disparity Index (DI) Math Math Grade Reading Math Total Calculation Analytical

CRESST ONR/NETC Meetings, July 2003, v1

Issues and problems in classification of students with limited English proficiency

CRESST ONR/NETC Meetings, July 2003, v1 Findings The relationship between language proficiency test scores and LEP classification. Since LEP classification is based on students’ level of language proficiency and because LAS is a measure of language proficiency, one would expect to find a perfect correlation between LAS scores and LEP levels (LEP versus non-LEP). The results of analyses indicated a weak relationship between language proficiency test scores and language classification codes (LEP categories). CorrelationG2G3G4G5G6G7G8G9G10G11G12 Pearson r Sig (2-tailed).000 N Correlation between LAS rating and LEP classification for Site 4

CRESST ONR/NETC Meetings, July 2003, v1

Correlation coefficients between LEP classification code and ITBS subscales for Site 1 GradeReading Math Concept & Estimation Math Problem Solving Math Computation Grade 3 Pearson r Sig (2-tailed).000 N36,00635,98135,94836,000 Grade 6 Pearson r Sig (2-tailed).000 N28,27228,27328,25028,261 Grade 8 Pearson r Sig (2-tailed).000 N25,36225,33625,33325,342

CRESST ONR/NETC Meetings, July 2003, v1 Generalizability Theory: Language as an additional source of measurement error  2 (X prl ) =  2 p +  2 r +  2 l +  2 pr +  2 pl +  2 rl +  2 prl,e p: Person r: Rater l: Language Are there any sources of measurement error that may specifically influence ELL performance?