CRT Dependability Consistency for criterion- referenced decisions.

Slides:



Advertisements
Similar presentations
Consistency in testing
Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity
Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Part II Sigma Freud & Descriptive Statistics
1 Effective Use of Benchmark Test and Item Statistics and Considerations When Setting Performance Levels California Educational Research Association Anaheim,
Chapter 4 Validity.
REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.
Measurement: Reliability and Validity For a measure to be useful, it must be both reliable and valid Reliable = consistent in producing the same results.
Session 3 Normal Distribution Scores Reliability.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Research Methods in MIS
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment A Practical Guide for Educators by Craig A
Evaluating a Norm-Referenced Test Dr. Julie Esparza Brown SPED 510: Assessment Portland State University.
Classical Test Theory By ____________________. What is CCT?
LG675 Session 5: Reliability II Sophia Skoufaki 15/2/2012.
Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.
Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University.
Measurement and Data Quality
Reliability, Validity, & Scaling
Reliability and Validity what is measured and how well.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Item specifications and analysis
Cut Points ITE Section One n What are Cut Points?
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Research methods in clinical psychology: An introduction for students and practitioners Chris Barker, Nancy Pistrang, and Robert Elliott CHAPTER 4 Foundations.
Designs and Reliability Assessing Student Learning Section 4.2.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Psychometrics. Goals of statistics Describe what is happening now –DESCRIPTIVE STATISTICS Determine what is probably happening or what might happen in.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.
REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Chapter 7 Criterion-Referenced Measurement PoorSufficientBetter.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Dr. Jeffrey Oescher 27 January 2014 Technical Issues  Two technical issues  Validity  Reliability.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
LangTest: An easy-to-use stats calculator Punjaporn P.
Questions What are the sources of error in measurement?
Assessment Theory and Models Part II
Classical Test Theory Margaret Wu.
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
Week 3 Class Discussion.
Using statistics to evaluate your test Gerard Seinhorst
By ____________________
The first test of validity
REVIEW I Reliability scraps Index of Reliability
Presentation transcript:

CRT Dependability Consistency for criterion- referenced decisions

Challenges for CRT dependability Raw scores may not show much variation (skewed distributions) CRT decisions are based on acceptable performance rather than relative position A measure of the dependability of the classification (i.e., master / non-master) is needed

Approaches using cut-score Threshold loss agreement –In a test-retest situation, how consistently are the students classified as master / non-master –All misclassifications are considered equally serious Squared error loss agreement –How consistent are the classifications –The consequences of misclassifying students far above or far below cut-point are considered more serious Berk, R. A. (1984). Selecting the index of reliability. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp ). Baltimore, MD: The Johns Hopkins University Press.

Issues with cut-scores “The validity of the final classification decisions will depend as much upon the validity of the standard as upon the validity of the test content” (Shepard, 1984, p. 169) “Just because excellence can be distinguished from incompetence at the extremes does not mean excellence and incompetence can be unambiguously separated at the cut-off.” (p. 171) Shepard, L. A. (1984). Setting performance standards. In R. A. Berk (Ed.), A guide to criterion- referenced test construction (pp ). Baltimore, MD: The Johns Hopkins University Press.

Methods for determining cut-scores Method 1: expert judgments about performance of hypothetical students on test Method 2: test performance of actual students

Setting cut-scores (Brown, 1996, p. 257)

Institutional decisions (Brown, 1996, p. 260)

Agreement coefficient (p o ), kappa P o = (A + D) / N P o = (A + D) / N P o = (77+21) / 110 P o =.89 P chance = [(A+B)(A+C)+(C+D)(B+D)]/N 2 (p – p chance ) (1 – p chance ) K= K = ( ) / (1 -.63) K =.70

Short-cut methods for one administration Calculate an NRT reliability coefficient –Split-half, KR-20, Cronbach alpha Convert cut-score to standardized score –Z = [(cut-score -.5 – mean)] / SD Use Table 7.9 to estimate Agreement Use Table 7.10 to estimate Kappa

Estimate the dependability for the HELP Reading test Assume a cut point of 60%. What is the raw score? 27 z = Look at Table 9.1. What is the approximate value of the agreement coefficient? Look at Table 9.2. What is the approximate value of the kappa coefficient?

Squared-error loss agreement Sensitive to degrees of mastery / non- mastery Short-cut form of generalizability study Classical Test Theory –OS = TS + E Generalizability Theory –OS = TS + (E 1 + E E k ) Brennan, Robert (1995). Handout from generalizability theory workshop.

Phi (lambda) dependability index Cut-point# of items Mean of proportion scores Standard deviation of proportion scores

Domain score dependability Does not depend on cut-point for calculation “estimates the stability of an individual’s score or proportion correct in the item domain, independent of any mastery standard” (Berk, 1984, p. 252) Assumes a well-defined domain of behaviors

Phi dependability index

Confidence intervals Analogous to SEM for NRTs Interpreted as a proportion correct score rather than raw score

Reliability Recap Longer tests are better than short tests Well-written items are better than poorly written items Items with high discrimination (ID for NRT, B- index for CRT) are better A test made up of similar items is better CRTs – a test that is related to the objectives is better NRTs – a test that is well-centered and spreads out students is better