Item specifications and analysis

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

Standardized Scales.
Standardized Tests: What Are They? Why Use Them?
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Gary D. Borich Effective Teaching Methods 6th Edition
Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.
Chapter Fifteen Understanding and Using Standardized Tests.
Some Practical Steps to Test Construction
Chapter 4 Validity.
Chapter 8 Developing Written Tests and Surveys Physical Fitness Knowledge.
Chapter Three: Determining Program Components by David Agnew Arkansas State University.
Lesson Seven Item Analysis. Contents Item Analysis Item Analysis Item difficulty (item facility) Item difficulty (item facility) Item difficulty Item.
Lesson Nine Item Analysis.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Lesson Thirteen Standardized Test. Yuan 2 Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized.
Standardized Test Scores Common Representations for Parents and Students.
Understanding Validity for Teachers
LG675 Session 5: Reliability II Sophia Skoufaki 15/2/2012.
Universal Screening and Progress Monitoring Nebraska Department of Education Response-to-Intervention Consortium.
Quantitative Research
Formative and Summative Assessment
I want to test a wound treatment or educational program but I have no funding or resources, How do I do it? Implementing & evaluating wound research conducted.
Standardized Tests. Standardized tests are commercially published tests most often constructed by experts in the field. They are developed in a very precise.
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
Topic 4: Formal assessment
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
TESTS © LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Classroom Assessments Checklists, Rating Scales, and Rubrics
What is Open response?.  A Situation Reading Open Response will have a story, a poem, or an article to read.  A Task Set of questions/prompt to answer.
CRT Dependability Consistency for criterion- referenced decisions.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
EDU 385 Education Assessment in the Classroom
NRTs and CRTs Group members: Camila, Ariel, Annie, William.
Cut Points ITE Section One n What are Cut Points?
What is design? Blueprints of the instructional experience Outlining how to reach the instructional goals determined during the Analysis phase The outputs.
The Teaching Process. Problem/condition Analyze Design Develop Implement Evaluate.
Assessment and Testing
Developing Assessment Instruments
Using Data to Improve Student Achievement Summer 2006 Preschool CSDC.
Test Specification Purposes, structure, nature and use.
Guidelines for Critically Reading the Medical Literature John L. Clayton, MPH.
Common Terms in AP Essay Prompts Since this is a college course, you are going to see many terms (in addition to vocab) that you might not know. Sometimes.
Review: Alternative Assessments Alternative/Authentic assessment Real-life setting Performance based Techniques: Observation Individual or Group Projects.
THE ASSESSMENT CYCLE, ASSESSMENT DESIGN AND SPECIFICATIONS PROSET - TEMPUS1 Prepared by Maria Verbitskaya, Angelika Kalinina, Elena Solovova.
Tests and Measurements
REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.
ASSESSMENT CRITERIA Jessie Johncock Mod. 2 SPE 536 October 7, 2012.
Assessment My favorite topic (after grammar, of course)
PSYCHOMETRICS. SPHS 5780, LECTURE 6: PSYCHOMETRICS, “STANDARDIZED ASSESSMENT”, NORM-REFERENCED TESTING.
Developing Assessment Instruments Instructional Design: Unit, 3 Design Phase.
Review: Performance-Based Assessments Performanc-based assessment Real-life setting H.O.T.S. Techniques: Observation Individual or Group Projects Portfolios.
Principles of Instructional Design Assessment Recap Checkpoint Questions Prerequisite Skill Analysis.
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Assessing Student Performance Characteristics of Good Assessment Instruments (c) 2007 McGraw-Hill Higher Education. All rights reserved.
CEIT 225 Instructional Design Prof. Dr. Kürşat Çağıltay
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 7 Assessing and Grading the Students.
Lesson Thirteen Standardized Test. Contents Components of a Standardized test Reasons for the Name “Standardized” Reasons for Using a Standardized Test.
Using Data to Improve Student Achievement Summer 2006 Preschool CSDC.
Designing Scoring Rubrics
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessments Checklists, Rating Scales, and Rubrics
Classroom Assessment Ways to improve tests.
Standards and Assessment Alternatives
TOPIC 4 STAGES OF TEST CONSTRUCTION
Using statistics to evaluate your test Gerard Seinhorst
Analyzing test data using Excel Gerard Seinhorst
REVIEW I Reliability scraps Index of Reliability
EDUC 2130 Quiz #10 W. Huitt.
Presentation transcript:

Item specifications and analysis CRT Development Item specifications and analysis

Considerations for CRTs Unlike NRTs, individual CRT items are not ‘expendable’ because they have been written to assess specific areas of interest “If a criterion-referenced test doesn’t unambiguously describe just what it’s measuring, it offers no advantage over norm-referenced measures.” (Popham, 1984, p. 29) Popham, W. J. (1984). Specifying the domain of content or behaviors. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 29-48). Baltimore, MD: The Johns Hopkins University Press.

CRT score interpretation ‘Good’ CRT ‘Bad’ CRT (Popham, 1984, p. 31)

Test Specifications ‘Blueprints’ for creating test items Ensure that item content matches objectives (or criteria) to be assessed Though usually associated with CRTs, can also be useful in NRT development (Davidson & Lynch, 2002) Recent criticism: Many CRT specs (and resulting tests) are too tied to specific item types and lead to ‘narrow’ learning

Specification components General Description (GD) – brief statement of the focus of the assessment Prompt Attributes (PA) – details what will be given to the test taker Response Attributes (RA) – describes what should happen when the test-taker responds to the prompt Sample Item (SI) Specification Supplement (SS) – other useful information regarding the item or scoring (Davidson & Lynch, 2002)

Item – specification congruence (Brown, 1996, p. 78)

CRT Statistical Item Analysis Based on criterion groups To select groups, ask: Who should be able to master the objectives and who should not? Logical group comparisons Pre-instruction / post-instruction Uninstructed / instructed Contrasting groups The interpretation of the analysis will depend in part on the groups chosen

Pre-instruction / post-instruction Advantages Individual as well as group gains can be measured Can give diagnostic information about progress and program Disadvantages Requires post-test Potential for test effect Berk, R. A. (1984). Conducting the item analysis. In R. A. Berk (Ed.), A guide to criterion-referenced test construction (pp. 97-143). Baltimore, MD: The Johns Hopkins University Press.

Uninstructed / instructed Advantages Analysis can be conducted at one point in time Test can be used immediately for mastery / non-mastery decisions Disadvantages Group identification might be difficult Group performance might be affected by a variety of factors (i.e., age, background, etc.)

Contrasting groups Advantages Does not equate instruction with mastery Sample of masters is proportional to population Disadvantages Defining ‘mastery’ can be difficult Individually creating each group is time consuming Extraneous variables

Guidelines for selecting CRT items Item Characteristic Criterion Index value Item-spec congruence Matches objective being tested IF (difficulty) Hard for UG Easy for IG IF less than .5 IF greater than .7 Discrimination Positively discriminates between criterion groups High positive

Item discrimination for groups Uninstructed Non-masters Instructed Masters DI = IF (‘master’) – IF (‘non-master’) Sometimes called DIFF (difference score) (Berk, 1984, p. 194)

(Brown, 1996, p. 81)

Item analysis interpretation (Berk, 1984, p. 125)

Distractor efficiency analysis Each distractor should be selected by more students in the uninstructed (or incompetent) group than in the instructed (or competent) group. At least a few uninstructed (or incompetent) students (5 – 10%) should choose each distractor. No distractor should receive as many responses by the instructed (or competent) group as the correct answer. (Berk, 1984, p. 127)

The B-index Difficulty index calculated from one test administration The criterion groups are defined by their passing or failing the test Failing is defined as falling below a predetermined cut score The validity of the cut score decision will affect the validity of the B-index

(Brown, 1996, p. 83)