Test Design & Construction

Slides:



Advertisements
Similar presentations
Test Development.
Advertisements

Agenda Levels of measurement Measurement reliability Measurement validity Some examples Need for Cognition Horn-honking.
Standardized Scales.
MCR Michael C. Rodriguez Research Methodology Department of Educational Psychology.
Reliability and Validity
Topic 4B Test Construction.
The Research Consumer Evaluates Measurement Reliability and Validity
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Chapter 4 Validity.
BASIC TERMINOLOGY n Test n Measurement n Evaluation.
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Chapter 4. Validity: Does the test cover what we are told (or believe)
Test Validity S-005. Validity of measurement Reliability refers to consistency –Are we getting something stable over time? –Internally consistent? Validity.
Qualitative vs. Quantitative QUANTITATIVE Hypothesis: All beans are alike. NULL: No beans are different. Method: Count the beans. QUALITATIVE Question:
Technical Issues Two concerns Validity Reliability
Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix,
Measurement in Exercise and Sport Psychology Research EPHE 348.
Reliability and Validity what is measured and how well.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Technical Adequacy Session One Part Three.
The Analysis of the quality of learning achievement of the students enrolled in Introduction to Programming with Visual Basic 2010 Present By Thitima Chuangchai.
Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Validity: Introduction. Reliability and Validity Reliability Low High Validity Low High.
Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Introduction to Item Analysis Objectives: To begin to understand how to identify items that should be improved or eliminated.
Chapter 6 - Standardized Measurement and Assessment
Chapter 3 Selection of Assessment Tools. Council of Exceptional Children’s Professional Standards All special educators should possess a common core of.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
The Process of Psychometric Validation of an Instrument across Language and Culture Halfway around the World Huey-Shys Chen PhD, RN, CHES Assistant Professor,
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
EVALUATING EPP-CREATED ASSESSMENTS
Ch. 5 Measurement Concepts.
Lecture 5 Validity and Reliability
Reliability and Validity
Reliability & Validity
Test Validity.
Validity and Reliability
Introduction to the Validation Phase
Tests and Measurements: Reliability
Validity and Reliability
Reliability & Validity
Reliability and Validity
Reliability and Validity
Human Resource Management By Dr. Debashish Sengupta
Week 3 Class Discussion.
Week 10 Slides.
پرسشنامه کارگاه.
Unit IX: Validity and Reliability in nursing research
Reliability.
RESEARCH METHODS Lecture 18
The first test of validity
How can one measure intelligence?
Measurement Concepts and scale evaluation
Copyright © Allyn & Bacon 2007
Cal State Northridge Psy 427 Andrew Ainsworth PhD
Reliability and validity
Presentation transcript:

Test Design & Construction RSCH 6109: Assessment & Evaluation Methods Test Design & Construction Purpose & Framework Test Specifications or Blueprint Item Construction Field Testing Evaluation & Revision

Evaluating the Value of a Test RSCH 6109: Assessment & Evaluation Methods Evaluating the Value of a Test The value of a test can only be determined relative to its purpose and intended use Value = Reliability, Validity, Cultural Sensitivity However, “value” is not a property of the test itself Reliability, Validity, and Cultural Sensitivity are properties of the information the test provides in the context of the administration, scoring, interpretation, and application to a particular population Contrast Proper and Improper use of the BDI

The Pilot Study RSCH 6109: Assessment & Evaluation Methods Give the test to a small group from the target audience Ask them to complete the test and to give feedback about the test How long did it take you to complete the test? Anything ambiguous / confusing items or wording? Anything not covered / suggestions for additional questions? Talk Out Loud Method / Focus Groups Open-ended questions about the construct

Descriptive Statistics – Item Analysis RSCH 6109: Assessment & Evaluation Methods Descriptive Statistics – Item Analysis Item difficulty Distribution of item responses Item discrimination Correlations between items Begin the process of eliminating or modifying item content

Reliability Studies RSCH 6109: Assessment & Evaluation Methods Internal consistency (Consistency) Split-half reliability (Consistency) Test-retest reliability (Stability) Inter-rater reliability (Agreement) Further the process of eliminating or modifying items

Construct Validity RSCH 6109: Assessment & Evaluation Methods What is the test really measuring? How many sub-scales should there be? Exploratory factor analysis Define the subscales that emerge Consider eliminating or adding items Higher order factors

Content Validity RSCH 6109: Assessment & Evaluation Methods Does the test really cover the content domain? Expert review Comparison to theory Consider additional items

Further Validation RSCH 6109: Assessment & Evaluation Methods Criterion Validity Concurrent Validity / MTMM matrix Predictive Validity Known Group Differences Validity Confirmatory Factor Analysis

Cultural Sensitivity RSCH 6109: Assessment & Evaluation Methods Expert review DTF – Differential Test Functioning DIF – Differential Item Functioning

Final Stages RSCH 6109: Assessment & Evaluation Methods Create Norm Tables Create Norm Tables for Sub-groups Prepare the technical manual Guidelines for administration Scoring Interpretation Technical properties Compare the reliability, validity, and cultural sensitivity evidence and technical manual to the Standards