~ Test Construction and Validation ~

Slides:

Advertisements

Similar presentations

Advertisements

Test Development.

FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.

L2 program design Content, structure, evaluation.

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

Psychological Testing Principle Types of Psychological Tests  Mental ability tests Intelligence – general Aptitude – specific  Personality scales Measure.

Part II Sigma Freud & Descriptive Statistics

Part II Sigma Freud & Descriptive Statistics

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

Job Analysis Background Research 1)Organizational charts (e.g., how the job is connected to other positions and where it is located in the overall company)

1 A Review  Adequate training does not happen without a lot of work  It requires significant planning  There are definite.

Reliability and Validity of Research Instruments

What are competencies – some definitions ……… Competencies are the characteristics of an employee that lead to the demonstration of skills & abilities,

Chapter 4 Hybrid Methods

Chapter 7 Correlational Research Gay, Mills, and Airasian

DEFINING JOB PERFORMANCE AND ITS RELATIONSHIP TO ASSESSMENTS.

Chapter 4. Validity: Does the test cover what we are told (or believe)

Chapter 4 – Strategic Job Analysis and Competency Modeling

Conducting a Job Analysis to Establish the Examination Content Domain Patricia M. Muenzen Associate Director of Research Programs Professional Examination.

1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.

DEVELOPING ALGEBRA-READY STUDENTS FOR MIDDLE SCHOOL: EXPLORING THE IMPACT OF EARLY ALGEBRA PRINCIPAL INVESTIGATORS:Maria L. Blanton, University of Massachusetts.

JOB ANALYSIS IDENTIFY AND RATE JOB TASKS & KSAs IDENTIFY AND RATE JOB TASKS & KSAs DEVELOP SELECTION DEVICE (S) DEVELOP SELECTION DEVICE (S) PAPER & PENCIL.

Identifying Training Needs Chapter #3. Why is demand for t and d increasing? Globalization Need for leadership Need for more knowledge workers Expanding.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

MANA 4328 Dr. George Benson Job Analysis 1 MANA 4328 Dr. George Benson 1.

Classroom Assessments Checklists, Rating Scales, and Rubrics

Review: Cognitive Assessments II Ambiguity (extrinsic/intrinsic) Item difficulty/discrimination relationship Questionnaires assess opinions/attitudes Open-/Close-ended.

Assessment Professional Learning Module 5: Making Consistent Judgements.

Surveys and Attitude Measurement The reason surveys seem to be everywhere is that they are tremendously flexible— you can ask people about anything, and.

JOB ANALYSIS IDENTIFY AND RATE JOB TASKS & KSAs IDENTIFY AND RATE JOB TASKS & KSAs DEVELOP SELECTION DEVICE (S) DEVELOP SELECTION DEVICE (S) PAPER & PENCIL.

EDU 8603 Day 6. What do the following numbers mean?

1 Assessment Professional Learning Module 5: Making Consistent Judgements.

Job Analysis: Concepts, Procedures, and Choices

Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Week 4 Slides. Conscientiousness was most highly voted for construct We will also give other measures – protestant work ethic and turnover intentions.

“… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation process begins.

Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.

Measurement MANA 4328 Dr. Jeanne Michalski

JOB ANALYSIS IDENTIFY AND RATE JOB TASKS & KSAs IDENTIFY AND RATE JOB TASKS & KSAs DEVELOP SELECTION DEVICE (S) DEVELOP SELECTION DEVICE (S) PAPER & PENCIL.

Criterion-Referenced Testing and Curriculum-Based Assessment EDPI 344.

JOB ANALYSIS IDENTIFY AND RATE JOB TASKS & KSAs IDENTIFY AND RATE JOB TASKS & KSAs DEVELOP SELECTION DEVICE (S) DEVELOP SELECTION DEVICE (S) PAPER & PENCIL.

RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.

Test Validity “… the development of a valid test requires multiple procedures, which are employed at different stages of test construction … The validation.

Chapter 6 - Standardized Measurement and Assessment

Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.

TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.

Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.

Chapter 5 (Part 2): Recruitment & Selection MGT 3513 It's not what you look at that matters, it's what you see. Henry David Thoreau.

Classroom Assessments Checklists, Rating Scales, and Rubrics

ASSESSMENT OF STUDENT LEARNING

AP Unit 11 Testing and Individual Differences pt. 1

Classroom Assessments Checklists, Rating Scales, and Rubrics

IDENTIFY TRAINING NEEDS &

Performance Management and Appraisal

MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.

پرسشنامه کارگاه.

Reliability and Validity of Measurement

VALIDITY Ceren Çınar.

Job Analysis Chapter 5.

MANA 4328 Dennis C. Veit Measurement MANA 4328 Dennis C. Veit 1.

Assessment for Learning

Job Analysis (Pay System Technique for Internal Alignment)

IDENTIFY TRAINING NEEDS &

Presentation transcript:

~ Test Construction and Validation ~ Fundamental Points and Practices Stephen J. Vodanovich, Ph.D.

~ Identifying The Item Domain ~ [a.k.a. Where do the questions come from?] Test Item Domain Specific, defined content area (e.g., course exam, training program) Expert opinion, observation (e.g., professional literature) Job analysis (identification of major job tasks, duties)

Job Analysis Overview Task Identification KSA Identification Job (or Job Category) Task 1 Task 2 Task 3 Task 4 KSA 1 KSA 2 KSA 3 KSA 4 Rate Tasks and KSAs Connect KSAs to Tasks

~ Sample Task Rating Form ~ Frequency of use 5 = almost all of the time 4 = frequently 3 = occasionally 2 = seldom 1 = not performed at all Importance of performing successfully 5 = extremely important 4 = very important 3 = moderately important 2 = slightly important 1 = of no importance Importance for new hire 5 = extremely important Distinguishes between superior & ad performance 5 = a great deal 4 = considerably 3 = moderately 2 = slightly 1 = not at all Damage if error occurs 5 = extreme damage 4 = considerable damage 3 = moderate damage 2 = very little damage 1 = virtually no damage 1 2 3 4 5 6 7

~ Sample KSA Rating Form ~ Importance for acceptable job performance 5 = extremely important 4 = very important 3 = moderately important 2 = slightly important 1 = of no importance Importance for new hire Distinguishes between superior & adequate performance 5 = a great deal 4 = considerably 3 = moderately 2 = slightly 1 = not at all A B C D E F G

Sample Task -- KSA Matrix To what extent is each KSA needed when performing each job task? 5 = Extremely necessary, the job task cannot be performed without the KSA 4 = Very necessary, the KSA is very helpful when performing the job task 3 = Moderately necessary, the KSA is moderately helpful when performing the job task 2 = Slightly necessary, the KSA is slightly helpful when performing the job task 1 = Not necessary, the KSA is not used when performing the job task KSA A B C D E F G H Job Tasks 1 2 3 4 5 6 7

~ Writing Test Items ~ Write a lot of questions Write more questions for the most critical KSAs Consider the reading level of the test takers

~ Selecting Test Items ~ Initial review by Subject Matter Experts (SMEs) Connect items to KSAs Assess difficulty of items relative to job requirements Suggest revisions to items and answers

Sample Item Rating Form Connect each item to a KSA or two Rate difficulty of each item (5-point scale) relative to the level of KSA needed in the job)

~ Statistical Properties of Items ~ Item Difficulty levels. Goal is to keep items of moderate difficulty (e.g., p values between .40 - .60) “p-value” is % of people getting each item correct -4 -3 -2 -1 Mean +1 +2 +3 +4

R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L L) Mean Std Dev Cases Q1 .7167 .4525 120.0 Q2 .7583 .4299 120.0 Q3 .8167 .3886 120.0 Q4 .9333 .2505 120.0 Q5 .9583 .2007 120.0 Q6 .9000 .3013 120.0 Q7 .6333 .4839 120.0 Q8 .8750 .3321 120.0 Q9 .8000 .4017 120.0 Q10 .6167 .4882 120.0 Q11 .9750 .1568 120.0 Q12 .8083 .3953 120.0 Q13 .7583 .4299 120.0 Q14 .5083 .5020 120.0 Answers are scored as correct “1” or wrong “0.” So, the mean is the p value of the items (difficulty level or % or people getting each item correct) Easy items Acceptable items

~ Statistical Properties of Items (cont.) ~ Internal Consistency Item correlations with each other. Goal is to select items that relate moderately to each other or “hang together” reasonably well (e.g., item x total score correlations of between .40 - .60, “alpha if item deleted” information)

~ Item-Total Statistics ~ Scale mean if item deleted Scale variance if item deleted Corrected item-total correlation Alpha if item deleted Q1 43.3750 67.0599 .2285 .8356 Q2 43.3333 67.7031 .1513 .8370 Q3 43.2750 66.5708 .3527 .8335 Q4 43.1583 67.7814 .2700 .8354 Q5 43.1333 68.6711 .0741 .8374 Q6 43.1917 68.8117 .0111 .8385 Q7 43.4583 65.8302 .3685 .8327 Q8 43.2167 67.0283 .3346 .8341 Q9 43.2917 65.9562 .4353 .8319 Q10 43.4750 67.4952 .1526 .8373 Q11 43.1167 68.8938 .0152 .8378 Q12 43.2833 67.9022 .1381 .8371 Q13 43.3333 65.9216 .4085 .8322 Q14 43.5833 65.2871 .4214 .8315 Alpha = .8374

Kirkland v. Department of Correctional Services (1974) ~ Legal Concerns ~ Kirkland v. Department of Correctional Services (1974) "Without such an analysis (job analysis) to single out the critical knowledge, skills and abilities required by the job, their importance relative importance to each other, and the level of proficiency demanded as to each attribute, a test constructor is aiming in the dark and can only hope to achieve job relatedness by blind luck” The KSAs tested for must be critical to successful job performance Portions of the exam should be accurately weighted to reflect the relative importance to the job of the attributes for which they test The level of difficulty of the exam material should match the level of difficulty of the job

Method 2 (Clinical Interview) Construct Validation Method 1 (Paper & Pencil) Method 2 (Clinical Interview) Method 3 (Peer observation) Traits A B C A B C A B C A B C Mono Method Hetero Method

Hetero-Trait; Mono Method .33 .36 .87 (Paper & Pencil) Method 2 (Clinical Interview) Method 3 (Peer observation) Traits A B C A B C A B C A (Boredom) B (Dep) C (Anxiety) .89 Reliability Figures .91 .49 Hetero-Trait; Mono Method .33 .36 .87 .55 .20 .08 .92 .20 .46 .12 .54 .93 .15 .15 .53 .62 .55 .82 .55 .20 .15 .61 .35 .41 .90 .21 .46 .13 .40 .54 .37 .49 .93 .15 .15 .53 .31 .32 .66 .54 .52 .87 Mono-Trait; Hetero-Method Hetero-Trait; Hetero-Method