Item Analysis.

Slides:



Advertisements
Similar presentations
Assessing Student Performance
Advertisements

Alternate Choice Test Items
Assessment in Early Childhood Education Fifth Edition Sue C. Wortham
Test Development.
How to Make a Test & Judge its Quality. Aim of the Talk Acquaint teachers with the characteristics of a good and objective test See Item Analysis techniques.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Psychology Practical (Year 2) PS2001 Correlation and other topics.
Inferential Statistics
Chapter 4 – Reliability Observed Scores and True Scores Error
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Using Test Item Analysis to Improve Students’ Assessment
Measurement Reliability and Validity
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Test Construction Processes 1- Determining the function and the form 2- Planning( Content: table of specification) 3- Preparing( Knowledge and experience)
Item Analysis What makes a question good??? Answer options?
Lesson Seven Item Analysis. Contents Item Analysis Item Analysis Item difficulty (item facility) Item difficulty (item facility) Item difficulty Item.
Item Analysis Prof. Trevor Gibbs. Item Analysis After you have set your assessment: How can you be sure that the test items are appropriate?—Not too easy.
Lesson Nine Item Analysis.
Multiple Choice Test Item Analysis Facilitator: Sophia Scott.
Personality, 9e Jerry M. Burger
ANALYZING AND USING TEST ITEM DATA
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Measurement and Data Quality
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Chapter 8 Measuring Cognitive Knowledge. Cognitive Domain Intellectual abilities ranging from rote memory tasks to the synthesis and evaluation of complex.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Technical Adequacy Session One Part Three.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Field Test Analysis Report: SAS Macro and Item/Distractor/DIF Analyses
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
Techniques to improve test items and instruction
Group 2: 1. Miss. Duong Sochivy 2. Miss. Im Samphy 3. Miss. Lay Sreyleap 4. Miss. Seng Puthy 1 ROYAL UNIVERSITY OF PHNOM PENH INSTITUTE OF FOREIGN LANGUAGES.
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Lab 5: Item Analyses. Quick Notes Load the files for Lab 5 from course website –
Grading and Analysis Report For Clinical Portfolio 1.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
RELIABILITY AND VALIDITY OF ASSESSMENT
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Measurement Theory in Marketing Research. Measurement What is measurement?  Assignment of numerals to objects to represent quantities of attributes Don’t.
Introduction to Item Analysis Objectives: To begin to understand how to identify items that should be improved or eliminated.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Tests and Measurements
FIT ANALYSIS IN RASCH MODEL University of Ostrava Czech republic 26-31, March, 2012.
Chapter 6 - Standardized Measurement and Assessment
TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.
Dan Thompson Oklahoma State University Center for Health Science Evaluating Assessments: Utilizing ExamSoft’s item-analysis to better understand student.
Psychometrics: Exam Analysis David Hope
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Dept. of Community Medicine, PDU Government Medical College,
Norm Referenced Your score can be compared with others 75 th Percentile Normed.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Items analysis Introduction Items can adopt different formats and assess cognitive variables (skills, performance, etc.) where there are right and.
Professor Jim Tognolini
CHAPTER 6, INDEXES, SCALES, AND TYPOLOGIES
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Data Analysis and Standard Setting
Classroom Analytics.
Reliability & Validity
Test Development Test conceptualization Test construction Test tryout
Dept. of Community Medicine, PDU Government Medical College,
Using statistics to evaluate your test Gerard Seinhorst
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 8 Objective Test Items.
Lies, Damned Lies & Statistical Analysis for Language Testing
Test construction 2.
Analyzing test data using Excel Gerard Seinhorst
Tests are given for 4 primary reasons.
Presentation transcript:

Item Analysis

Purpose of Item Analysis Evaluates the quality of each item Rationale: the quality of items determines the quality of test (i.e., reliability & validity) May suggest ways of improving the measurement of a test Can help with understanding why certain tests predict some criteria but not others

Item Analysis When analyzing the test items, we have several questions about the performance of each item. Some of these questions include: Are the items congruent with the test objectives? Are the items valid? Do they measure what they're supposed to measure? Are the items reliable? Do they measure consistently? How long does it take an examinee to complete each item? What items are most difficult to answer correctly? What items are easy? Are there any poor performing items that need to be discarded?

Types of Item Analyses for CTT Three major types: 1. Assess quality of the distractors 2. Assess difficulty of the items 3. Assess how well an item differentiates between high and low performers

DISTRACTOR ANALYSIS C. Multiple-Choice A. Multiple-Choke B. Multiply-Choice C. Multiple-Choice D. Multi-Choice

Distractor Analysis First question of item analysis: How many people choose each response? If there is only one best response, then all other response options are distractors. Example from in-class assignment (N = 35): Which method has the best internal consistency? # a) projective test 1 b) peer ratings 1 c) forced choice 21 d) differences n.s. 12

Distractor Analysis (cont’d) A perfect test item would have 2 characteristics: 1. Everyone who knows the item gets it right 2. People who do not know the item will have responses equally distributed across the wrong answers. It is not desirable to have one of the distractors chosen more often than the correct answer. This result indicates a potential problem with the question. This distractor may be too similar to the correct answer and/or there may be something in either the stem or the alternatives that is misleading.

Distractor Analysis (cont’d) Calculate the # of people expected to choose each of the distractors. If random same expected number for each wrong response (Figure 10-1). N answering incorrectly 14 Number of distractors 3 # of Persons Exp. To Choose Distractor = 4.7 =

Distractor Analysis (cont’d) When the number of persons choosing a distractor significantly exceeds the number expected, there are 2 possibilities: 1. It is possible that the choice reflects partial knowledge 2. The item is a poorly worded trick question unpopular distractors may lower item and test difficulty because it is easily eliminated extremely popular is likely to lower the reliability and validity of the test

Item Difficulty Analysis Description and How to Compute ex: a) (6 X 3) + 4 = ? b) 9[1n(-3.68) X (1 – 1n(+3.68))] = ? It is often difficult to explain or define difficulty in terms of some intrinsic characteristic of the item The only common thread of difficult items is that individuals did not know the answer

Percentage of test takers who respond correctly Item Difficulty Percentage of test takers who respond correctly What if p = .00 What if p = 1.00? ?

Item Difficulty An item with a p value of .0 or 1.0 does not contribute to measuring individual differences and thus is certain to be useless When comparing 2 test scores, we are interested in who had the higher score or the differences in scores p value of .5 have most variation so seek items in this range and remove those with extreme values can also be examined to determine proportion answering in a particular way for items that don’t have a “correct” answer

Item Difficulty (cont.) What is the best p-value? most optimal p-value = .50 maximum discrimination between good and poor performers Should we only choose items of .50? When shouldn’t we?

Should we only choose items of .50? Not necessarily ... When wanting to screen the very top group of applicants (i.e., admission to university or medical school). Cutoffs may be much higher Other institutions want a minimum level (i.e., minimum reading level) Cutoffs may be much lower

Item Difficulty (cont.) Interpreting the p-value... example: 100 people take a test 15 got question 1 right What is the p-value? Is this an easy or hard item?

Item Difficulty (cont.) Interpreting the p-value... example: 100 people take a test 70 got question 1 right What is the p-value? Is this an easy or hard item?

Item Difficulty (cont’d) General Rules of Item Difficulty… p low (< .20) difficult test item p moderate (.20 - .80) moderately diff. p high (> .80) easy item

ITEM DISCRIMINATION ... The extent to which an item differentiates people on the behavior that the test is designed to assess. the computed difference between the percentage of high achievers and the percentage of low achievers who got the item right.

Item Discrimination (cont.) compares the performance of upper group (with high test scores) and lower group (low test scores) on each item--% of test takers in each group who were correct

Item Discrimination (cont’d): Discrimination Index (D) Divide sample into TOP half and BOTTOM half (or TOP and BOTTOM third) Compute Discrimination Index (D)

Item Discrimination D = U - L U = # in the upper group correct response Total # in upper group L = # in the lower group correct response Total # in lower group The higher the value of D, the more adequately the item discriminates (The highest value is 1.0)

Item Discrimination seek items with high positive numbers (those who do well on the test tend to get the item correct) negative numbers (lower scorers on test more likely to get item correct) and low positive numbers (about the same proportion of low and high scorers get the item correct) don’t discriminate well and are discarded

Item Discrimination (cont’d): Item-Total Correlation Correlation between each item (a correct response usually receives a score of 1 and an incorrect a score of zero) and the total test score. To which degree do item and test measures the same thing? Positive -item discriminates between high and low scores Near 0 - item does not discriminate between high & low Negative - scores on item and scores on test disagree

Item Discrimination (cont’d): Item-Total Correlation Item-total correlations are directly related to reliability. Why? Because the more each item correlates with the test as a whole, the higher all items correlate with each other ( = higher alpha, internal consistency) ?

Quantitative Item Analysis Inter-item correlation matrix displays the correlation of each item with every other item provides important information for increasing the test’s internal consistency each item should be highly correlated with every other item measuring the same construct and not correlated with items measuring a different construct

Quantitative Item Analysis items that are not highly correlated with other items measuring the same construct can and should be dropped to increase internal consistency

Item Discrimination (cont’d): Interitem Correlation Possible causes for low inter-item correlation: a. Item badly written (revise) b. Item measures other attribute than rest of the test (discard) c. Item correlated with some items, but not with others: test measures 2 distinct attributes (subtests or subscales)