Modeling Item Difficulty for Perceptual Speed and Accuracy Tests Stephanie Taylor, MA Alan Mead, PhD.

Slides:



Advertisements
Similar presentations
YOU CANT RECYCLE WASTED TIME Victoria Hinkson. EXPERIMENT #1 :
Advertisements

Test Development.
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
What is a CAT?. Introduction COMPUTER ADAPTIVE TEST + performance task.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Split Questionnaire Designs for Consumer Expenditure Survey Trivellore Raghunathan (Raghu) University of Michigan BLS Workshop December 8-9, 2010.
#1 When have you felt particularly successful in school?
Reading Graphs and Charts are more attractive and easy to understand than tables enable the reader to ‘see’ patterns in the data are easy to use for comparisons.
Factors: Situational Characteristics9-10a Factors and Characteristics Important Issues and Questions Situational Characteristics Budget of available resources.
The Scientific Method: Scientific Questions
INTRODUCTIONRESULTS PURPOSE METHODS CONCLUSION The Correlation between Parental Perception of Movement Difficulties and Scoring on a Motor Proficiency.
Matching Experiment Class Results. Experiment Analyzed 114 subjects after removal of subjects who completed fewer than 4 problems 8 problems 2.
Memory Span A Comparison Between Major Types Amy Bender, Jeremy Owens, and Jared Smith Hanover College 2007.
1 Measurement Measurement Rules. 2 Measurement Components CONCEPTUALIZATION CONCEPTUALIZATION NOMINAL DEFINITION NOMINAL DEFINITION OPERATIONAL DEFINITION.
Critique of Research Outlines: 1. Research Problem. 2. Literature Review. 3. Theoretical Framework. 4. Variables. 5. Hypotheses. 6. Design. 7. Sample.
Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 9 Subjective Test Items.
BASICS OF WORKFORCE PLANNING
Intelligent System Lab. (iLab) Southern Taiwan University of Science and Technology 1 Estimation of Item Difficulty Index Based on Item Response Theory.
Classroom Assessment A Practical Guide for Educators by Craig A
The physics GRE Mark Messier Indiana University. Resources on the web General information and Physics Test Practice Book:
Learning Objective Chapter 7 Primary Data Collection: Survey Research CHAPTER seven Primary Data Collection: Survey Research Copyright © 2000 by John Wiley.
By: Christopher Prewitt & Deirdre Huston.  When doing any project it is important to know as much information about the project and the views of everyone.
Conducting a User Study Human-Computer Interaction.
INT 506/706: Total Quality Management Introduction to Design of Experiments.
Are there “Hidden Variables” in Students’ Initial Knowledge State Which Correlate with Learning Gains? David E. Meltzer Department of Physics and Astronomy.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
The Psychology of the Person Chapter 2 Research Naomi Wagner, Ph.D Lecture Outlines Based on Burger, 8 th edition.
Quantitative Research. Types of experiments u True experiment –Has a control group u Quasi-experiment –No control group –Normal style for work in social.
Teaching Thermodynamics with Collaborative Learning Larry Caretto Mechanical Engineering Department June 9, 2006.
Yr 7 Parents Forum: Introduction to MidYIS West Island School October 2013.
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Assessment  New testing system - SAGE  Student Assessment for Growth and Excellence  Built with and for Utah  Provided by AIR  American Institutes.
Writing Supply Items Gronlund, Chapter 8. Supply Type Items Require students to supply the answer Length of response varies –Short-answer items –Restricted-response.
Section B: Acquiring, developing and performing movement skills 2. Definition and characteristics of motor and perceptual skills.
Cues to Confidence and Consistency Jacob Westfall Richard Petty David Dunning Leif Nelson.
Chapter 9: Intelligence and Individual Differences in Cognition Module 9.1 What is Intelligence? Module 9.2 Measuring Intelligence Module 9.3 Special Children,
Acute effects of alcohol on neural correlates of episodic memory encoding Hedvig Söderlund, Cheryl L. Grady, Craig Easdon and Endel Tulving Sundeep Bhullar.
Reicher (1969): Word Superiority Effect Dr. Timothy Bender Psychology Department Missouri State University Springfield, MO
4.2.6The effects of an additional eight years of English learning experience * An additional eight years of English learning experience are not effective.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
1 Item Analysis - Outline 1. Types of test items A. Selected response items B. Constructed response items 2. Parts of test items 3. Guidelines for writing.
Assessment and Testing
Perceptual Learning, Roving and the Unsupervised Bias By Aaron Clarke, Henning Sprekeler, Wolfram Gerstner and Michael Herzog Brain Mind Institute École.
High Quality Items for High Stake Tests Delhi, 20 November, 2015.
Assessment My favorite topic (after grammar, of course)
Result 1: Effect of List Length Result 2: Effect of Probe Position Prediction by perceptual similarity Prediction by physical similarity Subject
WHITEBOARD PRACTICE FINDING THE MISSING ANGLE IN AN ANGLE PAIR.
Psychology 101. Key Terms we will use often Stimulus – a physical event capable of affecting behavior Behavior – actions that can be observed and measured.
University of Baltimore Test Development Solutions (TDS) Thomas Fiske, M.S. - Test Development Team Lead Charles Glover, M.S. - Test Developer; Diann M.
Making the most of Assessment Data in the Secondary Years Dr Robert Clark.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
COMMON TEST TECHNIQUES FROM TESTING FOR LANGUAGE TEACHERs.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Feature Binding: Not Quite So Pre-attentive Erin Buchanan and M
Kimron Shapiro & Frances Garrad-Cole The University of Wales, Bangor
The Scientific Method.
Concept of Test Validity
Implications and Future Studies
Table 1.2 Pants or Shorts Production Possibilities
RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS
Two Sample Tests When do use independent
Impacts of workload on trust in imperfect automated systems
The involvement of visual and verbal representations in a quantitative and a qualitative visual change detection task. Laura Jenkins, and Dr Colin Hamilton.
The Scientific Method in Psychology
Chapter 4: Demand Section 1
Reasoning in Psychology Using Statistics
Mohamed Dirir, Norma Sinclair, and Erin Strauts
Reasoning in Psychology Using Statistics
Matching Experiment Class Results.
Tests are given for 4 primary reasons.
Presentation transcript:

Modeling Item Difficulty for Perceptual Speed and Accuracy Tests Stephanie Taylor, MA Alan Mead, PhD

Cognitive Ability Tests Predictor of performance – Measure job-related ability Speeded component

Speeded vs. Power Tests Speeded Tests Items: trivially easy Score: Number of incomplete items Lower cost Power Tests Items: more involved/time consuming Score: Number of Incorrect answers More time consuming (costly)

Speeded Tests Cont’d Psychometric issues – Testers run out of the time – guessing – Serial position influences difficulty

Present study Perceptual Speed Accuracy tests (PSA) Model item difficulty

Hypothesis H1: Short stimuli will be easier than longer stimuli. H2: (a) Stimuli with differences in the beginning position will be easiest. (b) Stimuli with differences in the last position will be more difficult. And (c) stimuli with differences in the middle position will be most difficult. H3: Items with substitute characters will be hardest when the differing pair of characters have high degree of visual similarity and easiest where the differing pair of characters have low visual similarity. We are currently using item similarity ratings collected by Boles & Clifford (1989).

Research Questions RQ1: How well can we predict the difficulty of PSA items? RQ2: What are the relative weights of the three factors? RQ3: Are there interactions among the three factors in predicting item difficulty?

Method Our Design uses a repeated-measures approach. Each participant completes items designed to measure each of the 48 combinations of our three factors (4 stimulus sizes; 4 positions; and 3 levels of visual similarity) as well as 48 similar identical items (to balance the key), so total test length is 96 item-pairs. Each item is individually timed to prevent the serial position effect. Our target sample size is 200 participants, recruited from our department subject pool. This research is scheduled to be completed during the Fall 2013 semester.

Questions for you How can we analyze the data? How do we address the issue of guessing? What other factors might predict difficulty?

Your Questions?

Thank you!