Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September 15-17 Phoenix,

Slides:



Advertisements
Similar presentations
FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Advertisements

MCR Michael C. Rodriguez Research Methodology Department of Educational Psychology.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Copyright © 2012 Pearson Education, Inc. or its affiliate(s). All rights reserved
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 6 Validity.
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Advanced Topics in Standard Setting. Methodology Implementation Validity of standard setting.
Validity In our last class, we began to discuss some of the ways in which we can assess the quality of our measurements. We discussed the concept of reliability.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Evaluating tests and examinations What questions to ask to make sure your assessment is the best that can be produced within your context. Dianne Wall.
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Setting Performance Standards Grades 5-7 NJ ASK NJDOE Riverside Publishing May 17, 2006.
Assessment: Reliability, Validity, and Absence of bias
Jamal Abedi University of California, Davis/CRESST Presented at The Race to the Top Assessment Program January 20, 2010 Washington, DC RACE TO THE TOP.
Psych 231: Research Methods in Psychology
CONCEPT OF SELECTION The next step after requirement is the selection of candidates for the vacant position from among the applicants. This is the most.
Research Methods in MIS
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment A Practical Guide for Educators by Craig A
Validity Lecture Overview Overview of the concept Different types of validity Threats to validity and strategies for handling them Examples of validity.
2007 Annual Conference Competence and Conduct William D. Hogan Applied Measurement Professionals, Inc.
Validity and Validation: An introduction Note: I have included explanatory notes for each slide. To access these, you will probably have to save the file.
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
1 Development of Valid and Reliable Case Studies for Teaching, Diagnostic Reasoning, and Other Purposes Margaret Lunney, RN, PhD Professor College of.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.
Chapter 7 Item Analysis In constructing a new test (or shortening or lengthening an existing one), the final set of items is usually identified through.
CLEAR 2008 Annual Conference Anchorage, Alaska “Problems and Priorities in Pretesting Pondered” Beth Noeller Grady Barnhill Carol O’Byrne.
Sterling Practices in Design & Scoring of Performance-Based Exams #156 F. Jay Breyer Presented at the 2005 CLEAR Annual Conference.
Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.
Item specifications and analysis
Role of Statistics in Developing Standardized Examinations in the US by Mohammad Hafidz Omar, Ph.D. April 19, 2005.
Enhancing the Technical Quality of the North Carolina Testing Program: An Overview of Current Research Studies Nadine McBride, NCDPI Melinda Taylor, NCDPI.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Presented at CLEAR’s 23rd Annual Conference Toronto, Ontario September, 2003 Defending Your Licensing Examination Programme Deborah Worrad Registrar and.
CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:
Differential Item Functioning. Anatomy of the name DIFFERENTIAL –Differential Calculus? –Comparing two groups ITEM –Focus on ONE item at a time –Not the.
Evaluating Survey Items and Scales Bonnie L. Halpern-Felsher, Ph.D. Professor University of California, San Francisco.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Ch 9 Internal and External Validity. Validity  The quality of the instruments used in the research study  Will the reader believe what they are readying.
Assessing Responsiveness of Health Measurements Ian McDowell, INTA, Santiago, March 20, 2001.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Fitz-Albert R. Russell, PhD JP Establishing Standards for Students’ Assessment.
Chapter 6 - Standardized Measurement and Assessment
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
Stages of Test Development By Lily Novita
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Lecture 5 Validity and Reliability
Test Design & Construction
Introduction to the Validation Phase
HRM – UNIT 10 Elspeth Woods 9 May 2013
Week 3 Class Discussion.
Role of Statistics in Developing Standardized Examinations in the US
Chapter 8 VALIDITY AND RELIABILITY
Presentation transcript:

Kaizen–What Can I Do To Improve My Program? F. Jay Breyer, Ph.D. Presented at the 2005 CLEAR Annual Conference September Phoenix, Arizona

2 Test Development Process (Where we have been)  Content: found to be important for job as determined by job analysis  Sampling of content: How many items are needed in the test form necessary to assess minimal competency?  Importance of content domains: What is the emphasis on specific content domains?  Based on identified test specifications, select items that match content domains  Evaluate total item bank  Pretest new items  Evaluate statistical parameters: verify appropriate performance of items  Outcome: Valid & reliable test that is sound and defensible  But wait!!!: We can do something else … how can we change what we do to improve the testing program?  Review and edit items to ensure correct grammatical structure and adherence to fairness and sensitivity guidelines  Equate test forms following the standard setting to ensure comparability of test scores for different test forms  Prepare test forms for administration : paper-and- pencil delivery or computer delivery Validity Reliability & Defensibility Content Test Specifications Item Type Item Development Item Writing Statistical Analysis Form Assembly Edit & Fairness Review Statistical Parameters Test Modality

3 It seems we would never get to this point but here we are and before the next test is created …  What can we learn from this administration?  What should we do to find out about our examination we just gave and reported? After the Examination is Over…. Activities  What is the size and quality of my item bank  Do I have sufficient numbers of items in each content area for the next examination form?  Can I assemble the next form to content and statistical specifications?  How do I find out what my statistical specifications are?  What is the reliability of my test?

4  Determining appropriate psychometric approaches to item and test development  What do you do if your test is too  Long for the time allotted?  Too hard/easy for the population tested and the purpose?  Not sufficiently reliable for the test’s purpose?  Item analysis of the test before scores are reported helps ensure validity  Correct keys are used to grant points  Items function as intended  But Test Analyses after the test is reported can be useful for  Construction of new test forms  Evaluation of item creation techniques  Changes that improve the testing program Approaches Challenges

5 Help ensure quality for testing programs that wish to verify that appropriate test development and psychometric procedures are being used. These analyses help to verify that the program’s test development activities are psychometrically sound and provide directions for possible continuous improvement Test Analyses Assure the public of meeting basic standards of Quality & Fairness Reliability Answer the question “How are my test development activities doing?” Analyses should not Analyses should Limit innovation or have a punitive function Be ignored

6 Item Analyses at Different Times PIA –Preliminary Item Analysis EIA –Early Item Analysis IA after PINS but before equating or cut score study FIA –Final Item Analysis

7 PIA: Only Bad Items

8 PIA: Hard Item

9 PIA: Key Issue

10 FIA: Everything C 89.0

11 Item/Task Information Total Score Information Subscore Information  Reliability  Score Distributions  Descriptive Information  Speededness  Reliability of reported subscores  Score Distributions  Descriptive Information Post Test Administration Inquiry A FAIR TEST  Quality of items/tasks from past test  Difficulty  Discrimination  DIF

12 Score Information: Reliability and Validity Reliability –Consistency & Accuracy Validity –Score inferences, score meaning, score interpretations What we can say about people

13 Score Information: Reliability Reliability –Consistency and Accuracy Credential Testing –Refers to consistency of test scores across different test forms given the content sampling Alpha, Kuder-Richardson, (K-R 20 ) –Refers to consistency of passing and failing the same people as if they were able to take the test twice Subkoviak, PF Consistency, RELCLASS

14 Score Information: Reliability Measurement Error –Refers to random fluctuations in a person’s score due to factors not related to the content of the test SEM CSEM

15 Test Analyses: Score Information %

16 Test Analyses: Score Information Correlations can add to the understanding of score reliability

17 Item Information: DIF & Sensitivity Sensitivity –How questions appear –Review by TD Person Removes words and phrases from a test that may be Insulting Defamatory Charged Differential Item Functioning (DIF) –How question behave –Searches for items with Construct Irrelevant Variance –Tests differences in item difficulty for k groups when matched on proficiency –Mantel-Haenszel

18 DIF Impact is not DIF –The assessment of group differences in test performance between unmatched focal and reference group members –Confounding of item performance differences between focal; and reference groups

19 DIF How DIF is calculated –The criterion is the total test score or Construct –The question DIF answers is: Is the meaning the same for the focal group as it is for the reference group? –If the interpretation of the scores – the meaning, is different for subgroups then DIF is present DIF has to do with improving validity

20 In Summary Statistical Information following Test Administration can provide –Item information Difficulty and suitability of the items/tasks for your candidate samples DIF –Potential sources of bias (invalidity) –Decision Score Information Distributions – descriptive statistics – reliability information –Subscore Information Reliability information – intercorrelations Help highlight areas for continuous improvement –Kaizen