Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar.

Slides:



Advertisements
Similar presentations
Quality Control in Evaluation and Assessment
Advertisements

The Test of English for International Communication (TOEIC): necessity, proficiency levels, test score utilization and accuracy. Author: Paul Moritoshi.
Nükte Durhan METU, Northern Cyprus Campus, School of Foreign Languages (Ankara, 30 May 2012)
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
Testing What You Teach: Eliminating the “Will this be on the final
General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.
Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.
Learning targets: Students will be better able to: ‘Unpack’ the standards. Describe the purpose and value of using a rubric Evaluate whether a rubric can.
VALIDITY AND TEST VALIDATION Prepared by Olga Simonova, Inna Chmykh, Svetlana Borisova, Olga Kuznetsova Based on materials by Anthony Green 1.
Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.
© 2008 McGraw-Hill Higher Education. All rights reserved. CHAPTER 16 Classroom Assessment.
Basic Issues in Language Assessment 袁韻璧輔仁大學英文系. Contents Introduction: relationship between teaching & testing Introduction: relationship between teaching.
Creating Effective Classroom Tests by Christine Coombe and Nancy Hubley 1.
© UCLES 2013 Assessing the Fit of IRT Models in Language Testing Muhammad Naveed Khalid Ardeshir Geranpayeh.
Chapter 9 Flashcards. measurement method that uses uniform procedures to collect, score, interpret, and report numerical results; usually has norms and.
Classroom Assessment A Practical Guide for Educators by Craig A
Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.
Principles of Language Assessment Ratnawati Graduate Program University State of Semarang.
Technical Issues Two concerns Validity Reliability
Chapter 1 Assessment in Elementary and Secondary Classrooms
6 th semester Course Instructor: Kia Karavas.  What is educational evaluation? Why, what and how can we evaluate? How do we evaluate student learning?
Classroom Assessment and Grading
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Quality in language assessment – guidelines and standards Waldek Martyniuk ECML Graz, Austria.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
Principles of Test Construction
Classroom Assessment A Practical Guide for Educators by Craig A
1 An Introduction to Language Testing Fundamentals of Language Testing Fundamentals of Language Testing Dr Abbas Mousavi American Public University.
Measuring Mathematical Knowledge for Teaching: Measurement and Modeling Issues in Constructing and Using Teacher Assessments DeAnn Huinker, Daniel A. Sass,
” Interface” Validity Investigating the potential role of face validity in content validation Gábor Szabó, Robert Märcz ECL Examinations EALTA 9 - Innsbruck,
Student assessment Assessment tools AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Reliability & Validity
Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.
Military Language Testing at the National Defence University and the Common European Framework BILC CONFERENCE BUDAPEST.
Automated Scoring is a Policy and Psychometric Decision Christina Schneider The National Center for the Improvement of Educational Assessment
USEFULNESS IN ASSESSMENT Prepared by Vera Novikova and Tatyana Shkuratova.
ASSESSING STUDENT ACHIEVEMENT Using Multiple Measures Prepared by Dean Gilbert, Science Consultant Los Angeles County Office of Education.
A COMPARISON METHOD OF EQUATING CLASSIC AND ITEM RESPONSE THEORY (IRT): A CASE OF IRANIAN STUDY IN THE UNIVERSITY ENTRANCE EXAM Ali Moghadamzadeh, Keyvan.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Session 4 Performance-Based Assessment
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
Relating examinations to the CEFR – the Council of Europe Manual and supplementary materials Waldek Martyniuk ECML, Graz, Austria.
NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 22, 2011 ORLANDO, FL.
Maurice Grinberg evaluation.nbu.bg ALTE meeting, Lisbon, November
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
1 Scoring Provincial Large-Scale Assessments María Elena Oliveri, University of British Columbia Britta Gundersen-Bryden, British Columbia Ministry of.
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.
WHS AP Psychology Unit 7: Intelligence (Cognition) Essential Task 7-3:Explain how psychologists design tests, including standardization strategies and.
EVALUATING EPP-CREATED ASSESSMENTS
BILC Seminar, Budapest, October 2016
Principles of Language Assessment
Classroom Assessment A Practical Guide for Educators by Craig A
ECML Colloquium2016 The experience of the ECML RELANG team
Reliability and Validity
Introduction to the Validation Phase
Introduction to the Validation Phase
Stages of test construction
Validity and reliability of rating speaking and writing performances
Week 3 Class Discussion.
RELATING NATIONAL EXTERNAL EXAMINATIONS IN SLOVENIA TO THE CEFR LEVELS
PSY 614 Instructor: Emily Bullock, Ph.D.
Roadmap Towards a Validity Argument
From Learning to Testing
Presentation transcript:

Using the IRT and Many-Facet Rasch Analysis for Test Improvement “ALIGNING TRAINING AND TESTING IN SUPPORT OF INTEROPERABILITY” Desislava Dimitrova, Dimitar Atanasov New Bulgarian University BILC Seminar, October 2010-Varna

Outline  Examination procedure  Main concepts and observations  Socio-cognitive test validation framework, Cyril Weir (2005) and criteria  Scoring validity for listening and reading parts of the test  Scoring validity for essay

Test structure 1. Listening paper: two tasks  15 MCQ 2. Reading paper: five tasks  6 items matching response format  10 items bank-cloze response format  10 items open-cloze response format  16 items short-answer response format  2 open-ended questions  5 MCQ 3. Essay: words

Too much? The concept of communicative language ability (CEFR) The concept of test usefulness (Bachman) The concept of justifing the use of language assessment in real world (Bachman) The concept of validity The Code of practice (ALTE *, for example) * Association of Language Testers in Europe

Statements NBU exam is high-stake. NBU exam is criterion-oriented. NBU exam is ‘independent’. Evidences for test validation were not established, BUT there was a routine practice for test development process and test administration.

The Socio-cognitive Framework for test validation, Cyril Weir (2005) Test takers characteristics and: Context validity Theory-based validity Scoring validity Consequential validity Criterion-related validity

“ Before-the –test- event” Context validity Theory-based validity “After- the- test –event” Scoring validity Consequential validity Criterion-related validity

Scoring validity for listening and reading parts of the test are established by: Item analysis Internal consistency Error of measurement Marker reliability Not just looking at them! Investigate, discuss, learn and take decisions!

Analisis3-parameter IRT model Advantages Item parameter estimates are independent of the group of examinees used Test taker ability estimates are independent of the particular set of items used Degree of Difficulty to specify the discrimination to specify the content

Summer session, 2010

Item number Version 1 Values of difficulty Version 2 Values of difficulty Version 3 Values of difficulty Version 4 Values of difficulty 1-1,7-1,21,6-0,7 2-1,5-1,21.9-2,2 3-1,7-2,92,6-0,4 4-0,5-2,4-0,9-0, ,12,6-1, ,1-0,3-0,2

Possible decisions Remedial procedures Classroom assessment Only certification decision

Scoring validity for writing is established by: Criteria/rating scale Rating procedures: Rater training Standardization Rating conditions Rating Moderation Statistical analysis Raters Grading

Conclusion for the essay: Good Two raters Analytic writing scale Rubrics and input Negative The score depends on the raters No task specific scale No standardization

Now is fact that: We will continue our work for item writer’s training content and statistical specification of the items test review and test revision

Shearing: Investigation (small steps to “strong” validity). Comparison (language ability of the same population at the same level) Cooperation ( in research project)

Thank you New Bulgarian University