RELIABILITY AND VALIDITY OF ASSESSMENT

Slides:



Advertisements
Similar presentations
Item Analysis.
Advertisements

FACULTY DEVELOPMENT PROFESSIONAL SERIES OFFICE OF MEDICAL EDUCATION TULANE UNIVERSITY SCHOOL OF MEDICINE Using Statistics to Evaluate Multiple Choice.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
VALIDITY AND RELIABILITY
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
CH. 9 MEASUREMENT: SCALING, RELIABILITY, VALIDITY
Reliability and Validity of Research Instruments
RESEARCH METHODS Lecture 18
VALIDITY.
ANALYZING AND USING TEST ITEM DATA
Research Methods in MIS
Chapter 7 Correlational Research Gay, Mills, and Airasian
Educational Assessment
Classroom Assessment A Practical Guide for Educators by Craig A
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
Technical Issues Two concerns Validity Reliability
Measurement and Data Quality
Validity and Reliability
Reliability, Validity, & Scaling
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Induction to assessing student learning Mr. Howard Sou Session 2 August 2014 Federation for Self-financing Tertiary Education 1.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.
Reliability & Validity
EDU 8603 Day 6. What do the following numbers mean?
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Measurement Validity.
Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Assessment and Testing
Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
Measurement MANA 4328 Dr. Jeanne Michalski
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
DENT 514: Research Methods
Chapter 6 - Standardized Measurement and Assessment
VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.
Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
RELIABILITY AND VALIDITY Dr. Rehab F. Gwada. Control of Measurement Reliabilityvalidity.
Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –
Measurement and Scaling Concepts
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.
Lecture 5 Validity and Reliability
ARDHIAN SUSENO CHOIRUL RISA PRADANA P.
Concept of Test Validity
Assessment Theory and Models Part II
Validity and Reliability
Reliability & Validity
Human Resource Management By Dr. Debashish Sengupta
پرسشنامه کارگاه.
PSY 614 Instructor: Emily Bullock, Ph.D.
RESEARCH METHODS Lecture 18
Chapter 4 Characteristics of a Good Test
Presentation transcript:

RELIABILITY AND VALIDITY OF ASSESSMENT

ITEM ANALYSIS

Item analysis has to be done, before a meaningful and scientific inference about the test can be made in terms of its validity, reliability, objectivity and usability. Process which examines student’s response to individual test items inorder to assess the quality of the items and of the test as a whole.

The tools include : Item difficulty. Item discrimination. Item distractors.

THE PURPOSES OF ITEM ANALYSIS

Improve test items and identify unfair items. Reveal which questions were most difficult. If a particular distracter is the most often chosen answer, the item must be examined. To identify common misconceptions among students about a particular concept.

To improve the quality of tests. If items are too hard, teachers can adjust the way they teach.

Item Difficulty

It is the percentage of students taking the test who answered the item correctly. Higher the value, easier the item.

D= R/N X 100 R – Number of pupils who answered the item correctly. N – Total number of pupils who tried them.

Example Number of pupils answered item correctly = 40 Total number of pupils who tried them = 50 40/50 X 100 = 80 %

Ideal difficulty levels for multiple-choice items Format IdealDifficulty Five-response multiple-choice 70 Four-response multiple-choice 74 Three-response multiple-choice 77 True-false 85

Item Discrimination

Ability of an item to differentiate among the students on the basis of how well they know the material being tested. A good item discriminates between those who do well on the test and those who do poorly. Higher the discrimination index,better the item.

DI = RU – RL/1/2 X N RU – Number of correct responses from the upper group RL – number of correct responses from lower group N – total number of pupils who tried them.

Example Total score – 60 Total sample – 50 Upper group – 25 Lower group – 25 22 – 10/1/2 X 50 = 0.29

Interpretation 0.4 or higher – very good items. 0.3 to 0.39 - good items. 0.20 to 0.29 –fairly good items. 0.19 or less – poor items. So the item in the example, is a fairly good item.

Distractors

Analyzing the distractors (i. e Analyzing the distractors (i.e., incorrect alternatives) is useful in determining the relative usefulness of the decoys in each item. The alternatives are probably totally implausible and therefore of little use as decoys in multiple choice items.

One way to study responses to distractors is with a frequency table that tells you the proportion of students who selected a given distractor. Remove or replace distractors selected by a few or no students because students find them to be implausible.

RELIABILITY

Reliability is the degree to which an assessment tool produces stable and consistent results.

TYPES OF RELIABILITY

Test – retest reliability Parallel reliability Inter – rater reliability Internal consistency Form equivalence (Alternate form)

Test-retest reliability

Obtained by administering the same test twice over a period of time to a group of individuals.  Scores from Time 1 and Time 2 can then be correlated to evaluate the test for stability. Also known as temporal stability.

Parallel forms reliability

It is obtained by administering different versions of an assessment tool to the same group of individuals.  Scores from the two versions can then be correlated to evaluate the consistency of results across alternate versions. 

Inter-rater reliability

Used to assess the degree to which different judges or raters agree in their assessment decisions.  Useful because human observers will not necessarily interpret answers the same way.

Internal consistency reliability

It used to evaluate the degree to which different test items that probe the same construct produce similar results. Two types are Average inter-item correlation Split-half reliability

Average inter-item correlation Obtained by taking all of the items on a test that probe the same construct , determining the correlation coefficient for each pair of items, and finally taking the average of all of these correlation coefficients.   

Split-half reliability “Splitting in half” all items of a test to form two “sets” of items.  The total score for each “set” is computed. Determining the correlation between the two total “set” scores to obtain split half reliability.

Form equivalence (Alternate form)

Also known as alternate form reliability. Two different forms of test, based on the same content, on one occasion to the same examinees. Reliability is stated as correlation between scores of Test 1 and Test 2.

VALIDITY

An indication of how well an assessment actually measures what it is supposed to measure. Refers to the accuracy of an assessment. It is the veracity of an assessment instrument.

TYPES OF VALIDITY

Face validity Construct validity Content validity Criterion related validity Formative validity Sampling validity

Face Validity

Measure of the extent to which an examination looks like an examination in the subject concerned and at the appropriate level. Candidates, teachers and the public have expectations as to what an examination looks like and how it is conducted.

Construct Validity

The extent to which an assessment corresponds to other variables, as predicted by some rationale or theory. It is also known as theoretical construct.  

Content Validity

The extent to which a measure adequately represents all facets of a concept. It is the extent to which the content of the test matches the instructional objectives

Criterion-Related validity

Degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?)

Formative Validity

When applied to outcomes assessment it is used to assess how well a measure is able to provide information to help improve the program under study.

Sampling Validity

It is similar to content validity. It ensures that the measure covers the broad range of areas within the concept under study. 

FACTORS THAT CAN LOWER VALIDITY

Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty

Cont’d Poorly constructed test items. Test items inappropriate for the outcomes being measured. Tests that are too short. Administration and scoring.

Cont’d Improper arrangement of items (complex to easy?). Identifiable patterns of answers. Teaching. Students . Nature of criterion.

WAYS TO IMPROVE VALIDITY AND RELIABILITY

IMPROVING RELIABILITY First, calculate the item-test correlations and rewrite or reject any that are too low. Second, look at the items that did correlate well and write more like them. The longer the test, the higher the reliability up to a point

IMPROVING VALIDITY Make sure your goals and objectives are clearly defined and operationalized.  Expectations of students should be written down. Match your assessment measure to your goals and objectives.

Cont’d Have the test reviewed by faculty at other schools to obtain feedback from an outside party who is less invested in the instrument. Get students involved; have the students look over the assessment for troublesome work.

RELATIONSHIP BEWEEN RELIABILITY AND VALIDITY

The two do not necessarily go hand-in-hand. We can illustrate it as follows. Reliable but not valid - an archer who always hits about the same place but not near the bullseye.

Valid but not reliable - archer who hits various places centered around the bullseye, but not very accurately. Neither reliable nor valid - an archer who hits various places all off to the same side of the bullseye.

Cont’d Both reliable and valid - archer who hits consistently close to the bullseye.    A valid assessment is always reliable, but a reliable assessment is not necessarily valid.

FACTORS IN RESOLVING CONFLICTS BETWEEN VALIDITY AND RELIABILITY

Validity is paramount. Validity will not damage educational effectiveness but excessive concern for reliability or costs may do so. Staff costs are limited by the credits in the workload planning system being used.

Cont’d Student time costs are limited by the planned learning hours allocated to them. Reliability cannot be 100% for any one assessment and may need to be compromised. Between-marker reliability can be improved by marker training and monitoring.

Cont’d Clear, detailed criteria will maximise examiner reliability and validity. Educationally effective coursework assessments are often simultaneously designed to prevent plagiarism detection.

Cont’d Where each student produces a number of similar assignments they can be randomly sampled. Self and peer assessment can reduce staff costs and uses as a learning activity. High-reliability assessment is costly and so should be used only where it is critical.

Cont’d Programme-wide design of assessment can avoid the worst of the conflicts. Designing good assessments is a creative, challenging task that demands expertise in the teaching of the subject,time and is improved by peer support and review.

QUESTIONS