Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

Slides:

Advertisements

Similar presentations

Topics: Quality of Measurements

Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

VALIDITY AND RELIABILITY

Validity and Reliability

Research Methodology Lecture No : 11 (Goodness Of Measures)

Part II Sigma Freud & Descriptive Statistics

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

Testing What You Teach: Eliminating the “Will this be on the final

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Part II Sigma Freud & Descriptive Statistics

Reliability and Validity of Research Instruments

Assessment: Reliability, Validity, and Absence of bias

RESEARCH METHODS Lecture 18

Reliability and Validity Dr. Roy Cole Department of Geography and Planning GVSU.

Research Methods in MIS

Classroom Assessment A Practical Guide for Educators by Craig A

Understanding Validity for Teachers

Technical Issues Two concerns Validity Reliability

Measurement and Data Quality

Validity and Reliability

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Analyzing Reliability and Validity in Outcomes Assessment (Part 1) Robert W. Lingard and Deborah K. van Alphen California State University, Northridge.

Technical Adequacy Session One Part Three.

Study of the day Misattribution of arousal (Dutton & Aron, 1974)

Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.

Reliability & Validity

Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Measurement Validity.

Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

RELIABILITY AND VALIDITY OF ASSESSMENT

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.

Chapter 6 - Standardized Measurement and Assessment

Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

Validity EDUC 307 Chapter 3. What does this test tell me?  Validity as defined by Chase: "A test is valid to the extent that it helps educators make.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Educational Research Chapter 8. Tools of Research Scales and instruments – measure complex characteristics such as intelligence and achievement Scales.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Evaluation and Assessment Evaluation is a broad term which involves the systematic way of gathering reliable and relevant information for the purpose.

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –

Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.

1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Principles of Language Assessment

VALIDITY by Barli Tambunan/

Questions What are the sources of error in measurement?

Concept of Test Validity

Validity and Reliability

CHAPTER 5 MEASUREMENT CONCEPTS © 2007 The McGraw-Hill Companies, Inc.

Reliability & Validity

Classroom Assessment Validity And Bias in Assessment.

Human Resource Management By Dr. Debashish Sengupta

پرسشنامه کارگاه.

Analyzing Reliability and Validity in Outcomes Assessment Part 1

Chapter 4 Characteristics of a Good Test

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose of tests’ and the ‘consistency’ with which the purpose is fulfilled/met

Validity Depends on the PURPOSE E.g. a ruler may be a valid measuring device for length, but isn’t very valid for measuring volume Measuring what ‘it’ is supposed to Matter of degree (how valid?) Specific to a particular purpose! Must be inferred from evidence; cannot be directly measured Learning outcomes 1. Content coverage (relevance?) 2. Level & type of student engagement (cognitive, affective, psychomotor) – appropriate?

Reliability Consistency in the type of result a test yields Time & space participants Not perfectly similar result but ‘very close-to’ being similar When someone says you are a ‘reliable’ person, what do they really mean? Are you a reliable person?

What do you think…? Forced-choice assessment forms are high in reliability, but weak in validity (true/false) Performance-based assessment forms are high in both validity and reliability (true/false) A test item is said to be unreliable when most students answered the item wrongly (true/false) When a test contains items that do not represent the content covered during instruction, it is known as an unreliable test (true/false) Test items that do not successfully measure the intended learning outcomes (objectives) are invalid items (true/false) Assessment that does not represent student learning well enough are definitely invalid and unreliable (true/false) A valid test can sometimes be unreliable (true/false) If a test is valid, it is reliable! (by-product)

Question… In the context of what you understand about VALIDITY and RELIABILITY, how do you go about establishing/ensuring them in your own test papers?

Indicators of quality Validity Reliability Utility Fairness Question: how are they all inter-related?

Types of validity measures Face validity Construct validity Content validity Criterion validity 1. Predictive 2. Concurrent Consequences validity

Face Validity Does it appear to measure what it is supposed to measure? Example: Let’s say you are interested in measuring, ‘Propensity towards violence and aggression’. By simply looking at the following items, state which ones qualify to measure the variable of interest: Have you been arrested? Have you been involved in physical fighting? Do you get angry easily? Do you sleep with your socks on? Is it hard to control your anger? Do you enjoy playing sports?

Construct Validity Does the test measure the ‘human’ CHARACTERISTIC(s) it is supposed to? Examples of constructs or ‘human’ characteristics: Mathematical reasoning Verbal reasoning Musical ability Spatial ability Mechanical aptitude Motivation Applicable to PBA/authentic assessment Each construct is broken down into its component parts E.g. ‘motivation’ can be broken down to: Interest Attention span Hours spent Assignments undertaken and submitted, etc. All of these sub-constructs put together – measure ‘motivation’

Content Validity How well elements of the test relate to the content domain? How closely content of questions in the test relates to content of the curriculum? Directly relates to instructional objectives and the fulfillment of the same! Major concern for achievement tests (where content is emphasized) Can you test students on things they have not been taught?

How to establish Content Validity? Instructional objectives (looking at your list) Table of Specification E.g. At the end of the chapter, the student will be able to do the following: 1. Explain what ‘stars’ are 2. Discuss the type of stars and galaxies in our universe 3. Categorize different constellations by looking at the stars 4. Differentiate between our stars, the sun, and all other stars

Table of Specification (An Example)

Criterion Validity The degree to which content on a test (predictor) correlates with performance on relevant criterion measures (concrete criterion in the "real" world?) If they do correlate highly, it means that the test (predictor) is a valid one! E.g. if you taught skills relating to ‘public speaking’ and had students do a test on it, the test can be validated by looking at how it relates to actual performance (public speaking) of students inside or outside of the classroom

Two Types of Criterion Validity Concurrent Criterion Validity = how well performance on a test estimates current performance on some valued measure (criterion)? (e.g. test of dictionary skills can estimate students’ current skills in the actual use of dictionary – observation) Predictive Criterion Validity = how well performance on a test predicts future performance on some valued measure (criterion)? (e.g. reading readiness test might be used to predict students’ achievement in reading) Both are only possible IF the predictors are VALID

Consequences Validity The extent to which the assessment served its intended purpose Did the test improve performance? Motivation? Independent learning? Did it distort the focus of instruction? Did it encourage or discourage creativity? Exploration? Higher order thinking?

Factors that can lower Validity Unclear directions Difficult reading vocabulary and sentence structure Ambiguity in statements Inadequate time limits Inappropriate level of difficulty Poorly constructed test items Test items inappropriate for the outcomes being measured Tests that are too short Improper arrangement of items (complex to easy?) Identifiable patterns of answers Teaching Administration and scoring Students Nature of criterion

Reliability Measure of consistency of test results from one administration of the test to the next Generalizability – consistency (interwoven concepts) – if a test item is reliable, it can be correlated with other items to collectively measure a construct or content mastery A component of validity Length of assessment

Measuring Reliability Test – retest Give the same test twice to the same group with any time interval between tests Equivalent forms (similar in content, difficulty level, arrangement, type of assessment, etc.) Give two forms of the test to the same group in close succession Split-half Test has two equivalent halves. Give test once, score two equivalent halves (odd items vs. even items) Cronbach Alpha (SPSS) Inter-item consistency – one test – one administration Inter-rater Consistency (subjective scoring) Calculate percent of exact agreement by using Pearson's product moment and find out the coefficient of determination (SPSS)

How to improve Reliability? Quality of items; concise statements, homogenous words (some sort of uniformity) Adequate sampling of content domain; comprehensiveness of items Longer assessment – less distorted by chance factors Developing a scoring plan (esp. for subjective items – rubrics) Ensure VALIDITY