Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

Slides:



Advertisements
Similar presentations
Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.
Advertisements

Consistency in testing
Topics: Quality of Measurements
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 5 Reliability Robert J. Drummond and Karyn Dayle Jones Assessment Procedures for Counselors and Helping Professionals, 6 th edition Copyright ©2006.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Chapter 4 – Reliability Observed Scores and True Scores Error
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
Lesson Six Reliability.
Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.
Reliability and Validity of Research Instruments
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.
Reliability n Consistent n Dependable n Replicable n Stable.
Reliability and Validity
Session 3 Normal Distribution Scores Reliability.
Research Methods in MIS
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Classical Test Theory By ____________________. What is CCT?
Measurement and Data Quality
Validity and Reliability
Foundations of Educational Measurement
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Reliability Lesson Six
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Test item analysis: When are statistics a good thing? Andrew Martin Purdue Pesticide Programs.
Reliability Chapter 3. Classical Test Theory Every observed score is a combination of true score plus error. Obs. = T + E.
Reliability Chapter 3.  Every observed score is a combination of true score and error Obs. = T + E  Reliability = Classical Test Theory.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Tests and Measurements Intersession 2006.
Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
1 EPSY 546: LECTURE 1 SUMMARY George Karabatsos. 2 REVIEW.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Measurement MANA 4328 Dr. Jeanne Michalski
1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.
Reliability n Consistent n Dependable n Replicable n Stable.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.
Chapter 6 Norm-Referenced Reliability and Validity.
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.
Measuring Research Variables
©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.
Ch. 5 Measurement Concepts.
Classical Test Theory Margaret Wu.
Reliability & Validity
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
By ____________________
The first test of validity
Presentation transcript:

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test

Overview of Reliability Techniques 1. Test-retest 2. Parallel/Alternate Forms 3. Split-half TT 12 A 1 2 T AB Score pairs 4. Internal Consistency K-R-20 Coefficient Alpha

Test-retest method [error is due to changes occurring due to the passage of time] Some Issues: Length of time between test administrations if crucial (generally, the longer the interval, the lower the reliability) Memory Stability of the construct being assessed Speed tests, sensory discrimination, psychomotor tests (possible fatigue factor)

Parallel/Alternate Forms [error due to test content and perhaps passage of time] Two types: 1) Immediate (back-to-back administrations) 2) Delayed (a time interval between administrations) Some Issues: Need same number & type of items on each test Item difficulty must be the same on each test Variability of scores must be the same on each test

Split-half reliability [error due to differences in item content between the halves of the test] Typically, responses on odd versus even items are employed Correlate total scores on odd items with the scores obtained on even items Need to use the Spearman-Brown correction formula r ttc = nr (n – 1) r 12 # of times the test is lengthened correlation between both parts of the test corrected r for the total test Person Odd Even

KR-20 and Coefficient Alpha [error due to item similarity] KR-20 is used with scales that have right & wrong responses (e.g., achievement tests) Alpha is used for scales that have a range of response options where there are no right or wrong responses (e.g., 7-point Likert-type scales) R tt = k  p i (1 – p i ) k – 1  y 2 # of items variance of test scores % of those getting the item correct KR-20  = k 1 –  i 2 k – 1  y 2 # of items variance of test scores variance of scores on each item Alpha

Correlation of items with total test scores Corr. with criteria Possible problem with choosing test items based on their correlations with a criterion **** * * ** * * * * Selection zone

Factors Affecting Reliability 1) Variability of scores (generally, the more variability, the higher the reliability) 2) Number of items (the more questions, the higher the reliability) 3)Item difficulty (moderately difficult items lead to higher reliability, e.g., p-value of.40 to.60) 4) Homogeneity/similarity of item content (e.g., item x total score correlation; the more homogeneity, the higher the reliability) 5) Scale format/number of response options (the more options, the higher the reliability)

Standard Error of Measurement [Error that exists in an individual’s test score] 1 - r SEM = Standard Deviation Reliability Examples:  = 10; r =.90 SEM = 3.16  = 10; r =.60 SEM = 6.32

-4  -3  -2  -1  Mean +1  +2  +3  +4  Normal Curve 68 % 95 % 99 % Actual z-score = 1.96 Actual z-score = x 1.96 = 6.19 (95% confidence) 3.16 x 2.58 = 8.15 (99% confidence)

Other Standard Errors Standard error of the mean: S X = S √ N s = standard deviation N = # observations or sample size Standard error of proportion: SEP = p (1 - p)/N p = proportion N = sample size Standard error of difference in proportions: Standard error of estimate (validity coefficient):  y’ =  y 1 - r 2 xy  y = standard deviation of y (criterion) r 2 xy = correlation between x and y squared