Chapter 6 Norm-Referenced Reliability and Validity.

Slides:

Advertisements

Similar presentations

Questionnaire Development

Advertisements

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Consistency in testing

Topics: Quality of Measurements

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

The Department of Psychology

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Measuring Research Variables

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

MEQ Analysis. Outline Validity Validity Reliability Reliability Difficulty Index Difficulty Index Power of Discrimination Power of Discrimination.

-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 授課老師 : 蔡章仁.

Reliability and Validity of Research Instruments

REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.

Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Reliability n Consistent n Dependable n Replicable n Stable.

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Reliability n Consistent n Dependable n Replicable n Stable.

PSYCHOMETRICS RELIABILITY VALIDITY. RELIABILITY X obtained = X true – X error IDEAL DOES NOT EXIST USEFUL CONCEPTION.

Research Methods in MIS

Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.

Validity and Reliability

Reliability, Validity, & Scaling

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Validity and Reliability THESIS. Validity u Construct Validity u Content Validity u Criterion-related Validity u Face Validity.

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Tests and Measurements Intersession 2006.

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

Correlation & Prediction REVIEW Correlation BivariateDirect/IndirectCause/Effect Strength of relationships (is + stronger than negative?) Coefficient of.

Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Measurement MANA 4328 Dr. Jeanne Michalski

1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Reliability n Consistent n Dependable n Replicable n Stable.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.

Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.

REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.

Chapter 6 - Standardized Measurement and Assessment

Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.

Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 10: Correlational Research 1.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Measuring Research Variables

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Assessing Student Performance Characteristics of Good Assessment Instruments (c) 2007 McGraw-Hill Higher Education. All rights reserved.

Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Professor Jim Tognolini

Lecture 5 Validity and Reliability

Reliability & Validity

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

By ____________________

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

Chapter 6 Norm-Referenced Reliability and Validity

Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability

Observed, Error, and True Scores Observed Score = True Score + Error Score

Reliability Reliability is that proportion of observed score variance that is true score variance.

Table 6.1 Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP Sum (  ) Mean (M) Std. Dev. (s) Variance (s 2 )133.6=

Interclass Reliability Pearson Product Moment Test retest Equivalence Split halves Form AForm BTrial 1Trial 2OddEven

Table 6.2 Sit-up Performance for 10 Subjects Subject Trial 1 Trial Sum (  ) Mean (M) Std. Dev (s) Variance (s 2 ) r xx’ =.927

Spearman Brown Prophecy Formula k = the number of items I WANT to estimate the reliability for divided by the number of items I HAVE reliability for

Table 6.3 Odd-Even Scores for 10 Subjects Subject Odd Even Sum (  ) 9286 Mean (M) Std. Dev (s) Variance (s 2 ) r xx’ =.639

Table 6.4 Values of r kk From Spearman-Brown Prophecy Formula r K (change in test length)

Table 6.5 Effect of a Constant Change in Measures SubjectTrial 1Trial Sum (  ) Mean (M) Std. Dev. (s) Variance (s 2 ) r xx’ = 1.00

Intraclass Reliability ANOVA Model Cronbach's Alpha Coefficient Alpha Coefficient

Intraclass (ANOVA) Reliabilities Common terms you will encounter Alpha reliability Kuder Richardson Formula 20 (KR 20 ) Kuder-Richardson Formula 21 (KR 21 ) ANOVA reliabilities

Table 6.6 Calculating the Alpha Coefficient Subject Trial 1 Trial 2 Trial 3 Total  X  X s

Calculating the Alpha Coefficient

Index of Reliability The theoretical correlation between observed scores and true scores

Standard Error of Measurement Reflects the degree to which a person's observed score fluctuates as a result of errors of measurement

Factors Affecting Test Reliability 1)Fatigue 2)Practice 3)Subject variability 4)Time between testing 5)Circumstances surrounding the testing periods 6)Appropriate difficulty for testing subjects 7)Precision of measurement 8)Environmental conditions

Decline in Reliability for the Harvard Alumni Activity Survey as the Time Between Testing Periods Increases Months Between Test-Retest

Validity Types Content-related validity Criterion-related validity Statistical or correlational Concurrent Predictive Construct-related validity

Standard Error of Estimate Standard error Standard error of prediction

Standard Errors SE of Measurement SE of Estimate

Methods of Obtaining a Criterion Measure Actual participation e.g., golf, archery Perform the criterion Known valid criterion (e.g., treadmill performance) Expert judges Panel judges Tournament participation Round robin Known valid test

Table 6.7 Correlation Matrix for Development of a Golf Skills Test (From Green et al., 1987) Playing golf Long puttChip shotPitch shotMiddle distance shot Drive shot Playing golf 1.00 Long putt Chip shot Pitch shot Middle distance shot Drive shot What are these? Concurrent Validity coefficients

Table 6.8 Concurrent Validity Coefficients for Golf Test 2-item battery Middle distance shot Pitch shot.72 3-item battery Middle distance shot Pitch shot Long putt.76 4-item battery Middle distance shot Pitch shot Long putt Chip shot.77

Figure 6.1 Diagram of Validity and Reliability Terms

Interpreting the “r” you obtain Interpreting the “r” You Obtain

Various Correlations Actual Golf Score (Criterion) Putting Test Version A (Trial 1) Putting Test Version A (Trial 2) Driving Test Version A (Trial 1) Driving Test Version A (Trial 2) Swing Form Test Version A (Rating 1) Swing Form Test Version A (Rating 2) Actual Golf Score (Criterion) 1.00 Putting Test - Version A (Trial 1) Validity Coefficient (r XY ) 1.00 Putting Test - Version A (Trial 2) Reliability Coefficient (r XX′ ) 1.00 Driving Test - Version A (Trial 1) Pearson Product Moment Correlation Coefficients (r) 1.00 Driving Test - Version A (Trial 2) Reliability Coefficient (r XX′ ) 1.00 Swing Form Test - Version A (Rating 1) Pearson Product Moment Correlation Coefficients ( r XY ) 1.00 Swing Form Test - Version A (Rating 2) Objectivity Coefficient (r XX′ ) 1.00

Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Concurrent Validity coefficients Criterion

Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Reliability coefficients

Interpret These Correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What is this? Objectivity coefficient

Scatterplot Two trials of Leg Press Prediction line Line of identity

Correlation Two trials of Leg Press

Concurrent Validity This square represents variance in performance in a skill (e.g., golf)

Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)

Concurrent Validity The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Error

Concurrent Validity ACDB Consider the concurrent validity of the above 4 possible skills test batteries

Concurrent Validity ACDB Which test battery would you be LEAST likely to use? Why? D—it has the MOST error and requires 4 tests to be administered

Concurrent Validity ACDB Which test battery would you be MOST likely to use? Why? C—it has the LEAST error but it requires 3 tests to be administered

Concurrent Validity ACDB Which test battery would you use if you are limited in time? A or B—requires 1 or 2 tests to be administered but you lose some validity

PASW Examples