Chapter 6 Norm-Referenced Measurement. Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability.

Slides:

Advertisements

Similar presentations

Questionnaire Development

Advertisements

Reliability IOP 301-T Mr. Rajesh Gunesh Reliability  Reliability means repeatability or consistency  A measure is considered reliable if it would give.

Consistency in testing

Topics: Quality of Measurements

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-

Procedures for Estimating Reliability

Reliability Definition: The stability or consistency of a test. Assumption: True score = obtained score +/- error Domain Sampling Model Item Domain Test.

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

The Department of Psychology

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Chapter 4 – Reliability Observed Scores and True Scores Error

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.

 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Measuring Research Variables

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

MEQ Analysis. Outline Validity Validity Reliability Reliability Difficulty Index Difficulty Index Power of Discrimination Power of Discrimination.

-生醫統計期末報告- Reliability 學生 : 劉佩昀學號 : 授課老師 : 蔡章仁.

Reliability and Validity of Research Instruments

REVIEW I Reliability Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure Degree.

Can you do it again? Reliability and Other Desired Characteristics Linn and Gronlund Chap.. 5.

Reliability n Consistent n Dependable n Replicable n Stable.

Reliability Analysis. Overview of Reliability What is Reliability? Ways to Measure Reliability Interpreting Test-Retest and Parallel Forms Measuring and.

Reliability n Consistent n Dependable n Replicable n Stable.

Reliability and Validity

PSYCHOMETRICS RELIABILITY VALIDITY. RELIABILITY X obtained = X true – X error IDEAL DOES NOT EXIST USEFUL CONCEPTION.

Research Methods in MIS

Validity and Reliability EAF 410 July 9, Validity b Degree to which evidence supports inferences made b Appropriate b Meaningful b Useful.

Technical Issues Two concerns Validity Reliability

Validity and Reliability

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.

Reliability REVIEW Inferential Infer sample findings to entire population Chi Square (2 nominal variables) t-test (1 nominal variable for 2 groups, 1 continuous)

Validity and Reliability THESIS. Validity u Construct Validity u Content Validity u Criterion-related Validity u Face Validity.

Reliability & Validity

1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.

Tests and Measurements Intersession 2006.

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

Correlation & Prediction REVIEW Correlation BivariateDirect/IndirectCause/Effect Strength of relationships (is + stronger than negative?) Coefficient of.

Chapter 8 Validity and Reliability. Validity How well can you defend the measure? –Face V –Content V –Criterion-related V –Construct V.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

1 LANGUAE TEST RELIABILITY. 2 What Is Reliability? Refer to a quality of test scores, and has to do with the consistency of measures across different.

Reliability n Consistent n Dependable n Replicable n Stable.

©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.

Reliability and Validity Themes in Psychology. Reliability Reliability of measurement instrument: the extent to which it gives consistent measurements.

REVIEW I Reliability scraps Index of Reliability Theoretical correlation between observed & true scores Standard Error of Measurement Reliability measure.

Chapter 6 - Standardized Measurement and Assessment

Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.

Chapter 6 Norm-Referenced Reliability and Validity.

Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He

Measuring Research Variables

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Assessing Student Performance Characteristics of Good Assessment Instruments (c) 2007 McGraw-Hill Higher Education. All rights reserved.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Lecture 5 Validity and Reliability

Classical Test Theory Margaret Wu.

Reliability & Validity

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

By ____________________

15.1 The Role of Statistics in the Research Process

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

Chapter 6 Norm-Referenced Measurement

Topics for Discussion Reliability Consistency Repeatability Validity Truthfulness Objectivity Inter-rater reliability

Observed, Error, and True Scores Observed Score = True Score + Error Score

Reliability Reliability is that proportion of observed score variance that is true score variance

Table 6-1 Systolic Blood Pressure Recordings for 10 Subjects Subject Observed BP = True BP + Error BP Sum (  ) Mean (M) Variance (S 2 ) S

Interclass Reliability Pearson Product Moment Test Retest Equivalence Split Halves

Table 6-2 Sit-up Performance for 10 Subjects Subject Trial 1 Trial Sum (  ) Mean S Variance (S 2 ) r xx’ =.927

Spearman Brown Prophecy Formula k = the number of items I WANT to estimate the reliability for divided by the number of items I HAVE reliability for

Table 6-3 Odd and Even Scores for 10 Subjects Subject Odd Even Sum (  )9286 Mean S Variance (S 2 ) r xx’ =.639

Table 6-4 Values of r kk From Spearman-Brown Prophecy Formula r K (change in test length)

Table 6-5 Effect of a Constant Change in Measures Subject Trial 1 Trial Sum (  ) Mean S Variance (S 2 ) r xx’ = 1.00

Intraclass Reliability ANOVA Model Cronbach's alpha coefficient Alpha Coefficient

Intraclass (ANOVA) Reliabilities Common terms you will encounter Alpha Reliability Kuder Richardson Formula 20 (KR 20 ) Kuder-Richardson Formula 21 (KR 21 ) ANOVA reliabilities

Table 6-6 Calculating the Alpha Coefficient Subject Trial 1 Trial 2 Trial 3 Total  X  X S

Calculating the Alpha Coefficient

Index of Reliability The theoretical correlation between observed scores and true scores

Table 6-7 Student Scores on a 10-Item Multiple-Choice Quiz Subject Total Items

Standard Error of Measurement Reflects the degree to which a person's observed score fluctuates as a result of errors of measurement

Factors Affecting Test Reliability 1)Fatigue 2)Practice 3)Subject variability 4)Time between testing 5)Circumstances surrounding the testing periods 6)Appropriate difficulty for testing subjects 7)Precision of measurement 8)Environmental conditions

Decline in Reliability for the Harvard Alumni Activity Survey as the Time Between Testing Periods Increases Months Between Test-Retest

Validity Types Content-Related Validity Criterion-Related Validity Statistical or correlational concurrent predictive Construct-Related Validity

Standard Error of Estimate Standard Error Standard Error of Prediction

Standard Errors SE of Measurement SE of Estimate

Methods of Obtaining a Criterion Measure Actual participation e.g., golf, archery Perform the criterion known valid criterion (e.g., treadmill performance) Expert judges panel judges Tournament participation Round robin Known valid test

Table 6-8 Correlation Matrix for Development of a Golf Skill Test (From Green et al., 1987) Playing golf Long puttChip shotPitch shotMiddle distance shot Drive Playing golf 1.00 Long putt Chip shot Pitch shot Middle distance shot Drive What are these? Concurrent Validity coefficients

Table 6-9 Concurrent Validity Coefficients for Golf Test 2-item battery Middle distance shot Pitch shot.72 3-item battery Middle distance shot Pitch shot Long putt.76 4-item battery Middle distance shot Pitch shot Long putt Chip shot.77

Correlations Between IQs of Related or Unrelated Children as a Function of Genetic Similarity and Similarity of Environment Identical twins - reared together.88 Identical twins - reared apart.75 Fraternal twins - same sex.53 Fraternal twins - opposite sex.53 Siblings - reared together.49 Siblings - reared apart.46 Parent with child.52 Foster parent with child.19 Unrelated - reared together.16 From Glass & Stanley, 1970, p. 119

Figure 6.1 Diagram of Validity and Reliability Terms

Interpreting the “r” you obtain

Concurrent Validity This square represents variance in performance in a skill (e.g., golf)

Concurrent Validity The different colors and patterns represent different parts of a skills test battery to measure the criterion (e.g., golf)

Concurrent Validity The orange color represents ERROR or unexplained variance in the criterion (e.g., golf) Error

Concurrent Validity ACDB Consider the Concurrent validity of the above 4 possible skills test batteries

Concurrent Validity ACDB Which test battery would you be LEAST likely to use? Why? D – it has the MOST error and requires 4 tests to be administered

Concurrent Validity ACDB Which test battery would you be MOST likely to use? Why? C – it has the LEAST error but it requires 3 tests to be administered

Concurrent Validity ACDB Which test battery would you use if you are limited in time? A or B – requires 1 or 2 tests to be administered but you lose some validity

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Concurrent Validity coefficients Criterion

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What are these? Reliability coefficients

Interpret these correlations Actual golf score Putting Trial 1 Putting Trial 2 Driving Trial 1 Driving Trial 2 Observer 1 Observer 2 Actual golf score 1.00 Putting T Putting T Driving T Driving T Observer Observer What is this? Objectivity coefficient

Example of Reliability Study (Rikli et al., RQES, 1992) K1234 DistanceGender 1/2 mileM F /4 mileM F mileM F Grade

SPSS Examples