Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.

Slides:



Advertisements
Similar presentations
Topics: Quality of Measurements
Advertisements

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
The Department of Psychology
© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.
Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.
VALIDITY AND RELIABILITY
 A description of the ways a research will observe and measure a variable, so called because it specifies the operations that will be taken into account.
Part II Sigma Freud & Descriptive Statistics
Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.
Part II Sigma Freud & Descriptive Statistics
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.
Concept of Measurement
RELIABILITY consistency or reproducibility of a test score (or measurement)
Developing a Hiring System Reliability of Measurement.
Concept of Reliability and Validity. Learning Objectives  Discuss the fundamentals of measurement  Understand the relationship between Reliability and.
SOWK 6003 Social Work Research Week 4 Research process, variables, hypothesis, and research designs By Dr. Paul Wong.
PSYCHOMETRICS RELIABILITY VALIDITY. RELIABILITY X obtained = X true – X error IDEAL DOES NOT EXIST USEFUL CONCEPTION.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Research Methods in MIS
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment A Practical Guide for Educators by Craig A
Reliability of Selection Measures. Reliability Defined The degree of dependability, consistency, or stability of scores on measures used in selection.
Rosnow, Beginning Behavioral Research, 5/e. Copyright 2005 by Prentice Hall Ch. 6: Reliability and Validity in Measurement and Research.
Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Reliability and Validity what is measured and how well.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
I/O Psychology Research Methods. What is Science? Science: Approach that involves the understanding, prediction, and control of some phenomenon of interest.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
Psychometrics William P. Wattles, Ph.D. Francis Marion University.
Foundations of Recruitment and Selection I: Reliability and Validity
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Validity Is the Test Appropriate, Useful, and Meaningful?
Tests and Measurements Intersession 2006.
Independent vs Dependent Variables PRESUMED CAUSE REFERRED TO AS INDEPENDENT VARIABLE (SMOKING). PRESUMED EFFECT IS DEPENDENT VARIABLE (LUNG CANCER). SEEK.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Validity and Item Analysis Chapter 4. Validity Concerns what the instrument measures and how well it does that task Not something an instrument has or.
Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”
McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
Chapter 6 - Standardized Measurement and Assessment
2. Main Test Theories: The Classical Test Theory (CTT) Psychometrics. 2011/12. Group A (English)
Measuring Research Variables
5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)
Chapter 2 Norms and Reliability. The essential objective of test standardization is to determine the distribution of raw scores in the norm group so that.
Measurement and Scaling Concepts
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Lecture 5 Validity and Reliability
Reliability and Validity
Questions What are the sources of error in measurement?
Reliability & Validity
Week 3 Class Discussion.
پرسشنامه کارگاه.
Part II Knowing How to Assess Chapter 5 Minimizing Error
Reliability and Validity of Measurement
PSY 614 Instructor: Emily Bullock, Ph.D.
Evaluation of measuring tools: reliability
The first test of validity
How can one measure intelligence?
Presentation transcript:

Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term than Measurement – What does this mean? chapter 5 Minimizing Error1

Background Queletet (1835) Queletet – Established the normal distribution – Used by: Galton (measurement of genius) Binet et al. Munsterberg (employment testing) J. M. Cattell (perceptual and sensory tests) Over time measurement – Focus changed from reliability to validity chapter 5 Minimizing Error2

Background Measurement Adolphe Quetelet – (1835) – conception of the homme moyen (“average man”) as the central value about which measurements of a human trait are grouped according to the normal distribution. normal distribution – Physical and mental attributes are normally distributed – Errors of measurement are normally distributed – Foundation for psychological measurement chapter 5 Minimizing Error3

RELIABILITY CONCEPTS OF MEASUREMENT ERROR p117 Measurement Error and Error variance – Table 5.1 Reasons for differences in performance I Person characteristics –long term, permanent – Influence scores on all tests, e.g. language, skills II Person characteristics specific to test – E.g type of words on test more/less recognizable to some III temporary characteristics that – Influence scores on any test (e.g. evaluation apprehension) IV Temporary and specific to the test – E.g. stumped by a work, e.g. V Administration effects – E.g. interaction administrator and examinee VI pure chance chapter 5 Minimizing Error4

Category II A – of most interest – Others reflect unwanted sources of variance Classical theory: X = t + e Assumptions: (errors are truly random) – Obtained score = algebraic sum of t+e – Not correlated: t scores and e scores (in one test) errors in different measures errors in one measure with true scores in another chapter 5 Minimizing Error5

Measurement Error X = s + e (one individual’s score) – Why was t replaced with s? σ x 2 = σ s 2 + σ e 2 – total variance (all scores) = systematic causes + random error chapter 5 Minimizing Error6

Reliability chapter 5 Minimizing Error7

Reliability chapter 5 Minimizing Error8

Reliability and Validity chapter 5 Minimizing Error9

Accuracy/reliability/validity Accuracy is ≠ reliability – An inaccurate thermometer may be consistent (reliable) Accuracy is ≠ validity – An inaccurate thermometer may show validity (high correlations with Bureau of standards instrument – But is inaccurate (consistently lower for each paired observation), i.e. not accurate Why is the concept of “accuracy” meaningless for psychological constructs? chapter 5 Minimizing Error10

RELIABILITY ESTIMATION p125 Coefficients of Stability – Over time Coefficients of Equivalence – Equivalent forms (e.g. A and B) Coefficients of Internal Consistency – Kuder-Richardson Estimates (assumes homogeneity) K-R 20 (preferred) Cronbach’s alpha α (general version of K-R 20) Where is this in SPSS? chapter 5 Minimizing Error11

Reliability Estimation (con’t) Inter-rater Agreement v. reliability – ICC – Rwg – % agreement (Kappa) – See Rosenthal & Rosnow table (hand out) Comparisons Among Reliability Estimates – Systematic variance must be stable characteristics of examinee what is measured Use estimates that make sense for the purpose, – For re-testing what’s most appropriate? – For production over a long period? – An e.g. of a job requiring stability of attribute? chapter 5 Minimizing Error12

chapter 5 Minimizing Error13

Interpretations of Reliability Coefficients p133 Important to remember: – Size of coefficient needed depends upon: The purpose for which it is used The history of the type of measure – what would be acceptable for a GMA – for an interview? Length of test (how many items are needed?) chapter 5 Minimizing Error14

VALIDITY: AN EVOLVING CONCEPT p134 Why is it important for I/O to distinguish between – A Test “… purports to measure something” – validity “the degree it measures what it purports to” – Validity in “predicting to a criterion” (making inferences) Three Troublesome Adjectives – Content, criterion related, construct Meaning v. interpretation v. inferences about a person What’s troublesome and what’s more important? Descriptive and Relational Inferences – Descriptive inferences (about the score itself) High IQ means the person is smart (trait) – Relational inferences (about what can be predicted) High scorer will perform on the job (sign) chapter 5 Minimizing Error15

Psychometric Validity v. Job Relatedness Psychometric Validity – Confirm the meaning of the test intended by the test developer Examples? – Disconfirm plausible alternatives Examples? How dos psychometric validity differ from Job- relatedness chapter 5 Minimizing Error16

VARIETIES OF PSYCHOMETRIC VALIDITY EVIDENCE p137 Evidence Based on Test Development – Provide evidence for a test you plan to use – questions to guide evaluation: answer them for your job Did the developer have a clear idea of the attribute? Are the mechanics of the measurement consistent with the concepts? Is the stimulus content appropriate? What the test carefully and skillfully developed? Evidence Based on Reliability - questions to guide evaluation: answer them for your job Is the internal statistical evidence satisfactory? Are scores stable over time and consistent with alternative measures? chapter 5 Minimizing Error17

Evidence from Patterns of Correlates – Confirmatory and dis-confirmatory Questions for evaluation: – Answer them for a test you will use Does empirical evidence confirm logically expected relations with other variables? Does empirical evidence disconfirm alternative meanings of test scores? Are the consequences of the test consistent with the meaning of the construct being measured? chapter 5 Minimizing Error18

Beyond Classical Test Theory p144 Factor Analysis (identify latent variables in a set of scores) – EFA (Exploratory) – CFA (Confirmatory) – Which would be most likely to be used to develop a test? chapter 5 Minimizing Error19

GENERALIZABILITY THEORY Can the validity of the test be generalized to: – other times? – Other circumstances? – Other behavior samples? – Other test forms? – Other raters/ interviewers? – Other geographical populations? Give an example of where a test will not perform the same for applicants in different geographical locations chapter 5 Minimizing Error20

ITEM RESPONSE THEORY P148 Classical test: – A person’s score on a test relates to others IRT – A person’s score on a test reflects standing on the latent variable (i.e. “sample free”) Computerized adaptive testing with IRT Analysis of Bias with Adverse Impact – Differential item functioning chapter 5 Minimizing Error21