Lecture 5: Reliability and validity of scales 1. Describe the applications of the following types of measurement: - Impairment, disability, handicap, quality.

Slides:



Advertisements
Similar presentations
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Advertisements

MEASUREMENT: RELIABILITY Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Topics: Quality of Measurements
Survey Methodology Reliability and Validity EPID 626 Lecture 12.
The Research Consumer Evaluates Measurement Reliability and Validity
1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
Chapter 4 – Reliability Observed Scores and True Scores Error
Research Methodology Lecture No : 11 (Goodness Of Measures)
1 Lecture 3: Reliability and validity of scales Reliability: –internal consistency –test-retest –inter- and intra-rater –alternate form Validity: –content,
Research Methods in Psychology
Methods for Estimating Reliability
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Reliability and Validity of Research Instruments
Part II Knowing How to Assess Chapter 5 Minimizing Error p115 Review of Appl 644 – Measurement Theory – Reliability – Validity Assessment is broader term.
Concept of Measurement
Cross-Cultural Use of Measurements: Development of the Chinese SF-36 Health Survey Xinhua S. Ren, Ph.D. Boston University School of Public Health, Boston,
A quick introduction to the analysis of questionnaire data John Richardson.
SOWK 6003 Social Work Research Week 4 Research process, variables, hypothesis, and research designs By Dr. Paul Wong.
FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,
Research Methods in MIS
Measurement Joseph Stevens, Ph.D. ©  Measurement Process of assigning quantitative or qualitative descriptions to some attribute Operational Definitions.
Classroom Assessment Reliability. Classroom Assessment Reliability Reliability = Assessment Consistency. –Consistency within teachers across students.
Now that you know what assessment is, you know that it begins with a test. Ch 4.
Measurement and Data Quality
Measurement in Exercise and Sport Psychology Research EPHE 348.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
1 Lecture 2: Types of measurement Purposes of measurement Types and sources of data Reliability and validity Levels of measurement Types of scale.
LifeSpan. Function Natural, required, or expected activity of a person based on stage of development Ability to exist with in environment Related to a.
Instrumentation.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.
Data Analysis. Quantitative data: Reliability & Validity Reliability: the degree of consistency with which it measures the attribute it is supposed to.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.
Lecture 6: Reliability and validity of scales (cont) 1. In relation to scales, define the following terms: - Content validity - Criterion validity (concurrent.
Reliability & Validity
1 Chapter 4 – Reliability 1. Observed Scores and True Scores 2. Error 3. How We Deal with Sources of Error: A. Domain sampling – test items B. Time sampling.
Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.
Tests and Measurements Intersession 2006.
Reliability & Agreement DeShon Internal Consistency Reliability Parallel forms reliability Parallel forms reliability Split-Half reliability Split-Half.
Independent vs Dependent Variables PRESUMED CAUSE REFERRED TO AS INDEPENDENT VARIABLE (SMOKING). PRESUMED EFFECT IS DEPENDENT VARIABLE (LUNG CANCER). SEEK.
 Descriptive Methods ◦ Observation ◦ Survey Research  Experimental Methods ◦ Independent Groups Designs ◦ Repeated Measures Designs ◦ Complex Designs.
Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.
1 Lecture 6: Descriptive follow-up studies Natural history of disease and prognosis Survival analysis: Kaplan-Meier survival curves Cox proportional hazards.
Psychology 3051 Psychology 305: Theories of Personality Lecture 2.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Psychology 3051 Psychology 305: Theories of Personality Lecture 2.
Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.
Chapter 6 - Standardized Measurement and Assessment
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Reliability When a Measurement Procedure yields consistent scores when the phenomenon being measured is not changing. Degree to which scores are free of.
Language Assessment Lecture 7 Validity & Reliability Instructor: Dr. Tung-hsien He
PT 142 – Assessment in Physical Therapy Prepared by: Almira A. Tagala-Manuel, PTRP Prepared by ATM for PT 142 students AY
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
Data Collection Methods NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
Measurement and Scaling Concepts
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Questions What are the sources of error in measurement?
PSY 614 Instructor: Emily Bullock, Ph.D.
The first test of validity
15.1 The Role of Statistics in the Research Process
Presentation transcript:

Lecture 5: Reliability and validity of scales 1. Describe the applications of the following types of measurement: - Impairment, disability, handicap, quality of life, attitudes, behaviour - Generic versus disease-specific health status and quality of life scales 2. Define the following terms, giving examples of each: - Response bias - Social desirability 3. In relation to scales, define the following terms: - Test-retest reliability - Inter-rater reliability - Internal consistency

Scales Single- vs multi-item scales Items are intended to sample the content of the underlying construct Items summarized in various ways: –sum or average of responses to individual items –item weighting or other algorithm –profiles/sub-scale scores

3 International Classification of Impairments, Disabilities, and Handicaps (ICIDH) IMPAIRMENT: –...loss or abnormality of psychological, physiological, or anatomical structure or function. DISABILITY: –...restriction or lack (resulting from an impairment) of ability to perform an activity … HANDICAP: –...disadvantage... resulting from an impairment or disability, that limits or prevents the fulfillment of a role ….that is normal for that individual….

Quality of life (QoL) Definition –individuals’ perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards, and concerns” (WHO QOL group, 1995) Domains –physical, psychological,level of independence, social relationships, environment, and spirituality/religion/personal beliefs

Health-related quality of life (HRQoL) Dimensions of QoL related to health Related terms: –health status –functional status Usually includes: –physical health/function –mental health/function –social health/function

Selection of measures: Appropriateness Purpose: –describe health of population –evaluate effects of interventions (change over time) –compare groups at point in time –predict outcomes Areas of function covered Level of health Generic/global or specific

Generic vs specific Generic –comparisons across populations and problems –robust and generalizable –measurement properties better understood Disease-specific –shorter –more relevant and appropriate –sensitive to change

Practical considerations Mode of administration –self-administered (in-person, mail) –interviewer (face-to-face, telephone) –informant or proxy Respondent burden

Example of single-item measure of HRQoL: the EuroQol “thermometer” EITHER: visual analogue scale OR : Now, to help people say how good or bad their health state is, let’s say the best state you can imagine is 100 and the worst state you can imagine is 0. In your opinion, how good or bad is your health today - please use a number.

Example of Disability Scale: OARS ADL scale Measures basic and instrumental activities of daily living (ADL) 14 items: e.g., bathing, dressing, money management, house-cleaning Based on self-report and/or judgement Response scale: –Completely independent (2) –Needs some help (1) –Completely dependent (0)

Example of measure of health status/ HRQoL: SF-36 Generic measure of health status 36 items, self-report Sample item: Scoring: –8 specific sub-scales (e.g., physical function, mental health, vitality –2 component summary scores: physical and mental health

During the past 2 weeks, did you have any of the following problems with your work or other regular daily activities as a result of your physical health? 01 3b. Were limited in the kind of work or other activities 01 3a. Accomplished less than you would like NOYES

How much bodily pain did you have during the past 2 weeks? Was there no pain, very mild pain… (etc) None1 Very mild 2 Mild 3 Moderate 4 Severe 5 Very severe 6

Example of specific scale: Geriatric Depression Scale 15 or 30-item self-report scale Response options: yes/no Sample items: –Do you feel happy most of the time? –Do you feel that your life is empty?

Response bias Examples: –Recall –Acquiescence –Social desirability Factors affecting response bias: –Question wording/response scale –Characteristics of subjects: (age, education, etc) –Mode of data collection (questionnaire, interview, telephone vs face-to-face)

Social desirability Tendency to give answers to questions that are perceived to be more socially desirable than the true answer Different from deliberate distortion (“faking good”) Depends on: –Individual characteristics (age, sex, cultural background) –Specific question

Social desirability Measures of social desirability (SD) –SD scales (e.g., Jackson SD scale, Crowne & Marlowe SD scale) –individual tendency to SD bias Prevention –phrasing of questions –questionnaire mode –training of interviewers

Reliability of scales Internal consistency Test-retest reliability Inter-rater and intra-rater reliability

Example: Delirium Index (DI) Delirium = acute confusional state Characterized by acute onset and fluctuations Risk factors: –Predisposing: age, dementia, disability, comorbidity etc –Precipitating: infections, medications, environment DI: observer-rated measures of severity of 7 symptoms of delirium: –inattention, disorganized thinking –altered consciousness, disorientation –memory impairment, perceptual disturbances –psychomotor agitation or retardation

Administration and scoring Administered by research assistant based on patient observation Each symptom rated on 4-point scale: 0 = absent 1 = mild 2 = moderate 3 = severe Total score: range from

Evaluation of performance of DI What aspects should be evaluated? How?

Internal consistency Relevant to additive scales (that sum or average items) Split-half reliability: –correlation between scores on arbitrary half of measure with scores on other half Coefficient alpha (Cronbach) –estimates split half correlation for all possible combinations of dividing the scale

Example Internal consistency of Delirium Index scale to measure symptoms of delirium: –Cronbach’s alpha for entire scale: –….without perceptual disturbance:

Test-retest reliability (stability) Scale is repeated –short-term for constructs that fluctuate, 2 weeks often used to reduce effects of memory and true change –long-term for constructs that should not fluctuate (e.g., personality traits) Some measure of variability vs stability of 2 scores is computed

Mean within-patient standard deviation in DI score during 1st week in hospital

Inter- and intra-rater reliability Inter-rater reliability For scales requiring rater skill, judgment 2 or more independent raters of same event Intra-rater reliability Independent rating by same observer of same event

Measures of inter- and intra-rater reliability: continuous data Measures of correlation –Correlation graph (scatter diagram) –Correlation coefficients Measures of pairwise comparison

Correlation coefficients Pearson’s r –assesses linear association, not systematic differences between 2 sets of observations –sensitive to range of values, especially outliers Spearman r –ordinal or rank order correlation –less influenced by outliers –doesn’t assess systematic differences

Correlation coefficients Intra-class correlation coefficient (ICC) –Estimate of proportion of total measurement variability due to between-individuals (vs error variance) –Equivalent to kappa and same range of values –Reflects true agreement, including systematic differences –Affected by range of values - if less variation between individuals, ICC will be lower

Inter-rater reliability Intraclass correlation coefficient (ICC): n = 26 patients (39 pairs of ratings) ICC = 0.98 (SD 0.06)

Examples for discussion What aspects of reliability should be measured for the following scales: –EuroQol VAS –SF-36 –Geriatric Depression Scale