Download presentation
Presentation is loading. Please wait.
1
Content-Related Validation
Dr Saharnaz Nedjat , MD, PhD Assistant Professor, Epidemiology and Biostatistics Department, Public Health School, Tehran University of Medical Science
2
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
3
What is content validity?
Are we measuring what we think we are measuring?” and refers to the extent to which an instrument actually measures what it alleges to measure. Dr Saharnaz Nedjat, MD, PhD
4
Dr Saharnaz Nedjat, MD, PhD
“Content validity is established by showing that the test items are a sample of a universe in which the investigator is interested.” “Content validity is ordinarily to be established deductively, by defining a universe of items and sampling systematically within this universe to establish the test.” Dr Saharnaz Nedjat, MD, PhD
5
It permit us to answer two question:
Does the test cover a representative sample of the specified skills and knowledge? Is test performance reasonably free from the influence of irrelevant variables? Dr Saharnaz Nedjat, MD, PhD
6
Dr Saharnaz Nedjat, MD, PhD
Guard against any tendency to over generalize regarding the domain sampled by the test. Prevent from the possible inclusion of irrelevant factors in the test scores. Dr Saharnaz Nedjat, MD, PhD
7
Dr Saharnaz Nedjat, MD, PhD
Specific procedure Thorough and systematic examination of relevant domains and subjects in textbooks and internet Consultation with experts What is Quality of Life? What are the QoL domains? What are the objectives of QoL assesment? Dr Saharnaz Nedjat, MD, PhD
8
Dr Saharnaz Nedjat, MD, PhD
For a content validity study, the researcher would spend more resources initially but fewer resources in numerous revisions of the measure through evaluations. Although, all measures need to be evaluated repeatedly. But now fewer revision! Dr Saharnaz Nedjat, MD, PhD
9
Dr Saharnaz Nedjat, MD, PhD
The domain under consideration should be fully described in advance, rather than being defined after the test has been prepared. Dr Saharnaz Nedjat, MD, PhD
10
Conducting a content validity study
11
Qualitative Evaluation of Content Validity
Content validity is assessed using qualitative techniques. In the development of multi-item rating scales, the content validity of the questionnaire items may be examined by using an expert panel, focus groups or in-depth interviews with respondents. Focus groups may be formed with a range of subjects representing typical extremes (for example very dissatisfied and very satisfied patients) and discussions should be guided by open-ended questions designed to elicit common and typical responses based on real experiences or perceptions by the subjects. Dr Saharnaz Nedjat, MD, PhD
12
Quantifying content validity
However, because a measurement instrument with no content validity will not operationalize a theoretical construct of interest, we should begin assessments of measurement quality by quantifying the content validity of measurement instruments used in our empirical research. Literature support, is a necessary but insufficient condition for concluding content validity. Dr Saharnaz Nedjat, MD, PhD
13
Select a panel of experts
Consists of Content experts: who have published or worked in the field (2-20 depending on the desired level of expertise and diversity of knowledge) Lay experts: people for whom the topic is most salient. (potential research subjects): phrasing, unclear terms recommends other important items An expert in measurement or a related field We recommend using at least three experts for each group with a range up to Criteria for selecting these experts are the number of publications or the work experience. Dr Saharnaz Nedjat, MD, PhD
14
Solicit Experts’ Participation
A letter, , or telephone or telephone call : at least one week in advance Mail cover letter: should include the purpose of the study, the reason the expert was selected, description of measure and its scoring, and explanation of the response form, appropriate for their educational level response forms self-addressed, stamped return envelope a demographic questionnaire Dr Saharnaz Nedjat, MD, PhD
15
Dr Saharnaz Nedjat, MD, PhD
response forms Representative ness of the content domain Clarity of the item Factor structure (if the measure consists of more than one factor): assign each item to a factor, how well the items measures that factor, identify the factor that is not specified Comprehensiveness of the measure: experts are asked to consider the entire measure and specify the addition or deletion of any item. See your handout Dr Saharnaz Nedjat, MD, PhD
16
Dr Saharnaz Nedjat, MD, PhD
Each item is rated from 1(not representative or not clear) to 4(representative or clear) in clarity and representativeness. Space is provided for suggestion. Evaluate clarity with each item on the same scale as the representativeness. Dr Saharnaz Nedjat, MD, PhD
17
Dr Saharnaz Nedjat, MD, PhD
Analyze the data Three types of analyses can be performed Reliability or inter rater agreement Content validity index Factorial validity index Dr Saharnaz Nedjat, MD, PhD
18
Reliability or inter rater agreement
To determine the extent to which the experts are reliable in their rating. It should be calculated for representative ness and clarity. The scale(1 to 4) is dichotomized. To determine IRA for each item: the agreement among the experts is calculated. To determine IRA for the scale: the number of items considered as 100% (or IRA 0.8) reliable is divided by the total number of items. If there are more than 5 experts the less conservative approach(0.8). Dr Saharnaz Nedjat, MD, PhD
19
Content validity index
CVI is calculated based on the representative ness of the measure. CVI for each item: counting the number of experts who rated the item three or four and dividing the number by the total number of experts. CVI for the measure: calculate the average CVI across the items. CVI of 0.8 is recommended for a new measure. Dr Saharnaz Nedjat, MD, PhD
20
Content Validity Ratio (CVR)
In this approach, a panel of subject-matter-experts (SMEs) is asked to indicate whether or not a measurement item in a set of other measurement items is “essential” to the operationalization of a theoretical construct. The SME input is then used to compute the CVR for each ith candidate item in a measurement instrument (CVRi) as follows: CVRi= [ne- (N/2)]/N/2 Where: CVRi = CVR value for the ith measurement item, ne = number of SMEs indicating a measurement item is “essential,” and N =Total number of SMEs in the panel Cohen’s (Kappa) Dr Saharnaz Nedjat, MD, PhD
21
Factorial validity index
FVI is calculated by counting the number of experts who correctly assigned the item to the factor and dividing that number by the total number of experts. FVI of at least 0.8 is acceptable. Dr Saharnaz Nedjat, MD, PhD
22
Dr Saharnaz Nedjat, MD, PhD
Revise the measure The researcher must have specified that the study is not anonymous. The panel may be contacted for clarification or to examine a revised measure. Dr Saharnaz Nedjat, MD, PhD
23
Dr Saharnaz Nedjat, MD, PhD
No degree of reliability and construct validity can compensate for lack of content validity. Dr Saharnaz Nedjat, MD, PhD
24
Some limitations of content validity study
Experts’ feedback is subjective; Bias This type of study does not eliminate the need for additional psychometric testing Dr Saharnaz Nedjat, MD, PhD
25
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
26
Construct-Related Validity
27
Dr Saharnaz Nedjat, MD, PhD
What Is a Construct? A construct is some postulated attribute of people, assumed to be reflected in test performance.” Dr Saharnaz Nedjat, MD, PhD
28
Dr Saharnaz Nedjat, MD, PhD
When? The ‘Construct validity’ of a questionnaire applies when a single content or a single criterion cannot be determined. Typical examples are ‘intelligence’, personality’, ‘quality of life’, or ‘patient satisfaction’. Dr Saharnaz Nedjat, MD, PhD
29
Construct Validity cont.
“Construct validity is ordinarily studied when the tester has no definite criterion measure of the quality with which he is concerned, and must use indirect measures.” “Here the trait or quality underlying the test is of central importance, rather than either the test behavior or the scores on the criteria” Dr Saharnaz Nedjat, MD, PhD
30
Dr Saharnaz Nedjat, MD, PhD
Construct Validity Construct validity is present when a measurement scale is related to other measures predicted by theory or empirical observations. Dr Saharnaz Nedjat, MD, PhD
31
Construct validity has two components:
convergent validity demonstrates association with measures that are or should be related, and divergent validity demonstrates a lack of association with measures that should not be related. Dr Saharnaz Nedjat, MD, PhD
32
Dr Saharnaz Nedjat, MD, PhD
Convergent Validity Correlation between a new test and similar earlier tests are sometimes cited as evidence that the new test measures approximately the same general area of behavior as other tests designated by the same name. Such as …? Dr Saharnaz Nedjat, MD, PhD
33
Dr Saharnaz Nedjat, MD, PhD
Unlike the correlations found in criterion-related validity, these correlations should be moderately high, but not to high. If the new test correlates too highly with an already available test, without such added advantages as brevity or ease of administration, then the new test represents needless duplication. Dr Saharnaz Nedjat, MD, PhD
34
Dr Saharnaz Nedjat, MD, PhD
divergent validity We must show not only that a test correlates highly with other variables with which it should theoretically correlate, but also that it does not correlate significantly with variables from which it should differ. Such as? Dr Saharnaz Nedjat, MD, PhD
35
Dr Saharnaz Nedjat, MD, PhD
One would expect a high correlation between ‘physical functioning’ dimension in the SF36 quality of life scale with the ‘physical mobility’ dimension in the Nottingham health profile scale. There should be much less correlation between ‘physical functioning’ in the SF36 scale with ‘social isolation’ in the Nottingham scale. Dr Saharnaz Nedjat, MD, PhD
36
Dr Saharnaz Nedjat, MD, PhD
Construct Validity It requires the gradual accumulation of information from a variety of sources. Dr Saharnaz Nedjat, MD, PhD
37
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
38
Dr Saharnaz Nedjat, MD, PhD
Responsibility Sensitivity to change, or the ability of a measure to reflect underlying change, is an important capability of instruments that evaluate differences based on an intervention, and is regarded as additional evidence of an instrument's longitudinal validity. Example: ?? Dr Saharnaz Nedjat, MD, PhD
39
Dr Saharnaz Nedjat, MD, PhD
EFFECT SIZE . 1. Standardized Response Mean SRM=(mean time2-mean time1)/SD diff 2. Effect Size ES=(mean time2-mean time1)/SD time1 Dr Saharnaz Nedjat, MD, PhD
40
Dr Saharnaz Nedjat, MD, PhD
Content-, criterion-, and construct-related validation do not correspond to distinct or logically coordinate categories. On the contrary, construct-related validity is a comprehensive concept, which includes the other types. Dr Saharnaz Nedjat, MD, PhD
41
Dr Saharnaz Nedjat, MD, PhD
سلامت جسماني سلامت روانی رابطهي اجتماعي سلامت محيط درد سلامت عمومی نشاط سلامت جسمانی محدوديت جسمانی سلامت روان محدوديت احساسی عملکرد اجتماعی 1 58/0** رابطه ی اجتماعی 36/0** 48/0** 43/0** 55/0** 45/0** 20/0 08/0 41/0** 35/0** 18/0 38/0** 25/0* 40/0** 21/0* 29/0** 34/0** 47/0** 04/0 05/0** 50/0** 27/0* 23/0* 39/0** 17/0 07/0 1/0 31/0** 76/0** 37/0** 06/0 26/0* 46/0** 49/0** 22/0** 24/0* 42/0** 51/0** Convergent validity Dr Saharnaz Nedjat, MD, PhD
42
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
43
Dr Saharnaz Nedjat, MD, PhD
After completing the content validity and revising the measure, a pilot study can be undertaken to identify: Coding errors Format problems Administration … Dr Saharnaz Nedjat, MD, PhD
44
Dr Saharnaz Nedjat, MD, PhD
Pilot Most defects remaining from previous stages become evident by respondents, interviewers, or researchers at this stage. Try to include some respondents for all subgroups related to variables affecting the response reaction, e.g. gender, age, education, etc. to reach a trade off between this subgroups consideration, at least questionnaires. No definite sample size formula. Dr Saharnaz Nedjat, MD, PhD
45
Dr Saharnaz Nedjat, MD, PhD
Pilot Change in questions’ wording &/or number: Pilot may reveal ambiguity, suggesting unintended response, upsetting /embarrassing wording; too much questions or too few on specific topic(s) Change in sources of data: may reveal some sources are not readily giving out data, or discover some readily available sources not considered before Inadequacy of performance of some questioners / supervisors; Re-train or drop. Dr Saharnaz Nedjat, MD, PhD
46
Dr Saharnaz Nedjat, MD, PhD
Pilot During piloting, take detailed notes on how participants react to both the general format of your instrument and the specific questions. How long do people take to complete it? Do any questions need to be repeated or explained? How do participants indicate that they have arrived at an answer? Do they show confusion or surprise at a particular response—if so, why? Dr Saharnaz Nedjat, MD, PhD
47
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
48
Dr Saharnaz Nedjat, MD, PhD
Reproducibility the degree to which a measurement provides the same result each time it is performed on a given subject or specimen Let’s go over some basic definitions that I think you are familiar with. Reproduc refers to the degree . . . Validity is from the latin meaning strong. It is the degree. . .. Dr Saharnaz Nedjat, MD, PhD
49
Why Care About Reproducibility?
2O = 2T + 2E More measurement error means more variability in observed measurements e.g. measure height in a group of subjects. If no measurement error If measurement error So, this is a handy equation to know. It obviously follows that the more measurement error you have (ie the wider that variance of the measurement error) then the more variability you are going to see in your observed measurements. For example, what if you measured height in a group of research subjects. If there was no measurement error and all you had to deal with was the underlying variability in the true heights of the participants, the distribution or variability of the heights might look like this. However, if you also have to deal with measurement error, then there is an extra source of variability and now the observed variability in height will look like this. Dr Saharnaz Nedjat, MD, PhD Height
50
Impact of Reproducibility on Statistical Precision
observed value (O) = true value (T) + measurement error (E) E is random and ~ N (0, 2E) When measuring a group of subjects, the variability of observed values is a combination of: the variability in their true values and the variability in the measurement error 2O = 2T + 2E What happens when we jump from measuring one person to measuring a group of people, as if our usual practice in clinical research? Assuming that measurement error is random and normally distributed, then the variability of observed values in a group -again depicted by its variance (sigma squared) is a combination of the variability of the true values in the group subjects (sigma squared sub T) and the variability in the measurement errors (sigma squared sub E) Sigma squared observed = sigma squared true values plus sigma squared for the error Dr Saharnaz Nedjat, MD, PhD
51
Why Care About Reproducibility?
2O = 2T + 2E More variability of observed measurements has profound influences on statistical precision/power: Descriptive studies: wider confidence intervals RCT’s: power to detect a treatment difference is reduced Observational studies: power to detect an influence of a particular risk factor upon a given disease is reduced. Why is this important? It is because more variability in the observed measurements has profound influences on statistical precision in all study designs. In descriptive studies, more variability means wider confidence intervals. In other words, less certainty as to what the true value is. This is bad. In clinical trials, for example, this means less power, for a given sample size,to see if a given intervention is efficacious. Those of you who played around with the effects of changing standard deviation for an outcome variable in your sample size calculations for your summer protocol are very aware of this. and in observational studies: it means less power to detect whether a particular exposure has an association with a particular outcome. So, no matter what kind of work you do there is no escaping the evil influences of measurement error. Dr Saharnaz Nedjat, MD, PhD
52
Why Care About Reproducibility?
Impact on Validity Mathematically, the upper limit of a measurement’s validity is a function of its reproducibility Consider a study to measure height in the community: Assume the measurement has imperfect reproducibility: if we measure height twice on a given person, we get two different values; 1 of the 2 values must be wrong (imperfect validity) If study measures everyone only once, errors, despite being random, will lead to biased inferences when using these measurements (i.e. lack validity) Why make a fuss over reproducibility? There are two reasons: the first is that it can make a huge impact on the validity of your measurements and hence whatever you do with your measurements (like when you correlate one measurement with another and get things like risk ratios and odds ratios and the like). The second reason is that it can have a huge impact on the statistical precision of your studies. This comes out as standard errors, confidence intervals, p values and the like. We’ll get to this in a minute. Mathematically, it can be shown, although I wont’ do so here, that the upper limit of a measurement’s validity is a function of its reproducibility. In other words, poor reproducibility will preclude optimal validity. Let’s work this through on a more practical level. Consider simple study of height in the community. Say that we are working with a measurement that has imperfect reproducibility. If we measure height twice on a given person. . . When we actually do our study, if we only make the measurement once on everyone as is common practice, the errors in the measurement, despite being random, can result in biased inferences when we start to do something with our measurement - in other words, they lead to getting the wrong answer (they lack validity). Dr Saharnaz Nedjat, MD, PhD
53
Mathematical Definition of Reproducibility
Varies from 0 (poor) to 1 (optimal) As 2E approaches 0 (no error), reproducibility approaches 1 Here’s another way to think about reproducibility. Mathematically, we can think about reproducibility like this. Reproducibility is the fraction of the total observed variance that is accounted for by the variability in the true values. If all the variance in your measurements that you see is simply because the things you are measuring are that variable, then you have complete 100% reproducibility. Thinking about reproducibility like this, it then varies from 0 (poor) to 1 (optimal). Looking at this equation, as measurement error becomes small (ie goes to zero), this term leaves the denominator and you end up with sigma squared of the true values divided by itself, or 1. Dr Saharnaz Nedjat, MD, PhD
54
Test-Retest Reliability
= Stability over Time test time 1 time 2 Dr Saharnaz Nedjat, MD, PhD
55
Parallel-Forms Reliability
form B form A Stability Across Forms = time 1 time 2 Dr Saharnaz Nedjat, MD, PhD
56
Indices for Categorical Variables
Dr Saharnaz Nedjat, MD, PhD
57
Dr Saharnaz Nedjat, MD, PhD
Reliability or Reproducibility Is there good agreement between these two imperfect measurements? Dr Saharnaz Nedjat, MD, PhD
58
Dr Saharnaz Nedjat, MD, PhD
Percent agreement Dr Saharnaz Nedjat, MD, PhD
59
Dr Saharnaz Nedjat, MD, PhD
Cohen’s Kappa Reported in 1960 Kappa corrects for the chance agreement that would be expected to occur if the 2 classifications were completely unrelated Dr Saharnaz Nedjat, MD, PhD
60
Dr Saharnaz Nedjat, MD, PhD
Kappa Definition Chance corrected measure of nominal scale agreement among raters Assumptions Subjects are independent Categories are independent, mutually exclusive, and exhaustive Raters operate independently Dr Saharnaz Nedjat, MD, PhD
61
Dr Saharnaz Nedjat, MD, PhD
Kappa p° - pe K = pe p° = Observed proportion of agreement pe= Proportion of agreement expected to occur by chance alone Varies from -1 to 1 Dr Saharnaz Nedjat, MD, PhD
62
Dr Saharnaz Nedjat, MD, PhD
Example Plaque Normal Total 140 52 192 69 725 794 209 777 986 Dr Saharnaz Nedjat, MD, PhD
63
Dr Saharnaz Nedjat, MD, PhD
Percent agreement= /986=87.7% Chance agreement=(192*209)+(777*794)/(986)2=0.676 K= / =0.62 Dr Saharnaz Nedjat, MD, PhD
64
Interpretation of Kappa
Various authors have developed classifications for the interpretation of a kappa value See Altman (1991) or Fleiss (1981) or Byrt (1996) Dr Saharnaz Nedjat, MD, PhD
65
Interpretation of Kappa
Dr Saharnaz Nedjat, MD, PhD
66
Dr Saharnaz Nedjat, MD, PhD
K +’s “Adjustment” for chance agreement Most commonly used measure of agreement Many Variants and generalizations of kappa Interpretability in qualitative as well as quantitative terms Dr Saharnaz Nedjat, MD, PhD
67
Dr Saharnaz Nedjat, MD, PhD
Kappa and Prevalence Limitation of kappa when comparing the reliability of a diagnostic procedure in different populations is its dependence on the prevalence of true “positivity” in each population (from Szklo & Nieto, Epidemiology Beyond the Basics) Dr Saharnaz Nedjat, MD, PhD
68
Dr Saharnaz Nedjat, MD, PhD
Kappa and Prevalence So, for the same sensitivity and specificity of the observers, the kappa value is greater in the population in which the prevalence of positivity is higher Dr Saharnaz Nedjat, MD, PhD
69
Indices for Continuous Variables
Dr Saharnaz Nedjat, MD, PhD
70
Assessment by Simple Correlation
Well, how about simple correlation. We plot meas.1 on the x-axis and meas.2 on the y and make a scatterplot. Here, I have added a line of identity which shows where the points would lie if the 2 measurements were exactly the same. How would you numerically summarize this correlation? Dr Saharnaz Nedjat, MD, PhD
71
Pearson Product-Moment Correlation Coefficient
r (rho) ranges from -1 to +1 r r describes the strength of linear association I realize that the Intro to Biostats course has not started yet, but I think that many of you are familiar with correlation coefficients. Assuming a normal distribution in the measurements, the most straightforward analysis of correlation is to calculate what is known as Pearson’s product-moment correlation coefficient, which is aka rho. Rho can range from -1 to 1 and its formula is shown here. I will leave the derivation of the formula to Dr. Glantz but suffice it to say that rho describes the strength of linear association of the two measurements. When one squares rho you get what is known as r-squared and this is equal to the proportion of variance (ie variability) of one variable that is accounted for by the other variable. Remember we talked about r and r-squared last week when we read the article on journal reviews. Dr Saharnaz Nedjat, MD, PhD
72
Dr Saharnaz Nedjat, MD, PhD
Here are some examples of how rho behaves. Rho is one when the data perfectly follow a straight line with a positive slope and there is no variation about the line. Rho is -1 when the data follow a straight line but there is an inverse relationship. Here the data clearly has a linear association but you can see that there is still residual variation so, rho is Finally, if there is no linear association, rho is 0.0. r = 0.8 r = 0.0 Dr Saharnaz Nedjat, MD, PhD
73
Limitations of Simple Correlation for Assessment of Reproducibility
Depends upon range of data r (full range of data) = 0.98 r (peak flow <450) = 0.97 r (peak flow >450) = 0.94 Measures linear association only Although a simple approach, unfortunately the use of correlation coefficient has too many drawbacks. First, rho is very sensitive to the range of data being studied. The larger the variability in the persons being studied, the larger the r will be. For example, in the peak flow data when you look at the full range of data, the correlation coefficient is as we said before, But when you limit the data to all those persons who measure less than 450 on one of the two trials, you see that rho is 0.97; limiting to those with values above 450 is even lower. In other words, smaller range of data means lower correlation coefficient Looking at this, it is absurd to say that reproducibility, as measured by rho, is worse for persons with pf’s less than 450 and greater than 450 than it is for the whole group combined. Of course, the only meaningful r is one done using a random sample of the population you are truly interested in studying. Often researchers might think they are doing the right thing by testing their measurement in persons representing a very large range of measurements - much larger than might be seen in any research study. The effect of choosing such a large range of subjects to evaluate an assay is if data are analyzed with a correlation coefficient, this gives a falsely high r as compared to a study population where there is not so much variability. Dr Saharnaz Nedjat, MD, PhD
74
Dr Saharnaz Nedjat, MD, PhD
These graphs, from the Green text, illustrate the dependence of the correlation coefficient on the range of data. These two panels in particular show how one large value can increase rho from 0.8 to 0.95 Dr Saharnaz Nedjat, MD, PhD
75
Dr Saharnaz Nedjat, MD, PhD
Avoid using the usual correlation coefficient (the Pearson correlation coefficient) It does not correct for systematic error! Dr Saharnaz Nedjat, MD, PhD
76
Dr Saharnaz Nedjat, MD, PhD
Instead, calculate the intraclass correlation coefficient Vbetween individulas ICC = Vtotal Dr Saharnaz Nedjat, MD, PhD
77
Intraclass Correlation Coefficient (ICC)
Say you have 2 raters What if Rater 2 consistently overestimates the measurement when compared to Rater 1? Dr Saharnaz Nedjat, MD, PhD
78
Evaluation of the Scatter Diagram
Strong linear association: Pearson correlation coefficient is 0.99 However, the ICC is weaker: 0.89 Dr Saharnaz Nedjat, MD, PhD
79
Dr Saharnaz Nedjat, MD, PhD
Pearson’s vs. ICC The weaker concordance is due to the fact that the ICC takes into the account the difference in the mean, which for Rater 1 is 3.7 and for Rater 2 is 4.9 Dr Saharnaz Nedjat, MD, PhD
80
Dr Saharnaz Nedjat, MD, PhD
Conclusions Measurement reproducibility plays a key role in determining validity and statistical precision in all different study designs When assessing reproducibility, for interval scale measurements: avoid correlation coefficients use intraclass correlation coefficient or coefficient of variation if within-subject sd is proportional to the magnitude of measurement For categorical scale measurements, use Kappa What is acceptable reproducibility depends upon desired use To conclude, let me highlight the following points. I cannot emphasize enough how much reproducibility can influence both the validity and statistical precision of your studies. When assess reproducibility avoid correlation coefficients and instead opt for determining within subject sd and its derivative or the coefficient of variation if appropriate. And remember, what is a tolerable reproducibility depends upon the use of the measurment. If you only seek to detect gross differnces, then repro problems are not that bad but if . . . When assessing validity, use the same concepts we learned for repro and display your assessment in terms of 95% limits of agreement Finally, many of you will be using measurments made by others. Don’t reinvent the wheel if you don’t have to but at at the least be knowledgeable about how your measurements have been validated. Dr Saharnaz Nedjat, MD, PhD
81
Split-half Reliability
Definition: Randomly divide the test into two forms calculate scores for Form A, B calculate Pearson r as index of reliability Dr Saharnaz Nedjat, MD, PhD
82
Dr Saharnaz Nedjat, MD, PhD
Internal Consistency Reliability Split-Half Correlations item 1 item 2 item 1 item 3 item 4 .87 item 3 test item 4 item 5 item 2 item 5 item 6 item 6 Dr Saharnaz Nedjat, MD, PhD
83
Cronbach’s alpha & Kuder-Richardson-20
Measures the extent to which items on a test are homogeneous mean of all possible split-half combinations Kuder-Richardson-20 (KR-20): for dichotomous data Cronbach’s alpha: for non-dichotomous data Dr Saharnaz Nedjat, MD, PhD
84
Dr Saharnaz Nedjat, MD, PhD
Internal Consistency Reliability Cronbach’s alpha () item 1 .87 item 1 item 3 item 4 item 2 item 5 item 6 .85 item 1 item 3 item 4 item 2 item 5 item 6 .91 item 1 item 3 item 4 item 2 item 5 item 6 item 2 item 3 test SH1 .87 SH2 .85 SH3 .91 SH4 .83 SH5 .86 ... SHn .85 item 4 item 5 item 6 = .85 Dr Saharnaz Nedjat, MD, PhD
85
Dr Saharnaz Nedjat, MD, PhD
Goals for today: Content validity Construct validity Convergent validity Discriminant validity Responsiveness Pilot Reliability Continuous Categorical Internal consistency Dr Saharnaz Nedjat, MD, PhD
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.