We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJonathan Rook
Modified about 1 year ago
© Reliability and Validity Designs
© 2006 Evidence-based Chiropractic 2 Accurate and consistent measures are needed It is very important in research and clinical practice to be able to measure patient characteristics accurately and consistently Needed in clinical trials to effectively assess differences between groups Needed in practice to help make clinical decisions and to track patients’ progress
© 2006 Evidence-based Chiropractic 3 Reliability The ability of a test to provide consistent results when repeated –By the same examiner –Or by more than one examiner testing the same attribute on the same group of subjects Specific research designs are utilized to determine the degree tests are reliable
© 2006 Evidence-based Chiropractic 4 Validity The degree to which a test truly measures what it was intended it to measure In valid tests, when the characteristic being measured changes, corresponding changes occur in the test measurement In contrast, tests with reduced validity do not reflect patient changes very well
© 2006 Evidence-based Chiropractic 5 Measurement error All measurements have some degree of error Thus, any given test score will consist of a true score plus an error component Observed score = True score + Error True score is a theoretical concept involving a measurement derived from a perfect instrument in an ideal environment
© 2006 Evidence-based Chiropractic 6 True score theory In a group of subjects, variation of true scores occurs because of –Individual differences of the subjects –Plus an error component Consequently, group scores will always be variable and the variability will result in a distribution of true scores plus error that conforms to a normal curve when the sample size is large enough
© 2006 Evidence-based Chiropractic 7 Random errors Errors that are attributable to the examiner, the subject, or the measuring instrument Have little effect on the group’s mean score because the errors are just as likely to be high as they are low For example, blood pressure which is variable depending on a number of factors
© 2006 Evidence-based Chiropractic 8 Systematic errors Errors that cause scores to move in only one direction in response to a factor that has a constant effect on the measurement system Considered to be a form of bias For example, a sphygmomanometer that is out of calibration and always generates high BP readings
© 2006 Evidence-based Chiropractic 9 Error components
© 2006 Evidence-based Chiropractic 10 Estimating reliability The proportion of true score variance divided by the observed score variance True score variance –Real differences between subjects’ scores due to biologically different people Observed score variance –The portion of variability that is due to faults in measurement
© 2006 Evidence-based Chiropractic 11 Observed score variance
© 2006 Evidence-based Chiropractic 12 The reliability coefficient Becomes larger (increased reliability) as error variance gets smaller –Equals 1.0 when error variance is 0.0 Becomes smaller (decreased reliability) as error variance gets larger Reliability coefficient= True score variance True score variance + Error variance
© 2006 Evidence-based Chiropractic 13 Interpretation of the reliability coefficient A reliability coefficient of 0.75 means that 75% of the variance in the scores is due to the true variance of the trait being measured and 25% is due to the error variance
© 2006 Evidence-based Chiropractic 14 Interpretation of the reliability coefficient (cont.) Ranges from 0.0 to 1.0 –0.0 represents no reliability and 1.0 perfect reliability Implications –0.75 or greater good reliability –0.5 to 0.75 moderate reliability –<0.5 indicates poor reliability.
© 2006 Evidence-based Chiropractic 15 Inter-examiner reliability When 2 or more examiners test the same subjects for the same characteristic using the same measure, scores should match Inter-examiner reliability is the degree that their findings agree
© 2006 Evidence-based Chiropractic 16 Intra-examiner reliability Scores should also match when the same examiner tests the same subjects on two or more occasions Intra-examiner reliability is the degree that the examiner agrees with himself or herself
© 2006 Evidence-based Chiropractic 17 Quantifying inter-examiner and intra-examiner reliability Correlation –There should be a high degree of correlation between scores of 2 examiners testing the same group of subjects or 1 examiner testing the same group on 2 occasions –However, it is possible to have good correlation and concurrent poor agreement Occurs when 1 examiner consistently scores subjects higher or lower than the other examiner
© 2006 Evidence-based Chiropractic 18 Graphing reliability Examiner 1 scores Examiner 2 scores ▼ ▼ ▼ ▼ Very good correlation ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
© 2006 Evidence-based Chiropractic 19 Good correlation and concurrent poor agreement Examiner 1 scores Examiner 2 scores ▼ ▼ ▼ ▼ Examiner 1 = 10 Examiner 2 = 20 Examiner 1 = 10 Examiner 2 = 20 Examiner 1 = 20 Examiner 2 = 30 Examiner 1 = 20 Examiner 2 = 30 Examiner 1 = 30 Examiner 2 = 40 Examiner 1 = 30 Examiner 2 = 40 Examiner 1 = 40 Examiner 2 = 50 Examiner 1 = 40 Examiner 2 = 50 Good correlation, but no agreement
© 2006 Evidence-based Chiropractic 20 Test-retest reliability A test is administered to the same group of subjects on more than one occasion –Test scores should be consistent when repeated –Test scores should correlate well Test-retest reliability is used to assess self-administered questionnaires which are not directly controlled by the examiner
© 2006 Evidence-based Chiropractic 21 Test-retest reliability (cont.) It is assumed that the condition being considered has not changed between tests Conditions that noticeably change over time are not good candidates for test- retest reliability studies –e.g., pain and disability status
© 2006 Evidence-based Chiropractic 22 Test-retest reliability (cont.) Questionnaire (Time 2) 1 2 3 4 5 6 7 8 9 10 Questionnaire (Time 1) 1 2 3 4 5 6 7 8 9 10 ?=?=
© 2006 Evidence-based Chiropractic 23 Parallel forms reliability a.k.a. Alternate forms reliability Two versions of a questionnaire or test that measures the same construct are compared Both versions are administered to the same subjects Scores are compared to determine the level of correlation
© 2006 Evidence-based Chiropractic 24 Parallel forms reliability (cont.) Questionnaire (Version 2) 1 2 3 4 5 6 7 8 9 10 Questionnaire (Version 1) 1 2 3 4 5 6 7 8 9 10 ?=?=
© 2006 Evidence-based Chiropractic 25 Internal consistency reliability The degree each of the items in a questionnaire measures the targeted construct All questions should measure various characteristics of the construct and nothing else
© 2006 Evidence-based Chiropractic 26 Internal consistency reliability (cont.) A questionnaire is administered to 1 group of subjects on 1 occasion The results are examined to see how well questions correlate If reliable, each question contributes in a similar way to the questionnaire’s overall score
© 2006 Evidence-based Chiropractic 27 Internal consistency reliability (cont.) Questionnaire 1 2 3 4 5 6 7 8 9 10 Total score____ Does - Q1 correlate well with Q8 Q1 with Q9 Q2 with Q7 Also Do - Q1, Q7, Q9, etc. correlate well with the total score ?
© 2006 Evidence-based Chiropractic 28 Cronbach’s coefficient alpha A measure of internal consistency that evaluates items in a questionnaire to determine the degree that they measure the same construct Is essentially the mean correlation between each of a set of items
© 2006 Evidence-based Chiropractic 29 Cronbach’s alpha (cont.) Values range from 1, representing perfect internal consistency, to less than zero when a questionnaire includes many negatively correlating items Alpha values ≥0.70 are generally considered to be acceptable
© 2006 Evidence-based Chiropractic 30 2 X 2 contingency table to compare results of examiners Useful to visualize the results of two examiners who are evaluating the same group of patients Inter-examiner reliability articles often present their findings in the form of a 2 X 2 contingency table –If not, they are fairly easy to create from the data presented in the article
© 2006 Evidence-based Chiropractic 31 2 X 2 contingency table (cont.) Rater 2 Test +Test -Row Total Test +aba+b Test -cdc+d Column Totala+cb+da+b+c+d Grand Total Rater 1 Agreements - a & d Disagreements - b & c
© 2006 Evidence-based Chiropractic 32 The kappa statistic ( κ) Agreement between examiners evaluating the same patients can be represented by the percentage of agreement of paired ratings However, percentage of agreement does not account for agreement that would be expected to occur by chance
© 2006 Evidence-based Chiropractic 33 The kappa statistic (cont.) –Even using unreliable measures, a few agreements are expected to occur just by chance Only agreement that occurs beyond chance levels represents true agreement This is what is represented by the kappa statistic –It is appropriate for use with dichotomous or nominal data
© 2006 Evidence-based Chiropractic 34 The kappa statistic (cont.) –Where observed agreement (P O ) is the total proportion of observations where there is agreement Kappa = observed agreement - chance agreement 1 - chance agreement POPO = number of exact agreements or a + d number of possible agreementsa + b + c + d
© 2006 Evidence-based Chiropractic 35 The kappa statistic (cont.) –Chance agreement (P C ) is the proportion of agreements that would be expected by chance –a expected and d expected can be found using the same procedure used to calculate expected cell values in the chi square test –(Multiply the row total by the column total for cells a and d and then dividing by the grand total) PC =PC = number of expected agreements or a expected + d expected number of possible agreementsa + b + c + d
© 2006 Evidence-based Chiropractic 36 The kappa statistic (cont.) –The values of P O and P C are then utilized in the following formula to calculate the kappa statistic –When the amount of observed agreement exceeds chance agreement, kappa will be positive –The strength of agreement is determined by the magnitude of kappa –If negative, agreements are less than chance Kappa = PO - PCPO - PC 1 - P C
© 2006 Evidence-based Chiropractic 37 Interpretation of kappa values Kappa value 0 0– – – – –1.0 Agreement beyond chance None Slight Moderate Fair Substantial Almost perfect
© 2006 Evidence-based Chiropractic 38 Kappa example Reliability of McKenzie classification of patients with cervical or lumbar pain –50 spinal pain patients (25 lumbar and 25 cervical) were simultaneously assessed by 2 physical therapists (14 in total) to classify patients into syndromes and subsyndromes κ = 0.84 for syndrome classification κ = 0.87 for subsyndrome classification
© 2006 Evidence-based Chiropractic 39 Intraclass Correlation Coefficient (ICC) Another measure of inter-examiner reliability that is for use with continuous variables Can be used to evaluate 2 or more raters Pearson’s r can be used –But ICC is preferred when sample size is small (<15) or more than two tests are involved
© 2006 Evidence-based Chiropractic 40 ICC (Cont.) There are three models of ICC that may utilize one of two different forms –Thus, 6 possible types of ICC depending on how raters are chosen and how subjects are assigned The type of ICC used should always be presented in research papers –The first number represents the ICC model –The second represents the form used
© 2006 Evidence-based Chiropractic 41 ICC (Cont.) For example –Clare et al reported on the reliability of detection of lumbar lateral shift and found it to be moderate –ICC [2,1] values ranging from 0.48 to 0.64 Model Form
© 2006 Evidence-based Chiropractic 42 ICC is an index of reliability Can range from below 0.0 to +1.0 –With ≈0.0 indicating weak reliability ≈1.0 strong reliability Suggested interpretation –Some clinical measures require ≥ 0.90 ICC value > to 0.75 <0.4 Degree of reliability Excellent Fair to good Poor
© 2006 Evidence-based Chiropractic 43 ICC is based on variance ICC is the ratio of between-groups variance to total variance, where –Between-groups variance is due to different subjects having test scores that truly differ –Total variance is due to score differences resulting from inter-rater unreliability of two or more examiners rating the same person Two-way ANOVA is used to calculate ICC
© 2006 Evidence-based Chiropractic 44 Validity The ability of tests and measurements to in fact evaluate the traits that they were intended to evaluate –Vital in research, as well as in clinical practice The extent of a test’s validity depends on the degree to which systematic error has been controlled for
© 2006 Evidence-based Chiropractic 45 Validity (cont.) The greater the validity, the more likely test results will reflect true differences between scores and not systematic error It’s a matter of degrees, not black-and- white –Technically incorrect to say a test is “valid” or “invalid” –Better to use categories like highly valid, moderately valid, etc.
© 2006 Evidence-based Chiropractic 46 Validity (cont.) Test validity depends on its intended purpose –For example, a hand-grip dynamometer is valid to measure grip strength, but it is not valid to measure the qualities of hand tremor
© 2006 Evidence-based Chiropractic 47 Validity (cont.) An invalid test can still be reliable –For example, a test that used skull circumference to predict intelligence –Reliability would probably be excellent, but it would not be a valid predictor of intelligence But an unreliable test can never be considered valid
© 2006 Evidence-based Chiropractic 48 Methods to estimate the extent of test validity Can be divided into 3 major categories –Self-evident Does the test appear to measure what it is supposed to measure –Pragmatic Does the test actually work as hypothesized –Construct validity Does the test adequately measure the theoretical construct involved
© 2006 Evidence-based Chiropractic 49 Self-evident methods Face validity –Simply deciding whether a test appears to have merit based on “face value” e.g., if a headache questionnaire asked about the location of head pain it would have face validity If it asked about hair color, it probably would not –The lowest level of test validation –Often assessed when researchers are first exploring a topic
© 2006 Evidence-based Chiropractic 50 Self-evident methods (cont.) Content validity –The ability of a test to include or represent all of the content of a construct Another definition for content validity –The content of a test is compared to the literature that is already available on the topic –The test is said to have good content validity if it accurately reflects what is in the literature
© 2006 Evidence-based Chiropractic 51 Pragmatic methods Criterion-related validity –The degree a test corresponds with an external criterion that is an independent measure of the characteristic being tested A criterion is the standard by which a measure is judged –A valid test should correlate well with or predict some relevant criterion –Concurrent and predictive validity are subgroups of criterion-related validity
© 2006 Evidence-based Chiropractic 52 Pragmatic methods (cont.) Concurrent validity –The results of a new test are compared with an established test (gold standard) to see if they are well correlated –Both tests are given at the same time –For example, a study that compares a clinical test to detect spondylolisthesis with x-ray findings
© 2006 Evidence-based Chiropractic 53 Pragmatic methods (cont.) Gold standard test –a.k.a, reference standard –A test that is generally acknowledged to be the best available –The value of a concurrent validity trial depends greatly on the quality of the gold standard that is used
© 2006 Evidence-based Chiropractic 54 Pragmatic methods (cont.) Construct validity –The extent to which a test effectively measures a theoretical construct Like pain or disability –The characteristic is not observed directly –Rather, an abstraction of the characteristic that corresponds to the construct under consideration is observed e.g., a pain scale or disability questionnaire
© 2006 Evidence-based Chiropractic 55 Pragmatic methods (cont.) Construct validity can be thought of as the accumulation of evidence that points to the ability of a test to actually measure what it claims to measure It involves the accumulation of evidence by establishing some of the other types of validity –The validity of a test is supported if the results of these studies agree with one another
© 2006 Evidence-based Chiropractic 56 Pragmatic methods (cont.) Construct validity is determined by comparing a new test with other tests that measure a similar construct Another way to evaluate construct validity is to compare the new test with other tests that are different, but related, which should not correlate well
© 2006 Evidence-based Chiropractic 57 Pragmatic methods (cont.) Convergent validity –Has to do with the degree of correlation that exists between a new test and another measure of the same or similar constructs –A test that has good convergent validity correlates well with another measure of the same construct
© 2006 Evidence-based Chiropractic 58 Pragmatic methods (cont.) Discriminant validity –The opposite of convergent validity, where the new test is weakly related to or unrelated to another measure that it should in fact be different from –A test with good discriminant validity should be able to separate patients into different groups e.g., normal vs. abnormal
© 2006 Evidence-based Chiropractic 59 The concept of validity and reliability Can be compared with scores on a target Scores may be systematically off center –Results from bias –The test environment is faulty, causing all scores to be inaccurate –Scores miss the bull’s eye in one direction Scores may be randomly off center –Scores miss the bull’s eye in any direction
© 2006 Evidence-based Chiropractic 60 The concept of validity and reliability (cont.) –When test scores miss the bull’s eye in any direction, it is caused by random error –Some subjects are affected while others are not Accurate tests –Are free from bias Precise tests –Are free from random error
© 2006 Evidence-based Chiropractic 61 Accuracy and precision An accurate and precise test hits the bull’s eye and is tightly grouped An accurate and precise test hits the bull’s eye and is tightly grouped An inaccurate test syste- matically misses the bull’s eye in one direction An imprecise test misses the bull’s eye randomly
© 2006 Evidence-based Chiropractic 62 Cutoff points Test results involving ordinal or continuous measures are often converted to a dichotomous scale (dichotomized) Achieved by establishing a cutoff point at a specified value –Scores above the specified value are considered positive –Scores below the value are negative
© 2006 Evidence-based Chiropractic 63 Would always correctly discriminate between those with and those without the condition –Always positive for those with the condition –Always negative for those without it The ideal diagnostic test
© 2006 Evidence-based Chiropractic 64 The ideal test Always positive for those with the condition Always negative for those without the condition
© 2006 Evidence-based Chiropractic 65 Real-world test False negativesFalse positives
© 2006 Evidence-based Chiropractic 66 Sensitivity and Specificity Commonly used to assess the validity of tests Sensitivity –The ability of a test to correctly identify people who have the target disorder Specificity –The ability of a test to correctly identify people who do not have the target disorder
© 2006 Evidence-based Chiropractic 67 Sensitivity and Specificity (cont.) Expressed as a percentage –0% represents no sensitivity or specificity –100% is perfect sensitivity or specificity A 2 X 2 contingency table can be used to calculate these indices
© 2006 Evidence-based Chiropractic 68 2 X 2 contingency table Condition (per “gold standard”) PresentAbsentRow Total Positive a (True +) b (False +) a+b Negative c (False -) d (True -) c+d Column Totala+cb+da+b+c+d Grand Total Test Result
© 2006 Evidence-based Chiropractic 69 Sensitivity and Specificity (cont.) Sensitivity =a/(a+c) = Specificity =d/(b+d) =
© 2006 Evidence-based Chiropractic 70 SnOUT (Sensitivity rules OUT) In tests that have very high sensitivity –A negative test will rule out the condition under consideration –This is because there are very few false negatives in tests with very high sensitivity –If a test with very high sensitivity is negative, it is very likely a true negative
© 2006 Evidence-based Chiropractic 71 SpIN (SPecificity rules IN) In tests that have very high specificity –A positive test will rule in the condition under consideration –This is because there are very few false positives in tests with very high specificity –If a test with very high specificity is positive, it is very likely a true positive
© 2006 Evidence-based Chiropractic 72 The cutoff point influences a test’s sensitivity & specificity Higher scores point to a worsening condition False negativesFalse positives If the cutoff point is raised, specificity increases, but there are more false negatives
© 2006 Evidence-based Chiropractic 73 If the cutoff point is lowered, sensitivity increases, but there are more false positives The cutoff point and sensitivity & specificity (cont.) False negativesFalse positives
© 2006 Evidence-based Chiropractic 74 Because increasing sensitivity will decrease specificity, and increasing specificity will decrease sensitivity, the cutoff point that is set depends on –Whether it is best to maximize sensitivity at the expense of specificity, or –Whether it is best to maximize specificity at the expense of sensitivity The cutoff point and sensitivity & specificity (cont.)
© 2006 Evidence-based Chiropractic 75 Receiver Operating Characteristic (ROC) curves Graphically depicts the tradeoff between sensitivity and specificity In accurate tests –The curve closely follows the left-hand border and the top border of the ROC space In less accurate the tests –The curve is closer to the 45-degree diagonal of the ROC space
© 2006 Evidence-based Chiropractic 76 ROC curves (cont.)
© 2006 Evidence-based Chiropractic 77 ROC curves (cont.) Cut-off low = high sensitivity, but more false positives Cut-off high = low sensitivity, but fewer false positives
© 2006 Evidence-based Chiropractic 78 In tests with low sensitivity –People with the target disorder will be missed (false negatives) In tests with low specificity –People who do not actually have the target disorder will be identified as having it (false positives) Implications of sensitivity & specificity
© 2006 Evidence-based Chiropractic 79 Tests with high sensitivity may be suitable when the consequences of reporting false positive findings to a patient are minor –e.g., incorrectly reporting to a patient that their triglycerides are elevated which results in them shifting to a healthier lifestyle Implications of sensitivity & specificity (cont.)
© 2006 Evidence-based Chiropractic 80 Tests with high specificity are better when false positive findings lead to painful or expensive treatment –e.g., a test that leads to surgical intervention –In this case false positives must be minimized Implications of sensitivity & specificity (cont.)
© 2006 Evidence-based Chiropractic 81 Screening for rare conditions –Many false positives may result since very few cases have the potential to be detected, even when highly specific tests are used. –Not a serious problem if positive screening leads to confirmatory testing Screening for common conditions –Many cases may be overlooked, even when a highly sensitive test is used Implications of sensitivity & specificity (cont.)
© 2006 Evidence-based Chiropractic 82 There is no general agreement, also it depends on the clinical situation Is changeable when –The intent of the test or the setting changes –The prevalence of the condition is different in the group being tested –Alternate methods of testing are available What is an acceptable level of sensitivity specificity?
© 2006 Evidence-based Chiropractic 83 Predictive value of a test Positive predictive value –The probability that a positive test will correctly identify people who have the target disorder –a/(a+b) Condition PresentAbsent Positiveab Negativecd Test result
© 2006 Evidence-based Chiropractic 84 Predictive value of a test (cont.) Negative predictive value –The probability that a negative test will correctly identify people who do not have the target disorder –d/(c+d) Condition PresentAbsent Positiveab Negativecd Test result
© 2006 Evidence-based Chiropractic 85 Condition (per “gold standard”) PresentAbsentRow Total Positiveaba+b Negativecdc+d Column Totala+cb+da+b+c+d Grand Total Test Result Sensitivity = a/(a+c) Specificity = d/(b+d) Positive predictive value = a/(a+b) Negative predictive value = d/(c+d)
© 2006 Evidence-based Chiropractic 86 Likelihood ratio (LR) The probability that the results of a diagnostic test would be expected in a patient with the condition of interest (sensitivity) compared to the expected results of the same test in a patient without the condition (specificity) Applies to positive as well as negative tests
© 2006 Evidence-based Chiropractic 87 Likelihood ratio (cont.) LR of a positive test (LR + ) –A ratio of the probability of a positive test in a person with the condition compared to the probability of a positive test in a person without the condition a/(a+c) or sensitivity 1-d/(b+d)1-specificity
© 2006 Evidence-based Chiropractic 88 Likelihood ratio (cont.) In a positive test –LR >1, the probability that the condition is present is increased –LR <1, the probability that the condition is present is decreased –LR =1, the probability that the condition is present versus not being present is the same
© 2006 Evidence-based Chiropractic 89 Likelihood ratio (cont.) LR of a negative test (LR - ) –A ratio of the probability of a negative test in a person with the condition compared to the probability of a negative test in a person without the condition 1-a/(a+c) or 1-sensitivity d/(b+d)specificity
© 2006 Evidence-based Chiropractic 90 Likelihood ratio (cont.) LRs have been referred to as the most useful single indicator of a test’s diagnostic strength They can be used to help make decisions about the need of further testing Also, choosing the appropriate time to begin treatment
© 2006 Evidence-based Chiropractic 91 Meaning of LRs LR >10 or <0.1 –Generates large and conclusive changes in the probability of a given diagnosis LR in the range of 5 to 10 or 0.1 to 0.2 –Generates a moderate and usually important change in the probability of a given diagnosis LR in the range of 2 to 5 or 0.5 to 0.2 –Generates a small but sometimes important change in the probability of a given diagnosis LR in the range of 1 to 2 or 0.5 to 1 –Changes the probability of a given diagnosis to a small and rarely important degree
© 2006 Evidence-based Chiropractic 92 Meaning of LRs (cont.) LRs >10 indicate that the test can be used to rule the condition in LRs ~ 1 provide no useful information for ruling the condition in or out LRs <0.1 indicate that the test can be used to rule the condition out
© 2006 Evidence-based Chiropractic 93 Pre-test probability The probability that a patient has a condition before the test is carried out Is based on the clinician’s experience, the prevalence of the condition, and published literature May be modified up or down if the patient has risk factors
© 2006 Evidence-based Chiropractic 94 Post-test probability Is generated by combining a patients pre- test probability of having the condition with the test’s LR –A high pre-test probability coupled with a high LR produces a very high post-test probability –A low pre-test probability coupled with a low LR produces a very low post-test probability
© 2006 Evidence-based Chiropractic 95 Using LRs with Pre-test & Post-test probabilities A practitioner’s confidence about a correct diagnosis would be higher after positive results of a test with a high LR Especially if the pre-test probability was high Thus, clinicians can use them in making decisions about the need for further testing and when to begin treatment
© 2006 Evidence-based Chiropractic 96 Using LRs with Pre-test & Post-test probabilities (cont.) When the post-test probability is very high, the condition is very likely present and treatment should be initiated When it is very low, the condition can be ruled out and no further diagnostic or therapeutic action is necessary
© 2006 Evidence-based Chiropractic 97 Using a nomogram First, the pre- test probability is estimated Using a nomogram Next, the test’s LR is obtained from an article Draw a line between the pre-test probability and the LR, extending to the post-test probability Draw a line between the pre-test probability and the LR, extending to the post-test probability
© 2006 Evidence-based Chiropractic 98 Using LRs with Pre-test & Post-test probabilities (cont.) LRs and post-test probabilities can be used serially –The post-test probability resulting from one test can be used as a pre-test probability for the next one
© 2006 Evidence-based Chiropractic 99 Clinical disagreement Practitioners can still disagree about clinical findings, even when valid and reliable tests are used 3 sources of clinical disagreement –The examiner (practitioner) –The examined (patient) –The examination
© 2006 Evidence-based Chiropractic 100 Clinical disagreement due to the examiner 1.Biological variations of senses –Many tests rely on the examiners abilities –Some people have better hearing, sight, more skill at palpation, etc. 2.Tendency to record inferences rather than evidence –Examiners may “pre-diagnose” patients based on visible cues before actual examination
© 2006 Evidence-based Chiropractic 101 Clinical disagreement due to the examiner (cont.) 3.Ensnarement by diagnostic classification schemes –Vague diagnostic criteria and the tendency to pigeon-hole patients 4.Entrapment by prior expectation –Tendency for examiners to find what they hope to find (e.g., chiropractors find back problems, urologists find kidney problems) 5.Examiner incompetence
© 2006 Evidence-based Chiropractic 102 Clinical disagreement due to the examined 1.Biological variation –Many conditions vary from day-to-day 2.Effects of illness and medications –A patient with severe pain is very difficult to examine –Pain medications may mask the true findings
© 2006 Evidence-based Chiropractic 103 Clinical disagreement due to the examined (cont.) 3.Memory and rumination –Chronic patients may include everything under the sun, or only what they think caused the problem (selective memory) –Recall bias 4.Toss-ups –Deals with conflicting ways to manage a patients condition
© 2006 Evidence-based Chiropractic 104 Clinical disagreement due to the examination 1.Disruptive environment –e.g., an athletic field or a child crying during a parent’s examination 2.Disruptive interactions between examiner and patient –Patients won’t confide in a doctor they don’t like or trust 3.Dysfunctional or incorrectly used diagnostic tools
© 2006 Evidence-based Chiropractic 105 Appraising reliability and validity articles First decide whether purpose of the study is to assess the test’s reliability or validity (or both) –Reliability studies assess the consistency of tests within or between examiners or questionnaires –Validity studies compare test results with established tests, or how accurately the test predicts a future outcome
© 2006 Evidence-based Chiropractic 106 Appraising reliability and validity articles (cont.) Was the test adequately described? –Should mention how patients prepared for the test (e.g., fasting prior to a blood test) –What patients had to endure (e.g., drugs given for routine colonoscopy) Patient inconvenience, cost, and harm must be weighed against the need for information –How the results were analyzed and interpreted
© 2006 Evidence-based Chiropractic 107 Appraising reliability and validity articles (cont.) Did the study sample include a full range of subjects with and without the condition? –All types of patients should be included, like one would see in everyday clinical practice –If too many sick are included, there is a greater chance that those with the disease will test positive Such tests may be able to identify obviously ill patients, but not those who are only mildly ill
© 2006 Evidence-based Chiropractic 108 Appraising reliability and validity articles (cont.) If the study utilized a gold standard for comparison, was it an acceptable one? –The credibility of a validity study depends on the soundness of the gold standard –It is often difficult to find an ideal gold standard since most tests do not have both high sensitivity and high specificity –Especially complex for spinal function tests
© 2006 Evidence-based Chiropractic 109 Appraising reliability and validity articles (cont.) Were the test results and the gold standard assessed independently in a blinded fashion? –Raters should be unaware of the results of previous testing, because this knowledge can greatly affect the interpretation of tests –Expectation bias When raters are influenced by knowledge of certain features of the case
© 2006 Evidence-based Chiropractic 110 Appraising reliability and validity articles (cont.) –Verification bias When the decision to carry out the gold standard test is influenced by the results of the test that is being evaluated –Be wary of studies that use more than one type of gold standard test e.g., some patients are biopsied, while others wait to see if the condition develops
© 2006 Evidence-based Chiropractic 111 Appraising reliability and validity articles (cont.) Do the results of this study apply to the patient before me? –The study’s population should be comparable to the patient on factors such as age, gender, and condition severity –Prevalence or severity of the condition may be higher in an academic environment As a result, the test’s sensitivity may be higher than if it were studied in the general population
© 2006 Evidence-based Chiropractic 112 Appraising reliability and validity articles (cont.) Will patients benefit as a result of being tested? –Is the new test really preferable to the old one It may be less convenient, more expensive, and provide little or no added information Beware of studies on diagnostic tests that have commercial ties –Test results should benefit the patient and actually result in a change in the way their condition is managed
© 2006 Evidence-based Chiropractic 113 Appraising reliability and validity articles (cont.) –One must also consider the consequences of not performing the test For instance, a test that is designed to detect a condition that is potentially very harmful if left undiagnosed e.g., arterial dissection or abdominal aneurysm –The risk associated with the test should be proportional to the importance of the information to be gained
© 2006 Evidence-based Chiropractic 114 Appraising reliability and validity articles (cont.) Is the test reliable? –Coefficients of agreement should be within acceptable ranges –P values or confidence intervals should point to statistically significant findings
© 2006 Evidence-based Chiropractic 115 Appraising reliability and validity articles (cont.) Is the test valid? –P values or confidence intervals should be reported and should be significant –The gold standard should be a valid marker for what is being tested –Sensitivity and specificity should be sufficiently high Depends on the planned use of the test
Concept of Measurement The ability to demonstrate change or relationship and to communicate those changes to others. Describes the quality or quantity.
Measurement and Data Quality. Measurement The assignment of numbers to represent the amount of an attribute present in an object or person, using specific.
PTP 560 Research Methods Week 3 Thomas Ruediger, PT.
EVIDENCE ABOUT DIAGNOSTIC TESTS Min H. Huang, PT, PhD, NCS.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.
Chapter 14 Inferential Data Analysis. Inferential Statistics Techniques that allow us to study samples and then make generalizations about the population.
Chapter 6. The Research Consumer Evaluates Measurement Reliability and Validity.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Instrumentation. Instruments Questionnaires Surveys Interviews How do you know what to ask?
VALIDITY AND RELIABILITY Chapter Four. CHAPTER OBJECTIVES Define validity and reliability Define validity and reliability Understand the purpose for needing.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Today Concepts underlying inferential statistics Types of inferential statistics Parametric T-tests ANOVA Multiple regression ANCOVA Non-parametric Chi-Square.
Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.
Standardized Scales. Standardization Use of identical procedures to collect, score, interpret, and report results of a measure Assures that differences.
Measurement. Scales of Measurement Stanley S. Stevens’ Five Criteria for Four Scales Nominal Scales –1. numbers are assigned to objects according to rules.
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.
1 Beginning the Research Design Theory, Questions, Hypotheses Designing Tests for the above: Conceptualization, Operationalization, and Measurement.
Page 1 EDWARD JAMES R GORGON MPhysio BCHPEd PTRP Department of Physical Therapy College of Allied Medical Professions University of the Philippines Manila.
Chapter 8 Flashcards. systematic process that involves assigning labels (usually numbers) to characteristics of people, objects, or events using explicit.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Educational Research Chapter 7 Correlational Research Gay, Mills, and Airasian.
Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 25 Critiquing Assessments Sherrilene Classen, Craig A. Velozo.
Chapter 13 Understanding research results: statistical inference.
Chapter 6 - Standardized Measurement and Assessment
Chapter 7 Statistical Issues in Research Planning and Evaluation Research Methods in Physical Activity.
Reliability, Validity, & Scaling. Reliability Repeatedly measure unchanged things. Do you get the same measurements? Charles Spearman, Classical Measurement.
LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.
RELIABILITY Reliability refers to the consistency of a test or measurement. –Reliability studies Test-retest reliability –Equipment and/or procedures Intra-
Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.
© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.
©2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.
RESEARCH METHODS Lecture 18. CRITERIA FOR GOOD MEASUREMENT.
Appraising A Diagnostic Test Clinical Epidemiology and Evidence-based Medicine Unit FKUI-RSCM.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Educational Research Chapter 12 Inferential Statistics Gay, Mills, and Airasian.
ASSESSING RESPONSIVENESS OF HEALTH MEASUREMENTS. Link validity & reliability testing to purpose of the measure Some examples: In a diagnostic instrument,
Technical Adequacy Session One Part Three. Reliability We all have friends, some are reliable and some are not With your partner, discuss what a reliable.
2/10/11. Infer properties of the population from what is observed in the sample An inference is a generalization, as inferences go beyond the data.
Research Methods in MIS Dr. Deepak Khazanchi. Measurement of Variables: Scaling, Reliability and Validity.
SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.
Effect Size and Meta-Analysis Effect size helps evaluate the size of a difference, such as the difference between two means. Meta-analysis is used to combine.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Inter-observer variation can be measured in any situation in which two or more independent observers are evaluating the same thing Kappa is intended to.
© 2017 SlidePlayer.com Inc. All rights reserved.