Presentation on theme: "EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی."— Presentation transcript:
EPIDEMIOLOGY 4 RELIABILITY AND VALIDITY (TRAINING AND CALIBRATION) دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر
RELIABILITY AND VALIDITY OF DATA Two main reasons for variability in scoring (WHO, 1997): Difficulty in scoring the different levels of oral diseases, particularly dental caries and periodontal diseases Physical and psychological factors (fatigue, fluctuations in interest in the study, variations in visual acuity and tactile sense) that affect the judgement of examiners format time to time and to a different degree
RELIABILITY AND VALIDITY OF DATA What is the principal problem/ issue with the variability? To decide whether examiners are sufficiently close to each other in their interpretation and application of the clinical criteria. In this sense, data from their samples can be pooled together to provide area/district estimates, whose variances reflect true inter-subject, differences in oral health and not an inflation due to examiner differences (Pine et al. 1997).
RELIABILITY AND VALIDITY OF DATA Objectives of standardisation and calibration (WHO, 1997): To ensure uniform interpretation, understanding and application by all examiners of the codes and criteria for the various diseases and conditions to be observed and recorded. To ensure that each examiner can examine consistently.
RELIABILITY AND VALIDITY OF DATA How can this problem / issue be tackled? 1.Training of examiners and interviewers 2.Calibration exercise 3.Repeat examinations
TRAINING EXERCISE What do we mean by a training exercise? The training exercise aims to thoroughly and intensively teach to the survey examiners the logistics of the examination protocol and the agreed interpretation of the diagnostic criteria. In practical terms, the full range of diagnostic situations Are presented and discussed in detail: a) on slides, b) on actual subjects. It takes place before the survey and requires at least 2 days of intensive work. It may be repeated at specific intervals during the survey.
CALIRBATION EXERCISE What do we mean by a calibration exercise? The calibration exercise completes the training and reflects a formal measure of how well the examiner can interpret the criteria, compared to the "gold-standard" set by, the trainer. It takes place before the survey and may be (repeated annually.
CALIBRATION EXERICISE How does this practically happen? Some subjects are examined by some (or even all) examiners and by the gold-standard examiner and the data are compared. Repeated annually, in order to ensure consistency in the interpretation of criteria and familiarity with new measures. The calibration exercise should include a sufficient number of cases (20 subjects), on which a wide range of diagnostic decisions have to be made (i.e. treated and untreated caries, as well as caries-free subjects).
CALIBRATION EXERCISE What is the action taken? Outlier" examiners and the specific areas of over- or under- scoring are identified. The issue is discussed and thoroughly clarified. A repeat calibration exercise should be undertaken. On a repetitive unsatisfactory result, the outlier may be excluded from the survey. (practical difficulties) NB.Ability to standardise clinical examination results is not a measure of clinical skill (Claritfii1 advance )
REPEAT EXAMINATIONS What do we mean by repeat examinations? The repeat examinations can be carried out: a) by the same examiner, aiming to monitor the intra-examiner diagnostic consistency (single examiner), or b) by the gold-standard examiner, aiming to ensure inter examiner diagnostic consistency (group of examiners). In practical terms, this implies performing duplicate examinations on 5·10% of the survey sample ( 25 subjects). It should take place in various stages of the survey (beginning, half-way, end).
TRAINING AND CALIBRATION OF EXAMINERS 1. Intensive training in the examination protocol and criteria, guided by gold-standard examiner(s). 2. Calibration exercise for key measures. 3 Identification of problems, clarification with respective examiners. 4. Final training session and meeting with interviewers.before each wave of examinations (refresh knowledge, highlight key problematic areas) 5. Repeat examinations by examiner (single examiner) or by gold standard examiner (group of examiners).
TRAINING OF INTERVIEWERS Familiarise with the procedure and appropriate order of clinical examination (gold-standard examiner). 2. Training in the administration of the questionnaire (explanation, instructions on the format and the administration of questions, practical exercises). 3. Final meeting with examiners before each wave of fieldwork (meet examiners, highlight key points, discuss issues raised during fieldwork in previous waves) Re- training for interviewers that have not participated in the survey for a predefined period (e.g. 1 month).
ASSESSMENT OF REPRODUCIBILITY: METHODS 1. Use of master sheets. 2. Calculation of mean indices by examiner and the size and direction of deviation from gold-standard examiner. 3. Calculation of group means and 95% confidence limits. 4. Assessment of percentage of agreement between examiner and gold-standard examiner. 5. Sensitivity and Specificity estimations. 6. Dices concordance index. T. Kappa and weighted Kappa statistic.
DEVIATION FROM GOLD STANDARD EXAMINER 1. Establishment of an arbitrary cut-off point for acceptable deviation from the gold-standard examiner (e.g±.5 dmft/DMFT). Calculation of mean dmft/DMFT for the gold-standard examiner. 4. Estimation of the size and direction of deviation from the gold- standard examiner for each examiner, comparison with the chosen level of acceptance.
GROUP MEAN AND 95% CONFIDENCE LIMITS The basic concept is to identify the outliers, if any, whose mean scores fall outside the 95% confidence interval of the mean score for all examiners. The calculation of the group mean score excludes the gold-standard examiner. The value of t varies according to the number of examiners. The general formula for the 95% confidence limits is: group mean ± t (0.05, df=n-1) x sd
PERCENTAGE OF AGREEMENT Estimated as the exact number of agreements expressed as a percentage of the total. Very simple Takes no account of where in the table the agreement was Some agreement expected even by chance. Lack of accuracy when the prevalence of disease or condition is rather low.
SENSITIVITY AND SPECIFICITY Sensitivity refers to the ability to correctly identify the true positive cases. It is the proportion of true positive cases which are tested positive. Specificity refers to the ability to detect the true negative cases. It is the proportion of true negative cases which are tested negatives. Sensitivity=TP I (TP+FN), Specificity=TN I (TN+FP) Affected by disease experience and treatment provision (e.g caries experience and proportion restored).
DICES CONCORDANCE INDEX Appropriate when only one outcome is the object of interest (e.g.decayed teeth) Quick and easy Does NOT use all available data D=2a / (2a+b+c) -+ BA+ dc- Examiner Examiner B
(KAPPA (K) STATISTIC Kappa (Cohen, 1960) is a measure of agreement that can be calculated between a pair of examiners (examiner and gold- standard examiner) that takes chance agreement into account. It reflects the chance corrected proportional agreement. It may involve a comparison on a surface or on a tooth level, or even on aggregate indices (e.g. DMF). It may also Include all possible codes for a condition, as well as different groupings of data (flexibility in application).
KAPPA CALCULATION Eexaminer 1 Examiner 2 TotalCariesSound a+bbaSound c+ddcCaries nb+da+cTotal K=(P0-Pe) P0=(a+d)/n Pe([a+c)×(a+b)+ (b+d) ×(c+d)]/n2 P o reflects the proportion of observed agreement and p e the proportion of agreement that could be expected by chance
Kappa does NOT take into account the degree of disagreement. In ordinal variables, it is preferable to use the weighted Kappa, which provides weights to disagreements according to the magnitude of discrepancy (the closer to the diagonal, the better). Kappa and weighted Kappa represent the best approach to measuring variability - "statistics cannot provide a simple substitute to clinical judgement" (Altman, 1991).
KAPPA INTERPRETATION Strength of agreementValue of K Poor<0.20 Fair0.21-0.40 Moderate0.41-0.60 Good0.61-0.80 Very good0.81-1.00 Landis and koch (1977)
CORRELATION Correlation is an expression of how much two variables vary together; it does not reflect their proximity to 1: 1 correspondence Correlation is a measure of the strength of the association between two variables, not of their agreement. Consequently: Correlation should be avoided for the analysis of calibration exercise.
TRAINING AND CALIBRATION KEY POINTS Use the minimum number of examiners in surveys, Training and calibration exercise at baseline and repeated at later stages, Follow standardised procedures and agreed criteria, Include sufficient number of cases in calibration, so as to cover a wide range of diagnostic decisions. Determine key clinical variables and appropriate data Grouping, to be included in the calibration exercise. Calculate and interpret Kappa scores. Re- calibrate exclude outliers. Plan repeat examinations during the survey.