Presentation is loading. Please wait.

Presentation is loading. Please wait.

V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to.

Similar presentations


Presentation on theme: "V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to."— Presentation transcript:

1 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to validity and whether appropriate remedies were employed Al Best, PhD Perkinson 3100B

2 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Goals Be able to answer four questions: Based on the study design, what is the level of evidence? Based on the study design, what is the level of evidence? How were threats to validity addressed? How were threats to validity addressed? Based on the goals of the study, How do you describe the results? Based on the goals of the study, How do you describe the results? To justify the conclusions, were comparisons done appropriately? To justify the conclusions, were comparisons done appropriately?

3 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Threats to validity Threats to validity –Bias –Confounding –Chance –Multiplicity Some solutions Some solutions –Study design –Randomization –Masking (AKA blinding) –Analysis Analysis Analysis –Descriptive stats –SD vs SE –T-test and ANOVA –Statistical significance vs Clinical importance –Ordinal data and nonparametric stats –Correlation –Survival analysis Did the paper do the right stats? Did the paper do the right stats? Overview

4 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Bias Definition, Bias: Definition, Bias: –Systematic distortion of the estimated intervention effect away from the “truth” –Caused by inadequacies in the design, conduct, or analysis of a trial Selection bias Selection bias Measurement bias Measurement bias

5 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Selection bias Definition: Bias from the use of a non- representative group as the basis of generalization to a broader population Definition: Bias from the use of a non- representative group as the basis of generalization to a broader population Example: Estimate prognosis from patients newly diagnosed and infer to patients hospitalized with the disease Example: Estimate prognosis from patients newly diagnosed and infer to patients hospitalized with the disease –Newly diagnosed patients have a much broader spectrum of outcomes

6 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Selection bias Selection bias? Selection bias? –How are patients allocated to intervention groups? –How are exposure groups identified? Patients across time: Patients across time: –Groups comparable at baseline? –Similar follow-up? Similar dropout? –ALL subjects analyzed? (NOT only the completers!)

7 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Measurement (information) bias Definition, Measurement bias: Definition, Measurement bias: –Systematic failure of a measurement process to accurately represent the measurement target Examples: Examples: –different approaches to questioning, when determining past exposures in a case-control study –more complete medical history and physical examination of subjects who have been exposed to an agent suspected of causing a disease than of those who have not been exposed to the agent

8 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Measurement bias? NHANES III: “We estimate that at least 35% of the dentate US adults aged 30 to 90 have periodontitis” 1 NHANES III: “We estimate that at least 35% of the dentate US adults aged 30 to 90 have periodontitis” 1 –Mesial and buccal surfaces –Two randomly selected quadrants –CAL≥3mm Or: Full mouth prevalence= 65% 2 Or: Full mouth prevalence= 65% 2 1 JM Albandar, JA Brunelle, A Kingman (1999) "Destructive periodontal disease in adults 30 years of age and older in the United States, ". Journal of Periodontology 70 (1): 13–29. 2 A Kingman & JM Albandar (2002) “Methodological aspects of epidemiological studies of periodontal diseases.” Periodontology , 11–30.

9 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Confounding Informal definition: distortion of the true biologic relation between an exposure and a disease outcome of interest Informal definition: distortion of the true biologic relation between an exposure and a disease outcome of interest Usually due to a research design and analysis that fail to account for additional variables associated with both Usually due to a research design and analysis that fail to account for additional variables associated with both –Such variables are referred to as confounders or as lurking variables –Look for factors associated with the outcome and with the exposure

10 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Confounding examples Misidentified carcinogen Misidentified carcinogen Prior to the discovery of HPV, HSV-2 was associated with the cervical cancer Prior to the discovery of HPV, HSV-2 was associated with the cervical cancer It is now well established that HPV is central to the pathogenesis of invasive cervical cancer. And HSV-2 appears to increase the risk It is now well established that HPV is central to the pathogenesis of invasive cervical cancer. And HSV-2 appears to increase the risk Hawes & Kiviat (2002) Are Genital Infections and Inflammation Cofactors in the Pathogenesis of Invasive Cervical Cancer? JNCI 94(21):

11 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Perio and CVD Cigarette smoking is associated with adult perio and CVD Cigarette smoking is associated with adult perio and CVD This produces an association between perio and CVD This produces an association between perio and CVD Control for smoking to see the perio-CVD relationship clearly Control for smoking to see the perio-CVD relationship clearly Scannapieco et al. (2003) Associations Between Periodontal Disease and Risk for Atherosclerosis, Cardiovascular Disease, and Stroke. A Systematic Review. Annals of Periodontology (8)38-53

12 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Chance Begin by assuming: No relationship. No difference. No change. The intervention has no effect. The exposure changes nothing. Begin by assuming: No relationship. No difference. No change. The intervention has no effect. The exposure changes nothing. Ask: “I assume no effect, do the data support this?” Ask: “I assume no effect, do the data support this?” –The p-value answers this question. Decision rule: p-value < 0.05 means the data is unlikely to have occurred by chance. Decision rule: p-value < 0.05 means the data is unlikely to have occurred by chance. –A license to make up a story P-value > 0.05 means there is no story P-value > 0.05 means there is no story –It does NOT mean that the study demonstrated no relationship, no difference, no change.

13 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Outcome: Caries Outcome: Caries –visual exam –x-ray interpretation –Fiber optic transillumination –Electrical caries meter –DiagnoDent Outcome: Periodontology Outcome: Periodontology –alveolar bone loss –clinical attachment level –pocket depth Clustered data in Independent Subjects Clustered data in Independent Subjects –Teeth –Tooth surfaces –Restorations –Implants Multiplicity Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent May;41(5): pubmed/ pubmed/

14 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Multiplicity effects N=47 perio, N=20 healthy Analyzed for the presence of 300 species

15 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identify multiplicity effects Multiple outcomes: The proliferation of possible comparisons in a trial. Common sources of multiplicity are: Multiple outcomes: The proliferation of possible comparisons in a trial. Common sources of multiplicity are: –multiple outcome measures, assessment at several time points, subgroup analyses, or multiple intervention groups Multiple comparisons: Performance of multiple analyses on the same data. Multiple statistical comparisons increase the probability of a type I error: “finding” an association when there is none. Multiple comparisons: Performance of multiple analyses on the same data. Multiple statistical comparisons increase the probability of a type I error: “finding” an association when there is none.

16 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identify multiplicity effects Analysis of the same variable at multiple time points after treatment initiation Analysis of the same variable at multiple time points after treatment initiation Periodic analysis of accumulating partial results Periodic analysis of accumulating partial results Post hoc subgroup comparisons are especially likely not to be confirmed in following studies Post hoc subgroup comparisons are especially likely not to be confirmed in following studies Bottom line: With every comparison, the chance of a false positive goes up exponentially.

17 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Stats 3: Overview Threats to validity Threats to validity –Bias –Confounding –Chance –Multiplicity Some solutions Some solutions –Study design –Randomization –Masking (AKA blinding) –Analysis

18 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Study Design “A justification for the sample size used in the study should be given. “A justification for the sample size used in the study should be given. Baseline characteristics of the study groups should be compared and Baseline characteristics of the study groups should be compared and information given on non-response and dropouts.” information given on non-response and dropouts.” Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent May;41(5): pubmed/ pubmed/

19 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power and Sample Size But first, backing up. What is the definition of significance level (alpha)? What is the definition of significance level (alpha)? –It is the probability of rejecting a true null hypothesis. What is the definition of a p-value? What is the definition of a p-value? –The p-value is the probability that the data occurred by chance, assuming the null hypothesis is true. –The p-value is NOT the probability that the null-hypothesis is true.

20 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Trade offs Alpha = Type I error = prob. of rejecting a true null hypothesis Conclusion Truth Do not reject null-hypothesis (p-value >.05) Reject null- hypothesis (p-value <.05) Null-hypothesis (no difference) correctType I error Alternative hypothesis (difference) Type II errorcorrect

21 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Trade offs Beta = Type II error = prob. of not finding a true difference Power = probability of rejecting HO when it is false. Power = probability of finding a true difference. Conclusion Truth Do not reject null-hypothesis (p-value >.05) Reject null- hypothesis (p-value <.05) Null-hypothesis (no difference) correctType I error Alternative hypothesis (difference) Type II errorcorrect

22 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power Power = probability of finding a true difference. Power = probability of finding a true difference. Power depends upon: Power depends upon: –The size of the difference

23 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Power Power = probability of finding a true difference. Power = probability of finding a true difference. Power depends upon: Power depends upon: –The size of the difference –Measurement variability –Sample size

24 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Randomization: What it is Randomization: assignment of treatments to patients (equivalently, patients to treatments) based a chance Randomization: assignment of treatments to patients (equivalently, patients to treatments) based a chance Can take many different forms, all acceptable Can take many different forms, all acceptable –The simplest is a coin-flip for each patient Look for exactly HOW randomization happened Look for exactly HOW randomization happened –An explicit description is required –If the paper does not SAY random assignment was done, it wasn’t. Note: Don’t confuse “random selection of subjects” with “random assignment to treatments”

25 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Randomization: Why? What it accomplishes What it accomplishes –Virtually eliminates opportunities for intentional or inadvertent skewing of patient allocation to favor a treatment –Eliminates other selection biases of all sorts affecting treatment comparisons, period! –Tends to protect against confounding But But –Cannot assure comparable groups –Randomize after recruitment and consent –No effect on measurement bias or placebo effect

26 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Blinding, AKA: Masking Masking and Blinding refer to concealment of the randomized intervention received by a patient. Who may be blind: Masking and Blinding refer to concealment of the randomized intervention received by a patient. Who may be blind: –Case/patients/participants –Interventionists, those treating participants –Those measuring outcomes: Clinicians and technicians who do not treat case/patients, but are involved in evaluating their outcomes –Investigators involved in decision-making about policies during the trial, and about statistical analyses to interpret the resulting data

27 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Double? Blind “Blinding is intended to prevent bias on the part of study personnel. “Blinding is intended to prevent bias on the part of study personnel. The most common application is double- blinding, in which participants, caregivers, and outcome assessors are blinded to intervention assignment.” The most common application is double- blinding, in which participants, caregivers, and outcome assessors are blinded to intervention assignment.” Altman, et al. (2001) The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine, 134(8),

28 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Does the paper say blinding occurred? Needs to be explicit Needs to be explicit –Which of the trial participants were masked, and how treatment was concealed –Understand what the blinding accomplished Blinded measurement directly and totally protects against “diagnostic suspicion bias,” a skewing by treatment-influenced expectations Blinded measurement directly and totally protects against “diagnostic suspicion bias,” a skewing by treatment-influenced expectations Look for differential dropouts Look for differential dropouts –as “uncooperative” patients get less social support for returning for follow-up

29 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Threats to validity Threats to validity –Bias –Confounding –Chance –Multiplicity Some solutions Some solutions –Study design –Randomization –Masking (AKA blinding) –Analysis Analysis Analysis –Descriptive stats –SD vs SE –T-test and ANOVA –Statistical significance vs Clinical importance –Ordinal data and nonparametric stats –Correlation –Survival analysis Did the paper do the right stats? Did the paper do the right stats? Analysis: Overview

30 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Quantitative Data Continuous Type – –Age, duration of disease, roughness, level, color change Discrete Type (count data) – –dmfs, dmft, # involved surfaces, # bleaching treatments

31 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describe Quantitative Data We describe numeric data by: Measures of Centrality Measures of Centrality –AKA: typical value, location –Mean, median Measures of Spread Measures of Spread –Standard deviation, range Shape of distribution Shape of distribution –Normal –Skewed

32 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Descriptive Statistics Both a measure of centrality and a measure of variability are required to describe a set of numeric data e.g. mean (SD=) or median (first quartile, third quartile). The standard deviation is only appropriate for use with the mean. The mean and the median should be routinely compared to investigate the impact of outliers. Both a measure of centrality and a measure of variability are required to describe a set of numeric data e.g. mean (SD=) or median (first quartile, third quartile). The standard deviation is only appropriate for use with the mean. The mean and the median should be routinely compared to investigate the impact of outliers. Interpretations Interpretations –95% of the individuals are within 2 SD of the mean –50% of the individuals are between the 25 th %tile and the 75 th %tile SD = square root (average squared deviations from the mean) Hannigan A, Lynch CD. Statistical methodology in oral and dental research: pitfalls and recommendations. J Dent May;41(5): pubmed/ pubmed/

33 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y The median is little affected by extreme observations The three distributions above have the same median, but different means.

34 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y The median is little affected by extreme observations The three distributions above have the same median, but different means.

35 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Example: Henson, et al. The purpose of this study was to determine whether dental esthetics influenced the perceptions of teens when judging a peer’s athletic, social, leadership, and academic abilities. Methods: The frontal-face smiling photographs of 10 teenage volunteers were each altered to create 1 image with an ideal arrangement of teeth and 1 with a nonideal arrangement. Two parallel surveys were constructed with 1 photo displaying either an ideal or a nonideal smile image of each subject. If the ideal smile image appeared in one survey, then the nonideal smile appeared in the other. N=221 peer evaluators rated the pictures. (Am J Orthod Dentofacial Orthop 2011;140:389-95)

36 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Influence of dental esthetics on social perceptions of adolescents judged by peers ST Henson, SJ Lindauer, WG Gardner, B Shroff, E Tufekci, and AM Best

37 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y SE=standard error of the estimate SE=SD/√n

38 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describing Average Data Fig 2. Ratings for perceived social characteristics between ideal and non-ideal smiles.

39 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Describing Numeric Data Distribution of data from the two parallel surveys. Distribution of data from the two parallel surveys. Visual analog scale; 50=neutral, 0=disagree, 100=agree Visual analog scale; 50=neutral, 0=disagree, 100=agree “This person is a leader” “This person is a leader” Boxplot: 75 th %tile Median 25 th %tile whiskers

40 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Test Statistic

41 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Student’s T The T distribution was discovered by the mathematician William Gossett, who was employed by the Guinness brewery. He used the pseudonym “Student” in his paper describing his result because of the company policy prohibiting publication.

42 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y ANOVA A t-test is used when comparing (only) two groups. A t-test is used when comparing (only) two groups. When more than two groups are compared, or comparisons are using multiple classification variables, use Analysis of Variance. When more than two groups are compared, or comparisons are using multiple classification variables, use Analysis of Variance. Example: in the AJODO paper we tested whether the mean VAS was different across: Example: in the AJODO paper we tested whether the mean VAS was different across: –Evaluator’s sex, and race, and –Picture’s sex, race, and “ideal smile vs. non- ideal”

43 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Statistical Significance vs. Clinical Importance Stats: The difference is larger than chance. Stats: The difference is larger than chance. Clinical: The difference is large enough to matter. Clinical: The difference is large enough to matter. –Look at the CIs

44 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Parametric testing vs. Nonparametric testing Recall: Populations have parameters and we use sample data to estimate Recall: Populations have parameters and we use sample data to estimate Parametric tests assume that the data is Normally distributed. Parametric tests assume that the data is Normally distributed. Nonparametric tests do not make this assumption. The data is just ranks (ordinal data) and the distributions are compared. Nonparametric tests do not make this assumption. The data is just ranks (ordinal data) and the distributions are compared.

45 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Normal Distribution? Not every measure has a Normal distribution Not every measure has a Normal distribution Some are highly skewed (i.e., a few very large values) Some are highly skewed (i.e., a few very large values) Restricted range (eg., no zero values) Restricted range (eg., no zero values) Examples: Examples: –Triglyceride –Microbial counts –dmfs/DMFS scores –Shear strength (breaking strength)

46 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Normal Distribution? CFU/ml Enterococcus Faecalis CFU/ml Enterococcus Faecalis Control Control Sodium Hypochlorite, 1min Sodium Hypochlorite, 1min Green=Normal distribution, Red=log normal JP Coudron (2012) MSD Thesis

47 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y How we Measure Statistical Associations Associations are what we observe, as Associations are what we observe, as –Differences or ratios of:  means or medians  Proportions, odds, rates –Correlation, regression coefficients –Slopes of trends in statistical models Causation → association, but not the other way around Causation → association, but not the other way around No measure of association, in itself, implies causation No measure of association, in itself, implies causation

48 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Relationships We visualize the relationship between two numeric variables using a scatterplot We visualize the relationship between two numeric variables using a scatterplot We summarize the strength and direction of a linear relationship using a correlation We summarize the strength and direction of a linear relationship using a correlation –Pearson’s correlation coefficient, r –r = 0 means no linear relationship –r = +1 means a perfect positive relationship –r = – 1 means a perfect negative relationship –r has no units.

49 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y

50 Survival Analysis OBJECTIVE: To assess the predictors of implant failure after grafted maxillary sinus (GMS). METHODS: A total of 1045 implants were inserted in 224 patients/347 GMS during a period of 14 years. Kaplan-Meier and Cox proportional hazards analysis were used to assess the following variates: …, auto/allo/xenogenic bone grafts, …RESULTS: Significant implant failure predictors were the graft material (HR = 4.7), with superior results for autogenic bone, … OBJECTIVE: To assess the predictors of implant failure after grafted maxillary sinus (GMS). METHODS: A total of 1045 implants were inserted in 224 patients/347 GMS during a period of 14 years. Kaplan-Meier and Cox proportional hazards analysis were used to assess the following variates: …, auto/allo/xenogenic bone grafts, …RESULTS: Significant implant failure predictors were the graft material (HR = 4.7), with superior results for autogenic bone, … In highly atrophic situations, autogenic bone grafts showed superiority; however, in less atrophic cases, nonautogenic bone-grafts are equivalent. In highly atrophic situations, autogenic bone grafts showed superiority; however, in less atrophic cases, nonautogenic bone-grafts are equivalent. Zinser, et al. The predictors of implant failure after maxillary sinus floor augmentation and reconstruction: a retrospective study of 1045 consecutive implants. OOOO. (2013)115(5): pubmed/ pubmed/

51 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Survival after auto/allo/ xenogenic bone grafts “In highly atrophic situations, autogenic bone grafts showed superiority however, in less atrophic cases, nonautogenic bone-grafts are equivalent.”

52 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Recall: Data classification Type of data Distinguishing CharacteristicsExamples Discrete or qualitative Observations grouped into distinct classes NominalClasses without a natural order or rank Sex, treatment group, presence or absence OrdinalClasses with a predetermined or natural order Disease severity, bone density, plaque accumulation, bleeding

53 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Data classification Type of data Distinguishing CharacteristicsExamples Continuous or quantitative (numeric) Observation may assume any value on a continuous scale IntervalNumeric value with equal unit differences; arbitrary zero Temperature, GPA Time to event Survival analysis, Censored observations Restoration survival time, Implant success

54 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Which statistical method? How to decide if the correct statistical test was used? Questions are of the form: For ___ response variable, is there a relationship with ___ predictor variable? For ___ response variable, is there a relationship with ___ predictor variable? For ___ response variable, is there a difference between the groups identified by the ___ predictor variable? For ___ response variable, is there a difference between the groups identified by the ___ predictor variable? See the “decision matrix” and presentation online.

55 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Goals Be able to answer four questions: Based on the study design, what is the level of evidence? Based on the study design, what is the level of evidence? How were threats to validity addressed? How were threats to validity addressed? Based on the goals of the study, How do you describe the results? Based on the goals of the study, How do you describe the results? To justify the conclusions, were comparisons done appropriately? To justify the conclusions, were comparisons done appropriately?

56 V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y “… significant linear correlation between chocolate consumption per capita and the number of Nobel laureates per 10 million persons …” Messerli FH. Chocolate consumption, cognitive function, and Nobel laureates. N Engl J Med Oct 18;367(16): PubMed: PubMed:


Download ppt "V I R G I N I A C O M M O N W E A L T H U N I V E R S I T Y Identifying Threats to Validity Critical Appraisal Skills depend upon identifying threats to."

Similar presentations


Ads by Google