Presentation on theme: "Find the Joy in Stats ? ! ? Walt Senterfitt, Ph.D., PWA Los Angeles County Department of Public Health and CHAMP."— Presentation transcript:
Find the Joy in Stats ? ! ? Walt Senterfitt, Ph.D., PWA Los Angeles County Department of Public Health and CHAMP
Introduction Stats are tools, to help describe, understand and assess research results Like other tools, they can be used properly and appropriately, or they can be misused They have no meaning by themselves and are not a gold standard for determining truth or falsehood.
Some key terms and concepts Observational vs. experimental study Effect measures or estimates: Relative Risk, RR and Odds Ratio, OR (there are others!) Confidence Intervals (or CI) Crude vs. Adjusted RR and OR P-values and tests of statistical significance
Observational (Epidemiologic) Study Designs Individuals are described and observed and certain outcomes are measured No attempt is made to affect the outcome Example: Studies that observed that circumcised men were less likely to be HIV-infected than uncircumcised men Results are associations; causation can only be inferred
Experimental Studies Individuals divided into 2 or more groups, usually randomly, that receive different interventions or treatments, e.g. drug vs. placebo Specified outcomes are measured and compared Example: Randomized controlled trial of circumcision for HIV prevention.
Effect Measures Most research seeks to offer evidence (note I don’t say “show” or “prove”) for or against a hypothesized effect of an intervention or exposure on an outcome Usually expressed by comparing the outcome in the exposed or treatment group vs. the unexposed, placebo or control group
Relative Risk or Risk Ratio (RR) The probability of an event or outcome occurring in the exposed (treatment or intervention) group vs. the unexposed (control) group. Expressed as a ratio or fraction. Example: HIV incidence in circumcision group was 2.1% vs. 4.2% in control group Risk ratio (2.1 / 4.2) was 0.47 Corresponds to a 53% reduction in risk
Odds Ratio (OR) The ratio of the odds of an event or outcome occurring in the exposed group compared to the odds in the unexposed or control group “Odds” is a little different from way used in most ordinary language, but similar to betting odds, say at a horse race 3:2 odds, means 60% chance of winning and 40% chance of not winning, and OR =.6/.4 = 1.5
Odds Ratios vs. Risk Ratio For both, value of 1.0 means “no effect” RR is more intuitive, like the way people think, and thus easier to explain OR has some properties that make it easier to work with statistically, especially for adjustments to account for the influence of other variables
More on RR vs OR Both can be expressed exposed/unexposed or unexposed/exposed, so be sure you understand the framing. If an undesirable outcome has less risk or lower odds in the exposed (treatment) group, we can say that the treatment was “protective” OR tends to be close to the RR if the outcome of interest is fairly rare (<10%, say) BUT …..
Risk/Odds of Death on Titanic Males vs. Females Risk Ratio vs. Odds Ratio Alive Dead Total Female 308 154 462 Male 142 709 851 Total 450 863 1,313
…if the outcome is not rare Risk of death for males was 709/851 = 0.83 Risk of death for females = 154/462 = 0.33 RR = 0.83/0.33 = 2.5 Odds of death for males = 709:142 = 5:1 Odds of death for females = 154:308 = 1:2 OR = 4.99/0.5 = 9.99 Lesson: Make sure odds ratios are being applied correctly, i.e. to rare outcomes
Crude vs. adjusted RR and OR In observational studies especially, there are other variables -- besides the main one of interest -- that may well affect the outcome; the effect of an exposure may be different in, say, certain age, gender, race/ethnic, social class groups … and the proportionate mix of these categories may be incidental
Crude vs. Adjusted (more) To isolate or highlight the effect of the main exposure variable of interest, we can statistically “adjust” the OR or RR to “control for the effect of” certain other variables For instance, we can statistically adjust the observed results to see what the effect would have been if the overall sample were all of the same age; then what we have is an RR or OR “adjusted for age differences” or “controlling for variations in age”
Confidence Intervals (CI) Effect measures are first expressed, as in examples above, as “point estimates” as in “the risk of HIV in the circumcision group was only 0.47 times as great as in the control group” or “there was a 53% reduction in risk of HIV in the circumcision as compared with the control group” But there is always random error in measurement, no matter how carefully a study is done Unlikely we would get the exact same results again
Confidence Intervals (continued) Thus, effect estimates are also presented with “interval estimates” which is a range around the point estimate. The true value of the effect “most likely” lies within this range. This range is called the confidence interval and the upper and lower ends of the range are called the confidence limits. The range is determined in part by the confidence level, most typically set arbitrarily at 95%
CI: example “During the study, seroconversion occurred in 22 participants in the circumcision group and 47 of those in the control group. The 2-year HIV incidence was 2·1% (95% CI 1·2—3·0) in the circumcision group and 4·2% (3·0—5·4) in the control group (p=0·0065); combined, it was 3·1% (2·4—3·9)....The risk ratio of HIV acquisition in the circumcision group compared with the control group was 0·47 (95% CI 0·28—0·78), which corresponds to a reduction in the risk of acquiring an HIV infection in the circumcision group of 53% (22%—72%).”
CI Interpretation Tricky to be technically accurate and still be understood by anyone but statisticians! But not *too* bad to say “statistically, there is a 95% chance that the true effect is not outside this range” Most important point is to remember that the point estimate is probably not exact CI depends on the particular statistical test used, the amount of random variation in the data collection process and the sample size
“Statistical Significance” and p- values “Statistically significant” does not mean a result or inference is necessarily true or accurate, and “non-significant” does not mean an effect is necessarily not real Significance tests are associated with p- values; a particular p-value is calculated from a particular statistical test on the observed data
P-values The p-value is the probability of obtaining the observed result, or one more extreme, if no real effect exists Back to the circumcision trial example, the calculated p-value (on the Z-test of the difference in 2-year HIV incidence in circumcision group, 2.1% vs control, 4.2%) was.0065.
Statistical significance “Statistically significant” is usually applied to results where the calculated p-vale is less than.05 (sometimes set at a different level), the flip side of 95% a or confidence interval Sometimes researchers say “significant at the,05 level” or “highly significant” at “.01” or “less than.001,” etc. Sometimes researchers say for p-values greater than.05 but less than.10 that there was “a trend toward significance”
Dangers of Significance Testing Significance test provides an arbitrary but accepted way to compare the relative strength of associations between exposure and observed effect across studies, and helps make objective “decision rules” But they are often misinterpreted and misused in ways that prevent us applying critical judgment to make maximum use of information from studies
Dangers, continued A common misinterpretation of a p-value less than.05 = “That we can be 95% certain that the observed difference between groups, or the observed effect, is real and could not have happened by chance” A p-value less than.05 allows us to say that there is strong evidence against there being no true effect (aka “for rejecting the null hypothesis), but it does NOT prove that the alternative is true – that the observed effect is true and real.
More dangers Statistical significance does not necessarily = clinical or “real world” significance. Very large sample sizes can render almost any difference statistically significant. Focusing only on stat significance can us to ignore some real effects and exaggerate others Only necessary when there’s a close call !
Conclusion Question, question, question Does the basic study design make sense? Is it asking important question in the right way? Do the comparison groups seem equivalent? Are the observed differences likely to be significant in the real world? If the observed results are surprising, can you possibly explain the discrepancy or see what further studies are necessary? Replication and confirmation are usually key.