Presentation on theme: "Subgroup Report 7/28/06. Our Aims Purpose of future work: write (at least) one paper describing the landscape of appropriate analytic options. Purpose."— Presentation transcript:
Our Aims Purpose of future work: write (at least) one paper describing the landscape of appropriate analytic options. Purpose of paper is to educate writers and readers of research papers with respect to assessing the ‘quality’ and validity of results. Purpose of this talk: elicit group feedback on our paper structure and concepts.
Not our aims: We are not discussing particular methodology, but rather general principles that provide a framework in which different methodologies can be incorporated. We don’t overlap with CONSORT (Consolidated standards of reporting trials), which gives detailed advice for reporting randomized controlled trials. We deal with reporting more generally on subgroup research, not necessarily on specifically randomized trials.
Propose 3 levels of analysis People will look at their data in great detail no matter what we say. We are trying to tell them how to report what they find or interpret what other people report. So we suggest studies should be formulated at 3 levels of analysis: Primary, secondary, and tertiary. Investigators should clearly specify where they are in this system – to facilitate credibility.
Primary analyses There should be a very limited set of major outcomes (often only one) of primary concern. This level usually doesn’t include separate subgroup analyses – so is not really covered in our paper!
Secondary analyses Here we are specifically focusing on planned subgroup analyses. These may be carried out whether the primary analyses are significant or not. multiplicity correction methods
Tertiary analyses Unplanned analyses. We ask authors to explicitly identify when they move into this level of analysis! Examine data in many ways, aka, Data mining Data dredging, Exploratory analyses. Alert readers that results are derived within an unplanned analysis format.
Reporting/interpreting: 1 o Primary analysis(-es): sound methodology (for analyses). If the studies aren’t randomized, there may be different explanations for the results (e.g. confounding factors), but the results themselves have a defined statistical justification. Usual alpha allocation.
Reporting/interpreting: 2 o Secondary analyses: we propose a second allotment of alpha ≤ primary’s alpha for this entire set of analyses (no matter how large a set!) Results will have some statistical justification, but should be considered promising, with independent replication strongly advised before the results are acted upon.
Reporting/Interpreting 3 o Results must be considered speculative. Any reported p-values or effect sizes are purely descriptive, since they do not take the multiplicity of possible inferences into account. It may be possible to speculate on multiplicity adjustment, but this is usually problematic. Any findings here should include other related evidence to facilitate decisions on whether to pursue them (reporter’s [or reader’s] perspective) or believe them (reader’s perspective).
Summary of proposed continuum: Primary level: Allow 5% for T1error (or other specified value) (as usual) Secondary level: Allow some alpha <= the specified primary alpha Tertiary level: Presumably it is impossible to define ‘statistical significance’ here. It may be possible retrospectively, but unlikely. Authors can report individual p- values, and/or effect sizes, although these are generally only descriptive.
Summary of proposed paper(s) One paper is drafted-fuller version to SMMR; and possibly briefer version (see previous slide) to JAMA. Main reference study for SMMR paper is from WHI –to provide unifying context for explanations. Need other/additional exemplars too (see below).
What do we need? Input from you! References from you! A time line… to be announced. URL to be maintained at SAMSI! Alias to be maintained at SAMSI!
Input/ references? Input from SAMSI workshoppers – please check our draft http://www.samsi.info/200506/multiplicity/workingg roup/sa/index.htmlhttp://www.samsi.info/200506/multiplicity/workingg roup/sa/index.html Looking for examples of well-done, well-reported studies, even for null findings –Specifically, exemplars of the types/situations described in the paper at the 3 levels Researchers you know/respect/work with whose names will lead to excellent hits in Medline. Please consider these options; email us with anything you think of!!
SAMSI resources? URL to be maintained at SAMSI! –Do we need to put publications (references we’ve been sharing) that are currently posted on the URL behind a password ?? Alias to be maintained at SAMSI! Return tickets?
What if you find something at exploratory level When and how should followup studies be performed? Existence of prior published relevant results “Plausible” explanations Possible confounding factors Strength of evidence: p-values, effect sizes, posterior probabilities
Other issues 1. A priori vs post hoc comparisons: The wrong thing to emphasize. The important issue is whether there is some multiple error control over the set of comparisons. i.e. post hoc is fine using Scheffe. Planning 20 comparisons without multiplicity adjustment is bad.
2. Hierarchical analyses Can gain power by using analyses where you don’t continue unless something is significant. E.g. degree of a polynomial. Don’t test for linear unless constant is significant. Don’t test for quadratic unless linear is significant. Etc. Every test can be at.05 and familywise error rate is.05.
Interaction: Qualitative interaction is more important than quantitative interaction. Either test hypothesis of no qualitative interaction (Gail and Simon, Biometrics 1985) or test hypothesis of qualitative interaction (Shaffer, Psychometrika, 1991). Which?
Consider testing for differences in distributions of subgroups rather than differences in means. Sometimes leads to different tests, and often leads to different interpretations of tests.
Type III errors In some contexts directional errors are worse than false non-zero decisions. Ex: Comparing medications In some contexts directional errors are less important than false non-zero decisions. Ex: Perhaps microarray analysis? (Shaffer, Psychological Methods 2002)
We can’t expect perfection Some results of good studies will be Type I errors (p =.05) Some results of good studies will be Type II errors (p =.20 with power of.80).