Presentation on theme: "Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A."— Presentation transcript:
Token Swap Contingency Tables in Three Dimensions: Paradigm for Biomedical Data Analysis. G. William Moore, MD, PhD. Grover M. Hutchins, MD. Lawrence A. Brown, MD.
Disclaimer. United States Government Work, uncopyrighted, public-domain, DRAFT COPY ONLY. This document does not necessarily represent the views or policies of any United States Government agency. This document is provided "as is", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and non-infringement. In no event shall the authors be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, arising from, out of, or in connection with the document or the use or other dealings made with the document.
Abstract. Context: Contingency tables are commonly used for organizing frequency data on biomedical databases. Classical statistical methods applied to contingency tables include chisquare and Fisher exact methods, based upon squared-normal and binomial distributions. In the token swap method, patients, or tokens, in the contingency table are randomly swapped, to determine whether observed data deviate from a preset null hypothesis. Technology: Perl programming language, theory of statistics. Design: The simplest contingency table is a rectangular table, consisting of four cells, two rows by two columns, that measures association between row and column variables in a misclassification space. The null hypothesis predicts expected values for each cell; tokens are randomly swapped until they match observed values. More generally, a three-dimensional contingency table has rows, columns, and depths, representing a variable for ultimate biomedical outcome. Results: The two- and three-dimensional token swap methods satisfy the Neyman- Pearson condition for power of the alternative hypothesis. Unlike classical methods, the token swap method supports a range of null hypotheses, including those with zero cell totals. Conclusion: The present model extends the range of existing contingency table analysis to incorporate additional clinicopathologic information, and to explore customized null hypotheses.
Contingency Table. 1.Commonly used for organizing biomedical frequency data. 2.Simplest contingency table: 2×2 table. 3.Rectangular table, 2 rows, 2 columns. 4.Φ: established/old test; Ψ: new test. 5.Determines statistical correlation between independent variables Φ and Ψ.
Contingency Table: example patients autopsied with sickle cell disease patients with pain crisis, 9 deaths unexplained at autopsy (45%) patients without pain crisis, 4 deaths unexplained at autopsy (8.5%). 4.Φ: established/old test, i.e., death unexplained at autopsy. 5.Ψ: new test, i.e., clinical pain crisis.
Contingency Table: Example. 1.Is there a correlation between pain crisis and death unexplained at autopsy? 2.Chisquare method: χ 2 = , 1 d.f., p< Fisher exact method: p= Token swap method: p=
Problems with Classical Methods. 1. Chisquare (χ2) method fails if 20% of cell totals are small (less than 5). 2. Both methods assume random sampling, statistical independence. 3. Limited freedom to customize null hypothesis. 4. No distinction between established test and ultimate followup.
Misclassification Paradigm. 1. Classical statistics, cell- frequencies: either entire population, or random sample of population. 2. Token swap method, cell frequencies: misclassifications: false negatives, false positives. 3. Classical null hypothesis: cross-products of marginal totals, i.e., statistical independence. 4. Token swap method: how many swaps to transform the observed into the expected cell-frequencies?
Misclassification Paradigm. 1. Classical null hypothesis: statistical independence. 2. What if null hypothesis is zero false positives? 3. Trade-off Ratio: relative cost of false negatives versus false positives. 4. Screening test, e.g., gynecologic cytology, false negative (losing patient to followup) more costly than false positive (additional gynecologic cytology).
Token Swap Method. 1. Patients (tokens) randomly swapped in contingency table. 2. Determine whether observed data deviate from null hypothesis. 3. Null hypothesis: does not necessarily have statistical independence.
Token Swap Method: Usual Null Hypothesis. 1. Upper contingency table: expected table: cross-products of marginal totals: expected_a = (v×x)/z. expected_b = (v×y)/z. expected_c = (w×x)/z. expected_d = (w×y)/z. 2. Five swaps: transform expected table into observed table. 3. Each swap: move forward or fall back. 4. Token swap, p=
Token Swap Method: Customized Null Hypothesis, Trade-off Ratio. 1. Upper contingency table: customized expected table. 2. Three swaps: transform customized expected table into observed table. 3. Each swap: move forward or fall back. 4. Token swap, p=
Neyman-Pearson Condition: Definition Neyman-Pearson Condition is the condition that, for a hypothesis test between two point hypotheses H 0 : θ=θ 0 and H 1 : θ=θ 1, then the likelihood-ratio test that rejects H0 in favor of H1 when Λ(x) = (L(θ0|x) / L(θ1|x)) < η, where P((Λ(X)<η)|H 0 )=α is the most powerful test of size α for threshold η: (L(θ 0 |x)/L(θ 1 |x)): likelihood ratio; η: critical region for the test; α: significance level for Type I (false positive) Error. Statistical method decreases β-error only by increasing α-error.
Neyman Pearson Condition
Neyman-Pearson Condition: high α, low β
Neyman-Pearson Condition: low α, high β.
Definition 1. Definition 1. Token swap distribution: T(a,k<0)=0 at swaps k<0; T(a,k=0)=1 and T(≠a,k=0)=0 at swap k=0; T(a+j,k>0) = T(a+j-1,k- 1)×(((a+j-1)×(d+j-1))/(((a+j- 1)×(d+j-1))+((c-j+1)×(b- j+1)))) + T(a+j+1,k- 1)×(((a+j+1)×(d+j+1))/(((a+j+ 1)×(d+j+1))+((c-j-1)×(b-j- 1)))) at swap k>0, where 0×(.../...) = 0.
Theorem 1: Step k, zero tail beyond k. 1a. T(a+j,k) = 0 when j>k; 1b. T(a-j,k) = 0 when j>k. Proof. 1a. By induction, at swap k=0, by Definition 1 that T(≠a,k=0)=0, T(a+j,k) = 0. Assume true for swap k-1; consider swap k. Since j-1 > k-1 and j+1 > k-1, then by the inductive hypothesis: T(a+j,k) = T(>a,k-1)×... + T(a+j-1,k+1)×(.../...) = 0×(.../...) + 0×(.../...) = 0.
Theorem 2. 2a. T(a+k,k) = T(a+k-1,k-1)× T(a+j-1,k-1)×(((a+j- 1)×(d+j-1))/(((a+j-1)×(d+j-1))+((c-j+1)×(b-j+1)))) b. T(a-k,k) = T(a-k-1,k-1)× Proof. 2a. By Theorem 1a, the second term is T(a+k+1,k-1)=0. Proof. 2b. Analogous to Proof 2a.
Theorem 3: Step k, kth tail; less than step (k-1), (k-1)th 3a. T(a+k,k) < T(a+k-1,k-1). Proof. 3a. Since the swaps terminate when either c=k or b=k, then c>(k-1), b>(k-1), (c-k+1)>0,, (b-k+1)>0,, and q= Then by Theorem 2, T(a+k,k) = T(a+k-1,k-1)×q < T(a-k+1,k-1). Proof. 3b. Analogous to Proof 3a.
Misclassification: Three Dimensions. 1. Some biomedical analyses involve: old test, Φ; new test, Ψ; ultimate test, Ω, for example, long-term follow-up or autopsy findings. 2. Tests Φ, Ψ: conceptually comparable; but ultimate test, Ω, wins over other two tests. 3. Token swaps: swaps that favor Φ versus swaps that favor Ψ.
Token Swap Method: Three Dimensions 1. Cell a: true negative for Φ, Ψ: Φ, Ψ, Ω all false. 2. Cell b: Φ false positive, Ω false, Φ true; but for Ψ,: true negative: Ψ, Ω both false, etc. 3. Status of all cells: a = Φ,Ψ true negative. b = Φ false positive. c = Ψ false positive. d = Φ,Ψ false positive. e = Φ,Ψ false negative. f = Ψ false negative. g = Φ false negative. h = Φ,Ψ true positive.
Constraints: 3D paired swaps. 1. No swaps across Ω-true, Ω- false. 2. No net gain or loss in marginal totals for Ω. 3. No net gain or loss in Φ- true, Φ-false, Ψ-true, Ψ-false, Ω-true, or Ω-false permitted swaps, to or from cell a, as follows: 1. a→b. 2. a→c. 3. a→d, c→b. 4. a→d, b→c. 5. b→a. 6. c→a. 7. d→a, b→c. 8. d→a, c→b.
3D Token Swaps: 1, 2, 3, 4.
3D Token Swaps: 5, 6, 7, 8.
3D Swap 1. If a→b, Then d→c, f→e, and g→h. Net: no changes.
3D Swap 2. If a→c, Then d→b, g→e, and f→h. Net: no changes.
3D Swap 3. If a→d and c→b, Then h→e and f→g. Net: +2 Φ false positive, +2 Φ false negative, Favors Ψ.
3D Swap 4. If a→d and b→c, Then h→e and g→f. Net: +2 Ψ false positive, +2 Ψ false negative, Favors Φ.
3D Swap 5. If b→a, Then c→d, e→f, and h→g. Net: no changes.
3D Swap 6. If c→a, Then b→d, e→g, and h→f. Net: no changes.
3D Swap 7. If d→a and b→c, Then e→h and g→f. Net: +2 Ψ false positive, +2 Ψ false negative, Favors Φ.
3D Swap 8. If d→a and c→b, Then e→h and f→g. Net: +2 Φ false positive, +2 Φ false negative, Favors Ψ.
Live Demonstration: 3D Token Swap. TOKENSWAP, p:
Summary, Conclusions Two- and three-dimensional token swap methods satisfy Neyman-Pearson condition, for power of alternative hypothesis. Unlike classical methods, the token swap method supports a range of null hypotheses, including zero cell totals. Present model extends range of existing contingency table analysis. Incorporates additional clinicopathologic information. Explores customized null hypotheses.
References. 1. Parfrey NA, Moore GW, Hutchins GM. Is pain crisis a cause of death in sickle cell disease? Am J Clin Pathol Aug;84(2): Moore GW, Hutchins GM, Miller RE. Token swap test of significance for serial medical data bases. Am J Med Feb;80(2): Moore GW, Hutchins GM, Miller RE. A new paradigm for hypothesis testing in medicine, with examination of the Neyman Pearson condition. Theor Med Oct;7(3): Heckering PS. Token swap test revisited. Comput Methods Programs Biomed Mar;70(3):