BSHI 2002 Glasgow, Scotland STATISTICAL ANALYSIS OF HLA AND DISEASE ASSOCIATIONS M. Tevfik DORAK Department of Epidemiology University of Alabama at Birmingham.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

Epidemiologic Study Designs Clinical Studies & Objective Medicine
Contingency Tables Prepared by Yu-Fen Li.
Contingency Table Analysis Mary Whiteside, Ph.D..
Two-sample tests. Binary or categorical outcomes (proportions) Outcome Variable Are the observations correlated?Alternative to the chi- square test if.
Categorical Data Analysis
Mapping genes with LOD score method
M2 Medical Epidemiology
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL Biostatistics.
Chi-square A very brief intro. Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma.
Analysis of frequency counts with Chi square
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
Chi Square Test Dealing with categorical dependant variable.
EPI 809/Spring Multiple Logistic Regression.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
David Yens, Ph.D. NYCOM PASW-SPSS STATISTICS David P. Yens, Ph.D. New York College of Osteopathic Medicine, NYIT l PRESENTATION.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Analysis of Categorical Data
Statistics for clinical research An introductory course.
Amsterdam Rehabilitation Research Center | Reade Testing significance - categorical data Martin van der Esch, PhD.
Research Study Design and Analysis for Cardiologists Nathan D. Wong, PhD, FACC.
Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
1 Applied Statistics Using SAS and SPSS Topic: Chi-square tests By Prof Kelly Fan, Cal. State Univ., East Bay.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Week 6: Model selection Overview Questions from last week Model selection in multivariable analysis -bivariate significance -interaction and confounding.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Contingency tables Brian Healy, PhD. Types of analysis-independent samples OutcomeExplanatoryAnalysis ContinuousDichotomous t-test, Wilcoxon test ContinuousCategorical.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
MBP1010 – Lecture 8: March 1, Odds Ratio/Relative Risk Logistic Regression Survival Analysis Reading: papers on OR and survival analysis (Resources)
What is “collapsing”? (for epidemiologists) Picture a 2x2 tables from Intro Epi: (This is a collapsed table; there are no strata) DiseasedUndiseasedTotal.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
AnnMaria De Mars, Ph.D. The Julia Group Santa Monica, CA Categorical data analysis: For when your data DO fit in little boxes.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Categorical data analysis: An overview of statistical techniques AnnMaria De Mars The Julia Group AnnMaria De Mars The Julia Group.
Dan Piett STAT West Virginia University Lecture 12.
Intermediate Applied Statistics STAT 460 Lecture 20, 11/19/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Lecture 11. The chi-square test for goodness of fit.
Chapter 13 Understanding research results: statistical inference.
Nonparametric Statistics
Uses of Diagnostic Tests Screen (mammography for breast cancer) Diagnose (electrocardiogram for acute myocardial infarction) Grade (stage of cancer) Monitor.
Measures of disease frequency Simon Thornley. Measures of Effect and Disease Frequency Aims – To define and describe the uses of common epidemiological.
I. ANOVA revisited & reviewed
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
Nonparametric Statistics
March 28 Analyses of binary outcomes 2 x 2 tables
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 16: Research with Categorical Data.
Association between two categorical variables
Do data match an expected ratio
Qualitative data – tests of association
Nonparametric Statistics
The Chi-Square Distribution and Test for Independence
Examples and SAS introduction: -Violations of the rare disease assumption -Use of Fisher’s exact test January 14, 2004.
If we can reduce our desire,
ERRORS, CONFOUNDING, and INTERACTION
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Research Techniques Made Simple: Interpreting Measures of Association in Clinical Research Michelle Roberts PhD,1,2 Sepideh Ashrafzadeh,1,2 Maryam Asgari.
Effect Modifiers.
Presentation transcript:

BSHI 2002 Glasgow, Scotland STATISTICAL ANALYSIS OF HLA AND DISEASE ASSOCIATIONS M. Tevfik DORAK Department of Epidemiology University of Alabama at Birmingham U.S.A.(2002)

BSHI 2002 Glasgow, Scotland This workshop will cover categorical data analysis for case-control design and some concepts in population genetics AIMS Familiarization with common statistical tests useful in HLA and disease association studies Clarification of several statistical concepts Discussion of common mistakes Interpretation of results

BSHI 2002 Glasgow, Scotland Why would you do an association study? Disease gene mapping and positional cloning Molecular profiling (to predict susceptibility, outcome, response, prognosis) Basic science (to learn about disease development and subsequently to design diagnostic tests or new treatment)

BSHI 2002 Glasgow, Scotland Meaning of an association Population stratification (confounding by ethnicity) or other spurious associations Linkage disequilibrium (confounding by locus) Direct involvement in the disease process

BSHI 2002 Glasgow, Scotland Cross-validation of results Replication (population level and/or family-based) Functional studies Split the sample into two random groups (if nothing else can be done!)

BSHI 2002 Glasgow, Scotland Failure to replicate False positive in the original study False negative in the second one Population specificity Population stratification

BSHI 2002 Glasgow, Scotland Considerations at the beginning Will you have enough power? Who are the controls? Unrelated or family-based? A subgroup vs another one (males vs females)? Prospective sequential sampling or retrospective convenience samples for cases? Remember you will be testing whether the cases and controls are from the same population. The answer shouldnt be obvious at the beginning.

BSHI 2002 Glasgow, Scotland An example of power calculation Proportion Difference Power / Sample Size Calculation Significance Level (alpha):.05 (Usually 0.05) Power (% chance of detecting):.80 (Usually 80) First Group Population Proportion:.40 (Between 0.0 and 1.0) Second Group Population Proportion:.60 (Between 0.0 and 1.0) Relative Sample Sizes Required: 2.0 (For equal samples, use 1.0) Sample Size Required: Group 1: 80 Group 2: 160 (Sample sizes become 115 : 231 for P = 0.01)

BSHI 2002 Glasgow, Scotland An example of power calculation Proportion Difference Power / Sample Size Calculation Significance Level (alpha):.01 (Usually 0.05) Power (% chance of detecting):.80 (Usually 80) First Group Population Proportion:.05 (Between 0.0 and 1.0) Second Group Population Proportion:.10 (Between 0.0 and 1.0) Relative Sample Sizes Required: 2.0 (For equal samples, use 1.0) Sample Size Required: Group 1: 538 Group 2:

BSHI 2002 Glasgow, Scotland Beware of the following flaws and fallacies of epidemiologic studies confounders (known or unknown) selection bias response bias misclassification bias variable observer Hawthorne effect (changes caused by the observer in the observed values) diagnostic accuracy bias regression to the mean significance Turkey nerd of nonsignificance cohort effect ecologic fallacy Berkson bias (selection bias in hospital-based studies) SEE:

BSHI 2002 Glasgow, Scotland Categorical Data Analysis * 2x2 Table Analysis for Association Chi-squared (Pearson, Yates) Fisher G-test McNemar's test: TDT, HRR (Logistic Regression) * Odds Ratio - Relative Risk Difference between OR and RR Woolf-Haldane Modification Comparison of two ORs Adjusted OR * Linkage Disequilibrium Comparison of two LDs * RxC (multicontingency) Table Analysis Chi-squared G-test Exact Tests (needed for HWE) Trend Test (frequently overlooked) See

BSHI 2002 Glasgow, Scotland The SAS System FREQ Procedure Output – I Statistic DF ValueProb Chi-Square Likelihood Ratio Chi-Square Continuity Adj. Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Fisher's Exact Test Cell (1,1) Frequency (F)45 Left-sided Pr <= F Right-sided Pr >= F Table Probability (P) Two-sided Pr <= P0.0066

BSHI 2002 Glasgow, Scotland The SAS System FREQ Procedure Output – II Estimates of the Common Relative Risk (Row1/Row2) Type of StudyMethodValue95% Confidence Limits Case-ControlMantel-Haenszel (Odds Ratio)Logit CohortMantel-Haenszel (Col1 Risk)Logit CohortMantel-Haenszel (Col2 Risk)Logit

BSHI 2002 Glasgow, Scotland BC ACACB BB BCBCABAB BCAB ABABCDCDACACBDBD transmitted allele case Non-transmitted allele control Parent-Case Trios in TDT/HRR

BSHI 2002 Glasgow, Scotland - AN EXAMPLE OF TDT - TRANSMISSION DISEQUILIBRIUM OF HLA-B62 TO THE PATIENTS WITH CHILDHOOD AML (Dorak et al, BSHI 2002) Out of 13 parents heterozygote for B62, 12 transmitted B62 to the affected child and 1 did not McNemars test results: P = (with continuity correction) odds ratio = 12.0, 95% CI = 1.8 to 513 Nontransmitted Allele B62Other Transmitted Allele B62x12 Other1y

BSHI 2002 Glasgow, Scotland Multiple comparisons Not needed if the study is not hypothesis driven (i.e., a fishing experiment) Not needed if the study is hypothesis driven ('Possible relevance of the HLA system' is not a valid hypothesis in this context. Those studies belong to the fishing experiments group) Therefore, it is not clear when it is needed in HLA association studies. Most frequently, it is an excuse for a busy reviewer to avoid a comprehensive review Best solution is to avoid facing this problem - ideally by replication and/or functional data to support the statistical association before it is dismissed as a spurious result of multiple comparisons

BSHI 2002 Glasgow, Scotland Common Mistakes in Statistical Evaluation of Association Study Results - I Confusion between corrections (Yates/Williams for continuity VS Bonferroni) Confusion between RR and OR (they are not the same) Confusion between expected and observed values in cells of a contingency table Small sample size issue Dont confuse a negative result with lack of power (No significant difference between the two groups and they were pooled VS the difference did not reach significance due to small sample size are different interpretations of the same phenomenon, i.e., lack of power) Using Chi-squared test for small sample size (why not use Fisher all the time?) Using Chi-squared test for HWE (use exact test or G-test)

BSHI 2002 Glasgow, Scotland Common Mistakes in Statistical Evaluation of Association Study Results - II One-tailed and two-tailed P values (always use two-tailed) Trend test for a multicontingency table? (if appropriate, more powerful) Multiple comparison issue Failure to give the strength of the association (OR, RR, RH) Use of the word proof. Does statistics prove anything? (A P value provides a sense of the strength of the evidence for or against the null hypothesis of no association) Reliance on large sample effect to achieve significance Showing P values as (this means P < 0.001) Confusion between association and linkage

BSHI 2002 Glasgow, Scotland Association and Causality? However strong an association does not necessarily mean causation. Several criteria have been proposed to assess the role of an associated marker in causation. Some of those are as follows: 1. Biological plausibility 2. Strength of association (this is not measured by the P value) 3. Dose response (are heterozygotes intermediate between the two homozygotes, or is homozygosity showing a stronger association than just having the marker?) 4. Time sequence (this is inherent in the germ-line nature of HLA genes) 5. Consistency (next slide lists reasons for inconsistency in HLA association studies) 6. Specificity of the association to the disease studied

BSHI 2002 Glasgow, Scotland Why Are the Inconsistencies? (I) 1. Mistakes in genotyping (lack of HWE in controls is usually an indication of problems with typing rather than selection, admixture, nonrandom mating or other reasons of departure from HWE) 1. Mistakes in genotyping (lack of HWE in controls is usually an indication of problems with typing rather than selection, admixture, nonrandom mating or other reasons of departure from HWE) 2. Poor control selection (would your controls be in the case group if they had the disease, and would the cases be in your control group if they were free of the disease?) 2. Poor control selection (would your controls be in the case group if they had the disease, and would the cases be in your control group if they were free of the disease?) 3. Design problems including the statistical power issue (negative results due to lack of statistical power should be distinguished from truly negative results observed despite having sufficient power) 3. Design problems including the statistical power issue (negative results due to lack of statistical power should be distinguished from truly negative results observed despite having sufficient power) 4. Publication bias (are there many more studies with negative results but we have never heard about them?) 4. Publication bias (are there many more studies with negative results but we have never heard about them?) 5. Disease misclassification or misclassification bias 5. Disease misclassification or misclassification bias

BSHI 2002 Glasgow, Scotland Why Are the Inconsistencies? (II) 6. Excessive type I errors (are the positive results due to using P < 0.05 as the statistical significance?) 6. Excessive type I errors (are the positive results due to using P < 0.05 as the statistical significance?) 7. Posthoc and subgroup analysis (are positive results due to fishing (data dredging)?) 7. Posthoc and subgroup analysis (are positive results due to fishing (data dredging)?) 8. Unjustified multiple comparisons and subsequent type II error 8. Unjustified multiple comparisons and subsequent type II error 9. Failure to consider the mode of inheritance in a genetic disease 9. Failure to consider the mode of inheritance in a genetic disease 10. Failure to account for the LD structure of the gene (only haplotype-tagging markers will show the association, other markers within the same gene may fail to show an association and generate background noise) 10. Failure to account for the LD structure of the gene (only haplotype-tagging markers will show the association, other markers within the same gene may fail to show an association and generate background noise) 11. Likelihood that the gene studied account for a small proportion of the variability in risk 11. Likelihood that the gene studied account for a small proportion of the variability in risk

BSHI 2002 Glasgow, Scotland Further Information Select Biostatistics' or Epidemiology at or write to me at dorakmt :at: lycos.com [please do not add to your address book as it will change periodically]

BSHI 2002 Glasgow, Scotland I am grateful to the BSHI Organizing Committee for giving me the opportunity to run this workshop at BSHI 2002 in Glasgow. I particularly thank Nancy Henderson and Ian Galbraith also for their hospitality. BSHI AGM 5:15 pm All members should attend