Genetic Statistics Lectures (5) Multiple testing correction and population structure correction.

Slides:



Advertisements
Similar presentations
Statistics Workshop Bayes Theorem J-Term 2009 Bert Kritzer.
Advertisements

Virtual COMSATS Inferential Statistics Lecture-3
Test practice Multiplication. Multiplication 9x2.
A.P. STATISTICS LESSON 6 – 2 (DAY2) PROBABILITY RULES.
AP Statistics: Section 8.1A Binomial Probability.
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.
Statistics Lecture 6. Last day: Probability rules Today: Conditional probability Suggested problems: Chapter 2: 45, 47, 59, 63, 65.
Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Statistics Lecture 20. Last Day…completed 5.1 Today Parts of Section 5.3 and 5.4.
2003/02/13 Chapter 4 1頁1頁 Chapter 4 : Multiple Random Variables 4.1 Vector Random Variables.
Statistics Lecture 11.
STAT 104: Section 4 27 Feb, 2008 TF: Daniel Moon.
Economics 310 Lecture 18 Simultaneous Equations There is a two-way, or simultaneous, relationship between Y and (some of) the X’s, which makes the distinction.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
511 Friday Feb Math/Stat 511 R. Sharpley Lecture #15: Computer Simulations of Probabilistic Models.
Mutually Exclusive: P(not A) = 1- P(A) Complement Rule: P(A and B) = 0 P(A or B) = P(A) + P(B) - P(A and B) General Addition Rule: Conditional Probability:
Probability Rules l Rule 1. The probability of any event (A) is a number between zero and one. 0 < P(A) < 1.
Chi-Squared (  2 ) Analysis AP Biology Unit 4 What is Chi-Squared? In genetics, you can predict genotypes based on probability (expected results) Chi-squared.
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Part IV Significantly Different: Using Inferential Statistics
BA 201 Lecture 6 Basic Probability Concepts. Topics Basic Probability Concepts Approaches to probability Sample spaces Events and special events Using.
Math 22 Introductory Statistics Chapter 8 - The Binomial Probability Distribution.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
2.2 The Addition and Multiplication Properties of Equality Math, Statistics & Physics 1.
Section 3.2 Notes Conditional Probability. Conditional probability is the probability of an event occurring, given that another event has already occurred.
Math 4030 Midterm Exam Review. General Info: Wed. Oct. 26, Lecture Hours & Rooms Duration: 80 min. Close-book 1 page formula sheet (both sides can be.
CHEMISTRY ANALYTICAL CHEMISTRY Fall
26134 Business Statistics Tutorial 7: Probability Key concepts in this tutorial are listed below 1. Construct contingency table.
1 Probability: Liklihood of occurrence; we know the population, and we predict the outcome or the sample. Statistics: We observe the sample and use the.
381 Goodness of Fit Tests QSCI 381 – Lecture 40 (Larson and Farber, Sect 10.1)
I can find probabilities of compound events.. Compound Events  Involves two or more things happening at once.  Uses the words “and” & “or”
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Lecture Slides Elementary Statistics Eleventh Edition and the Triola.
Chapter 6 - Probability Math 22 Introductory Statistics.
Probability Michael J. Watts
STATISTICS 6.0 Conditional Probabilities “Conditional Probabilities”
Hypothesis Testing Steps for the Rejection Region Method State H 1 and State H 0 State the Test Statistic and its sampling distribution (normal or t) Determine.
Probability – the bedrock of randomness Definitions Random experiment – observing the close of the NYSE and the Nasdaq Sample space = {NYSE+Nasdaq+, NYSE+Nasdaq-,
Lec. 19 – Hypothesis Testing: The Null and Types of Error.
The Chi-Square Distribution  Chi-square tests for ….. goodness of fit, and independence 1.
Multiplication Timed Tests.
AP Biology Probability & Genetics. AP Biology Genetics & Probability  Mendel’s laws:  segregation  independent assortment reflect.
Hypothesis Testing II: The Two-sample Case
Statistics 300: Introduction to Probability and Statistics
Lecture Slides Elementary Statistics Twelfth Edition
Active Learning Lecture Slides
Conditional Probability AGENDA
Probability Probability is the frequency of a particular outcome occurring across a number of trials
STAT 5372: Experimental Statistics
The Chi-Square Test The chi-square test is a statistical test commonly used to compare the observed results of a genetic cross with the expected results.
P-VALUE.
PSYB07 Review Questions: Set 4
Section 8.2 Geometric Distributions
COnDITIONAL Probability
Section 11.7 Probability.
Chapter 4 Probability 4.2 Basic Concepts of Probability
Section 8.2 Geometric Distributions
Multiple Choice Quiz.
Genetic Statistics.
Probability Rules Rule 1.
Probability Mutually exclusive and exhaustive events
Statistics 350 Lecture 12.
Homework Agenda Bellwork: Wednesday February 14, 2018 Learning Goal:
Business Statistics - QBM117
9J Conditional Probability, 9K Independent Events
Lecture Slides Essentials of Statistics 5th Edition
Multiplication Law for Several Events
Math 10, Spring 2019 Introductory Statistics
Presentation transcript:

Genetic Statistics Lectures (5) Multiple testing correction and population structure correction

Independence of tests When all tests are mutually independent, –probability to observe P<=0.01, is 0.01 –probability to observe P<=0.05, is 0.05 –probability to observe P<=0.5, is 0.5 –probability to observe P<=0.05 and probability to observe 0.05<P<=0.1 are the same and 0.05

When 100 independent tests are performed.... Observed p values were sorted. The i-th minimum p value is expected as i/(100+1). Observed p Expected p Q-Q plot of p value

One marker, one test Phenotype marker genotype strong association between phenotype and genotype cases controls

Multiple markers, multiple tests Phenotype is associated with the first marker Two markersphenotype

Do you believe the association between phenotype and the first marker? markersphenotype

Do you beilive the association still??? markersphenotype

Multiple testing correction Bonferroni’s correction –When k independent hypotheses are tested, pc=pn x k –pc: corrected p –pn: nominal p (p before correction) Family-wise error rate –When k independent hypotheses are tested, the probability to observe q as the minimal p value among k values is; 1-(1-q) k ~ q x k

FWER for two tests Hypothesis 1 Hypothesis 2 0. 05 0.05x0.05= - D= A DC B 1-B-C-D = 0.95 x 0.95 = = P<=0.05 for either H1 or H2 or both is B+C+D=

←Same→

Markers are mutually independen. The association is likely to be true. The association is present between phenotype and all the markers. Markers are dependent each other. When markers are in LD, this happens.

When multiple hypotheses are dependent, Bonferroni’s correction and Family-wise error rate correction are too conservative. Different methods are necessary.

FWER for two tests When tests are dependent, FWER can not be applied. Hypothesis 1 Hypothesis 2 0. 05 0.05x0.05= - D= A DC B 1-B-C-D = 0.95 x 0.95 = = P<=0.05 for either H1 or H2 or both is B+C+D=

Multiple testing correction for dependent tests. Fraction(P1<0.1 or P2<0.1) 137/ /1000 P2 P1 P2 78/1000

Examples of dependent tests Multiple tests (2x3 and dominant and recessive and trend) for one SNP are not mutually independent. Tests for markers in LD are not independent. A test for a SNP and a test for a haplotype containing the SNP are not mutually dependent. When multiple phenotypes that are mutually dependent are tested, they are dependent. 。。。。

When multiple hypotheses are dependent, Bonferroni’s correction and Family-wise error rate correction are too conservative. Different methods are necessary. –Permutation test Under the assumption of no association between phenotype and markers, you can exchange phenotype label of samples. Let’s exchange phenotype labels and tests all the markers for the shuffled phenotype information. Compare the original test result and the results from shuffled labels. If the original test result is considered rare among the results from shuffled labels, then you can believe the original test result is rare under the assumption of no association.

Ways to perform permutation tests. Permutations for “123”: –“123”,”132”,”213”,”231”,”312”,”321” When sample size is small, you can try all permutations of phenotype label shuffling. When sample size is not small enough, you should try samples of permutations at random. (Monte carlo permutation)

Example Cumulative probability of minimal p value from Monte-Carlo permutation attempts. Log

Population structure Population from where you sample can not be homogeneous and randmly maiting. They are consisted of multiple small sub-populations which might be in HWE. In this case, the population is “structured”. When sampling population is structured, case-control association tests tend to give small p values-> false positives increase.

Cases and controls are sampled with biase. Cases and controls are evenly sampled...Luck! Smapling from structured population

P 値昇順プロット P値P値 P-value Markers Biased samples give many mall p values.

←Same→

Markers and phenotype are associated. Markers are dependent each other. Genotypes of each individual are associated each other. → LD Markers are dependent each other. Genotypes of each individual are not associated. →Population structure.

RandomLDStructure Same

Genomic control method When structured, Variance inflates.

When structured, i-th minimum p value is smaller than i/(N+1).

Genomic control method lambda = Median(chi-square values of observation)/chi-square value that gives p of 0.5 corrected chi-square = observed chi- square/lambda

GC-method corrects the plot to fit y=x.

Genomic control method All the p values become bigger with GC- correction.... Conservative.

Eigenstrat Principal component-based method. Identify vectors to describe population structure. Assess each SNP with the vectors and recalculate p value for case-control association.

Eigenstrat makes some nominal p values bigger and some nominal p values smaller.

Examples of dependent tests Multiple tests (2x3 and dominant and recessive and trend) for one SNP are not mutually independent. Tests for markers in LD are not independent. A test for a SNP and a test for a haplotype containing the SNP are not mutually dependent. Markers far-away each other can be dependent when sample population are structured. When multiple phenotypes that are mutually dependent are tested, they are dependent. 。。。。