Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome.

Similar presentations


Presentation on theme: "Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome."— Presentation transcript:

1 Genome-wide association studies Usman Roshan

2 SNP Single nucleotide polymorphism Specific position and specific chromosome

3 SNP genotype Suppose this is the DNA on chromosome 1 starting from position 1. There is a SNP C/G on position 5, C/T on position 14, and G/T on position 21. This person is heterozygous in the first SNP and homozygous in the other two. F:AACACAATTAGTACAATTATGAC M:AACAGAATTAGTACAATTATGAC

4 SNP genotype representation The example F: AACACAATTAGTACAATTATGAC M:AACAGAATTAGTACAATTATGAC is represented as CGCCGG …

5 SNP genotype For several individuals A/TC/TG/T… H0:AATTGG… H1:ATCCGT… H2:AACTGT….

6 SNP genotype encoding If SNP is A/B (alphabetically ordered) then count number of times we see B. Previous example becomes A/TC/TG/T…A/TC/TG/T… H0:AATTGG…020… H1:ATCCGT… =>101… H2:AACTGT…011… Now we have data in numerical format

7 Genome wide association studies (GWAS) Aim to identify which regions (or SNPs) in the genome are associated with disease or certain phenotype. Design: –Identify population structure –Select case subjects (those with disease) –Select control subjects (healthy) –Genotype a million SNPs for each subject –Determine which SNP is associated.

8 Example GWAS A/TC/GA/G … Case 1AACCAA Case 2ATCGAA Case 3AACGAA Control 1TTGGGG Control 2TTCCGG Control 3TACGGG

9 Encoded data A/TC/GA/GA/TC/GA/G Case1AACCAA000 Case2ATCGAA110 Case3AACGAA =>010 Con1TTGGGG222 Con2TTCCGG202 Con3TACGGG112

10 Ranking SNPs SNP1SNP2SNP3SNP1SNP2 SNP3 A/TC/GA/GA/TC/GA/G Case1AACCAA000 Case2ATCGAA110 Case3AACGAA =>010 Con1TTGGGG222 Con2TTCCGG202 Con3TACGGG112 A good ranking strategy would produce SNP3, SNP1, SNP2

11 Chi-square test Gold standard is the univariate non- parametric chi-square test with two degrees of freedom. Search for SNPs that deviate from the independence assumption. Rank SNPs by p-values

12 Case-control example Study of 100 people: –Case: 50 subjects with cancer –Control: 50 subjects without cancer Count number of alleles and form a 2x2 contingency table Relative risk: RR = Pr(disease | one copy of risk allele)/ Pr(disease | zero copies of risk allele) (Jewell ‘03) Due to sampling we cannot estimate the relative risk from a case-control study But we can estimate the odds-ratio 982 Control 9010Case #Allele2#Allele1

13 Odds ratio Odds of an event A is defined as Odds(A)= Pr(A)/Pr(~A) Odds ratio is the ratio of two odds. For example the ratio of odds of A and B is OR = Odds(A)/Odds(B) = Pr(A)/Pr(~A) / Pr(B)/Pr(~B) Odds ratio of disease and exposed and unexposed groups would be OR = Odds(D|G=1)/Odds(D|G=0) = = Pr(D|G=1)/Pr(~D|G=1) / Pr(D|G=0)/Pr(~D|G=0) = Pr(D|G=1)/Pr(D|G=0) x Pr(~D|G=0)/Pr(~D|G=1) = RR x Pr(~D|G=0)/Pr(~D|G=1)

14 Symmetry in odds ratio The odds ratio is symmetric in disease and genotype: OR = Odds(D|G=1)/Odds(D|G=0) = = Odds(G|D=1)/Odds(G|D=0) Great! Because we can estimate P(G|D) from a case control study. We can now use the OR as an estimate of one’s risk of disease.

15 Example Odds of risk allele in case = (10/100)/(90/100)=1/9 Odds of risk allele in control = (2/100)/(98/100)=1/49 Odds ratio of risk allele = 49/9 982 Control 9010Case #Allele2 (wildtype) #Allele1 (risk)

16 Statistical test of association (P-values) P-value = probability of the observed data (or worse) under the null hypothesis Example: –Suppose we are given a series of coin-tosses –We feel that a biased coin produced the tosses –We can ask the following question: what is the probability that a fair coin produced the tosses? –If this probability is very small then we can say there is a small chance that a fair coin produced the observed tosses. –In this example the null hypothesis is the fair coin and the alternative hypothesis is the biased coin

17 Binomial distribution Bernoulli random variable: –Two outcomes: success of failure –Example: coin toss Binomial random variable: –Number of successes in a series of independent Bernoulli trials Example: –Probability of heads=0.5 –Given four coin tosses what is the probability of three heads? –Possible outcomes: HHHT, HHTH HTHH, HHHT –Each outcome has probability = 0.5^4 –Total probability = 4 * 0.5^4

18 Binomial distribution Bernoulli trial probability of success=p, probability of failure = 1-p Given n independent Bernoulli trials what is the probability of k successes? Binomial applet: http://www.stat.tamu.edu/~west/applets/binomialdemo.html

19 Hypothesis testing under Binomial hypothesis Null hypothesis: fair coin (probability of heads = probability of tails = 0.5) Data: HHHHTHTHHHHHHHTHTHTH P-value under null hypothesis = probability that #heads >= 15 This probability is 0.021 Since it is below 0.05 we can reject the null hypothesis

20 Chi-square statistic Define four random variables Xi each of which is binomially distributed Xi ~ B(n, pi) where n=c1+c2+c3+c4 is the total number of subjects and pi is the probability of success of Xi. Each variable Xi represents the number of case and control subjects with number of risk and wildtype alleles. The expected value E(Xi) = npi since each Xi is binomial. c4 (X4)c3 (X3) Control c2 (X2)c1 (X1)Case #Allele2 (wildtype) #Allele1 (risk)

21 Chi-square statistic Define the statistic: where c i = observed frequency for i th outcome e i = expected frequency for i th outcome n = total outcomes The probability distribution of this statistic is given by the chi-square distribution with n-1 degrees of freedom. Proof can be found at http://ocw.mit.edu/NR/rdonlyres/Mathematics/18-443Fall2003/4226DF27-A1D0-4BB8-939A-B2A4167B5480/0/lec23.pdf Great. But how do we use this to get a SNP p-value?

22 Null hypothesis for case control contingency table We have two random variables: –D: disease status –G: allele type. Null hypothesis: the two variables are independent of each other (unrelated) Under independence –P(D,G)= P(D)P(G) –P(D=case) = (c1+c2)/n –P(G=risk) = (c1+c3)/n Expected values –E(X1) = P(D=case)P(G=risk)n We can calculate the chi-square statistic for a given SNP and the probability that it is independent of disease status (using the p- value). SNPs with very small probabilities deviate significantly from the independence assumption and therefore considered important. c4c3Control c2c1Case #Allele2 (wildtype) #Allele1 (risk)

23 Chi-square statistic exercise 482Control 3515Case #Allele2#Allele1 Compute expected values and chi-square statistic Compute chi-square p-value by referring to chi-square distribution


Download ppt "Genome-wide association studies Usman Roshan. SNP Single nucleotide polymorphism Specific position and specific chromosome."

Similar presentations


Ads by Google