Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association mapping for mendelian, and complex disorders January 16Bafna, BfB.

Similar presentations


Presentation on theme: "Association mapping for mendelian, and complex disorders January 16Bafna, BfB."— Presentation transcript:

1 Association mapping for mendelian, and complex disorders January 16Bafna, BfB

2 UG Bioinformatics specialization at UCSD January 16Bafna, BfB

3 Abstraction of a causal mutation January 16Bafna, BfB

4 Looking for the mutation in populations January 16Bafna, BfB A possible strategy is to collect cases (affected) and control individuals, and look for a mutation that consistently separates the two classes. Next, identify the gene.

5 Looking for the causal mutation in populations January 16Bafna, BfB Case Control Problem 1: many unrelated common mutations, around one every 1000bp

6 Case Control January 16Bafna, BfB

7 Looking for the causal mutation in populations January 16Bafna, BfB Case Control Problem 2: We may not sample the causal mutation.

8 How to hunt for disease genes We are guided by two simple facts governing these mutations 1. Nearby mutations are correlated 2. Distal mutations are not January 16Bafna, BfB Case Control

9 This lecture 1. The bottom line: How do these facts help in finding disease genes? 2. The genetics: why should this happen? 3. The computation 4. Challenge of complex diseases. January 16Bafna, BfB Case Control 1. Nearby mutations are correlated 2. Distal mutations are not

10 The basics of association mapping Sample a population of individuals at variant locations across the genome. Typically, these variants are single nucleotide polymorphisms (SNPs). Create a new bi-allelic variant corresponding to cases and controls, and test for correlations. By our assumptions, only the proximal variants will be correlated. Investigate genes near the correlated variants. January 16Bafna, BfB Case Control 0 0 0 0 1 1 1 1

11 So, why should the proximal SNPs be correlated, and distal SNPs not? January 16Bafna, BfB

12 A bit of evolution Consider a fixed population (of chromosomes) evolving in time. Each individual arises from a unique, randomly chosen parent from the previous generation January 16Bafna, BfB Time

13 Genealogy of a chromosomal population Current (extant) population Time January 16Bafna, BfB

14 Adding mutations January 16Bafna, BfB Infinite sites assumption: A mutation occurs at most once at a site.

15 SNPs January 16Bafna, BfB The collection of acquired mutations in the extant population describe the SNPs

16 Fixation and elimination Not all mutations survive. Some mutations get fixed, and are no longer polymorphic January 16Bafna, BfB

17 Removing extinct genealogies January 16Bafna, BfB

18 Removing fixed mutations January 16Bafna, BfB

19 The coalescent January 16Bafna, BfB

20 Disease mutation January 16Bafna, BfB We drop the ancestral chromosomes, and place the mutations on the internal branches.

21 Disease mutation A causal mutation creates a clade of affected descendants. January 16Bafna, BfB

22 Disease mutation Note that the tree (genealogy) is hidden. However, the underlying tree topology introduces a correlation between each pair of SNPs January 16Bafna, BfB

23 What have we learnt? The underlying genealogy creates a correlation between SNPs. By itself, this is not sufficient, because distal SNPs might also be correlated. Fortunately, for us the correlation between distal SNPs is quickly destroyed. January 16Bafna, BfB

24 Recombination January 16Bafna, BfB

25 Recombination In our idealized model, we assume that each individual chromosome chooses two parental chromosomes from the previous generation January 16Bafna, BfB

26 Multiple recombination change the local genealogy January 16Bafna, BfB

27 A bit of evolution Proximal SNPs are correlated, distal SNPs are not. The correlation (Linkage disequilibirium) decays rapidly after 20-50kb January 16Bafna, BfB

28 BASIC STATISTICS January 16Bafna, BfB

29 Testing for correlation In the absence of correlation January 16Bafna, BfB

30 Testing for correlation When correlated January 16Bafna, BfB

31 Assigning confidence 22 22 January 16Bafna, BfB X X 40 04 X X Expected Observed

32 Assigning confidence 22 22 January 16Bafna, BfB X X 40 04 X X Expected Observed

33 Assigning confidence 2.5 1.5 January 16Bafna, BfB X 32 12 X Expected Observed X X

34 STATISTICAL TESTS OF ASSOCIATION January 16Bafna, BfB

35 Tests for association: Pearson Case-control phenotype: –Build a 3X2 contingency table –Pearson test (2df)= CasesControls mm Mm MM O1O1 O2O2 O3O3 O4O4 O6O6 O5O5 January 16Bafna, BfB

36 The χ 2 test CasesControls mm Mm MM O1O1 O5O5 O3O3 O4O4 O2O2 O6O6 The statistic behaves like a χ 2 distribution. A p-value can be computed directly January 16Bafna, BfB

37 Χ 2 distribution properties A related distribution is the F-distribution January 16Bafna, BfB

38 Likelihood ratio Another way to check the extremeness of the distribution is by computing a (log) likelihood ratio. We have two competing hypothesis. Let N be the total number of observations January 16Bafna, BfB

39 LLR An LLR value close to 0, implies that the null hypothesis is true. Asymptotically, the LLR statistic also follows the chi-square distribution. January 16Bafna, BfB

40 Exact test The chi-square test does not work so well when the numbers are small. How can we compute an exact probability of seeing a specific distribution of values in the cells? Remember: we know the marginals (# cases, # controls, January 16Bafna, BfB

41 Fischer exact test CasesControls mm Mm MM a e cd b f Num: #ways of getting configuration (a,b,c,d,e,f) Den: #ways of ensuring that the row sums and column sums are fixed January 16Bafna, BfB

42 Fischer exact test Remember that the probability of seeing any specific values in the cells is going to be small. To get a p-value, we must sum over all similarly extreme values. How? January 16Bafna, BfB

43 Test for association: Fisher exact test Here P is the probability of seeing the exact count. The actual significance is computed by summing over all such tables that are at least this extreme. CasesControls mm Mm MM a e cd b f January 16Bafna, BfB

44 Test for association: Fisher exact test CasesControls mm Mm MM a e cd b f January 16Bafna, BfB

45 Continuous outcomes Instead of discrete (Case/control) data, we have real-valued phenotypes –Ex: Diastolic Blood Pressure In this case, how do we test for association January 16Bafna, BfB

46 Continuous outcome ANOVA Often, the phenotypes are not offered as case- controls but like a continuous variable –Ex: blood-pressure measurements Question: Are the mean values of the two groups significantly different? MMmm January 16Bafna, BfB

47 Two-sided t-test For two categories, ANOVA is also known as the t-test Assume that the variables from the two sets are drawn from Normal distributions –Different means, equal variances Null hypothesis is that they are both from the same distribution January 16Bafna, BfB

48 t-test continued January 16Bafna, BfB

49 Two-sample t-test As the variance is not known, we use an estimate S, defined by The T-statistic is given by Significant deviations from 0 are used to reject the Null hypothesis January 16Bafna, BfB

50 Two-sample t-test (unequal variances) If the variances cannot be assumed to be equal, we use The T-statistic is given by Significant deviations from 0 are used to reject the Null hypothesis January 16Bafna, BfB

51 CONFOUNDING ASSOCIATION January 16Bafna, BfB

52 Confounding association Association tests can be confounded in many ways. We will explore a few of these, at a high level, and point to a few algorithmic problems. January 16Bafna, BfB

53 Confounding association with population substructure January 16Bafna, BfB If the cases and controls are from different subpopulations, then sites with differing allele frequencies will confound association

54 The algorithmic problem Given a collection of individual genotypes, separate them into sub-populations. Idea: take markers that are very far apart so that no LD is possible. LD indicates structure. Problem: Partition individuals into sub-populations so that all correlation across pairs of distant markers is minimized. Penalty for increasing sub-populations? January 16Bafna, BfB

55 Confounding associations with genotypes January 16Bafna, BfB A recombination event Distinct haplotypes can create identical genotypes confounding association

56 Confounding association with interactions Individually, the markers do not correlate. Together, they perfectly predict genes. Find interacting partners that associate with genes January 16Bafna, BfB

57 Confounding association with rare variants Not only can we have multiple interacting SNPs, each SNP individually occurs with very low frequency (< 1%). Can you detect associations with rare variants? January 16Bafna, BfB

58 Other problems January 16Bafna, BfB Can we reconstruct the phylogeny? Useful for computing recombination bounds.

59 Conclusion As individual genomes are sequenced, the association of variations with phenotypes presents many confounding challenges. Some of these challenges can be modeled as algorithmic problems. Population genetics should be part of a bioinformatics undergraduate curriculum. January 16Bafna, BfB

60 Thank you Homework (due Monday, March 15) –Describe an algorithm to detect associations of interacting, rare-variants with a complex disease phenotype, in the presence of population substructure in the case-control population. January 16Bafna, BfB


Download ppt "Association mapping for mendelian, and complex disorders January 16Bafna, BfB."

Similar presentations


Ads by Google