Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association Mapping by Local Genealogies Bioinformatics Research Center University of Aarhus Thomas Mailund.

Similar presentations


Presentation on theme: "Association Mapping by Local Genealogies Bioinformatics Research Center University of Aarhus Thomas Mailund."— Presentation transcript:

1 Association Mapping by Local Genealogies Bioinformatics Research Center University of Aarhus http://www.birc.au.dk/~mailund mailund@birc.au.dk Thomas Mailund

2 Disease mapping... --A--------C--------A----G---X----T---C---A---- --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- Locate disease locus  Unlikely to be among our genotyped markers  Use information from available markers Cases (affected) Controls (unaffected)

3 Indirect signal for causal locus --T--------G--------A----G---X----C---C---A---- --A--------G--------G----G---X----C---C---A---- --A--------C--------A----G---X----T---C---A---- --T--------C--------A----G---X----T---C---A---- --T--------C--------A----T---X----T---A---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------A----G---X----T---C---G---- --T--------C--------A----T---X----T---C---A---- --A--------C--------A----G---X----T---C---A---- --A--------C--------G----T---X----C---A---A---- --A--------C--------A----G---X----C---C---G---- The markers are not independent  Knowing one marker is partial knowledge of others  This dependency decreases with distance --A--------C--------A----G---X----T---C---A----

4 The Ancestral Recombination Graph Locally, the genealogy of a small genomic region is the Ancestral Recombination Graph (ARG) (Hudson 1990, Griffith&Marjoram 1996)

5 The Ancestral Recombination Graph Sampled sequences MRCA (Hudson 1990, Griffith&Marjoram 1996)

6 The Ancestral Recombination Graph Recombination Coalescence (Hudson 1990, Griffith&Marjoram 1996)

7 The Ancestral Recombination Graph Non-ancestral material Non- ancestral material Ancestral material (Hudson 1990, Griffith&Marjoram 1996)

8 The Ancestral Recombination Graph Mutations 1 23 4 (Hudson 1990, Griffith&Marjoram 1996)

9 (Larribe, Lessard and Schork, 2002) The unknown ARG, mutation locus and disease status can be explored using statistical sampling methods This is very CPU demanding! The Ancestral Recombination Graph

10 Local trees For each “point” on the chromosome, the ARG determines a (local) tree:

11 Local trees For each “point” on the chromosome, the ARG determines a (local) tree:

12 Local trees For each “point” on the chromosome, the ARG determines a (local) tree:

13 Local trees For each “point” on the chromosome, the ARG determines a (local) tree:

14 Local trees Type 1: No change Type 2: Change in branch lengths Type 3: Change in topology From Hein et al. 2005

15 Local trees Recombination rate From Hein et al. 2005 Tree measure: where

16 Using the local trees Tree genealogies  Each site a different genealogy  Nearby genealogies only slightly different --T--------G--------A----G---X----C----C-----A-- --A--------G--------G----G---X----C----C-----A-- --A--------C--------A----G---X----T----C-----A-- --T--------C--------A----G---X----T----C-----A-- --T--------C--------A----T---X----T----A-----A-- --A--------C--------A----G---X----T----C-----A-- AAATTTCCGGCC AAAGAAGGGGGTTTCCTTCCCCCAAAAAAA A nearby tree is an imperfect local tree

17 Tree at disease site:  “Perfect” setup  Incomplete penetrance  Other disease causes HHHHHHHH DDDDD HHHHHHHH DDDHD HDHHHDHH DDDHD Templeton et al 1987 Using the local trees

18 At the disease site:  A significant clustering of diseased/healthy HDHHHDHH DDDHD Using the local trees Templeton et al 1987

19 --T--------G--------A----G---X----C----C-----A-- --A--------G--------G----G---X----C----C-----A-- --A--------C--------A----G---X----T----C-----A-- --T--------C--------A----G---X----T----C-----A-- --T--------C--------A----T---X----T----A-----A-- --A--------C--------A----G---X----T----C-----A-- AAATTT CCGGCCAAAGAAGGGGGTTTCCTTCCCCCAAAAAAA Tree at disease site resembles neighbours Using the local trees

20 Near the disease site:  A significant clustering of diseased/healthy HDHHHDHH DDDHD Using the local trees Zöllner&Pritchard 2005; Mailund et al 2006 ; Sevon et al 2006

21 Approach:  Infer trees over regions  Score the regions wrt their clustering HDHHHDHH DDDHD Zöllner&Pritchard 2005; Mailund et al 2006 ; Sevon et al 2006 Using the local trees

22 In the infinite sites model:  Each mutation occurs only once  Each mutation splits the sample in two  A consistent tree can efficiently be inferred for a recombination free region Mailund et al 2006 Using the local trees

23 Use the four-gamete test to find regions, around each locus, that can be explained by a tree Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

24 Build a tree for each such region Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

25 Build a tree for each such region Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

26 Build a tree for each such region Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

27 Build a tree for each such region Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

28 Score the tree, and assign the score to the locus Mailund et al 2006 BLOck aSSOCiation (BLOSSOC)

29 Ding et al 2007 The tree construction is more complicated – but still possible and still efficient – for un-phased sequence data

30 Scoring trees Red=cases Green=controls Are the case chromosomes significantly overrepresented in some sub-trees? Mailund et al 2006

31 Scoring trees Mutation We can place “mutations” on the tree edges and partition chromosomes into “mutants” and “wild-types”... Mailund et al 2006; Ding et al 2007 Mutants Wild-types

32 Scoring trees...and assign different risks based on the implied genotypes Mailund et al 2006; Ding et al 2007 Mutants Wild-types Likelihoods Haploid data: Null model: Diploid data:

33 Scoring trees Using an uninformative Beta prior, β (1,1), we can integrate the risk parameters out Mailund et al 2006; Ding et al 2007 Mutants Wild-types Marginal likelihoods Haploid data: Null model: Diploid data: Balding 2006 ; Waldron et al 2006

34 Scoring trees For the tree, we take the mean score over all edges. The score is the Bayes factor of the tree likelihood vs the null model likelihood. Mailund et al 2006; Ding et al 2007 Mutants Wild-types Null model: Tree model: Score:

35 Scoring trees This generalises to several mutations (more complicated implied genotypes; computationally slower) Through Bayes factors we can test for the number of mutations. Mailund et al 2006; Ding et al 2007

36 Fine mapping example... 500 cases / 500 controls 100 SNPs on 100 Kbp 2 mutations at same locus with same risk P(case|aa) = 5% ; GRR = 2

37 Fine mapping example...

38

39 Localization accuracy 1 causal mutation Max BF / min p-val used as point estimate

40 Localization accuracy 2 causal mutations Max BF / min p-val used as point estimate

41 Power analysis 1 causal mutation 10 SNPs on 100 Kbp 500 cases / 500 controls

42 Power analysis 2 causal mutations 10 SNPs on 100 Kbp 500 cases / 500 controls

43 Power analysis 1 causal mutation 10 SNPs on 100 Kbp 500 cases / 500 controls

44 Power analysis 2 causal mutations 10 SNPs on 100 Kbp 500 cases / 500 controls

45 Implementation freely available Homepage: www.birc.au.dk/~mailund/Blossoc Command line and graphical user interface...

46 Implementation freely available Homepage: www.birc.au.dk/~mailund/Blossoc Command line and graphical user interface... Current version only phased data

47 Implementation freely available Homepage: www.birc.au.dk/~mailund/Blossoc Command line and graphical user interface... Current version only phased data Unphased data in version 1.2.0 (expected in a few weeks)

48 Computational demands Fast enough for genome wide association studies:  300K SNPs / 500 cases / 500 controls < 12 hours  But depends somewhat on various parameters  Size of trees, number of mutations,... Uses I/O efficient binary file format  Low disk and RAM requirements  http://www.birc.au.dk/~mailund/SNPfile/

49 Computational demands Fast enough for genome wide association studies:  300K SNPs / 500 cases / 500 controls < 12 hours  But depends somewhat on various parameters  Size of trees, number of mutations,... Uses I/O efficient binary file format  Low disk and RAM requirements  http://www.birc.au.dk/~mailund/SNPfile/ This format from version 2.0 (expected spring 2007)

50 The end Thank you! More at http://www.birc.au.dk/~mailund/association-mapping/

51 References A cladistic analysis of phenotypic associations with haplotypes inferred from restriction endonuclease mapping – A.R. Templeton, E. Boerwinkle, and C.F. Sing; Genetics 117 343-351 1987 Gene genealogies and the coalescent process – R.R. Hudson; Oxford Surveys in Evolutionary Biology 7 1-44 1990 Ancestral inference from samples of DNA sequences with recombination – R.C. Griffith and P. Majoram; J Comput Biol 3:4 479-502 1996 Gene mapping via the ancestral recombination graph – F. Larribe, S. Lessard, and N.J. Schork; Theor Popul Biol 62:2 215-229 2002 Gene genealogies, variation, and evolution – J. Hein, M.H. Schierup, and C. Wiuf; Oxford University Press 2005 Coalescent-based association mapping and fine mapping of complex trait loci – S. Zöllner and J.K. Pritchard; Genetics 169:2 1071-1092 2005 Fine mapping of disease genes via haplotype clustering – E.R.B. Waldron, J.C. Whittaker, and D.J. Balding; Genet Epidemiol 30:2 170-179 2006 Whole genome association mapping by incompatibilities and local perfect phylogenies – T. Mailund, S. Besenbacher, and M.H. Schierup; BMC Bioinformatics 7:454 2006 TreeDT: Tree pattern mining for gene mapping – P. Sevon, H. Toivonen, V. Ollikainen; IEEE/ACM Transactions on Computational Biology and Bioinformatics 3 174-185 2006 A tutorial on statistical methods for population association studies – D.J. Balding; Nat Rev Genet 7:10 781-791 2006 Using unphased perfect phylogenies for efficient whole-genome association mapping – Z. Ding, T. Mailund and Y.S. Song; In preparation 2007


Download ppt "Association Mapping by Local Genealogies Bioinformatics Research Center University of Aarhus Thomas Mailund."

Similar presentations


Ads by Google