Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5.

Similar presentations


Presentation on theme: "Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5."— Presentation transcript:

1 Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5 1 Computer Science and Applied Mathematics, Weizmann Institute of Science 2 Molecular Genetics, Weizmann Institute of Science 3 Génétique Médicale, Universitätsspital Lausanne 4 School of Computer Science, Tel- Aviv University 5 Medical and Population Genetics Group, Broad Institute

2 Overview Introduction Introduction Xor PPH Xor PPH Theoretical outlines and results Theoretical outlines and results Experimental results Experimental results Informative SNPs Informative SNPs Theoretical results Theoretical results Summary and Future research Summary and Future research

3 Chromosomes

4 AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTAATGCTAGCACGCGCGCCAGGATTAGCTGCCACA AATATATCGCTTTCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCCAGGATTAGCTGCCACA AATATATCGCTATCCGTATACCTAATTTGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA AATATATCGCTATCCGTATACCTAATTGGGGGTGTGTGTACGTACTGCTAGCACGCGCGCTAGGATTAGCTGCCACA A T T A A A G T T T G G A A C C C C C C C T T T SNP – Single nucleotide polymorphism

5 A T T A A A G T T T G G A A C C C C C C C T T T

6 Haplotypes, Genotypes and XOR-Genotypes Genotype: A/T T/G A C Haplotypes:AGAC TTAC XOR-Genotype: Het Het Hom Hom 1234 A T T A A A G T T T G G A A C C C C C C C T T T 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0

7 Haplotypes, Genotypes and XOR-Genotypes 1234 A T T A A A G T T T G G A A C C C C C C C T T T 1 0 0 1 1 1 1 0 0 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 Genotype: 2 2 0 1 Haplotypes:1101 0001 {1, 2} XOR-Genotype: {1, 2}

8 Perfect Phylogeny 1000010000 1001010010 1010010100 1100011000 1001110011 0001000010 SNPs only 1 0 0 1 0 1 0 0 0 0 0 0 0 1 01 0 0 1 1 1 1 0 0 01 0 1 0 0 4: 1 → 0 1: 1 → 0 5: 0 → 1 2: 0 → 13: 0 → 1 1 0 1 0 0 1 1 0 0 0 23

9 Previous work Haplotyping: haplotypes from genotypes: Input: Genotypes G={G 1,…,G n } on SNPs S={s 1,…,s m } Output: Find the haplotypes H={H 1,…,H 2 n } that gave rise to G General heuristics: General heuristics: Clark ’90 Clark ’90 Excoffier+Slatkin ‘95 Excoffier+Slatkin ‘95 PPH: Perfect phylogeny haplotyping ( n genotypes, m SNPs): PPH: Perfect phylogeny haplotyping ( n genotypes, m SNPs): Gusfield 2002O( nm  ( n, m )) Gusfield 2002O( nm  ( n, m )) Bafna et. al 2002O( nm 2 ) Bafna et. al 2002O( nm 2 ) Eskin et. al 2003O( nm 2 ) Eskin et. al 2003O( nm 2 ) Graph Realization

10 Previous work Tutte 1959 O(n 2 m), Gavril and Tamari 1983 O(nm 2 ), Bixby and Wagner 1988 O(nm  (n,m)) Bixby and Wagner 1988 O(nm  (n,m)) The graph realization problem: The graph realization problem: Input: A hypergraph H=({1,…,m}, P) Input: A hypergraph H=({1,…,m}, P)  P={P 1,P 2,…,P n }, P i  {1,…,m} Goal: A tree T=(V,E) with E=N s.t  P i labels a path in T Goal: A tree T=(V,E) with E=N s.t  P i labels a path in T Input: { {1,2}, {2,3} } Output: 1 23 1 2 3

11 Overview Introduction Introduction Xor PPH Xor PPH Theoretical outlines and results Theoretical outlines and results Experimental results Experimental results Informative SNPs Informative SNPs Theoretical results Theoretical results Summary and Future research Summary and Future research

12 Xor-haplotyping: haplotypes from xor-genotypes: Input: 1. Xor-genotype data(can be obtained by DHPLC) 2. Three genotypes 2. Three genotypes Goal: Resolve the haplotypes and their perfect phylogeny XPPH - Xor perfect phylogeny haplotyping haplotypes Xor-genotypes genotypes {1, 2} 0/1 0/1 0 1 {2, 4} 0 0/1 {2, 3, 4} 0 0/1 0/1 0/1 {1, 2, 4} 0/1 0/1 0 0/1 {1} 0/1 1 0 0 1101010111010101 1101000111010001 0101000001010000 0111000001110000 1101000011010000 ? ? ? ? ?

13 Xor-haplotyping: haplotypes from xor-genotypes: Input: 1. Xor-genotype data(can be obtained by DHPLC) 2. Three genotypes 2. Three genotypes Goal: Resolve the haplotypes and their perfect phylogeny XPPH - Xor perfect phylogeny haplotyping haplotypes Xor-genotypes genotypes {1, 2} 0/1 0/1 0 1 {2, 4} 0 0/1 {2, 3, 4} 0 0/1 0/1 0/1 {1, 2, 4} 0/1 0 0/1 0/1 {1} 0/1 1 0 0 ? ? ? ? ?

14 Strategy: 1. Input: Xor-genotype data Goal: Find the perfect phylogeny 2. Additional Input: 3 genotypes 2. Additional Input: 3 genotypes Goal: Find haplotypes Step 1: Xor-genotype = {Het SNPs} = A path in the perfect phylogeny  Build a tree from its paths  Graph realization Input reduction: Merge SNPs that are equivalent in the xor-data Proof: Unique graph realization solution  A perfect phylogeny XPPH - Xor perfect phylogeny haplotyping

15 GREAL  Find graph realization or determine that none exists  Count num of graph realization solutions for data Stable and fast Stable and fast Available at http://www.cs.tau.ac.il/~rshamir/greal/ Available at http://www.cs.tau.ac.il/~rshamir/greal/http://www.cs.tau.ac.il/~rshamir/greal/ Simulations  Simulate data of n individuals using Hudson 2002  Remove all SNPs with <5% minor allele frequency  Apply GREAL: Is there a single solution?  Repeat 5000 times for each n We implemented Gavril & Tamari’s algorithm (83) for graph realization: O(m 2 n)

16 Results The percentage of single solutions vs sample size

17 R.H. Chung and D. Gusfield 2003 Results

18 Perfect phylogeny Perfect phylogeny ? Haplotypes Step 2 1 2 3 0 0 0 1 1 0 1 0 1 1 2 3 1 0 0 0 1 0 0 0 1 {1, 2} {1, 3} {2, 3} Xor-genotypes ? XPPH Resolution up to bit flipping : gives the haplotypes structure

19 1 2 3 {1, 2} {1, 3} {2, 3} Xor-genotypes 1 2 2 Genotype 1 x x 0 x x SNP #1 homozygous  Can infer SNP #1 for all haplotypes SNP #1 homozygous  Can infer SNP #1 for all haplotypes  Need individuals with  xor-genotypes (=  {het SNPs}) =  XPPH Perfect phylogeny Perfect phylogeny ? Haplotypes Step 2

20 Theorem:  xor-genotypes=   there are three xor-genotypes with empty intersection Proof: ! xor-genotypes are tree paths (ow: NP-hard) (1) The intersection of two tree paths is an interval

21 (Proof) (2) Pick X 1 arbitrarily, take X 1  X 2, X 1  X 3, … X 1  X n X1X1X1X1

22 X1X1X1X1

23 (3) X L ends first, X R begins last XLXLXLXL XRXRXRXR X1X1X1X1 X1X1X1X1

24 (Proof) (2) Pick X 1 arbitrarily, take X 1  X 2, X 1  X 3, … X 1  X n (3) X L ends first, X R begins last XLXLXLXL XRXRXRXR X1X1X1X1 XLXLXLXL XRXRXRXR X1X1X1X1

25 (Proof) (2) Pick X 1 arbitrarily, take X 1  X 2, X 1  X 3, … X 1  X n  X 1  X L  X R =  XLXLXLXL XRXRXRXR X1X1X1X1 XLXLXLXL XRXRXRXR X1X1X1X1 XLXLXLXL XRXRXRXR X1X1X1X1

26 Find 3 individuals to genotype in O( nm ) Find 3 individuals to genotype in O( nm ) Resolve the haplotypes Resolve the haplotypes XLXLXLXL XRXRXRXR X1X1X1X1 XLXLXLXL XRXRXRXR X1X1X1X1 XLXLXLXL XRXRXRXR X1X1X1X1

27 Overview Introduction Introduction Xor PPH Xor PPH Theoretical outlines and results Theoretical outlines and results Experimental results Experimental results Informative SNPs Informative SNPs Theoretical results Theoretical results Summary and Future research Summary and Future research

28 Input: 1. Haplotypes H={H 1,…,H n } on SNPs S={s 1,…,s m } 2. A set of interesting SNPs S"  S Output: Minimal set S  S\S" that distinguishes the same haplotypes as S" Informative SNPs (Bafna et al. 2003): Informative SNPs 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 Haplotypes 4 3 2 1 SNPs 1 2 3 4 5 Not perfect phylogeny: NP-hard ( MINIMUM TEST SET ) Perfect phylogeny, 1 interesting SNP: O( nm ), Bafna et al. 2003

29 Informative SNPs: Input: 1. Haplotypes H={H 1,…,H n } on SNPs S={s 1,…,s m } 2. A set of interesting SNPs S"  S 3. A perfect phylogeny for H. 4. A cost function C:S  R +. Output: S  S\S" with minimal cost that distinguishes the same haplotypes as S" Informative SNPs Generalization of prev def 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 0 1 0 1 0 Haplotypes 4 3 2 1 SNPs 1 2 3 4 5

30  We find informative SNPs set Of minimal cost Of minimal cost For any number of interesting SNPs For any number of interesting SNPs In O( m ) In O( m )  By a dynamic programming algorithm that climbs up the perfect phylogeny tree  We prove that the definition of informative SNPs generalizes to a more practical definition  Under the perfect phylogeny model, informative SNPs on genotypes and haplotypes are equivalent

31 Summary Xor-haplotyping: Xor-haplotyping: Definition Definition Resolve haplotypes given xor-data and 3 genotypes in O( nm  ( m, n )) Resolve haplotypes given xor-data and 3 genotypes in O( nm  ( m, n )) Implementation Implementation Experimental results Experimental results Selection of tag SNPs: Selection of tag SNPs: Generalize to Generalize to arbitrary cost arbitrary cost many interesting SNPs many interesting SNPs Find optimal informative SNPs set in O( m ) time Find optimal informative SNPs set in O( m ) time Combinatorial observation allows practical uses Combinatorial observation allows practical uses

32 Future research Relax the strong assumption of perfect phylogeny Relax the strong assumption of perfect phylogeny Deal with data errors and missing data Deal with data errors and missing data Obtain empirical results for the theoretical work on informative SNPs Obtain empirical results for the theoretical work on informative SNPs Preliminary results show that blocks of up to 600 SNPs are distinguishable by ~20 informative SNPs Preliminary results show that blocks of up to 600 SNPs are distinguishable by ~20 informative SNPs

33

34 Theorem: All genotypes are distinct within a block Proof: Assume to the contrary equivalency of two: 1100 10101010 10101010 01010101 01010101 1100 1100 1100 22 10 10 22 10 10 22 01 10 Haplotype Pair 1 Haplotype Pair 2 Genotype 1 Genotype 2


Download ppt "Computational Problems in Perfect Phylogeny Haplotyping: Xor-Genotypes and Tag SNPs Tamar Barzuza 1 Jacques S. Beckmann 2,3 Ron Shamir 4 Itsik Pe’er 5."

Similar presentations


Ads by Google