Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.

Similar presentations


Presentation on theme: "Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky."— Presentation transcript:

1 Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky

2 Outline SNPs, haplotypes and genotypes Haplotype tagging problem Linear reduction method for tagging Maximizing tagging separability Conclusions & future work

3 Outline SNPs, haplotypes and genotypes Haplotype tagging problem Linear reduction method for tagging Maximizing tagging separability Conclusions & future work

4 Human Genome and SNPs Length of Human Genome  3  10 9 base pairs Difference b/w any people  0.1% of genome  3  10 6 SNPs Total #single nucleotide polymorphisms (SNP)  1  10 7 SNPs are mostly bi-allelic, e.g., alleles A and C Minor allele frequency should be considerable e.g. > 1% Diploid = two different copies of each chromosome Haplotype = description of single copy (0,1) Genotype = description of mixed two copies (0=00, 1=11, 2=01) 0111 0 0 110 110 00 Twohaplotypesper individual 2121 0 0 120 Genotype for the individual 0111 0 0 110 110 00 Twohaplotypesper individual 2121 0 0 120 Genotype for the individual 

5 Haplotype and Disease Association Haplotypes/genotypes define our individuality Genetically engineered athletes might win at Beijing Olympics (Time (07/2004)) Haplotypes contribute to risk factors of complex diseases (e.g., diabetes) International HapMap project: http://www.hapmap.org –SNP’s causing disease reason are hidden among 10 million SNPs. –Too expensive to search –HapMap tries to identify 1 million tag SNPs providing almost as much mapping information as entire 10 million SNPs.

6 Outline SNPs, haplotypes and genotypes Haplotype tagging problem Linear reduction method for tagging Maximizing tagging separability Conclusions & future work

7 Tagging Reduces Cost Decrease SNP haplotyping cost: –sequence only small amount of SNPs = tag SNP –infer rest of (certain) SNPs based on sequenced tag SNPs Cost-saving ratio = m / k (infinite population) Traditional tagging = linkage disequilibrium (LD) needs too many SNPs, cost-saving ratio is too small (≈ 2) Proposed linear reduction method: cost-saving ratio ≈ 20 Number of SNPs: m Number of Tags : k

8 Haplotype Tagging Problem Given the full pattern of all SNPs for sample Find minimum number of tag SNPs that will allow for reconstructing the complete haplotype for each individual

9 Outline SNPs, haplotypes and genotypes Haplotype tagging problem Linear reduction method for tagging Maximizing tagging separability Conclusions & future work

10 Linear Rank of Recombinations Human Haplotype Evolution = –Mutations – introduce SNPs –Recombinations – propagate SNPs over entire population Replace notations (0, 1) with (–1, 1) Theorem: Haplotype population generated from l haplotypes with recombinations at k spots has linear rank (l- 1)(k+2) It is much less than number of all haplotypes = l k Conclusion: use only linearly independent SNP’s as tags

11 Tag SNPs Selection Tag Selecting Algorithm –Using Gauss-Jordan Elimination find Row Reduced Echelon Form (RREF) X of sample matrix S. –Extract the basis T of sample S –Factorize sample S = T  X –Output set of tags T Fact: In sample, each SNP is a linear combination of tag SNPs Conjecture: In entire population, each SNP is same linear combination of tags as in sample Sample S rref X × tags T =

12 Haplotype Reconstruction –Given tags t of unknown haplotype h and RREF X of sample matrix S –Find unknown haplotype h –Predict the h’ = t  X –We may have errors, since predicted h’ may not equal to unknown haplotype h. we assign –1 if predicted values are negative and +1 otherwise. (RLRP) –Variant : randomly reshuffle SNPs before choosing tags (RLR) Unknown haplotype h rref X Predicted haplotype h’ =  tags set

13 Results for Simulated Data Cost-saving ratio for 2% error for LR is 3.9 and for RLRP is 13 P =1000 different haplotypes m =25000 sites Sample size = k (number of tag SNP’s) = 50,100,…,750

14 Results for Real Data Cost-saving ratio for 5% error for LR is 2.1 and for RLRP is 2.8 P =158 different haplotypes (Daly el.,) m =103 sites Sample size = k (number of tag SNP’s) = 10,15,20,…,90

15 Outline SNPs, haplotypes and genotypes Haplotype tagging problem Linear reduction method for tagging Maximizing tagging separability Conclusions & future work

16 Tag Separability Correlation between number of zeros for SNPs in RREF X and number of errors in prediction column Greedy heuristic gives a more separable basis. For 5% error, cost-saving ratio 2.8 vs 3.3 for RLRP

17 Conclusions and Future work Our contributions –new SNP tagging problem formulation –linear reduction method for SNP tagging –enhancement of linear reduction using separable basis Future work –application of tagging for genotype and haplotype disease association

18 Thank you


Download ppt "Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky."

Similar presentations


Ads by Google