SNP Haplotype Block Partition and tagSNP Finding

Slides:



Advertisements
Similar presentations
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Advertisements

Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Single Nucleotide Polymorphism And Association Studies Stat 115 Dec 12, 2006.
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
Efficient Algorithms for Genome-wide TagSNP Selection across Populations via the Linkage Disequilibrium Criterion Authors: Lan Liu, Yonghui Wu, Stefano.
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
The Questions Why study haplotypes? How can haplotypes be inferred? What are haplotype blocks? How can haplotype information be used to test associations.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference Ion Mandoiu University of Connecticut CS&E Department.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
A dynamic program algorithm for haplotype block partitioning Zhang, et. al. (2002) PNAS. 99, 7335.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Dynamic Programming Algorithms for Haplotype Block Partitioning:
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Synthetic Sequence Design for Signal Location Yaw-Ling Lin ( 林 耀 鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics Providence.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Efficient Algorithms for SNP Haplotype Block Selection Problems Yaw-Ling Lin ( 林耀鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
KGEM: an EM Error Correction Algorithm for NGS Amplicon-based Data Alexander Artyomenko.
ICCABS 2013 kGEM: An EM-based Algorithm for Local Reconstruction of Viral Quasispecies Alexander Artyomenko.
Yufeng Wu and Dan Gusfield University of California, Davis
Single Nucleotide Polymorphisms (SNPs
Introduction to SNP and Haplotype Analysis
Of Sea Urchins, Birds and Men
Constrained Hidden Markov Models for Population-based Haplotyping
How Accurate is Pure Parsimony Haplotype Inferencing
Introduction to SNP and Haplotype Analysis
Estimating Recombination Rates
By Michael Fraczek and Caden Boyer
TagSNP Selection Problems based on Linkage Disequilibrium and Lagrangian Relaxation Chia-Yi Ma I-Lin Wang Department of Industrial & Information Management.
Roadmap Discovering Patterns Analyzing Patterns
BI820 – Seminar in Quantitative and Computational Problems in Genomics
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
Sequential Steps in Genome Mapping
Approximation Algorithms for the Selection of Robust Tag SNPs
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Discovering Frequent Poly-Regions in DNA Sequences
Haplotype Block Partition with Limited Resources and Applications to Human Chromosome 21 Haplotype Data  Kui Zhang, Fengzhu Sun, Michael S. Waterman,
Presentation transcript:

SNP Haplotype Block Partition and tagSNP Finding Speaker:孫嘉璘 Adviser:楊昌彪

Outline Introduction to SNP Combinatorial problems arising from SNP Haplotype block forming Characteristics of blocks tagSNP finding and block partition strategies Conclusion

SNP (Single Nucleotide Polymorphism) Point mutation Occurs about every 800bps, it’s relatively stable than other markers

Measurement of the Variance within DNA Sequences DNA pooling method under case control: ΔAIP(Allele Image Pattern) ΔAIP = Diff./(Diff.+Comm.)

Combinatorial Problems Arising in SNP and Haplotype Haplotype Phasing Problem: using a set of genotypes to infer haplotypes Haplotype Block Detection: using a set of haplotype data to detect haplotype blocks Finding tagSNP: using minimum number of SNP sites to identify all haplotypes

Haplotype Block Forming

Recombination

Characteristics of Blocks Haplotype: the pattern of alleles along a single chromosome In every block, 2~5 haplotypes can capture 75~90% haplotypes Block length is highly related to LD extent Etc.

Cont. Regions with low levels of LD would require a denser SNP map to detect association than regions where LD is conserved over large physical distances.

Hardness of Finding tagSNP We concerned with the problem of how many tagSNPs are required to tag a given number of haplotypes This question can be reduced to MINIMUM TEST COLLECTION problem, it shows the problem is NP-complete[2]

tagSNP Finding and Block Partition Strategies The Best Enumeration of SNP Tags (BEST) algorithm uses the concept of derivation[3] Zhang et al., dynamic programming method for block partition is based on coverage concept[4] We are going to use entropy concept to work on this problem

BEST Apply the Boolean dependency to initial H1 Run the process recursively to draw out max ci to H1, and form 1. H2i = H1∪ ci 2. C2i = S \ (H2i ∪ D2i) Until set C is empty

Cont.

Zhang’s Dynamic Programming Method k haplotypes with its length n (SNP sites) ri(k) = 0,1 or 2 (i=1…n): represents the ith SNP in the kth haplotype • block(ri, …, rj) = 1 if at least x% unambiguous haplotype occur at least once • f(ri, …,rj) represents min tagSNP number required for ri … rj within a block to distinguish x% unambiguous haplotypes

Cont. Sj : the minimum tagSNP require for j SNPs Sj = min{ Si-1 + f(ri …rj) if 1≦ i≦ j and block(ri,…,rj) =1 } Cj : the minimum number of blocks requires for Sj tagSNPs represents first j SNPs Cj = min{Ci-1 + 1, if 1 ≦ i≦ j and block(ri,…,rj) =1 and Sj = Si-1 + f(ri,…,rj) }

Drawbacks in Recent Methods BEST : Minimum tagSNPs coverage  O Block structure  X DP : Minimum tagSNPs coverage  O Block structure  O DP block definition does not meet the real biology phenomenon ( 2~5 common haplotypes per block)

Entropy Concept Entropy concept : entropy can represent the degree of difference in data Formula : E(si) = E(s1)=(-0.8)(log20.8)+(-0.2)(log20.2)=0.2173 E(s2)=(-0.4)(log20.4)+(-0.6)(log20.6)=0.2923 E(s3)=(-0.1)(log20.1)+(-0.9)(log20.9)=0.141 E(s4)=(-1.0)(log21.0) =0 E(s5)=(-0.1)(log20.1)+(-0.9)(log20.9)=0.141

Entropy Concept Method Joint entropy : E(s1,s2,s3,s4,s5) = E(s1)+E(s2|s1)+E(s3|s2,s1)+E(s4|s3,s2,s1)+E(s5|s4,s3,s2,s1) Within biological block definition criteria (2~5 common haplotypes),choosing the one has max entropy Entropy stands for the number of tagSNPs needed in a block (still under progress)

Reference [1]Bjarni V. Halldorsson et al., Combinatorial problems arising in SNP and haplotype analysis, Proc., DMTCS Conference (2003) [2]Carsten Wiuf et al., Some notes on the combinatorial properties of haplotype tagging, Math. Bio. Vol.185 (2003) [3]Paola Sebastiani et al., Minimal haplotype tagging, PNAS, Vol. 100 (2003) no. 17 [4]Zhang et al., A dynamic programming algorithm for block partitioning, PNAS, Vol.99 No.11 (2002)

Thank You