National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.

Slides:



Advertisements
Similar presentations
CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Advertisements

Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Combinatorial Algorithms for Maximum Likelihood Tag SNP Selection and Haplotype Inference Ion Mandoiu University of Connecticut CS&E Department.
Wei-Bung Wang Tao Jiang
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Incorporating Mutations
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Dynamic Programming Algorithms for Haplotype Block Partitioning:
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Efficient Algorithms for SNP Haplotype Block Selection Problems Yaw-Ling Lin ( 林耀鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Notes: Human Genome (Right side page)
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Single Nucleotide Polymorphisms (SNPs
Introduction to SNP and Haplotype Analysis
Of Sea Urchins, Birds and Men
School of Pharmacy, University of Nizwa
Recombination (Crossing Over)
Introduction to SNP and Haplotype Analysis
Linking Genetic Variation to Important Phenotypes
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
School of Pharmacy, University of Nizwa
Ho Kim School of Public Health Seoul National University
Approximation Algorithms for the Selection of Robust Tag SNPs
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Parsimony population haplotyping
Presentation transcript:

National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department of Computer Science and Information Engineering National Taiwan University, Taiwan * We thank Yao-Ting Huang for his technical assistance.

National Taiwan University Department of Computer Science and Information Engineering 2 Genetic Variations The genetic variations in DNA sequences (e.g., insertions, deletions, and mutations) have a major impact on genetic diseases and phenotypic differences.  All humans share 99% the same DNA sequence.  The genetic variations in the coding region may change the codon of an amino acid and alter the amino acid sequence.

National Taiwan University Department of Computer Science and Information Engineering Single Nucleotide Polymorphism A Single Nucleotide Polymorphism (SNP), pronounced “snip,” is a genetic variation when a single nucleotide (i.e., A, T, C, or G) is altered and kept through heredity.  SNP: Single DNA base variation found >= 1%  Mutation: Single DNA base variation found <1% C T T A G C T T C T T A G T T T SNP C T T A G C T T C T T A G T T T Mutation 94% 6% 99.9% 0.1%

National Taiwan University Department of Computer Science and Information Engineering 4 Mutations and SNPs Common Ancestor timepresent Observed genetic variations Mutations SNPs

National Taiwan University Department of Computer Science and Information Engineering 5 Single Nucleotide Polymorphism SNPs are the most frequent form among various genetic variations.  90% of human genetic variations come from SNPs.  SNPs occur about every 300~600 base pairs.  Millions of SNPs have been identified (e.g., HapMap and Perlegen). SNPs have become the preferred markers for association studies because of their high abundance and high-throughput SNP genotyping technologies.

National Taiwan University Department of Computer Science and Information Engineering Single Nucleotide Polymorphism A SNP is usually assumed to be a binary variable.  The probability of repeat mutation at the same SNP locus is quite small.  The tri-allele cases are usually considered to be the effect of genotyping errors. The nucleotide on a SNP locus is called  a major allele (if allele frequency > 50%), or  a minor allele (if allele frequency < 50%). A C T T A G C T T A C T T A G C T C C: Minor allele 94% 6% T: Major allele

National Taiwan University Department of Computer Science and Information Engineering 7 Haplotypes A haplotype stands for a set of linked SNPs on the same chromosome.  A haplotype can be simply considered as a binary string since each SNP is binary. SNP 1 SNP 2 SNP 3 -A C T T A G C T T- -A A T T T G C T C- -A C T T T G C T C- Haplotype 2 Haplotype 3 C A T A T C C T C Haplotype 1 SNP 1 SNP 2 SNP 3

National Taiwan University Department of Computer Science and Information Engineering 8 Tag SNP Selection Haplotype Inference Tag SNP Selection Maximum Parsimony Perfect Phylogeny Statistical Methods Haplotype block LD bin Prediction Accuracy SNP Database …

National Taiwan University Department of Computer Science and Information Engineering 9 Problems of Using SNPs for Association Studies The number of SNPs is too large to be used for association studies.  There are millions of SNPs in a human body.  To reduce the SNP genotyping cost, we wish to use as few SNPs as possible for association studies. Tag SNPs are a small subset of SNPs that is sufficient for performing association studies without losing the power of using all SNPs.  We will first study the definition of tag SNPs based on the haplotype-block model.

National Taiwan University Department of Computer Science and Information Engineering 10 Haplotype Blocks and Tag SNPs Some studies have shown that the chromosome can be partitioned into haplotype blocks interspersed by some recombination hotspots.  Within a haplotype block, there is little or no recombination occurred.  The SNPs within a haplotype block tend to be inherited together. Within a haplotype block, a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block.  We only need to genotype tag SNPs instead of all SNPs within a haplotype block.

National Taiwan University Department of Computer Science and Information Engineering 11 Recombination Hotspots and Haplotype Blocks Recombination hotspots Chromosome Haplotype blocks P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype patterns : Major allele : Minor allele

National Taiwan University Department of Computer Science and Information Engineering 12 A Haplotype Block Example Human chromosome 21 is partitioned into 4,135 haplotype blocks over 24,047 SNPs by Patil et al. (Science, 2001).  Blue box: major allele  Yellow box: minor allele

National Taiwan University Department of Computer Science and Information Engineering 13 Examples of Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype patterns Suppose we wish to distinguish an unknown haplotype sample. We can genotype all SNPs to identify the haplotype sample. An unknown haplotype sample : Major allele : Minor allele

National Taiwan University Department of Computer Science and Information Engineering 14 Examples of Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern In fact, it is not necessary to genotype all SNPs. SNPs S 3, S 4, and S 5 can form a set of tag SNPs. P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S5S5

National Taiwan University Department of Computer Science and Information Engineering 15 Examples of Wrong Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern SNPs S 1, S 2, and S 3 can not form a set of tag SNPs because P 1 and P 4 will be ambiguous. P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3

National Taiwan University Department of Computer Science and Information Engineering 16 Examples of Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern SNPs S 1 and S 12 can form a set of tag SNPs. This set of SNPs is the minimum solution in this example. P1P1 P2P2 P3P3 P4P4 S1S1 S 12

National Taiwan University Department of Computer Science and Information Engineering 17 Problems of Finding Tag SNPs The problem of finding the minimum set of tag SNPs is known to be NP-hard.  This problem is the minimum test set problem.  A number of methods have been proposed to find the minimum set of tag SNPs. Here we illustrate how to recast the tag SNP selection problem as the set cover problem and the integer linear programming problem.

National Taiwan University Department of Computer Science and Information Engineering 18 Problem Formulation (1,2)(1,3)(1,4)(2,3)(2,4)(3,4) S1S1 S2S2 S3S3 S4S4 Given h patterns, we have pairs of patterns. The relation between SNPs and haplotypes can be formulated as a bipartite graph. S 1 can distinguish (P 1, P 3 ), (P 1, P 4 ), (P 2, P 3 ), and (P 2, P 4 ). S 2 can distinguish (P 1, P 4 ), (P 2, P 4 ), (P 3, P 4 ). P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S1S1 S2S2

National Taiwan University Department of Computer Science and Information Engineering 19 Set Cover (1,2) (1,3)(1,4)(2,3)(2,4)(3,4) S1S1 S3S3 Each pair of patterns is connected by at least one edge. The SNPs can form a set of tag SNPs if each pair of patterns is connected by at least one edge. e.g., S 1 and S 3 can form a set of tag SNPs. e.g., S 1 and S 2 can not be tag SNPs. P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S1S1 S2S2 S2S2

National Taiwan University Department of Computer Science and Information Engineering 20 A Greedy Algorithm (1,2)(1,3)(1,4)(2,3)(2,4)(3,4) S1S1 S1S1 S1S1 S1S1 S4S4 S4S4 S4S4 S4S4 (1,2)(1,3)(1,4)(2,3)(2,4)(3,4) S1S1 S4S4 P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S1S1 S2S2

National Taiwan University Department of Computer Science and Information Engineering 21 Integer Linear Programming n SNPs, h patterns Let x i be defined as follows.  x i = 1 if the i-th SNP is selected;  x i = 0 otherwise. Let D(P j, P k ) be the set of SNPs that can distinguish patterns P j and P k. Integer programming formulation.

National Taiwan University Department of Computer Science and Information Engineering 22 Problem Formulation D(P 1, P 2 )={S 3, S 4 } D(P 1, P 3 )={S 1, S 3 } D(P 1, P 4 )={S 1, S 2, S 4 } D(P 2, P 3 )={S 1, S 4 } D(P 2, P 4 )={S 1, S 2, S 3 } D(P 3, P 4 )={S 2, S 3, S 4 } P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S1S1 S2S2

National Taiwan University Department of Computer Science and Information Engineering 23 An Iterative LP-relaxation Algorithm Linear programming relaxation. Randomized rounding method. Repeat the steps for those unsatisfied inequalities until all of them are satisfied.

National Taiwan University Department of Computer Science and Information Engineering 24 Discussion In this chapter, we illustrate how to recast the tag SNP selection problem as the set cover problem and the integer linear programming problem. hard problems approximation algorithms Related topics: missing data LD-bins a specified number of tag SNPs