The Haplotype Blocks Problems Wu Ling-Yun 2003.10.10.

Slides:



Advertisements
Similar presentations
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Advertisements

CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Gene Linkage and Genetic Mapping
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
Signatures of Selection
Genomics An introduction. Aims of genomics I Establishing integrated databases – being far from merely a storage Linking genomic and expressed gene sequences.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005.
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
A dynamic program algorithm for haplotype block partitioning Zhang, et. al. (2002) PNAS. 99, 7335.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Dynamic Programming Algorithms for Haplotype Block Partitioning:
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Efficient Algorithms for SNP Haplotype Block Selection Problems Yaw-Ling Lin ( 林耀鈴 ) Dept Computer Sci and Info Engineering College of Computing and Informatics.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Yufeng Wu and Dan Gusfield University of California, Davis
Single Nucleotide Polymorphisms (SNPs
Introduction to SNP and Haplotype Analysis
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Of Sea Urchins, Birds and Men
Signatures of Selection
The ABC’s of DNA Barry Bowman.
Recombination (Crossing Over)
Detection of the footprint of natural selection in the genome
Introduction to SNP and Haplotype Analysis
Estimating Recombination Rates
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
“TaqMan genotyping Assay’’
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Ho Kim School of Public Health Seoul National University
Medical genomics BI420 Department of Biology, Boston College
Medical genomics BI420 Department of Biology, Boston College
Approximation Algorithms for the Selection of Robust Tag SNPs
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Block Partition with Limited Resources and Applications to Human Chromosome 21 Haplotype Data  Kui Zhang, Fengzhu Sun, Michael S. Waterman,
Presentation transcript:

The Haplotype Blocks Problems Wu Ling-Yun

References Daly, M. et al. High-resolution haplotype structure in the human genome. Nature Genetics 29: , Patil, N. et al. Blocks of limited haplotype diversity revealed by high- resolution scanning of human chromosome 21. Science 294: , Gusfield, D. Haplotyping as perfect phylogeny: conceptual framework and efficient solutions. RECOMB 02: , Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296: , Zhang, K. et al. A dynamic programming algorithm for haplotype block partitioning. Proc. Natl. Acad. Sci. USA 99(11): , Kimmel, G. et al. Computational Problems in Noisy SNP and Haplotype Analysis: Block Scores, Block Identification and Population Stratification. Working paper, 2003.

Methods Hidden Markov Model (Daly, M et al. 2001) Perfect Phylogeny (Gusfield, D. 2002) Linkage Disequilibrium (Gabriel, S. B. et al. 2002) Greedy Algorithm (Daly, M et al. 2001, Patil, N. et al. 2001) Dynamic Programming (Zhang, K. et al. 2002, Kimmel, G. et al. 2003)

What ’ s SNP Genetic Polymorphism is a difference in DNA sequence among individuals, groups, or populations. Genetic Mutation is a change in the nucleotide sequence of a DNA molecule. Genetic mutations are a kind of genetic polymorphism. A Single Nucleotide Polymorphism is a single base mutation in DNA. SNPs ("snip") are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms).

Types of SNPs Two types of substitutions resulting in SNPs Transition : substitution between purines (A, G) or between pyrimidines (C, T). Constitute two thirds of all SNPs. Transversion : substitution between a purine and a pyrimidine. A Non-Synonymous SNP coding region is one in which the substitution results in an alteration of the encoded amino acid. One half of all coding sequence SNPs result in non- synonymous codon changes. Common SNP : >5% minor allele frequency.

What ’ s Haplotype Genotype is an exact description of the genetic constitution of an individual. A Haplotype is a “haploid genotype”. Haplotype is a particular pattern of sequential SNPs (or alleles) observed on a single chromosome. We can associate disease gene with SNPs because they come together in a haplotype. Haplotypes have been successfully used to identify genes for diseases. The general properties of haplotypes in the human genome have remained unclear.

Haplotyping Haplotyping : involves grouping subjects by haplotypes, or particular patterns of sequential SNPs, found on a single chromosome. There are 2 n possible haplotypes provided n SNPs. But in reality, only O(n) haplotypes are observed. There are thought to be a small number of haplotype patterns for each chromosome. Instead of finding haplotypes in the whole genome, we find them in small pieces and recombine.

Haplotype Blocks Recombination of haplotypes occurs primarily in narrow regions called hot spots. The haplotype regions between two neighboring hot spots are called blocks. Limited haplotype diversity are observed within blocks. Few representative SNPs (tag SNPs) from each block are suffice to unambiguously distinguish the haplotypes in this block.

Blocks Properties Little recombination within blocks. Large probability of exchange (recombination) between blocks. Blocks do not have absolute boundaries and may be defined in different ways, depending on the specific application.

Daly ’ s Model HETobs = observed haplotypic heterozygosity HETexp = expected haplotypic heterozygosity Block score = HETobs / HETexp A smaller score represents lower diversity of haplotypes compared with expectation. Start from windows of five SNPs. Windows were expanded or contracted by adding or subtracting SNP to the ends to find the longest local minimum window.

Haplotype Blocks at 5q31

LD on Haplotype Blocks

Petil ’ s Model Consider all possible blocks of physically consecutive SNPs of size one SNP or larger. Select the one with the maximum ratio of total SNPs in the block to the minimal number of SNPs required to uniquely discriminate haplotypes represented more than once in the block. Any of the remaining blocks that physically overlap with the selected block are discarded. Repeat until we have selected a set of contiguous, non-overlapping blocks that cover whole chromosome with no gaps and with every SNP assigned to a block.

Haplotype Blocks Figure

Gabriel ’ s Model A haplotype block is defined as a region over which a very small proportion (<5%) of comparisons among informative SNP pairs show strong evidence of historical recombination.

Types of Haplotypes Common haplotypes Represented more than once 70~90% of the haplotypes within a block Very few (2-5) Rare haplotypes Represented only once 10~30% of the haplotypes within a block

Ambiguous Haplotypes Two haplotypes are said to be compatible if the alleles are identical at all loci for which there are no missing data; otherwise incompatible. A haplotype is ambiguous if it is compatible with two other haplotypes that are themselves incompatible. H1 = (1, 1, ?, 0) H2 = (1, 1, 0, ?) H3 = (1, 1, 1, 0)

Zhang ’ s Model Find a partition to minimize the total number of representative SNPs required to distinguish at least percent of unambiguous haplotypes in each block for the entire chromosome. Minimize the number of blocks among all of the block partitions with the minimum number of representative SNPs. The problem of finding the minimum number of representative SNPs within a block to uniquely distinguish all of the haplotypes is known as Minimum Test Set problem, which have been proven to be NP-complete.

Dynamic Programming

Coverage v.s. Diversity Another measure of haplotype quality in a block is the minimum total number of SNPs required to explain percent of haplotype diversity in each block. The haplotype block partition based on diversity can be solved using the same dynamic programming method.

Kimmel ’ s Model Find a block partition that minimize the total number of distinct haplotypes that are observed in all the blocks. Minimize the total number of haplotypes in blocks can be done in polynomial time if there are no data errors. Several problems are studied Total Block Errors (TBE) Local Block Errors (LBE) Incomplete Haplotypes (IH) Minimum Block Haplotypes (MBH) Probabilistic Model Block Scoring (PMBS) algorithm. Simulated annealing algorithm are used to solve MBH.