Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs

Slides:



Advertisements
Similar presentations
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Advertisements

SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Linkage Disequilibrium
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Plant of the day! Pebble plants, Lithops, dwarf xerophytes Aizoaceae
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Wei-Bung Wang Tao Jiang
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Inferring Haplotypes Dr. Russell Thomson. A Haplotype. …AGCTATATTA…..GGCTGCTC…..AGCAGCGA… …AGCTAAATTA…..GGCTCCTC…..AGCAGCGA… One individual. Marker 1Marker.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Informative SNP Selection Based on Multiple Linear Regression
National Taiwan University Department of Computer Science and Information Engineering Dynamic Programming Algorithms for Haplotype Block Partitioning:
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Genes in human populations n Population genetics: focus on allele frequencies (the “gene pool” = all the gametes in a big pot!) n Hardy-Weinberg calculations.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Global Variation in Copy Number in the Human Genome Speaker: Yao-Ting Huang Nature, Genome Research, Genome Research, 2006.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Single Nucleotide Polymorphisms (SNPs
Introduction to SNP and Haplotype Analysis
Genetic Linkage.
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
SNP Haplotype Block Partition and tagSNP Finding
Genetic Linkage.
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linkage: Statistically, genes act like beads on a string
Patterns of Linkage Disequilibrium in the Human Genome
Introduction to SNP and Haplotype Analysis
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Genetic Drift, followed by selection can cause linkage disequilibrium
Genetic Linkage.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
Ho Kim School of Public Health Seoul National University
Approximation Algorithms for the Selection of Robust Tag SNPs
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs Speaker: Yao-Ting Huang Advisor: Kun-Mao Chao Good afternoon, this talk is about how to handle SNP genotyping with missing data. My name is Yao-Ting Huang and my advisor is Kun-Mao Chao. And we have two other coauthors, Prof. Zhang and Prof. Chen, but they are not here today. Algorithms and Computational Biology Lab. Dept. of Computer Science & Information Engineering National Taiwan University

Variations in DNA Sequence Variants in the human genome include Single Nucleotide Polymorphisms (SNPs), deletions (e.g., loss of heterozygosity), and insertions. SNPs become the preferred DNA markers for association studies because of their high abundance (e.g., ~1 SNP/1000 base pairs), and high-throughput genotyping technology which allows building a large SNP database (e.g., International HapMap Project).

SNPs Arise from Mutations Variations observed in a population Mutations over time Disease Mutation Common Ancestor time present

Haplotype A set of closely linked SNPs located on one chromosome. GATATTCGTACGGA-T GATGTTCGTACTGAAT GATATTCGTACGGAAT Haplotypes AG- 2/6 GTA 3/6 AGA 1/6 DNA Sequences

Factors Affecting Haplotypes The chromosome recombination breaks up and reorganizes halotypes. If SNPs are closely linked, they will tend to be inherited together as haplotypes. Less chance that recombination will occur between them. Linkage Disequilibrium (LD) is a measure of the non-random association of alleles at linked loci.

Linkage Disequilibrium Consider only two SNPs A b a B a b There are 4 possible haplotypes SNP 1 B b Total A PAB PaB PA a Pab Pa PB Pb 1.0 The probabilities for each haplotype SNP 2

Linkage Equilibrium PAB = PAPB PAb = PAPb = PA(1-PB) SNP 1 B b Total A PAB PaB PA a Pab Pa PB Pb 1.0 SNP 2

Linkage Disequilibrium PAB ≠ PAPB PAb ≠ PAPb = PA(1-PB) PaB ≠ PaPB = (1-PA) PB Pab ≠ PaPb = (1-PA) (1-PB) SNP 1 B b Total A PAB PaB PA a Pab Pa PB Pb 1.0 SNP 2

An Example of Linkage Disequilibrium Before mutation After mutation -- A -- -- -- G -- -- -- -- A -- -- -- G -- -- -- -- C -- -- -- G -- -- -- -- C -- -- -- G -- -- -- PA=1/2 PC=1/2 PG=1 -- C -- -- -- C -- -- -- PA=1/3 PC=2/3 PG=2/3 PC=1/3 We got only three haplotypes: AG, CG, and CC. There is no AC haplotype, i.e., PAC = 0. However, PAPC =1/9, thus PAPC ≠ PAC . These two SNPs are linkage disequilibrium.

An Example of Linkage Equilibrium Before recombination After recombination -- A -- -- -- G -- -- -- -- A -- -- -- G -- -- -- -- C -- -- -- G -- -- -- -- C -- -- -- G -- -- -- -- C -- -- -- C -- -- -- -- C -- -- -- C -- -- -- -- A -- -- -- C -- -- -- PA=1/2 PC=1/2 PG=1/2 PC=1/2 After recombination, PAG = PAPG = 1/4, PCG = PCPG = 1/4, PCC = PCPC = 1/4, and PAC = PAPC = 1/4. Thus, these two SNPs are linkage equilibrium.

D Coefficient We can measure the non-randomness of two loci by means of a deviation, D, defined as follows: D = PAB – PAPB or PABPab – PAbPaB PAB = PAPB + D PAb = PA(1-PB) - D PaB = (1-PA) PB - D Pab = (1-PA) (1-PB) + D These two SNPs are linkage equilibrium iff D = 0.

Standardization of D Coefficient D coefficient can be standardized in many ways. D’ = D/Dmax, where Dmax stands for the absolute maximal possible value of D. D D -PAPB PaPB

Interpretation of D’ D’ is constrained between -1 and +1. D’ = 1 (perfect positive LD between SNP alleles) D’ = 0 (linkage equilibrium between SNP alleles) D’ = -1 (perfect negative LD between SNP alleles) D’ = 0.87 (strong positive LD between SNP alleles) D’ = 0.12 (weak positive LD between SNP alleles) Other measures of D coefficient: r2 or Δ2: Chi-square Test. P value.

Decay of LD over Time The chromosome recombination decreases LD and should reach equilibrium at the end.

Haplotype Blocks in Human Genome The human genome has been shown to contain regions of high LD interspersed by regions of low LD. The recombination occurs frequently in low LD regions. The high LD regions can form haplotype blocks. The International HapMap Project aims to build the haplotype map across human genome. Recombination hot spots (Low LD regions) Haplotype blocks (High LD regions) Chromosome

Genotype Data v.s. Halotype Data The use of haplotype map has been limited due to the fact that the human genome is diploid. Genotype data instead of haplotype data are obtained. Phase problem: loss of the information of the chromosome where each base appears. e.g., we don’t know they are (GA, TC) or (GC, TA). G A Diploid T C SNP1 SNP2

Haplotype Reconstruction with Pedigree Haplotype reconstruction with pedigree (Li and Jiang, 2004). There is no mutations but only recombinants happened within a pedigree. Given a pedigree and genotype data for each member in the pedigree, find a haplotype configuration for the pedigree that requires minimum number of recombinants. Pedigree 1|2 1|2 1|2 3|2 1|2 3|1 1|2 1|3 2|2 2|2

Haplotype Block Partition and Tag SNP Selection Using Genotype Data Zhang et al. (2004) combine a dynamic programming and an EM algorithms to partition haplotype blocks. The EM algorithm infers the haplotypes for a range of SNPs. The dynamic programming algorithm minimizes the number of tag SNPs used in the haplotype block partition. The experiments examine the factors that affect block partition and tag SNPs used, which include number of haplotypes, density of SNPs, minor allele frequency of SNPs, missing data, and genotyping error rate.

Thoughts How to modify the tag SNP selection algorithm to process genotype data. The naïve approach is inferring haplotype data by existing algorithms and finding tag SNPs. Is it possible to determine tag SNPs directly from genotype data? Assume 0: homozygous wild type, 1: homozygous mutant, 2: heterozyhous. P1 P2 P3 P4 S1 1 1 0 0 S2 1 0 1 0 S3 1 2 0 1 S4 1 2 0 1

The Relation Between Minor Allele Frequency and Tag SNPs The minor allele frequency ranges from 0% to 50%. The higher the frequency, the more useful tag SNPs are available. 0000000011 -> 20%. 0010010011 -> 40%, this SNP can distinguish more haplotype patterns. What is the relation between the minor allele frequency and the number of tag SNPs.

Block-Free Selection of Tagging SNPs Bafna, et al. (2004) propose algorithms for selecting tag SNPs without considering haplotype block structure. They define a new measure called “Informativeness,” which measures how well a set of SNPs can predict another set of SNPs. Find a subset of SNPs which has the maximum Informativeness. The number of total tag SNPs used in a whole genome is less than block-dependent approaches.