中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Introduction to Haplotype Estimation Stat/Biostat 550.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
SNP Haplotype reconstruction Statistics 246, 2002, Week 14, Lecture 2 Not complete.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Sharlee Climer, Alan R. Templeton, and Weixiong Zhang
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Data Mining in Linkage Disequilibrium Mapping Jing Hua Zhao Epidemiology June 2003.
Joint Linkage and Linkage Disequilibrium Mapping
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Ronnie A. Sebro Haplotype reconstruction BMI /21/2004.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Ion Mandoiu CSE Department, University of Connecticut Joint work with Justin.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Justin Kennedy, Ion Mandoiu, Bogdan Pasaniuc CSE Department, University of Connecticut.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model Dumitru Brinza joint work with Alexander Zelikovsky Department of Computer Science Georgia.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Shaun Purcell & Pak Sham Advanced Workshop Boulder, CO, 2003
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Family-Based Association Tests
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Joint Linkage and Linkage Disequilibrium Mapping Key Reference Li, Q., and R. L. Wu, 2009 A multilocus model for constructing a linkage disequilibrium.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
INTRODUCTION TO ASSOCIATION MAPPING
1 Haplotyping Algorithms Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 29,
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004.
1 Haplotyping Algorithm Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 6, 2008.
California Pacific Medical Center
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
A Transmission/disequilibrium Test for Ordinal Traits in Nuclear Families and a Unified Approach for Association Studies Heping Zhang, Xueqin Wang and.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Introduction to SNP and Haplotype Analysis
Ø Novel approaches for linkage mapping in dairy cattle
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Constrained Hidden Markov Models for Population-based Haplotyping
Xiaole Shirley Liu STAT115/STAT215/
upstream vs. ORF binding and gene expression?
Use of DNA information in Genetic Programs.
Genome Wide Association Studies using SNP
Imputation-based local ancestry inference in admixed populations
Epidemiology 101 Epidemiology is the study of the distribution and determinants of health-related states in populations Study design is a key component.
Power to detect QTL Association
Basic concepts on population genetics
Association Mapping Lon Cardon
Haplotype Reconstruction
Phasing of 2-SNP Genotypes Based on Non-Random Mating Model
Lecture 10: QTL Mapping II: Outbred Populations
Association Analysis Spotted history
Outline Cancer Progression Models
Ho Kim School of Public Health Seoul National University
Lecture 9: QTL Mapping II: Outbred Populations
IBD Estimation in Pedigrees
Presentation transcript:

中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype become more and more prevail, becasue haplotypes can give more information than single locus, they are useful in QTL fine mapping, evolution and so on. Daughter and grand-daughter designs are two very powerful designs for QTL mapping in cattle. Here, I will show you an algorithms to reconstruct haplotype in daughter and grand-daughter designs. 丁向东 中国农业大学动物科技学院

Outline Haplotype inference Transmission Disequilibrium Test (TDT) Introduction of important methods Parsimony (Clark,1990) EM (Excoffier and Slatkin, 1995) Bayesian (Stephens and Donnelly,2001) Haplotype inference using family information Transmission Disequilibrium Test (TDT) Haplotype-based TDT (Hap-TDT)

Genotype and Haplotype Haplotype: A collection of alleles from same chromosome Allele & Genotype Haplotype & diplotype Loc1 2 13 Loc2 1 6 … 9 15 4 17 Loc7 2 13 6 1 9 15 17 4 Haplotype reconstruction Chromosome phase is unkown Chromosome phase is kown

Algorithms for haplotype inference Statistical methods Parsimony (Clark,1990) EM Excoffier and Slatkin (1995); Qin et al. (2002) Bayesian Stephens and Donnelly (2001); Niu et al. (2002) Rule-based methods Minimum recombination principle Qian and Beckmann (2002); Baruch, et al. (2006) So far, there are two categories of algorithms for haplotype inference: statistical methods and Rule-based methods. Statistical methods are usually very time consuming. Parsimony, maximum likelihood via EM algorithm and Bayesian methods are three pupolar statistic approaches in statistical methods. On the other hand, by utilizing some reasonable biological assumptions ,such as minimum recombination principle,rule-based methods give the inference of haplotypes. Generally,rule-based approaches are very fast compare with statistical methods.

Parsimony (Clark,1990) Start from a homozygote A1A1 A1A2 A2A3 A3A4 A3A5 A6A7 Start from a homozygote Determine any other ambiguous sequence using the definitive haplotype from 1 Continue this procedure until all haplotypes are resolved or until no more new haplotypes can be found

EM algorithm Numerical method of finding maximum likelihood estimates for parameters given incomplete data. Initial parameter values: haplotype frequencies Expectation step: compute expected values of missing data based on initial data Maximization step: compute MLE for parameters from the complete data Repeat with updated set of parameters until changes in the parameter estimates are negligible.

EM algorithm efficiency Heavy computational burden with large number of loci Partition-ligation algorithm (Niu et al., 2002) PL-EM (Qin et al., 2002) Accuracy and departures from HWE Assumption of HWE in most EM-based methods Robust to departure from HWE (Fallin and Schork, 2000)

Bayesian method PHASE (Stephens and Donnelly, 2001) Based on coalescent model Use Gibbs sampling So far, very accurate, but also complicated. fastPHASE (Scheet and Stephens,2006) Revised version of PHASE Much faster, missing genotype

Comparison of Parsimony,EM and PHASE PHASE performs better than parsimony and EM (Stephen, 2001) PHASE and EM-based methods exhibited similar performances (Zhang et al. 2001; Xu et al. 2002)

Haplotype inference using family data (1) Haplotype inference based on close relatives Reduces haplotype ambiguity and improves the efficiency Rohde and Fuerst (2001) – EM algorithm Families with both parents and their children The genotyped offspring reduce the number of potential haplotype pairs for both parents. Ding et al. (2006) - EM algorithm Families with only one parent available

Haplotype inference using family data (2) Ding et al. – EM algorithm Families with only sibs genotypes of sibs Parental information Haplotype pairs of sibs Mixed family data Complete families Incomplete families One parent Only sibs the sibs are considered jointly as an independent pair, their parental information can be inferred firstly, and the haplotype pairs of sibs can be updated given the posterior parental information

Comparison of four different strategies Method name Using family information? Using LD? Handling incomplete families? Complete-family-EM (Rhode and Fuerst, 2001) YES NO Incomplete-family-EM (Ding et al., 2006) GENEHUNTER (Kruglyak et al., 1996) PHASE (Stephens et al., 2003) (Ding et al., 2006)

Result for incomplete families Incomplete-family-EM PHASE GENEHUNTER Running time of PHASE: 3.5 hs for the whole 100 datasets of 30 trios,187 SNPs (Marchini et al.,2006) Running time will become prohibitive for large SNPs

Rule-based method Minimum recombination principle Qian and Beckmann (2002); Baruch, et al. (2006) Genetic recombination is rare Haplotype with fewer recombinants should be preferred in a haplotype reconstruction Rule 1 Rule 2 … Wang et al., 2006

Joint EM and rule-based algorithm for (grand-) daughter design Assumption of no recombination EM algorithm to construct diplotype Taking into account recombination Minimum recombination principle Find the diplotype of sire that minimizes the number of recombinations in the sire family

Example: Possible diplotypes recom. events ------------------------------------------------ 1. 22222 11111 1 2. 11222 22111 51 3. 11122 22211 49 4. 22221 11112 51 12222 21111 51 ------------------------------------------------- A sire family with 50 offspring phase-unknown genotype of sire:(12;12;12;12;12)

Results 10 sire families with 30 offspring each   HSHAP PedPhase SimWalk2 Error rate in sires 0.007c 0.682a 0.126b Error rate in offspring 0.021c 0.0330b 0.0705a Running time (m) 8.211 0.006 60.17 10 sire families with 30 offspring each 10 SNPs , 0.5 centiMorgan

Results - incomplete data HSHAP PedPhase SimWalk2 Error rate in sires 0.0310c 0.8910a 0.0930b Error rate in offspring 0.0159c 0.7607a 0.2164b Running time (m) 0.01 0.002 10.65 10 sire families with 20 offspring each 5SNPs, 0.5 centiMorgan 30% individuals with one random missing locus.

Linkage vs Association Family-based Matching/ethnicity generally unimportant Few markers for genome coverage (300-400 STRs) Can be weak design Good for initial detection; poor for fine-mapping Powerful for rare variants Association Families or unrelateds Matching/ethnicity crucial Many markers req for genome coverage (105 – 106 SNPs) Powerful design Poor for initial detection; good for fine-mapping Powerful for common variants; rare variants generally impossible

TDT (Transmission Disequilibrium Test) Compares the distribution of transmitted and non-transmitted alleles by parents of affected offspring (Spielman et al. 1993) Non-transmitted allele total transmitted allele M1 M2 a b a+b c d c+d a+c b+d 2n If the marker is unlinked to the causative locus then we expect b = c, else, one of the alleles will tend to be transmitted more often

TDT TDT (Transmission Disequilibrium Test) Good for fine-mapping, poor for initial detection Robust for population stratification/admixture Widely used in genome-wide association study Extended to all kinds of data structure Multi allelic markers (Sham and Curtis, 1995) Multiple siblings (Spielman,1998; Boehnke, 1998) Missing parental data (Sun, 1999) Extended pedigree (Martin et al., 2000) Quantitative traits (Allison,1997; Ding et al, 2004)

Comparison of PDT,QTDT and PTDT Quantitative trait # ped # alleles QTL effect Power Type I error PDT QTDT PTDT 10 2 0.1 59 18 57 4 5 1 0.3 74 61 82 7 3 8 0.5 85 81 92

Hap-TDT Two categories of haplotype-based TDT Hap-TDT (Haplotype-based TDT) Haplotype is more informative than single marker The original TDT and most of its extensions consider one marker at a time. Two categories of haplotype-based TDT Haplotype reconstruction first Sethuraman (1997); Wilson (1997); Clayton and Jones (1999); Zhao et al. (2000); Zhang et al.(2003) Implicit haplotype reconstruction Dudbridge (2003)

Hap-TDT vs TDT Zhang et al. (2003) Qualitative trait Quantitative trait (h2 = 0.06)

Hap-TDT vs TDT 5SNPs, Excluding causal variant SNP

Hap- TDT Problem of multiple comparisons Increase in the degree of freedom More markers A large number of haplotypes A large number of degrees of freedom Limit the power of test

Acknowledgement Prof. Qin Zhang (China Agricultural University) Prof. Henner Simianer (University of Goettingen)