Presentation is loading. Please wait.

Presentation is loading. Please wait.

中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype.

Similar presentations


Presentation on theme: "中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype."— Presentation transcript:

1 中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype become more and more prevail, becasue haplotypes can give more information than single locus, they are useful in QTL fine mapping, evolution and so on. Daughter and grand-daughter designs are two very powerful designs for QTL mapping in cattle. Here, I will show you an algorithms to reconstruct haplotype in daughter and grand-daughter designs. 丁向东 中国农业大学动物科技学院

2 Outline Haplotype inference Transmission Disequilibrium Test (TDT)
Introduction of important methods Parsimony (Clark,1990) EM (Excoffier and Slatkin, 1995) Bayesian (Stephens and Donnelly,2001) Haplotype inference using family information Transmission Disequilibrium Test (TDT) Haplotype-based TDT (Hap-TDT)

3 Genotype and Haplotype
Haplotype: A collection of alleles from same chromosome Allele & Genotype Haplotype & diplotype Loc1 2 13 Loc2 1 6 9 15 4 17 Loc7 2 13 6 1 9 15 17 4 Haplotype reconstruction Chromosome phase is unkown Chromosome phase is kown

4 Algorithms for haplotype inference
Statistical methods Parsimony (Clark,1990) EM Excoffier and Slatkin (1995); Qin et al. (2002) Bayesian Stephens and Donnelly (2001); Niu et al. (2002) Rule-based methods Minimum recombination principle Qian and Beckmann (2002); Baruch, et al. (2006) So far, there are two categories of algorithms for haplotype inference: statistical methods and Rule-based methods. Statistical methods are usually very time consuming. Parsimony, maximum likelihood via EM algorithm and Bayesian methods are three pupolar statistic approaches in statistical methods. On the other hand, by utilizing some reasonable biological assumptions ,such as minimum recombination principle,rule-based methods give the inference of haplotypes. Generally,rule-based approaches are very fast compare with statistical methods.

5 Parsimony (Clark,1990) Start from a homozygote
A1A1 A1A2 A2A3 A3A4 A3A5 A6A7 Start from a homozygote Determine any other ambiguous sequence using the definitive haplotype from 1 Continue this procedure until all haplotypes are resolved or until no more new haplotypes can be found

6 EM algorithm Numerical method of finding maximum likelihood estimates for parameters given incomplete data. Initial parameter values: haplotype frequencies Expectation step: compute expected values of missing data based on initial data Maximization step: compute MLE for parameters from the complete data Repeat with updated set of parameters until changes in the parameter estimates are negligible.

7 EM algorithm efficiency
Heavy computational burden with large number of loci Partition-ligation algorithm (Niu et al., 2002) PL-EM (Qin et al., 2002) Accuracy and departures from HWE Assumption of HWE in most EM-based methods Robust to departure from HWE (Fallin and Schork, 2000)

8 Bayesian method PHASE (Stephens and Donnelly, 2001)
Based on coalescent model Use Gibbs sampling So far, very accurate, but also complicated. fastPHASE (Scheet and Stephens,2006) Revised version of PHASE Much faster, missing genotype

9 Comparison of Parsimony,EM and PHASE
PHASE performs better than parsimony and EM (Stephen, 2001) PHASE and EM-based methods exhibited similar performances (Zhang et al. 2001; Xu et al. 2002)

10 Haplotype inference using family data (1)
Haplotype inference based on close relatives Reduces haplotype ambiguity and improves the efficiency Rohde and Fuerst (2001) – EM algorithm Families with both parents and their children The genotyped offspring reduce the number of potential haplotype pairs for both parents. Ding et al. (2006) - EM algorithm Families with only one parent available

11 Haplotype inference using family data (2)
Ding et al. – EM algorithm Families with only sibs genotypes of sibs Parental information Haplotype pairs of sibs Mixed family data Complete families Incomplete families One parent Only sibs the sibs are considered jointly as an independent pair, their parental information can be inferred firstly, and the haplotype pairs of sibs can be updated given the posterior parental information

12 Comparison of four different strategies
Method name Using family information? Using LD? Handling incomplete families? Complete-family-EM (Rhode and Fuerst, 2001) YES NO Incomplete-family-EM (Ding et al., 2006) GENEHUNTER (Kruglyak et al., 1996) PHASE (Stephens et al., 2003) (Ding et al., 2006)

13 Result for incomplete families
Incomplete-family-EM PHASE GENEHUNTER Running time of PHASE: 3.5 hs for the whole 100 datasets of 30 trios,187 SNPs (Marchini et al.,2006) Running time will become prohibitive for large SNPs

14 Rule-based method Minimum recombination principle
Qian and Beckmann (2002); Baruch, et al. (2006) Genetic recombination is rare Haplotype with fewer recombinants should be preferred in a haplotype reconstruction Rule 1 Rule 2 Wang et al., 2006

15 Joint EM and rule-based algorithm for (grand-) daughter design
Assumption of no recombination EM algorithm to construct diplotype Taking into account recombination Minimum recombination principle Find the diplotype of sire that minimizes the number of recombinations in the sire family

16 Example: Possible diplotypes recom. events A sire family with 50 offspring phase-unknown genotype of sire:(12;12;12;12;12)

17 Results 10 sire families with 30 offspring each
HSHAP PedPhase SimWalk2 Error rate in sires 0.007c 0.682a 0.126b Error rate in offspring 0.021c 0.0330b 0.0705a Running time (m) 8.211 0.006 60.17 10 sire families with 30 offspring each 10 SNPs , 0.5 centiMorgan

18 Results - incomplete data
HSHAP PedPhase SimWalk2 Error rate in sires 0.0310c 0.8910a 0.0930b Error rate in offspring 0.0159c 0.7607a 0.2164b Running time (m) 0.01 0.002 10.65 10 sire families with 20 offspring each 5SNPs, 0.5 centiMorgan 30% individuals with one random missing locus.

19 Linkage vs Association
Family-based Matching/ethnicity generally unimportant Few markers for genome coverage ( STRs) Can be weak design Good for initial detection; poor for fine-mapping Powerful for rare variants Association Families or unrelateds Matching/ethnicity crucial Many markers req for genome coverage (105 – 106 SNPs) Powerful design Poor for initial detection; good for fine-mapping Powerful for common variants; rare variants generally impossible

20 TDT (Transmission Disequilibrium Test)
Compares the distribution of transmitted and non-transmitted alleles by parents of affected offspring (Spielman et al. 1993) Non-transmitted allele total transmitted allele M1 M2 a b a+b c d c+d a+c b+d 2n If the marker is unlinked to the causative locus then we expect b = c, else, one of the alleles will tend to be transmitted more often

21 TDT TDT (Transmission Disequilibrium Test)
Good for fine-mapping, poor for initial detection Robust for population stratification/admixture Widely used in genome-wide association study Extended to all kinds of data structure Multi allelic markers (Sham and Curtis, 1995) Multiple siblings (Spielman,1998; Boehnke, 1998) Missing parental data (Sun, 1999) Extended pedigree (Martin et al., 2000) Quantitative traits (Allison,1997; Ding et al, 2004)

22 Comparison of PDT,QTDT and PTDT
Quantitative trait # ped # alleles QTL effect Power Type I error PDT QTDT PTDT 10 2 0.1 59 18 57 4 5 1 0.3 74 61 82 7 3 8 0.5 85 81 92

23 Hap-TDT Two categories of haplotype-based TDT
Hap-TDT (Haplotype-based TDT) Haplotype is more informative than single marker The original TDT and most of its extensions consider one marker at a time. Two categories of haplotype-based TDT Haplotype reconstruction first Sethuraman (1997); Wilson (1997); Clayton and Jones (1999); Zhao et al. (2000); Zhang et al.(2003) Implicit haplotype reconstruction Dudbridge (2003)

24 Hap-TDT vs TDT Zhang et al. (2003) Qualitative trait
Quantitative trait (h2 = 0.06)

25 Hap-TDT vs TDT 5SNPs, Excluding causal variant SNP

26 Hap- TDT Problem of multiple comparisons
Increase in the degree of freedom More markers A large number of haplotypes A large number of degrees of freedom Limit the power of test

27 Acknowledgement Prof. Qin Zhang (China Agricultural University) Prof. Henner Simianer (University of Goettingen)


Download ppt "中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype."

Similar presentations


Ads by Google