Ronnie A. Sebro Haplotype reconstruction BMI 374 10/21/2004.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Introduction to Haplotype Estimation Stat/Biostat 550.
Tutorial #1 by Ma’ayan Fishelson
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
SNP Haplotype reconstruction Statistics 246, 2002, Week 14, Lecture 2 Not complete.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Tutorial #5 by Ma’ayan Fishelson. Input Format of Superlink There are 2 input files: –The locus file describes the loci being analyzed and parameters.
Concepts and Connections
Chapter 11 Mendel & The Gene Idea.
Basics of Linkage Analysis
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
1) Linkage means A) Alleles at different loci are independent B) Alleles at different loci are physically close to each other and on the same chromosome.
MALD Mapping by Admixture Linkage Disequilibrium.
Tutorial by Ma’ayan Fishelson Changes made by Anna Tzemach.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Justin Kennedy, Ion Mandoiu, Bogdan Pasaniuc CSE Department, University of Connecticut.
Mapping Basics MUPGRET Workshop June 18, Randomly Intermated P1 x P2  F1  SELF F …… One seed from each used for next generation.
Reconstructing Genealogies: a Bayesian approach Dario Gasbarra Matti Pirinen Mikko Sillanpää Elja Arjas Department of Mathematics and Statistics
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Tutorial #5 by Ma’ayan Fishelson Changes made by Anna Tzemach.
Tutorial #5 by Ma’ayan Fishelson
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Standardization of Pedigree Collection. Genetics of Alzheimer’s Disease Alzheimer’s Disease Gene 1 Gene 2 Environmental Factor 1 Environmental Factor.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
1 Father of genetics. Studied traits in pea plants.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
1 Efficient Haplotype Inference on Pedigrees and Applications Tao Jiang Dept of Computer Science University of California – Riverside (joint work with.
Advanced Algorithms and Models for Computational Biology -- a machine learning approach Population Genetics: SNPS Haplotype Inference Eric Xing Lecture.
Non-Mendelian Genetics
Patterns of Inheritance By Clark and Garret. Heredity Definition- The transmission of traits from one generation to the next.
Calculation of IBD State Probabilities Gonçalo Abecasis University of Michigan.
CS177 Lecture 10 SNPs and Human Genetic Variation
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
Tutorial #10 by Ma’ayan Fishelson. Classical Method of Linkage Analysis The classical method was parametric linkage analysis  the Lod-score method. This.
1 B-b B-B B-b b-b Lecture 2 - Segregation Analysis 1/15/04 Biomath 207B / Biostat 237 / HG 207B.
9 Genes, chromosomes and patterns of inheritance.
1 Genes and MS in Tasmania, cont. Lecture 6, Statistics 246 February 5, 2004.
California Pacific Medical Center
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Meiotic gene conversion in humans: rate, sex ratio, and GC bias Amy L. Williams June 19, 2013 University of Chicago.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Animal breeders use test crosses to determine whether an individual animal ________. 1.is fertile 2.is homozygous dominant or heterozygous 3.is homozygous.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Lecture 17: Model-Free Linkage Analysis Date: 10/17/02  IBD and IBS  IBD and linkage  Fully Informative Sib Pair Analysis  Sib Pair Analysis with Missing.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Recombination (Crossing Over)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Mapping Quantitative Trait Loci
中国畜牧兽医学会信息技术分会 2009年会. 哈尔滨 Haplotype inference and haplotype-based transmission disequilibrium test (Hap-TDT) Hello,, everyone. Nowadays, reconstruct haplotype.
Error Checking for Linkage Analyses
Haplotype Reconstruction
Ho Kim School of Public Health Seoul National University
IBD Estimation in Pedigrees
Linkage Analysis Problems
X-chromosomal markers and FamLinkX
Minimum-Recombinant Haplotyping in Pedigrees
Presentation transcript:

Ronnie A. Sebro Haplotype reconstruction BMI /21/2004

Mendelian Laws of Inheritance Law of Segregation –Alleles separate when gametes are formed Law of Independent assortment –Allele pairs separate independently during formation of gametes Mendelian Inheritance –Each offspring receives one allele from male parent, and the other from female parent

Complex Diseases Polygenic or multifactorial diseases Run in families, but do not show Mendelian (monogenic) inheritance Complex interaction between disease susceptibility genes, and environmental factors Examples: asthma, schizophrenia

Finding disease genes Two common methods employed Pedigree analysis –Linkage analysis –Affected individuals inherit/share the same portion of the genome Case-control analysis –Association analysis –Affected individuals have different allele frequencies (higher or lower) than controls

Definitions Marker – small segments of DNA with specific features Types of markers –SNPs AATAA vs. AACAA –Microsatellites (STRs) -CAGCAGCAG- vs. –CAGCAGCAGCAGCAG- Locus - physical position of a marker on a chromosome Homozygous – when both alleles at a locus are the same Heterozygous – when the alleles at a locus are different

Definitions Haplotype –All alleles, one from each locus that are on the same chromosome Recombinant –An individual who inherited a haplotype not identical to that inherited by his/her parent Phase –Information about which alleles are inherited from each parent

Example GenotypesHaplotypes

Enumerating Haplotypes Consider an individual heterozygous at 3 loci e.g Several possible haplotypes Haplotype space can be potentially huge For n SNPs – 2 n haplotypes

Finding disease genes Both tests (association based tests, and pedigree linkage analysis tests) tentatively converge Convergence is at the point of requiring to find a haplotype/allele in tight association (LD) or inherited by all affected individuals Putative disease locus thereby identified

Why Haplotype? Single allele vs. Haplotype Advantages of using haplotype –Improved Power ! Disadvantages of using haplotype –Haplotypes aren’t readily known

Problem Data generated from sequencer in the following format (SNPs) Genotypes are known Haplotypes are unknown Pedigree

Haplotyping Haplotyping can be done at molecular level – whole genome derived haplotypes (ref. Douglas et al., 2001) Algorithms preferred because –Lower cost of genotyping –Fast and accurate algorithms

Current Haplotyping Algorithms Algorithms used for unphased data Clark Algorithm (Andy Penn State) E-M Algorithm (Stephens et al. ) Bayesian Haplotype Inference (Jun Liu et al.)

Clark Algorithm Enumerate haplotypes which exist with certainty in the sample (individuals heterozygous at 0 or 1 loci) Assigns ambiguous haplotypes to those in the known list Solutions are dependent on the order in which the individuals with unresolved haplotype phase are entered The algorithm does not assume HW equilibrium

E-M Algorithm Estimate population haplotype probabilities is via maximum likelihood estimation; finding the values of the haplotype probabilities which optimize the probability of the observed data The maximum likelihood estimates of the haplotype probabilities are obtained by maximization of the likelihood This is a missing data problemAssumption of HW equilibrium Software EH (Xie and Ott, 1993) and EH+ (Zhao, Curtis and Sham)

Bayesian Algorithm A dirichlet prior distribution is used for the haplotype frequencies Uses a Gibbs sampler: enables handling of many SNP loci Implemented in program HAPLOTYPER

Errata in data Genotyping Errors –(quite common esp. with SNPs) Missing data –MCAR –MAR –Non-ignorable missingness Marker order errors

Overview Discuss paper dealing with estimation of haplotypes in pedigrees (i.e. some information about phase) Minimum-Recombinant Haplotyping in Pedigrees (Qian & Beckmann) Useful for the HAPMAP project! Useful also for association analyses with the Transmission Disequilibrium Test (TDT)

Paper 1 Minimum-Recombinant Haplotyping in Pedigrees –Notation –Methods (Algorithm) –Results –Shortcomings of algorithm

Recombination Principles Minimum-Recombination Principle –In nature, recombination is a rare event –The most probable haplotypes are those that minimize the total number of recombinations needed in the pedigree Double-Recombinants –Naturally these are even rarer events, especially over such short intervals (10cM)

Notation Consider a pedigree of J family members and a set of L linked marker loci Individual – any family member Parent – a family member with at least 1 child Founder – a parent without his/her parents Offspring – a family member with at least one parent

Notation Define individual “genotyped” at locus l iff: –The genotype at locus l is known (from DNA) –The genotype data can be determined from 1º relatives Ungenotyped parent (other genotyped) –Informative if both haplotypes transmitted –Partially informative only one haplotype transmitted Genotyped offspring –Informative if at least one genotyped parent

Notation Parental source (PS) – allele that is maternally or paternally derived Grandparental source (GS) – the parental source of each parental allele

Notation For a nuclear family: denote the alleles of parent 1 denote the alleles of parent 2 denote the alleles of offspring j denote the paternal and maternal alleles of parent 1 denote the paternal and maternal alleles of parent 2 denote the paternal and maternal alleles of offspring j denote the GS of paternal and maternal alleles of individual j denote the minimum and maximum allele values, respectively

Notation denotes PS- unknown genotype with alleles a and b denotes PS-known haplotype with paternal allele A and maternal allele B (ab) = (cd) denotes that genotypes (ab) and (cd) are equal (ab) ≠ (cd) denotes that genotypes (ab) and (cd) are not equal denotes that allele c is a constituent allele of genotype (ab) denotes that allele c is not a constituent allele of genotype (ab)

Flexible Locus Type 1 If trio are all heterozygotes, and at least 1 parent and offspring not haplotyped

Flexible Locus Type 2 Two alternative haplotype assignments at locus l in a founder result in equal number of recombinant offspring

Flexible Locus Type 3 If two alternative haplotype assignments at locus l in offspring result in equal number of recombinants

Rules Divide pedigree into nuclear trios Apply rules to each trio until all individuals haplotyped, or no further inference possible Rule 1: Input missing genotype at unambiguous loci in each parent conditional on spouse and child genotypes Rule 2: Assign haplotype at each unambiguous ocus in each offspring, conditional on parental genotypes Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family

Rules Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trio Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations, retaining all configuration with the minimum recombinants

Implementation Raw genotype data

Implementation Rule 1: –Impute missing genotype at each unambiguous locus in each parent, conditional on genotypes in spouse and offspring

Implementation Rule 2: –Assign a haplotype at each unambiguous locus in each offspring, conditional on genotypes in parents in each parent-offspring trio

Implementation Rule 3: Assign haplotypes at each unambiguous locus in each founder, conditional on haplotypes in offspring and the criterion of minimum recombinants in each nuclear family

Implementation Rule 4: Assign haplotypes at each unambiguous locus in each offspring, conditional on haplotypes in parents and the criterion of minimum recombinants in each trio

Implementation Rule 5: Impute a missing genotype at each unambiguous locus in each parent, conditional on the haplotypes in offspring and the criterion of minimum recombinants in each nuclear family

Implementation Second application of rules 2 and 3

Implementation Rule 6: Locate a locus with at least one individual in a nuclear family that is flexible at this locus, enumerate the haplotype configuration into multiple configurations with alternative haplotype assignments at each flexible locus in these individuals. Retain all configurations with the minimum recombinants Reapplication of rule 3

Results A pedigree with Episodic ataxia –29 total individuals –Genotyped at 9 polymorphic markers –2 individuals not genotyped Simulation study Looped marriage structure in a pedigree with ataxia telangiecstasia

Results High degree of concordance with the maximum-likelihood method Identical haplotype configuration obtained with GENEHUNTER (ML based) in >99% of pedigrees analyzed.

Simulation Results Data set (FSize,Loci,Rec,T) BothRule-Based alone Fewer recombinants (15,10,4,STR)9028 (15,25,4,STR)9406 (15,50,4,STR)9172 (29,10,4,STR)86113 (29,25,4,STR)9118 (29,50,4,STR)89011 (15,10,4,SNP)8299 (29,10,4,SNP)76816 (15,10,0,SNP)10000 (29,10,0,SNP)9730 (17,10,0,Loop)9820

Genotype Errors Impact of genotype errors investigated Generated genotype data on 1000 pedigrees, each pedigree containing one incorrect allele in a random individual at a random marker Mean number of recombinants increased from 5 to 6.2 (1.2) 44% of these additional recombinants were double recombinants All four correct MRHCs were reconstructed in 84% of pedigrees

Marker errors The consequence of incorrect marker order on imputing haplotypes was investigated Marker loci 2-7 (of the 9 loci involved for the EA study) were permuted (6! -1 ways) Of the 719 orderings –None produced MRHCs with fewer than 5 recombinants –Only 5% had the same number of recombinants as the correct ordering –Chances of recovering all four MRHCs was 20% and 0% when 2 and 6 marker loci were incorrectly ordered

Conclusions Both genotype errors and incorrect marker order can produce additional recombinants in reconstructing haplotypes Sensitivity analyses suggest that incorrect marker orderings may have a larger impact than genotyping errors

Conclusions This haplotyping method is applicable to both STRs and SNP data Total computational requirement due to enumeration in a pedigree with J family members and L loci is on the order O(J 2 L 3 ) Computational requirements for SNP data are 3-10 times larger than for STRs (more flexible loci)

Shortcomings A genotyped individual with neither genotyped parents nor genotyped offspring cannot be analyzed in this algorithm Same problem above, even if multiple siblings and other relatives are genotyped Likelihood-based methods are able to assign haplotypes to individuals who are uninformative using this rule-based method