High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

Mapping analysis software Dr Ian Carr PhD. MCSD. Leeds Institute of Molecular Medicine St Jamess University Hospital.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Note that the genetic map is different for men and women Recombination frequency is higher in meiosis in women.
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
METHODS FOR HAPLOTYPE RECONSTRUCTION
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Objectives Cover some of the essential concepts for GWAS that have not yet been covered Hardy-Weinberg equilibrium Meta-analysis SNP Imputation Review.
. Parametric and Non-Parametric analysis of complex diseases Lecture #6 Based on: Chapter 25 & 26 in Terwilliger and Ott’s Handbook of Human Genetic Linkage.
Joint Linkage and Linkage Disequilibrium Mapping
From sequence data to genomic prediction
MALD Mapping by Admixture Linkage Disequilibrium.
Lab 13: Association Genetics. Goals Use a Mixed Model to determine genetic associations. Understand the effect of population structure and kinship on.
University of Connecticut
Approaching the Long-Range Phasing Problem using Variable Memory Markov Chains Samuel Angelo Crisanto 2015 Undergraduate Research Symposium Brown University.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Ion Mandoiu CSE Department, University of Connecticut Joint work with Justin.
. Hidden Markov Models For Genetic Linkage Analysis Lecture #4 Prepared by Dan Geiger.
Genotype Error Detection using Hidden Markov Models of Haplotype Diversity Justin Kennedy, Ion Mandoiu, Bogdan Pasaniuc CSE Department, University of Connecticut.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
Imputation-based local ancestry inference in admixed populations Ion Mandoiu Computer Science and Engineering Department University of Connecticut Joint.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
KinSNP Software for homozygosity mapping of disease genes using SNP microarrays El-Ad David Amir 1, Ofer Bartal 1, Yoni Sheinin 2, Ruti Parvari 2 and Vered.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Fine mapping QTLs using Recombinant-Inbred HS and In-Vitro HS William Valdar Jonathan Flint, Richard Mott Wellcome Trust Centre for Human Genetics.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Imputation 2 Presenter: Ka-Kit Lam.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
10cM - Linkage Mapping Set v2 ABI Median intermarker distance: 4.7 Mb Mean intermarker distance: 5.6 Mb Mean genetic gap distance: 8.9 cM Average Heterozygosity.
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Copy Number Variation Eleanor Feingold University of Pittsburgh March 2012.
INTRODUCTION TO ASSOCIATION MAPPING
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
2007 Melvin Tooker Animal Improvement Programs Laboratory USDA Agricultural Research Service, Beltsville, MD, USA
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
California Pacific Medical Center
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
The International Consortium. The International HapMap Project.
Imputation-based local ancestry inference in admixed populations
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Biostatistics-Lecture 19 Linkage Disequilibrium and SNP detection
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Types of genome maps Physical – based on bp Genetic/ linkage – based on recombination from Thomas Hunt Morgan's 1916 ''A Critique of the Theory of Evolution'',
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Gonçalo Abecasis and Janis Wigginton University of Michigan, Ann Arbor
Constrained Hidden Markov Models for Population-based Haplotyping
Imputation-based local ancestry inference in admixed populations
Error Checking for Linkage Analyses
Haplotype Reconstruction
Haplotype Estimation Using Sequencing Reads
Caroline Durrant, Krina T. Zondervan, Lon R
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry  Oscar.
Brian K. Maples, Simon Gravel, Eimear E. Kenny, Carlos D. Bustamante 
Volume 173, Issue 1, Pages e9 (March 2018)
Homozygosity Haplotype Allows a Genomewide Search for the Autosomal Segments Shared among Patients  Hitoshi Miyazawa, Masaaki Kato, Takuya Awata, Masakazu.
Robust Inference of Identity by Descent from Exome-Sequencing Data
Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies 
IBD Estimation in Pedigrees
Pier Francesco Palamara, Laurent C. Francioli, Peter R
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals  Brian L. Browning, Sharon.
A Fast, Powerful Method for Detecting Identity by Descent
Stephen Leslie, Peter Donnelly, Gil McVean 
X-chromosomal markers and FamLinkX
Yu Zhang, Tianhua Niu, Jun S. Liu 
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Gonçalo R. Abecasis, Janis E. Wigginton 
Presentation transcript:

High resolution detection of IBD Sharon R Browning and Brian L Browning Supported by the Marsden Fund

Aim Detect short segments of identity by descent (IBD) in “unrelated” individuals or distant relatives  <1 cM (or < 1 Mb)  Need dense SNP data  Account for linkage disequilibrium (LD) Various applications  IBD mapping in humans – midway between linkage mapping and association mapping.  Could be useful for QTL mapping in cows and sheep?

What is IBD? In a pedigree, IBD is defined in terms of pedigree founders:  Two haplotypes are IBD if they are copies of the same founder haplotype.  IBD regions typically large (10+ cM) for small pedigrees. Founder (grandmother) Half-cousins (may share IBD through grandmother)

IBD without a pedigree is nebulous Assuming no recurrent mutation, identical alleles are IBD  this definition leads to ordinary association tests Useful IBD for improvements in mapping  Extends beyond background LD  Due to non-ancient ancestry

What level of resolution is needed? Very long IBD stretches (5+ Mb)  are easy to detect  but are too rare. For IBD mapping  Expected size of IBD regions depends on when the mutation(s) entered the population.  Small IBD regions give better localization.

IBD Model Part I Uses Beagle model previously applied to  haplotype phase inference  imputation  multilocus association testing. No need to prune SNPs → greater power to detect short segments. Beagle LD model is computationally efficient.

Beagle model At each marker location, haplotypes are clustered. Number of clusters can vary, depending on LD structure. Approx. 100 clusters in a data set with 2000 individuals. The model is constructed to be Markov (in the haplotype clusters).

IBD Model Part II Markov model for IBD with two states  0 or 1 pair of haplotypes shared IBD between a pair of individuals.  Need to check for homozygosity within individuals first. Transition probabilities specified by the user based on population history.

IBD Model Part III Allow for some genotyping error  Computationally prohibitive to sum over all possible miscalled genotypes.  Instead allow for IBD when there is no IBS, with a penalty.  P(haplotypes | IBD) multiplied by error rate if haplotypes are not IBS at the position.  Used error rate = 0.01 or (depending on data quality).  Doesn’t correct for the messed up haplotypes caused by genotype error.

Estimation Build LD model using 10 iterations of stochastic EM. Simultaneous phasing and IBD detection.  Don’t have to worry about getting haplotypes wrong. Calculate IBD probabilities using forward- backward algorithm for this model. Repeat with 3 restarts of LD model building, then average the IBD probabilities.  Model can get caught in local max, leading to false positive IBD.

Threshold for IBD We use a threshold of 0.99 on posterior IBD probability. Define length of IBD region as distance over which IBD probability > 0.5  but IBD probability must be ≥ 0.99 somewhere in the region. IBD prob IBD region

Data 1958 British Birth Cohort (1958BC)  Genotyped on Illumina 550K platform (Sanger) and Affymetrix 500K (WTCCC).  Genotypes re-called by Beagle (using LD) to improve accuracy.  1400 individuals.

Detection of IBD – 1958BC Chromosome 22, non-monomorphic markers  Illumina: 8407 SNPs  Affymetrix: 5098 SNPs In 40,000 random pairs found  Illumina: 54 IBD regions (lengths 0.52 – 12.5 cM)  Affymetrix: 19 IBD regions (lengths 2.1 – 12.1 cM)  58 regions total For the 4 regions found by Affymetrix but not by Illumina, Illumina had IBD probability ≥0.92  Various regions shown on next 3 slides.

0.5 cM region Illumina = solid black line; Affymetrix = dashed blue line

Conclusions New, very dense genotype data provide new opportunity to detect small IBD regions. Detection of short IBD regions will play an important role in various genetic analyses. Computation is challenging  Need a pre-filter?