Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.

Slides:



Advertisements
Similar presentations
A New Recombination Lower Bound and The Minimum Perfect Phylogenetic Forest Problem Yufeng Wu and Dan Gusfield UC Davis COCOON07 July 16, 2007.
Advertisements

Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Efficient Computation of Close Upper and Lower Bounds on the Minimum Number of Recombinations in Biological Sequence Evolution Yun S. Song, Yufeng Wu,
Association Studies, Haplotype Blocks and Tagging SNPs Prof. Sorin Istrail.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Recombination and genetic variation – models and inference
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Planning under Uncertainty
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
A coalescent computational platform for tagging marker selection for clinical studies Gabor T. Marth Department of Biology, Boston College
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
L6: Haplotype phasing. Genotypes and Haplotypes Each individual has two “copies” of each chromosome. Each individual has two “copies” of each chromosome.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Estimating and Reconstructing Recombination in Populations: Problems in Population Genomics Dan Gusfield UC Davis Different parts of this work are joint.
L5: Estimating Recombination Rates. Review  m M : min. number of recombination events in any explanation of the haplotypes in M  Last time, we covered.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Estimation Error and Portfolio Optimization Global Asset Allocation and Stock Selection Campbell R. Harvey Duke University, Durham, NC USA National Bureau.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Case(Control)-Free Multi-SNP Combinations in Case-Control Studies Dumitru Brinza and Alexander Zelikovsky Combinatorial Search (CS) for Disease-Association:
Gil McVean Department of Statistics, Oxford Approximate genealogical inference.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
CS177 Lecture 10 SNPs and Human Genetic Variation
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Recombination based population genomics Jaume Bertranpetit Marta Melé Francesc Calafell Asif Javed Laxmi Parida.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Lecture 11: Linkage Analysis IV Date: 10/01/02  linkage grouping  locus ordering  confidence in locus ordering.
National Taiwan University Department of Computer Science and Information Engineering An Approximation Algorithm for Haplotype Inference by Maximum Parsimony.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
The Haplotype Blocks Problems Wu Ling-Yun
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Yufeng Wu and Dan Gusfield University of California, Davis
Algorithms for estimating and reconstructing recombination in populations Dan Gusfield UC Davis Different parts of this work are joint with Satish Eddhu,
Hypotheses and test procedures
Of Sea Urchins, Birds and Men
Minimum Spanning Tree 8/7/2018 4:26 AM
L4: Counting Recombination events
Estimating Recombination Rates
Estimation Error and Portfolio Optimization
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
The coalescent with recombination (Chapter 5, Part 1)
Recombination, Phylogenies and Parsimony
Estimation Error and Portfolio Optimization
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Outline Cancer Progression Models
Approximation Algorithms for the Selection of Robust Tag SNPs
Approximation Algorithms for the Selection of Robust Tag SNPs
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Estimating Recombination Rates

LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes. Clearly the test is dependent on the recombination rate. Higher recombination rate destroys homozygosity It turns out that recombination rates do vary a lot in the genome, and there are many regions with little or no recombination

Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped in 129 trios. The direct approach is to do a case-control analysis using individual SNPs. Instead, they decided to focus on haplotypes to corect for local correlation. The study finds that large blocks (upto 100kb) show no evidence of recombination, and contain only 2-4 haplotypes There is some recombination across blocks

Daly et al, 2001

Recombination in human chromosome 22 (Mb scale) Q: Can we give a direct count of the number of the recombination events? Dawson et al. Nature 2002

Recombination hot-spots (fine scale)

Recombination rates (chimp/human) Fine scale recombination rates differ between chimp and human The six hot-spots seen in human are not seen in chimp

Estimating recombination rate Given population data, can you predict the scaled recombination rate  in a small region? Can you predict fine scale variation in recombination rates (across 2-3kb)?

Combinatorial Bounds for estimating recombination rate Recall that expected #recombinations =  log n Procedure Generate N random ARGs that results in the given sample Compute mean of the number of recombinations Alternatively, generate a summary statistic s from the population. For each , generate many populations, and compute the mean and variance of s (This only needs to be done once). Use this to select the most likely  What is the correct summary statistic? Today, we talk about the min. number of recombination events as a possible summary statistic. It is not the most natural, but it is the most interesting computationally.

The Infinite Sites Assumption & the 4 gamete condition Consider a history without recombination. No pair of sites shows all four gametes 00,01,10,11. A pair of sites with all 4 gametes implies a recombination event

Hudson & Kaplan Any pair of sites (i,j) containing 4 gametes must admit a recombination event. Disjoint (non-overlapping) sites must contain distinct recombination events, which can be summed! This gives a lower bound on the number of recombination events. Based on simulations, this bound is not tight.

Myers and Griffiths’03: Idea 1 Let B(i,j) be a lower bound on the number of recombinations between sites i and j. 1=i 1 i 2 i 3 i 4 i 5 i 6 i k =n Can we compute max P R(P) efficiently?

The R m bound

Improved lower bounds The R m bound also gives a general technique for combining local lower bounds into an overall lower bound. In the example, R m =2, but we cannot give any ARG with 2 recombination events. Can we improve upon Hudson and Kaplan to get better local lower bounds?

Myers & Griffiths: Idea 2 Consider the history of individuals. Let H t denote the number of distinct haplotypes at time t One of three things might happen at time t: –Mutation: H t increase by at most 1 –Recombination: H t increase by at most 1 –Coalescence: H t does not increase

The R H bound Ex: R>= 8-3-1=4

R H bound In general, R H can be quite weak: –consider the case when S>H However, it can be improved –Partitioning idea: sum R H over disjoint intervals –Apply to any subset of columns. Ex: Apply R H to the yellow columns (BB’05)

Computing the R H bound Goal: Compute –Max H’ R(H’) It is equivalent to the following: Find the smallest subset of columns such that every pair of rows is ‘distinguished’ by at least one column For example, if we choose columns 1, 8, rows 1,2, and rows 5,6 remain identical. If choose columns 1,8,15 all rows are distinct. 1: : : : : : : : (BB’05)

Computing R H A greedy heuristic: –Remove all redundant rows. –Set of columns, C=Ø –Set S = {all pairs of rows} –Iterate while (S<>Ø): Select a column c that separates maximum number of pairs P in S. C=C+{c} S=S-P –Return n-1-|C|

Computing R H How tight is R H ? Clearly, by removing a haplotype, R H decreases. However, the number of recombinations needed doesn’t really change

R s bound: Observation I abc 000::1000::1 a s  Non-informative column: If a site contains at most one 1, or one 0, then in any history, it can be obtained by adding a mutation to a branch.  EX: if a is the haplotype containing a 1, It can simply be added to the branch without increasing number of recombination events  R(M) = R(M-{s})

R s bound: Observation 2 Redundant rows: If two rows h 1 and h 2 are identical, then –R(M) = R(M-{h 1 }) r1r1 r2r2 c

Rs bound: Observation 3 Suppose M has no non- informative columns, or redundant rows. –Then, at least one of the haplotypes is a recombinant. –There exists h s.t. R(M) = R(M-{h})+1 –Which h should you choose?

R s bound (Procedural) Procedure Compute_R s (M) If  non-informative column s return (Compute_R s (M-{s})) Else if  redundant row h return (Compute_R s (M-{h})) Else return (1 + min h (Compute_R s (M-{h}))

Results

Additional results/problems Using dynamic programming, R s can be computed in 2^n poly(mn) time. Also, Rs can be augmented to handle intermediates. Are there poly. time lower bounds? –The number of connected components in the conflict graph is a lower bound (BB’04). Fast algorithms for computing ARGs with minimum recombination. –Poly. Time to get ARG with 0 recombination –Poly. Time to get ARGs that are galled trees (Gusfield’03)

Underperforming lower bounds Sometimes, R s can be quite weak An R I lower bound that uses intermediates can help (BB’05)

LPL data set 71 individuals, 9.7Kbp genomic sequence –R m =22, R h =70