Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories 1 23 4 1 23 4 1 2 3 4 Global Pedigrees Finding.

Slides:



Advertisements
Similar presentations
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Advertisements

Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
Coalescent Module- Faro July 26th-28th 04 Monday H: The Basic Coalescent W: Forest Fire W: The Coalescent + History, Geography.
Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population Yufeng Wu Dept. of Computer Science and Engineering University of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Recombination and genetic variation – models and inference
Sampling distributions of alleles under models of neutral evolution.
A New Model for Coalescent with Recombination Zhi-Ming Ma ECM2013 PolyU
Preview What does Recombination do to Sequence Histories. Probabilities of such histories. Quantities of interest. Detecting & Reconstructing Recombinations.
N-gene Coalescent Problems Probability of the 1 st success after waiting t, given a time-constant, a ~ p, of success 5/20/2015Comp 790– Continuous-Time.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
Forward Genealogical Simulations Assumptions:1) Fixed population size 2) Fixed mating time Step #1:The mating process: For a fixed population size N, there.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
D. Gusfield, V. Bansal (Recomb 2005) A Fundamental Decomposition Theory for Phylogenetic Networks and Incompatible Characters.
Inference of Complex Genealogical Histories In Populations and Application in Mapping Complex Traits Yufeng Wu Dept. of Computer Science and Engineering.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Algorithms to Distinguish the Role of Gene-Conversion from Single-Crossover recombination in populations Y. Song, Z. Ding, D. Gusfield, C. Langley, Y.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
CSB Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes) Yufeng Wu and Dan Gusfield University of California, Davis.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Continuous Coalescent Model
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
Inferring Evolutionary History with Network Models in Population Genomics: Challenges and Progress Yufeng Wu Dept. of Computer Science and Engineering.
My wish for the project-examination It is expected to be 3 days worth of work. You will be given this in week 8 I would expect 7-10 pages You will be given.
Phylogenetic Networks of SNPs with Constrained Recombination D. Gusfield, S. Eddhu, C. Langley.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
RECOMB Satellite Workshop, 2007 Algorithms for Association Mapping of Complex Diseases With Ancestral Recombination Graphs Yufeng Wu UC Davis.
Combinatorics & the Coalescent ( ) Tree Counting & Tree Properties. Basic Combinatorics. Allele distribution. Polya Urns + Stirling Numbers. Number.
Molecular phylogenetics
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
What is of interest to calculate ? for open problems Semple and Steel.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Models and their benefits. Models + Data 1. probability of data (statistics...) 2. probability of individual histories 3. hypothesis testing 4. parameter.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
5 Open Problems in Bioinformatics Pedigrees from Genomes Comparative Genomics of Alternative Splicing Viral Annotation Evolving Turing Patterns Protein.
Population genetics. coalesce 1.To grow together; fuse. 2.To come together so as to form one whole; unite: The rebel units coalesced into one army to.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories Acknowledgements Yun Song - Rune Lyngsø - Mike Steel - Carsten.
Yufeng Wu and Dan Gusfield University of California, Davis
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Multiple Alignment and Phylogenetic Trees
L4: Counting Recombination events
Estimating Recombination Rates
ReCombinatorics The Algorithmics and Combinatorics of Phylogenetic Networks with Recombination Dan Gusfield U. Oregon , May 8, 2012.
The coalescent with recombination (Chapter 5, Part 1)
Recombination, Phylogenies and Parsimony
Trees & Topologies Chapter 3, Part 2
Trees & Topologies Chapter 3, Part 2
Outline Cancer Progression Models
Presentation transcript:

Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding Common Ancestors NOW Population Genetics and Genealogies NOW Forward in time Backward in time

Wright-Fisher Model of Population Reproduction Haploid Model i. Individuals are made by sampling with replacement in the previous generation. ii. The probability that 2 alleles have same ancestor in previous generation is 1/2N Diploid Model Individuals are made by sampling a chromosome from the female and one from the male previous generation with replacement Assumptions 1.Constant population size 2.No geography 3.No Selection 4.No recombination

10 Alleles’ Ancestry for 15 generations Generate new generations according to the haploid model Sort generations forward in time, such that individual 1’s children in the next generation comes first, then the children of 2.. Pick individuals in the present and label edges to their ancestors back in time

Mean, E(X 2 ) = 2N. Ex.: 2N = , Generation time 30 years, E(X 2 ) = years. Waiting for most recent common ancestor - MRCA P(X 2 = j) = (1-(1/2N)) j-1 (1/2N) Distribution of time until 2 alleles had a common ancestor, X 2 ?: P(X 2 > j) = (1-(1/2N)) j P(X 2 > 1) = (2N-1)/2N = 1-(1/2N) 1 2N j j

P(k):=P{k alleles had k distinct parents} 1 2N 1 (2N) k k -> any 2N *(2N-1) *..* (2N-(k-1)) =: (2N) [k] k -> k Ancestor choices: k -> k-1 For k << 2N: k -> j S k,j - the number of ways to group k labelled objects into j groups.(Stirling Numbers of second kind.

Expected Height and Total Branch Length Expected Total height of tree: H k = 2(1-1/k) /(k-1)) ca= 2*ln(k-1) k 2 1 2/(k-1) Branch Lengths 1/3 1 Time Epoch i.Infinitely many alleles finds 1 allele in finite time. ii. In takes less than twice as long for k alleles to find 1 ancestors as it does for 2 alleles. Expected Total branch length in tree, L k : 2*(1 + 1/2 + 1/ /(k-1)) ca= 2*ln(k-1)

6 Realisations with 25 leaves Observations: Variation great close to root. Trees are unbalanced.

Sampling more sequences The probability that the ancestor of the sample of size n is in a sub-sample of size k is Letting n go to infinity gives (k-1)/(k+1), i.e. even for quite small samples it is quite large.

Three Models of Alleles and Mutations. i. Only identity, non- identity is determinable ii. A mutation creates a new type. i. Allele is represented by a line. ii. A mutation always hits a new position. i. Allele is represented by a sequence. ii. A mutation changes nucleotide at chosen position. Infinite Allele  Infinite Site  Finite Site acgtgctt acgtgcgt acctgcat tcctgcat acgtgctt acgtgcgt acctgcat tcctggct tcctgcat 

Final Aligned Data Set: Infinite Site Model

Recombination-Coalescence Illustration Copied from Hudson 1991 Intensities Coales. Recomb. 1 2  3 2  6 2  1 (1+b)  0  3 (2+b)  b Backward in time

Local Inference of Recombinations Four combinations Incompatibility: Myers-Griffiths (2002): Number of Recombinations in a sample, N R, number of types, N T, number of mutations, N M obeys: T... G T... C A... G A... C Recoding At most 1 mutation per column 0 ancestral state, 1 derived state

”Observing” Recombinations: Hudson & Kaplan’s R M If you equate R M with expected number of recombinations, this could be used as an estimator. Unfortunately, R M is a gross underestimate of the real number of recombinations

Minimal Number of Recombinations Last Local Tree Algorithm: L 21 Data 2 n i-1i 1 Trees The Kreitman data (1983): 11 sequences, 3200bp, 43(28) recoded, 9 different How many neighbors? Bi-partitions How many local trees? Unrooted Coalescent

Metrics on Trees based on subtree transfers. Pretending the easy problem (unrooted) is the real problem (age ordered), causes violation of the triangle inequality: Tree topologies with age ordered internal nodes Rooted tree topologies Unrooted tree topologies Trees including branch lengths

Observe that the size of the unit-neighbourhood of a tree does not grow nearly as fast as the number of trees Allen & Steel (2001) Song (2003+) Due to Yun Song Tree Combinatorics and Neighborhoods

Due to Yun Song

minARGs: Recombination Events & Local Trees True ARG Reconstructed ARG ((1,2),(1,2,3)) ((1,3),(1,2,3)) n=7,  =10,  =75 Minimal ARG True ARG Mutation information on only one side Mutation information on both sides 0 4 Mb Hudson- Kaplan Myers- Griiths Song-Hein n=8,  =40 n=8,  =15

Counting + Branch and Bound Algorithm ? Exact length Lower bound Upper Bound k-recombinatination neighborhood k Song, Y.S., Lyngsø, R.B. & Hein, J. (2006) Counting all possible ancestral configurations of sample sequences in population genetics. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(3), 239–251 Lyngsø, R.B., Song, Y.S. & Hein, J. (2008) Accurate computation of likelihoods in the coalescent with recombination via parsimony. Lecture Notes in Bioinformatics: Proceedings of RECOMB –477

BB & Heuristic minimal ancestral recombination graphs

Time versus Spatial 1: Coalescent-Recombination (Griffiths, 1981; Hudson, Wiuf & Hein, 1999) Temporal ProcessSpatial Process ii. The trees cannot be reduced to Topologies i. The process is non-Markovian * * =

Elston-Stewart (1971) -Temporal Peeling Algorithm: Lander-Green (1987) - Genotype Scanning Algorithm: Mother Father Condition on parental states Recombination and mutation are Markovian Mother Father Condition on paternal/maternal inheritance Recombination and mutation are Markovian Time versus Spatial 2: Pedigrees

Time versus Spatial 3: Phylogenetic Alignment Optimisation Algorithms indels of length 1 (David Sankoff, 1973) Spatial indels of length k (Bjarne Knudsen, 2003) Temporal Statistical Alignment Spatial: Temporal:

- recombination 27 ACs The Griffiths-Ethier-Tavare Recursions No recombination: Infinite Site Assumption Ancestral State Known History Graph: Recursions Exists No cycles Possible Histories without Recombination for simple data example + recombination 9*10 8 ACs

1st 2nd Ancestral configurations to 2 sequences with 2 segregating sites mid-point heuristic M M R C R C C

1st 2nd 0-ARG 5 states 1-ARG +15 states 2-ARG +10 states ,  = 2, 0 ,  = 1, 1 ,  = 2,

Likelihood Calculations on the  -ARG Example: 0-ARG 2-ARG1-ARG

For sites=7, sequences=6, then th of states visited Cut-off  -ARG,  =2  -ARG likelihood calculation

Combining Ancestral Individuals and the Coalescent Wiuf & Hein, Let T be the time, when somebody was everybody’s ancestor. Changs’ result: lim T*/log 2 (N) =1 prob. 1 Unify the two processes: I. Sample more individuals II. Let each have 2 parents with probabilty p. Result: A discontinuity at 1. For p<1 change log 2  log p Comment: Genetic Ancestors is a vanishing set within Genealogical Ancestors.

Reconstructing global pedigrees: Superpedigrees Steel and Hein, 2006 The gender-labeled pedigrees for all pairs defines global pedigree k Gender-unlabeled pedigrees don’t!! Steel, M. & Hein, J. (2006) Reconstructing pedigrees: a combinatorial perspective. Journal of Theoretical Biology, 240(3), 360–367

All embedded phylogenies are observable Do they determine the pedigree? Genomes with  and  --> infinity  recombination rate,  mutation rate Benevolent Mutation and Recombination Process Counter example: Embedded phylogenies:

Infinite Sequences: From ARG to Pedigree Given A/B/C/D/E above how much does that constrain set of pedigrees How many pedigrees are compatible with A/B/C/D/E varying over data? A. The ARG? What can you observe from data (infinite sequences)? C. Sequence of local trees? D. Set of local trees? E. Set/Sequences of local unrooted tree topologies? B. Sequence of neighbor pairs of local trees with recombination points Going to neighbor triples, quadruples,.., be more restrictive than pairs? F. Set/Sequences of local bipartitions? (neighbor pairs…

Infinite Sequences: From ARG to Pedigree A. The ARG? What can you observe from data (infinite sequences)? C. Sequence of local trees? D. Set of local trees? E. Set/Sequences of local unrooted tree topologies? B. Sequence of pairs of local trees with recombination points Going to triples, quadruples,.., be more restrictive than pairs? F. Set/Sequences of bipartitions