Outline Cancer Progression Models

Slides:



Advertisements
Similar presentations
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut DIMACS Workshop on Algorithmics in Human.
Advertisements

PHYLOGENETIC TREES Bulent Moller CSE March 2004.
Population Genetics, Recombination Histories & Global Pedigrees Finding Minimal Recombination Histories Global Pedigrees Finding.
METHODS FOR HAPLOTYPE RECONSTRUCTION
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
WABI 2005 Algorithms for Imperfect Phylogeny Haplotyping (IPPH) with a Single Homoplasy or Recombnation Event Yun S. Song, Yufeng Wu and Dan Gusfield University.
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
L6: Haplotype phasing. Genotypes and Haplotypes Each individual has two “copies” of each chromosome. Each individual has two “copies” of each chromosome.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Dispersal models Continuous populations Isolation-by-distance Discrete populations Stepping-stone Island model.
CSE182-L18 Population Genetics. Perfect Phylogeny Assume an evolutionary model in which no recombination takes place, only mutation. The evolutionary.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
CSE 291: Advanced Topics in Computational Biology Vineet Bafna/Pavel Pevzner
CSE182-L17 Clustering Population Genetics: Basics.
Evaluation of the Haplotype Motif Model using the Principle of Minimum Description Srinath Sridhar, Kedar Dhamdhere, Guy E. Blelloch, R. Ravi and Russell.
Incorporating Mutations
March 2006Vineet Bafna CSE280b: Population Genetics Vineet Bafna/Pavel Pevzner
Phylogenetic trees Sushmita Roy BMI/CS 576
Population Genetics 101 CSE280Vineet Bafna. Personalized genomics April’08Bafna.
Molecular phylogenetics
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Linear Reduction for Haplotype Inference Alex Zelikovsky joint work with Jingwu He WABI 2004.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
Informative SNP Selection Based on Multiple Linear Regression
CSE280Vineet Bafna CSE280a: Algorithmic topics in bioinformatics Vineet Bafna.
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
E QUILIBRIA IN POPULATIONS CSE280Vineet Bafna Population data Recall that we often study a population in the form of a SNP matrix – Rows.
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Estimating Recombination Rates. LRH selection test, and recombination Recall that LRH/EHH tests for selection by looking at frequencies of specific haplotypes.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
SNPs, Haplotypes, Disease Associations Algorithmic Foundations of Computational Biology II Course 1 Prof. Sorin Istrail.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
1 Population Genetics Basics. 2 Terminology review Allele Locus Diploid SNP.
Association mapping for mendelian, and complex disorders January 16Bafna, BfB.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
CSE280Vineet Bafna In a ‘stable’ population, the distribution of alleles obeys certain laws – Not really, and the deviations are interesting HW Equilibrium.
Coalescent theory CSE280Vineet Bafna Expectation, and deviance Statements such as the ones below can be made only if we have an underlying model that.
Estimating Recombination Rates. Daly et al., 2001 Daly and others were looking at a 500kb region in 5q31 (Crohn disease region) 103 SNPs were genotyped.
National Taiwan University Department of Computer Science and Information Engineering Introduction to SNP and Haplotype Analysis Algorithms and Computational.
Restriction enzyme analysis The new(ish) population genetics Old view New view Allele frequency change looking forward in time; alleles either the same.
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
International Workshop on Bioinformatics Research and Applications, May 2005 Phasing and Missing data recovery in Family Trios D. Brinza J. He W. Mao A.
Equilibria in populations
Introduction to SNP and Haplotype Analysis
An Algorithm for Computing the Gene Tree Probability under the Multispecies Coalescent and its Application in the Inference of Population Tree Yufeng Wu.
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
CSE 280A: Advanced Topics in Computational Molecular Biology
L4: Counting Recombination events
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Introduction to SNP and Haplotype Analysis
Estimating Recombination Rates
Ranking Tumor Phylogeny Trees by Likelihood
Haplotype Reconstruction
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotypes at ATM Identify Coding-Sequence Variation and Indicate a Region of Extensive Linkage Disequilibrium  Penelope E. Bonnen, Michael D. Story,
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Outline Cancer Progression Models SNPs, Haplotypes, and Population Genetics: Introduction

Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

Cancer Genomes Leukemia Breast

“Comparative Genomics” of Cancer Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 4 Tumor genome 3 Identify recurrent aberrations Mitelman Database, >40,000 aberrations Reconstruct temporal sequence of aberrations Linear model: Colorectal cancer (Vogelstein, 1988): -5q  12p*  -17p  -18q Tree model: (Desper et al.1999) 3) Find age of tumor, time of clonal expansion

Observing Cancer Progression Obtaining longitudinal (time-course) data difficult. t1 t2 t3 t4 Latitudinal data (multiple patients) readily available. Mutation, selection Human genome Tumor genome Tumor genome 2 Tumor genome 4 Tumor genome 3

Multiple Mutations 4 step model for colorectal cancer, Vogelstein, et al. (1988) New Eng. J.Med -5q  12p*  -17p  -18q Inferred from latitudinal data in 172 tumor samples.

Oncogenetic Tree models (Desper et al. JCB 1999, 2001) Given: measurements of chromosome gain/loss events in multiple tumor samples (CGH) Compute: rooted tree that best explains temporal sequence of events. {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q}

Oncogenetic Tree models (Desper et al. JCB 1999, 2000) Given: measurements of chromosome gain/loss events in multiple tumor samples {+1q}, {-8p}, {+Xq}, {+Xq, -8p}, {-8p, +1q} L = set of chromosome alterations observed in all samples Tumor samples give probability distribution on 2L

Oncogenetic Tree T = (V, E, r, p, L) rooted tree V = vertices E = edges L = set of events (leaves) r root p: E  (0,1] probability distribution T gives probability distribution on 2L e1 e2 e3 e4 e0

Results CGH of 117 cases of kidney cancer

Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999)

Extensions

Extensions Oncogenetic trees based on branching (Desper et al., JCB 1999) Maximum Likelihood Estimation (von Heydebreck et al, 2004) Mutagenic trees: mixtures of trees (Beerenwinkel, et al. JCB 2005)

Heterogeneity within a tumor Final tumor is clonal expansion of single cell lineage. Can we date the time of clonal expansion? Tsao, … Tavare, et al. Genetic reconstruction of individual colorectal tumor histories, PNAS 2000.

Estimating time of clonal expansion Microsatellite loci (MS), CA dinucleotides. In tumors with loss of mismatch repair (e.g. colorectal), MS change size.

Estimating time of clonal expansion For each MS locus, measure mean mi and variance si of size. S2allele = average of s12, …, sL2 S2loci = variance of m1, …, mL

Time to clonal expansion?

Simulation Estimates of Tumor Age Y2 Y1 Y1 = time to clonal expansion Tumor age = Y1 + Y2 Branching process simulation. Each cell in population gives birth to 0, 1 or 2 daughter cells with +- 1 change in MS size (coalescent: forward, backward, forward simulation) Posterior estimate of Y1, Y2 by running simulations, accepting runs with simulated values of S2allele, S2loci close to observed.

Results 15 patients, 25 MS loci Estimate time since clonal expansion from observed S2allele, S2loci .

Cancer: Mutation and Selection Clonal theory of cancer: Nowell (Science 1976)

Population Genetics C.C. Maley: selective sweeps of mutations in tumor cell populations Chin and Gray: solid tumors

Genetics 101 Humans are diploid: two copies of each chromosome, maternal and paternal Locus: Region on a chromosome (gene, nucleotide, etc.) Allele: “Value” at a locus Genotype: Pair of alleles (maternal and paternal) at loci on a chromosome (homozygous, heterozygous) Haplotype: Alleles of loci on same chromosome (maternal or paternal)

Allele Measurement “Old days” (< 1970?): gene variants More recently: (1980’s-90’s), various sequence based genetic markers: microsatellites, sequence tagged sites (STS), etc. Today: single nucelotide polymorphisms (SNPs)

Single Nucleotide Polymorphisms Infinite Sites Assumption: Each site mutates at most once 00000101011 10001101001 01000101010 01000000011 00011110000 00101100110 By convention, SNPs are biallelic: only two of four possible nucleotides present in population

Infinite Sites Assumption B 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 8 5 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. Each sequence has single parent. The history of a population can be expressed as a tree. The tree can be constructed efficiently

Infinite sites Assumption and Perfect Phylogeny Each site is mutated at most once in the history. All descendants must carry the mutated value, and all others must carry the ancestral value i 1 in position i 0 in position i

Perfect Phylogeny Assume an evolutionary model in which only mutation takes place, The evolutionary history is explained by a tree in which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny. How can one reconstruct such a tree?

The 4-gamete condition A column i partitions the set of species into two sets i0, and i1 A column is homogeneous w.r.t a set of species, if it has the same value for all species. Otherwise, it is heterogenous. EX: i is heterogenous w.r.t {A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

4 Gamete Condition 4 Gamete Condition There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i0, or i1. Equivalent to There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist (0,0), (0,1), (1,0), (1,1)

4-gamete condition: proof Depending on which edge the mutation j occurs, either i0, or i1 should be homogenous. (only if) Every perfect phylogeny satisfies the 4-gamete condition (if) If the 4-gamete condition is satisfied, does a prefect phylogeny exist? i0 i1 i

An algorithm for constructing a perfect phylogeny We will consider the case where 0 is the ancestral state, and 1 is the mutated state. This will be fixed later. In any tree, each node (except the root) has a single parent. It is sufficient to construct a parent for every node. In each step, we add a column and refine some of the nodes containing multiple children. Stop if all columns have been considered.

Inclusion Property For any pair of columns i,j i < j if and only if i1  j1 Note that if i<j then the edge containing i is an ancestor of the edge containing j i j

Example r A B C D E 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Initially, there is a single clade r, and each node has r as its parent

Sort columns Sort columns according to the inclusion property (note that the columns are already sorted here). This can be achieved by considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0

Add first column In adding column i 1 2 3 4 5 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 In adding column i Check each edge and decide which side you belong. Finally add a node if you can resolve a clade r u B D A C E

Adding other columns 1 2 3 4 5 A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 Add other columns on edges using the ordering property r 1 3 E 2 B 5 4 D A C

Unrooted case Switch the values in each column, so that 0 is the majority element. Apply the algorithm for the rooted case

Summary :No recombination leads to correlation between sites 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 8 5 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 The different sites are linked. A 1 in position 8 implies 0 in position 5, and vice versa. The history of a population can be expressed as a tree. The tree can be constructed efficiently

Haplotype Phasing Problem Most sequencing technologies measure genotypes not haplotypes 0 1 0 1 1 1 0 1 1 0 0 0 1 0 2 1 0 2 2 1 0 Pair of haplotypes Genotype: 2 = heterozygous Given a set of genotypes, infer the haplotypes. Use parsimony assumption Haplotypes satisfy perfect phylogeny (Gusfield) Find minimum number of haplotypes that explain observed genotypes

Recombination 00000000 11111111 00011111

Recombination A tree is not sufficient as a sequence may have 2 parents Recombination leads to violation of 4 gamete property. Recombination leads to loss of correlation between columns 00000000 11111111 00011111

Studying recombination A tree is not sufficient as a sequence may have 2 parents Recombination leads to loss of correlation between columns How can we measure recombination?

Linkage (Dis)-equilibrium (LD) A B 0 0 0 1 1 1 1 0 A B 0 1 0 0 1 0 No recombination Pr[A,B=0,1] = 0.25 Linkage disequilibrium Extensive Recombination Pr[A,B=(0,1)=0.125 Linkage equilibrium