Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
School of CSE, Georgia Tech
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Evaluating the Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
5. Lecture WS 2003/04Bioinformatics III1 Genome Rearrangements Compare to other areas in bioinformatics we still know very little about the rearrangement.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
CS CM124/224 & HG CM124/224 DISCUSSION SECTION (JUN 6, 2013) TA: Farhad Hormozdiari.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Greedy Algorithms And Genome Rearrangements
Sorting by Cuts, Joins and Whole Chromosome Duplications
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
whole-genome duplications and large segmental duplications… …seem to be a common feature in eukaryotic genome evolution …play a crucial role in the evolution.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Comparative genomics Haixu Tang School of Informatics.
Table 8.3 & Alberts Fig.1.38 EVOLUTION OF GENOMES C-value paradox: - in certain cases, lack of correlation between morphological complexity and genome.
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Comparative genomics of Gossypium and Arabidopsis: Unraveling the consequences of both ancient and recent polyploidy Junkang Rong, John E. Bowers, Stefan.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
CSCI2950-C Lecture 9 Cancer Genomics
CSCI2950-C Genomes, Networks, and Cancer
Genome Rearrangement and Duplication Distance
Tao Jiang Department of Computer Science
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
CSE 5290: Algorithms for Bioinformatics Fall 2009
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
By Chunfang Zheng and David Sankoff, 2014
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
CSCI2950-C Lecture 4 Genome Rearrangements
Mattew Mazowita, Lani Haque, and David Sankoff
Greedy Algorithms And Genome Rearrangements
Gene Density and Noncoding DNA
CSCI2950-C Lecture 6 Genome Rearrangements and Duplications
FanChang Hao, Melvin Zhang, and Hon Wai Leong Review for TCBB
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
MAGE: Models and Algorithms for Genome Evolution 2013
Rearrangement Phylogeny of Genomes in Contig form
Presentation transcript:

Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada

Plan 1. Genome rearrangement and multigene families 2. Genome duplication 3. Duplication of chromosomal segments 4. Conclusion

Genome rearrangement Chromosomes evolved by insertion, deletion, movement of genes Genomic approach: Compare gene orders Hypothesis: Homologous genes are known Chromosome sequence of signed genes (or blocks) b -a d -e -c f

Multigene families In the human genome ~15% protein genes duplicated (Li, Wang, Nekrutenko, 2001) ~16% yeast, ~25% Arabidopsis (Wolfe, 2001) Compare sequences of signed genes allowing many copies of each gene b –a d a –e –c e f d a

Multigene families due to:  Single gene duplication;  Segment duplication: Tandem duplication or duplication transposition a b c d e f g a b c d e f b c d g  Horizontal gene transfer;  Genome-wide doubling event

Algorithms and models  Genome rearrangement with multigene families Exemplar approach, Sankoff 1999 Insertion, deletion, gene duplication, Marron,Swensen, Moret 2003  Reconciliation analysis, projecting gene tree on phylogenetic tree Hallett, Lagergren 2000, Page, Cotton 2000; Chen, Durand, Farach 2000, Sankoff, El-Mabrouk 2000  Probabilistic models for the generation of multigene families

Find the ancestor of a genome with multiple gene copies  Genome duplication N. El-Mabrouk and D. Sankoff, SIAM, J. Comp., 2003  Duplication of chromosomal segments N. El-Mabrouk, J. Comp. Sys. Sci., 2002  Genome duplication for unordered chromosomes N. El-Mabrouk and D. Sankoff 1998

Plan 1. Genome rearrangement and multigene families 2. Genome duplication 3. Duplication of chromosomal segments 4. Conclusion

Genome doubling Tetraploid = 4n chromosomes Evidence across the eukaryote spectrum; Two duplications in early vertebrate evolution (McLysaght et al. 2002) Particularly prevalent in plants (rice, oats, corn, wheat, soybeans, Arabidopsis…)

Wolfe, Shields 1997: Traces of duplication in Saccharomyces cerevisiae. 55 duplicated regions representing 50% of the genome From 8 to 16 chromosomes

Originally, duplicated genome = 2 identical copies of each chromosome After rearrangements, duplicated segments scattered among the genome Present–day genome: Signed gene sequences, 2 copies of each gene Reconstruct original gene order at time of duplication Minimum number of reversal and/or translocation

Inversion

Translocation Reciprocal translocation: Fusion: Fission:

Problem: Min. num. of inversion and/or translocation trans- forming G into H Multi-chromosomal case: H has an even number of chromosomes. Not necessarily the case for G Rearranged duplicated genome G : 1: +a +b –c +b -d 3: -e +g -f -d 2: -c -a +f 4: +h +e -g +h Unknown duplicated genome H: 1: +a +b -d 3: +h +c +f -g +e 2: +a +b -d 4: +h +c +f -g +e

The circular case

Method Genome rearrangement: Minimum number of rearrangements to transform one genome into another First polynomial algorithm by Hannenhalli and Pevzner for  Reversals only  Translocations only  Reversals and translocations Ancestral duplicated genome of G minimizing the HP formula

The breakpoint graph G 1 = G 2 = a th -a ht

Multichromosomal case G 1 : I: II: III: G 2 : I: II: III:

When G 1 =G 2, maximum number of cycles Perform reversals increasing nb of cycles Good component: Can be solved by good reversals Bad component: Requires bad reversals HP: RD ( G 1, G 2 ) = b – c + m + f b : red edges; c : cycles; m : bad components; f : Correction of 0, 1 or 2

Genome halving Partial graph for G: Set of valid black edges representing a duplicated genome Find a set of valid black edges minimizing HP

Decomposition into natural subgraphs Natural subgraphs of even size are completable Amalgamate natural graphs into completable supernatural graphs Example: Amalgamate S 2 and S 5 S 1, S 25, S 3, S 4

Upper bound on the number of cycles S e a supernatural graph of n edges, S e (  e ) a completed graph with c e cycles  If S e not amalgamated, c e ≤ n/2 +1  Otherwise, c e ≤ n/2

Maximizing cycles – Multichromosomal case Complete each subgraph separately Avoid to create circular fragments a 1 d 1 a 2 d 2 c 1 b 1 c 2 b 2 Bad graph:

Maximizing cycles 2 Avoid black edges creating circular fragments: A pair of edges that does not create a circular fragment nor a bad graph is called possible

Algorithm dedouble 2-edges graph:n-edges graph:

Algorithm dedouble 2 Linear time algorithm constructing a maximal completed graph with c cycles: c = n/2 +   n : number of red edges   : number of natural graphs (not amalgamated)

Bad components Related to subpermutations or conserved intervals minSP: SP not contained in any SP  Rearrangement by translocations: Bad components = minSPs  Rearrangement by invertions and/or translocations: Bad components subset of minSPs a -b c -d e -g h -f i

Bad components 2 Local SPs of G : Lemma: In a max completed graph, if there is minSP not a local SP, then correction to eliminate the minSP Corollary: If G does not contain local SPs, then duplicated genome H produced by the algorithm is such that: RO(G,H) minimal

Bad components 3 General case: RO(G) = n/2 –  (G) + m(G) +  (G) n : nb of red edges;  G  : nb of natural graphs  m (G) : nb of bad local SPs ;  G  : correction depending on local SPs Multichromosomal case: Exact algorithm Circular case: Uncertainty of up to 2 reversals

Application: Yeast genome Degenerate tetraploid, duplication 10 8 Sorting by translocations: 45 translocations Sorting by inversions and/or translocations: No local SPs, thus no reversal. Still 45 translocations years ago (Wolfe and Shield, 1997 ). 55 duplicated regions

A circular genome Mitochondrial genome of Marchantia polymorpha: many genes in 2 or 3 copies (Oda et al. 1992) Unlikely to be a tetraploid A map with 25 pairs of genes was extracted from the Genbank entry Sorting by reversals: minimum of 25 reversals Similar to a random distribution No trace of duplication

Plan 1. Genome rearrangement and multigene families 2. Genome duplication 3. Duplication of chromosomal segments 4. Conclusion

Duplication of chromosomal segments Duplication of entire regions from one location to another in the genome a b c d e f g a b c d e f b c d g Very recent segment duplication in the human genome (Eichler et al., 1999)

Data: A genome containing many copies of each gene Problem: An ancestral genome containing one copy of each gene, minimizing reversals + segment duplication D(G) : Number of repeats of G +a -b +c +x +d -e +e -d +a -b +c +y

At most two copies of each gene A reversal can decrease by at most two number of repeats of G Find I minimizing RD(G,I) = D(I) + R(G,I) Ignoring bad components minimize  (G) = D(I) +n(G)-c(G,I)

a 1 b 1 x h 1 f 1 e 1 g 1 -c 1 -a 2 -b 2 -z d 2 e 2 -g 2 -c 2 –f 2 y Genome: Natural graphs:  : Graphs of even size with only duplicated genes  (G) ≥ D(G) - |  |

Algorithm  For graphs not in , red edges = black edges;  For graphs in , similar to genome duplication BUT: Possibly more than one circular fragment. A correction is required Approximation algorithm with tight bounds in O(|  | n )

1, 2, 3, 4, 5, 6, 7, 8 1, {2, 3}, 4, 5, 6, {7, 8} 1, 2, 4, 5, 6, 7 {1, 2}, 4, 6, {5, 7} 1, 4, 6, 5 {1, 4}, {6, 5} 1, 6 1 Paralog pairings Remove one copy of each duplicate Paralog pairings

Conclusion First bioinformatics tools to reconstruct the evolutionary history of a single genome Genome duplication: A linear-time exact algorithm for reversals and/or translocations Segment duplication: A polynomial approximation algorithm with bounds for reversals Extension: Consider the centromere. Some translocations not allowed