Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.

Slides:



Advertisements
Similar presentations
A very short introduction (in plants)
Advertisements

Modular proteins I Level 3 Molecular Evolution and Bioinformatics Jim Provan Patthy Sections –
The Concept of Functional Constraint. The intensity of purifying selection is determined by the degree of intolerance characteristic of a site or a genomic.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Basics of Comparative Genomics Dr G. P. S. Raghava.
How many transcripts does it take to reconstruct the splice graph? Introduction Alternative splicing is the process by which a single gene may be used.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
FINAL EXAM: TAKE-HOME Assessment of Significance in Cancer Gene SNPs.
Alternative splicing and evolution Daniel Jeffares.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Bioinformatics Alternative splicing Multiple isoforms Exonic Splicing Enhancers (ESE) and Silencers (ESS) SpliceNest Lecture 13.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Lecture 12 Splicing and gene prediction in eukaryotes
Sequencing a genome and Basic Sequence Alignment
Anum kamal(BB ) Umm-e-Habiba(BB ). Gene splicing “Gene splicing is the removal of introns from the primary trascript of a discontinuous gene.
International Livestock Research Institute, Nairobi, Kenya. Introduction to Bioinformatics: NOV David Lynn (M.Sc., Ph.D.) Trinity College Dublin.
Origins and impact of constraints in evolution of gene families Boris E. Shakhnovich and Eugene V.Koonin Genome Research 2006, October 19 Stella Veretnik.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Construction of Substitution Matrices
Calculating branch lengths from distances. ABC A B C----- a b c.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
Sackler Medical School
Alternative splicing: A playground of evolution Mikhail Gelfand Institute for Information Transmission Problems, RAS May 2004.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Using Exons to Define Isoforms in PRO Timothy Danford Novartis Institutes for Biomedical Research PRO / AlzForum Kickoff Meeting Oct. 4, 2011.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Research about Alternative Splicing recently 楊佳熒.
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
SDPpred: a method for identification of amino acid residues that determine differences in functional specificity of homologous proteins and application.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation.
Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
A knowledge-based approach to integrated genome annotation Michael Brent Washington University.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Evolution of gene function
Basics of Comparative Genomics
Pipelines for Computational Analysis (Bioinformatics)
GEP Annotation Workflow
Visualization of genomic data
The Functional Impact of Alternative Splicing in Cancer
What are the Patterns Of Nucleotide Substitution Within Coding and
Identify D. melanogaster ortholog
Alternative Splicing May Not Be the Key to Proteome Complexity
Volume 20, Issue 12, Pages (June 2010)
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
The Functional Impact of Alternative Splicing in Cancer
BLAT Blast Like Alignment Tool
Basics of Comparative Genomics
Introduction to Alternative Splicing and my research report
Figure 1 Schematic of the OPA3 gene and OPA3 protein isoform b
Volume 11, Issue 7, Pages (May 2015)
Origins and Impacts of New Mammalian Exons
Presentation transcript:

Alternative splicing: A playground of evolution Mikhail Gelfand Research and Training Center for Bioinformatics Institute for Information Transmission Problems RAS, Moscow, Russia October 2006

% of alternatively spliced human and mouse genes by year of publication Human (genome / random sample) Human (individual chromosomes) Mouse (genome / random sample) All genes Only multiexon genes Genes with high EST coverage

Evolution of alternative exon-intron structure –mammals: human, mouse, dog –dipteran insects: Drosophila melanogaster, D. pseudoobscura, Anopheles gambiae Evolutionary rate in constitutive and alternative regions –human / mouse –D. melanogaster / D. pseudoobscura –human-chimpanzee / human SNPs Plan

Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron

Alternative exon-intron structure in the human, mouse and dog genomes EDAS: a database of human alternative splicing (human genome + GenBank + EST data from RefSeq) –consider casette exons and alternative splicing sites –functionality: potentially translated vs. NMD-inducing elementary alternatives Human-mouse-dog triples of orthologous genes We follow the fate of human alternative sites and exons in the mouse and dog genomes Each human AS isoform is spliced-aligned to the mouse and dog genome. Definition of conservation: –conservation of the corresponding region (homologous exon is actually present in the considered genome); –conservation of splicing sites (GT and AG)

Caveats we consider only possibility of AS in mouse and dog: do not require actual existence of corresponding isoforms in known transcriptomes we do not consider situations when alternative human exon (or site) is constitutive in mouse or dog of course, functionality assignments (translated / NMD-inducing) are not very reliable

Translated cassette exons constitutive

NMD-inducing cassette exons

Observations Predominantly included exons are highly conserved irrespective of function Predominantly skipped translated exons are more conserved than NMD-inducing ones Numerous lineage-specific losses –more in mouse than in dog Still, ~40% of skipped (<1% inclusion) exons are conserved in at least one lineage

Alternative donor and acceptor sites: same trends Higher conservation of ~uniformly used sites Internal sites are more conserved than external ones (as expected)

Alternative exon-intron structure in fruit flies and the malarial mosquito Same procedure (AS data from FlyBase) –cassette exons, splicing sites –also mutually exclusive exons, retained introns Follow the fate of D. melanogaster exons in the D. pseudoobscura and Anopheles genomes Technically more difficult: –incomplete genomes –the quality of alignment with the Anopheles genome is lower –frequent intron insertion/loss (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

Conservation of coding segments constitutive segments alternative segments D. melanogaster – D. pseudoobscura 97%75-80% D. melanogaster – Anopheles gambiae 77%~45%

Conservation of D.melanogaster elementary alternatives in D. pseudoobscura genes blue – exact green – divided exons yellow – joined exon orange – mixed red – non-conserved retained introns are the least conserved (are all of them really functional?) mutually exclusive exons are as conserved as constitutive exons

Conservation of D.melanogaster elementary alternatives in Anopheles gambiae genes blue – exact green – divided exons yellow – joined exons orange – mixed red – non-conserved ~30% joined, ~10% divided exons (less introns in Aga) mutually exclusive exons are conserved exactly cassette exons are the least conserved

CG1517: cassette exon in Drosophila, alternative acceptor site in Anopheles Dme, Dps Aga a)

CG31536: cassette exon in Drosophila, shorter cassette exon and alternative donor site in Anopheles Dme, Dps Aga

Evolutionary rate in constitutive and alternative regions Human and mouse orthologous genes Estimation of the d n /d s ratio: higher fraction of non-synonymous (changing amino acid) substitutions => weaker stabilizing (or stronger positive) selection

Concatenates of constitutive and alternative regions in all genes: different evolutionary rates Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio) Less amino acid identity in alternative regions

Individual genes: the rate of non-synonymous to synonymous substitutions d n /d s tends to be larger in alternative regions (vertical acis) than in constitutive regions (horizontal acis)

Non-symmetrical histogram of d n /d s (const)– d n /d s (alt) Black: shadow of the left half. In a larger fraction of genes d n /d s (const)< d n /d s (alt), especially for larger values

The same effect is seen in: N-terminal, internal, C-terminal parts

Drosophilas: less selection in alternative regions? More mutations in alt. regions Similar level of mutations More mutations in const. regions In a majority of genes, both synonymous and non- synonymous mutation rates are higher in alternative regions than in constitutive regions

Different behavior of N-terminal, internal and C-terminal alternatives N-terminal alternatives: most genes have higher syn. substit. rate in alt. regions; most genes have higher stabilizing selection in alt. regions Internal alternatives: intermediate situation C-terminal alternatives: more non-synonymous substitutions and less synonymous substitutions => lower stabilizing selection in alternative regions

The MacDonald-Kreitman test: evidence for positive selection in (minor isoform) alternative regions Human and chimpanzee genome mismatches vs human SNPs Exons conserved in mouse and/or dog Genes with at least 60 ESTs (median number) Fisher’s exact test for significance Pn/Ps (SNPs)Dn/Ds (genomes)diff.Signif. Const – Major – % Minor % Minor isoform alternative regions: More non-synonymous SNPs: Pn(alt_minor)=.12% >> Pn(const)=.06% More non-synonym. mismatches: Dn(alt_minor)=.91% >> Dn(const)=.37% Positive selection (as opposed to lower stabilizing selection): α = 1 – (Pa/Ps) / (Da/Ds) ~ 25% positions Similar results for all highly covered genes or all conserved exons

An attempt of integration AS is often genome-specific young AS isoforms are often minor and tissue-specific … but still functional –although unique isoforms may result from aberrant splicing AS regions show evidence for decreased negative selection –excess non-synonymous codon substitutions AS regions show evidence for positive selection –excess non-synonymous SNPs AS tends to shuffle domains and target functional sites in proteins Thus AS may serve as a testing ground for new functions without sacrificing old ones

What next? Multiple genomes –many Drosophila spp. –ENCODE data for many mammals Estimate not only the rate of loss, but also the rate of gain (as opposed to aberrant splicing) Control for: –functionality: translated / NMD-inducing –exon inclusion (or site choice) level: major / minor isoform –tissue specificity pattern (?) –type of alternative: N-terminal / internal / C-terminal Evolution of regulation of AS Splicing errors and mutations: retained introns, skipped exons, cryptic sites

Acknowledgements Discussions –Vsevolod Makeev (GosNIIGenetika) –Eugene Koonin (NCBI) –Igor Rogozin (NCBI) –Dmitry Petrov (Stanford) –Dmitry Frishman (GSF, TUM) –Shamil Sunyaev (Harvard University Medical School) Data –King Jordan (NCBI) Support –Howard Hughes Medical Institute –INTAS –Russian Academy of Sciences (program “Molecular and Cellular Biology”) –Russian Fund of Basic Research

Authors Andrei Mironov (Moscow State University) Ramil Nurtdinov (Moscow State University) – human/mouse/dog Dmitry Malko (GosNIIGenetika) – drosophila/mosquito Ekaterina Ermakova (Moscow State University, IITP) – Kn/Ks Vasily Ramensky (Institute of Molecular Biology) – SNPs Irena Artamonova (GSF/MIPS) – human/mouse, plots Alexei Neverov (GosNIIGenetika) – functionality of isoforms