Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation.

Similar presentations


Presentation on theme: "Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation."— Presentation transcript:

1 Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation Analysis and Alternative Splicing” Berlin, December 2004

2 Overview Exon-intron structure of orthologous genes –human–mouse –Drosophila–Anopheles Sequence divergence in alternative and constitutive regions Evolution of splicing and regulatory sites Alternative splicing and protein structure

3 Alternative splicing of human (and mouse) genes

4 Exon-intron structure of orthologous genes –human – mouse –Drosophila–Anopheles Sequence divergence in alternative and constitutive regions Alternative splicing and protein structure

5 Data known alternative splicing –HASDB (human, ESTs+mRNAs) –ASMamDB (mouse, mRNAs+genes) additional variants –UniGene (human and mouse EST clusters) complete genes and genomic DNA –GenBank (full-length mouse genes) –human genome

6 Methods TBLASTN (initial identification of orthologs: mRNAs against genomic DNA) BLASTN (human mRNAs against genome) Pro-EST (spliced alignment, ESTs and mRNA against genomic DNA) Pro-Frame (spliced alignment, proteins against genomic DNA) –confirmation of orthology same exon-intron structure >70% identity over the entire protein length –analysis of conservation of alternative splicing conservation of exons or parts of exons conservation of sites

7 166 gene pairs 424284844040 human mouse Known alternative splicing: 126124124

8 Elementary alternatives Cassette exon Alternative donor site Alternative acceptor site Retained intron

9 Human genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons56257426 Alt. donors1871610 Alt. acceptors1351915 Retained introns4350 Total963011451 Total genes45284144 Conserved elementary alternatives: 69% (EST) - 76% (mRNA) Genes with all isoforms conserved: 57 (45%)

10 Mouse genes mRNAEST cons.non-cons.cons.non-cons. Cassette exons705399 Alt. donors246176 Alt. acceptors156169 Retained introns87104 Total117248228 Total genes68223026 Conserved elementary alternatives: 75% (EST) - 83% (mRNA) Genes with all isoforms conserved: 79 (64%)

11 Real or aberrant non-conserved AS? 24-31% human vs. 17-25% mouse elementary alternatives are not conserved 55% human vs 36% mouse genes have at least one non-conserved variant denser coverage of human genes by ESTs: –pick up rare (tissue- and stage-specific) => younger variants –pick up aberrant (non-functional) variants 17-24% mRNA-derived elementary alternatives are non-conserved (compared to 25-32% EST- derived ones)

12 smoothelin human common mouse human-specific donor-site mouse-specific cassette exon

13 autoimmune regulator human common mouse retained intron; downstream exons read in two frames

14 Na/K-ATPase gamma subunit (Fxyd2) human mouse (deleted) intron common alternative acceptor site within (inserted) intron

15 Comparison to other studies. Modrek and Lee, 2003: skipped exons 98% constitutive exons are conserved 98% major form exons are conserved 28% minor form exons are conserved inclusion level is a good predictor of conservation inclusion level of conserved exons in human and mouse is highly correlated

16 Minor non-conserved form exons are errors? No: minor form exons are supported by multiple ESTs 28% of minor form exons are upregulated in one specific tissue 70% of tissue-specific exons are not conserved splicing signals of conserved and non- conserved exons are similar

17 Thanaraj et al., 2003: extrapolation from EST comparisons 61% (47-86%) alternative splice junctions are conserved 74% (71-78%) constitutive splice junctions are conserved the former number is consistent with other studies, whereas the latter seems to be an underestimate

18 Regulation of alternative splicing: introns Brudno et al., 2001: UGCAUG is over- represented downstream of tissue-specific exons (brain, muscle). Sorek and Ast, 2003: Enhanced conservation (between human and mouse) in intronic sequences flanking alternatively spliced exons. UGCAUG is over-represented in conserved regions.

19 Exon-intron structure of orthologous genes –human – mouse –Drosophila–Anopheles Sequence divergence in alternative and constitutive regions Alternative splicing and protein structure

20 Fruit fly and mosquito Technically more difficult than human- mouse: –incomplete genomes –difficulties in alignment, especially at gene termini –changes in exon-intron structure irrespective of alternative splicing (~4.7 introns per gene in Drosophila vs. ~3.5 introns per gene in Anopheles)

21 Filtering of the dataset FlyBase alternatively spliced fruit fly gene and all its protein isoforms Non-canonical sites: exclude isoform Pro-Frame alignment of all isoforms with the fruit fly genome. Frameshift or in- frame stop for at least one isoform: exclude gene No constitutive segments inside gene: exclude gene List of orthologous pairs List of filtered fruit fly genes ENSEMBL Pro-Frame alignment of all fruit fly isoforms with the mosquito genome mosquito mosquito genes Similarity for all isoforms <30%: exclude orthologous pair Poly-N within aligned region in the mosquito genome for at least one isoform: exclude orthologous pair Set of filtered orthologous pairs

22 Classification of exons and coding segments for each pair of isoforms define: mutually Exclusive exon, Cassette exon, retained Intron, alternative Acceptor site, alternative Donor site; then merge these definitions over all pairs for a gene

23 How to define conservation of fruit fly alternative exons Alignment of an exon may depend on the isoform. In the cases listed below, shorter exons are assumed to be conserved, whereas longer ones are considered missing isoform 1 isoform 2 - similarity in alignments of all isoforms including this segment was less than 35% - similarity in alignment of at least one isoform including this segment was greater than 35% **missing exon *missing exon***missing exon

24 Conservation of fruit coding segments in the mosquito genome. Small (curated) sample Type of segment MissingConservedTotal left marginal (alternative) 46 (77%)14 (23%)60 (12%) internal alternative 22 (55%)18 (45%)40 (8%) internal constitutive 83 (24%)264 (76%)347 (69%) right marginal (alternative) 31 (56%)24 (44%)55 (11%) Total182 (36%)320 (64%)502 (100%)

25 Conservation of fruit coding segments in the mosquito genome. Large (non-curated) sample Type of segment MissingConservedTotal left marginal (alternative) 858 (57%)639 (43%)1497 (23%) internal alternative 215 (55%)178 (45%)393 (6%) internal constitutive 903 (23%)2999 (77%)3902 (59%) right marginal (alternative) 414 (53%)369 (47%)783 (12%) Total2390 (36%)4185 (64%)6575 (100%)

26 Classification of slice events for fruit fly exons divided exon joined exon exactly conserved exon mixed;

27 Different types of events for the same exon dependent on an isoform d Dr (isoform 1) - slice j - exon An d Dr (isoform 2) j An j j e e

28 Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons missingmixedjoineddividedexact constitutive728 (23%)212 (7%)754 (23%)407 (13%)1356 (42%) Donor site229 (50%)21 (5%)52 (11%)47 (10%)130 (28%) Acceptor site390 (43%)45 (5%)133 (15%)124 (14%)250 (28%) retained Intron37 (70%)3 (6%)2 (4%)8 (15%)6 (11%) Cassette exon90 (59%)4 (3%)9 (6%)6 (4%)50 (33%) Exclusive exon10 (15%)1 (1%) 55 (82%)

29 Types of elementary alternatives and conservation of fruit fly exons in the mosquito genome. Large (non-curated) sample, internal exons

30 Fruit fly and mosquito The general results are the same as for the human-mouse comparison: more conservation of constitutive segments than alternative ones: –75% const. and 45% alt. segments are conserved –constitutive exons: >50% conserved exactly, ~25% intron in drosophila, ~8% intron in anopheles –conservation of alternatives: 36% cassette exons, 51% donor sites, 63% acceptor sites, 83% mutually exclusive exons

31 Exon-intron structure of orthologous genes –human – mouse –Drosophila – Anopheles Sequence divergence in alternative and constitutive regions Alternative splicing and protein structure

32 Concatenates of constitutive and alternative regions in all genes: different evolutionary rates Columns (left-to-right) – (1) constitutive regions; (2–4) alternative regions: N-end, internal, C-end Relatively more non-synonimous substitutions in alternative regions (higher dN/dS ratio) Less amino acid identity in alternative regions

33 Genes with length of both const. and alt. reg. > 80 nt Horizontal axis: difference in dN/dS in const. and alt. regions Vertical axis: number of genes Violet : dN/dS in const. regions > dN/dS in alt. regions Yellow: dN/dS in const. regions < dN/dS in alt. regions

34 279 proteins from SwissProt+TREMBL with “varsplic” features constitutivealternative% alt. to all length1992706605425% all SNPs112636825% synonymous576 (51%)167 (45%)22% benign401 (36%)141 (38%)26% damaging149 (13%)60 (16%)29% again, there is some evidence of positive selection towards diversity. This is not due to aberrant ESTs (only protein data are considered).

35 Exon-intron structure of orthologous genes –human – mouse –Drosophila – Anopheles Sequence divergence in alternative and constitutive regions Alternative splicing and protein structure

36 Data Alternatively spliced genes (proteins) from SwissProt –human –mouse Protein structures from PDB Domains from InterPro –SMART –Pfam –Prosite –etc.

37 Alternative splicing avoids disrupting domains (and non-domain units) Control: fix the domain structure; randomly place alternative regions

38 … and this is not simply a consequence of the (disputed) exon-domain correlation

39 Positive selection towards domain shuffling (not simply avoidance of disrupting domains)

40 Short (<50 aa) alternative splicing events within domains target protein functional sites c) Prosite patterns unaffected Prosite patterns affected FT positions unaffected FT positions affected ExpectedObserved

41 An attempt of integration AS is often young (as opposed to degenerating) young AS isoforms are often minor and tissue-specific … but still functional –although unique isoforms may be result of aberrant splicing AS regions show evidence for positive selection –excess damaging SNPs –excess non-synonymous codon substitutions

42 What to do Each isoform (alternative region) can be characterized: –by conservation (between genomes) –if conserved, by selection (positive vs negative) human-mouse, also add rat; compare species of Drosophila and Caenorhabditis –pattern of SNPs (synonymous, benign, damaging) –tissue-specificity in particular, whether it is cancer-specific –degree of inclusion (major/minor) –functionality (for isoforms) whether it generates a frameshift how bad it is (the distance between the stop-codon and the last exon-exon junction)

43 What to expect Cancer-specific isoforms will be less functional and more often non-conserved Set of non-conserved isoforms will contain a larger fraction of non-functional isoforms; and this may influence evolutionary conclusions on the sequence level Still, after removal of non-functional isoforms, one would see positive selection in alternative regions (more non-synonymous substitutions compared to constant regions etc.), especially in tissue-specific ones

44 References Nurtdinov RN, Artamonova II, Mironov AA, Gelfand MS (2003) Low conservation of alternative splicing patterns in the human and mouse genomes. Human Molecular Genetics 12: 1313-1320. Kriventseva EV, Koch I, Apweiler R, Vingron M, Bork P, Gelfand MS, Sunyaev S. (2003) Increase of functional diversity by alternative splicing. Trends in Genetics 19: 124-128. Brudno M, Gelfand MS, Spengler S, Zorn M, Dubchak I, Conboy JG (2001) Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing. Nucleic Acids Research 29: 2338-2348. Mironov AA, Fickett JW, Gelfand MS (1999). Frequent alternative splicing of human genes. Genome Research 9: 1288-1293.

45 Acknowledgements Discussions –Vsevolod Makeev (GosNIIGenetika) –Eugene Koonin (NCBI) –Igor Rogozin (NCBI) –Dmitry Petrov (Stanford) Support –Ludwig Institute of Cancer Research –Howard Hughes Medical Institute –Russian Fund of Basic Research –Russian Academy of Sciences

46 Authors Andrei Mironov (Moscow State University) – spliced alignment Ramil Nurtdinov (Moscow State University) – human/mouse comparison Irena Artamonova (Institute of Bioorganic Chemistry, now Institute of Bioinformaics, GSF) – human/mouse comparison, MAGEA family Dmitry Malko (GosNIIGenetika) – Drosophila/Anopheles comparison Inna Dubchak (Lawrence Berkeley Lab) – sites Michael Brudno (UC Berkeley, now Stanford) – sites Ekaterina Ermakova (Moscow State University) – evolution of alternative/constitutive regions Vasily Ramensky (Institute of Molecular Biology) – SNPs Eugenia Kriventseva (EBI, now BASF) – protein structure Shamil Sunyaev (EMBL, now Harvard University Medical School) – protein structure


Download ppt "Evolution of alternative splicing Mikhail Gelfand Institute for Information Transmission Problems, Russian Academy of Sciences Workshop “Gene Annotation."

Similar presentations


Ads by Google