A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.

Slides:



Advertisements
Similar presentations
Ab initio gene prediction Genome 559, Winter 2011.
Advertisements

1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
1 Alternative Splicing. 2 Eukaryotic genes Splicing Mature mRNA.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Alternative Splicing from ESTs
The Influence of Alternative Splicing in Protein Structure The fact that gene number is not significantly different between mammals and some invertebrates.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Lecture 12 Splicing and gene prediction in eukaryotes
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Progress report Yiming Zhang 02/10/2012. All AS events in ASIP Intron retention Exon skipping Alternative Acceptor site NAGNAG AltA Alternative Donor.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Exploring Alternative Splicing Features using Support Vector Machines Feature for Alternative Splicing Alternative splicing is a mechanism for generating.
Sackler Medical School
Towards a new gene definition: the hindrance of Alternative Splicing Laura Fontrodona Montals Mòdul 4: Genòmica i Proteòmica Màster Oficial de Genètica-UAB.
Background & Motivation Problem & Feature Construction Experiments Design & Results Conclusions and Future Work Exploring Alternative Splicing Features.
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Genes and Genomes. Genome On Line Database (GOLD) 243 Published complete genomes 536 Prokaryotic ongoing genomes 434 Eukaryotic ongoing genomes December.
A brief comment on nomenclature. A croe (coding region of an exon) is not exactly an exon Steve Mount ISMB SIG meeting on alternative splicing June 23,
Pre-mRNA secondary structures influence exon recognition Michael Hiller Bioinformatics Group University of Freiburg, Germany.
Comparative Genomics Methods for Alternative Splicing of Eukaryotic Genes Liliana Florea Department of Computer Science Department of Biochemistry GWU.
Research about Alternative Splicing recently 楊佳熒.
.1Sources of DNA and Sequencing Methods.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 2 Genome Assembly.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
August 20, 2007 BDGP modENCODE Data Production. BDGP Data Production Project Goals 21,000 RACE experiments 6,000 cDNA’s from directed screening and full.
Chapter 3 The Interrupted Gene.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Considerations for multi-omics data integration Michael Tress CNIO,
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
EGASP 2005 Evaluation Protocol
Results for all features Results for the reduced set of features
EGASP 2005 Evaluation Protocol
Evaluating classifiers for disease gene discovery
GEP Annotation Workflow
Visualization of genomic data
Eukaryotic Gene Finding
Visualization of genomic data
Ab initio gene prediction
From: TopHat: discovering splice junctions with RNA-Seq
Introduction to Bioinformatics II
Volume 84, Issue 3, Pages (February 1996)
BLAT Blast Like Alignment Tool
GT repeats are unique to Cdk6 and are conserved in different mammals.
Introduction to Alternative Splicing and my research report
Determine CDS Coordinates
Evaluating Classifiers for Disease Gene Discovery
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Nucleotide and predicted amino acid sequence of the adult mouse brain cdr2 cDNA. Nucleotide and predicted amino acid sequence of the adult mouse brain.
Figure Genetic characterization of the novel GYG1 gene mutation (A) GYG1_cDNA sequence and position of primers used. Genetic characterization of the novel.
Presentation transcript:

A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August 2004 楊佳熒

Homologous human and mouse exon are, on the average, 85% identical in their sequences, but introns are more pooly conserved. (Waterston et al. Nature,2002) Segments and blocks >300kb in size with conserved in human are superimposed on the mouse genome

Reference Sorek, R. et al. Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse. Genome Research, Sorek, R. et al. How prevalent is functional alternative splicing in the human genome. TRENDS in Genetics, Sorek, R. et al. A Non-EST-Based Method for Exon-Skipping Prediction. Genome Research, 2004.

What is Exon-Skipping ? dbESTs exon1exon2exon3exon4exon5exon6gene est2 est3 est4 est1

Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse Rotem Sorek and Gil Ast Genome Research July 2003

Objective and Result 1. Alternatively spliced conserved exons 2. Constitutively spliced conserved exons exon1exon2exon3 Human est1 Human est2 Alternatively spliced internal exons Mouse est1 Mouse est2 Alternatively spliced conserved exons Human gene Human est1 exon1exon2exon3 Human est2 Human est3 Human est4 Constitutively spliced internal exons 7557 Mouse est Constitutively spliced conserved exons 1966 Mouse gene exon1exon2exon3 Mouse gene exon1exon2exon3 A1B1 A2B2 D1C1 C2D2 223/243=92%199/243=82%188/243=77% 886/1966=45%691/1966=35%343/1966=17%

Per-position conservation near alternatively and constitutively spliced exons

Human KCND3 gene (exon 4~8) Refseq:NM_004980

KCDN3 gene exon information

KCDN3 gene exon 6 sequences (bold) (alternatively spliced exon)

Compare to chimpanzee genome (NM_004980)

Compare to chimpanzee genome (NM_172198)

Review : Finding exon-skipping events that are conserved between human and mouse 243 Conserved exon skipping events (25%) 737( ) Non-Conserved exon skipping events(75%)

How prevalent is functional alternative splicing in the human genome ? Rotem Sorek, Ron Shamir and Gil Ast TRENDS in Genetics Vo1.20 February 2004

Motivation 1.How many of there predicted splice variants are functional? 2.How many are the result of aberrant splicing (noise data)?

The influence of alternatively spliced exon on the protein- coding sequence. are peptide cassettess

Features differentiating between conserved alternatively spliced exons and non-conserved alternatively spliced exons FeaturesConserved alternatively spliced exons Non-conserved alternatively spliced exons Average size87116 Percentage of exon that a multiple of three 77%(147/191)40%(206/510) Percentage of exons that are “peptide cassettes” 73%(139/191)21%(109/510) Percentage of exon insertion that result in a longer protein by a nearby stop codon 61%(27/44)8%(25/304) Percentage of exon insertions that result in a protein <100 amino acids 9%(4/44)30%(91/304) Average supporting expressed sequences %62%

Conclusion 1.We show that conserved (functional) cassette exons possess unique characteristics in size, repeat content and in their influence on the protein. 2.By contrast, most non-conserved cassette exons do not share these characteristics. 3.We conclude that a portion of skipping exon evidence in EST databases is not functional, and might result from aberrant rather than regulated splicing.

Review : Intronic Sequences Flanking Alternatively Spliced Exons Are Conserved Between Human and Mouse 1. Alternatively spliced conserved exons 2. Constitutively spliced conserved exons exon1exon2exon3 Human est1 Human est2 Alternatively spliced internal exons Mouse est1 Mouse est2 Alternatively spliced conserved exons Human gene Human est1 exon1exon2exon3 Human est2 Human est3 Human est4 Constitutively spliced internal exons 7557 Mouse est Constitutively spliced conserved exons 1966 Mouse gene exon1exon2exon3 Mouse gene exon1exon2exon3 A1B1 A2B2 D1C1 C2D2 223/243=92%199/243=82%188/243=77% 886/1966=45%691/1966=35%343/1966=17%

Review : Features Differentiating Between Alternatively Spliced and Constitutively Spliced Exons Alternatively spliced exons Constitutively spliced exons Average size87128 Percent exons whose length is a multiple of 3 73%(177/243)37%(642/1753) Percent exons with upstream intronic elements conserved in mouse 92%(223/243)45%(788/1753) Pervent exons with downstream intronic elements conserved in mouse 82%(199/243)35%(611/1753) Percent exons with both upstream and downstream intronic elements conserverd in mouse 77%(188/243)17%(292/1753)

A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August 2004

Objective 1.Our goal was to find a combination of features that would detect a substantial fraction of the alternative exons. 2.The features we have chosen are the following : 1)exon length 2)divisible / not divisible by 3 3)percent identity when aligned to the mouse 4)conservation in the upstream and downstream intronic sequences

Result 1.The best rule is : 1)at least 95% identity with mouse exon counterpart 2)exon size is a multiple of three 3)a best local alignment of at least 15 intronic nucleotides upstream of the exon with at least 85% identity 4)a perfect match of at least 12 intronic nucleotides downstream of the exon 2.The combination of features identified 76 exons, 31% of the 243 alternatively spliced exons in the training sets, whether non of 1753 constitutively spliced exons matched these features.

To test this classifier in a genome-wide manner (cont.) 1.For 453(48%) of the 952 candidate alternative exon there was such skipping evidence. 2.Only(17%) of the 453 exons that were classified by our rule had their exon-skipping supported by only one EST. 3. The rest were supported by two or more. 108,983 human exons for which a mouse counterpart could be identified using these rules 108, candidate exon, ~1%, were found.

To test this classifier in a genome-wide manner (cont.) 1.In comparison, skipping was supported by only a single EST in 46% of the total 7495 exons. 2.This suggests that our classification rule enriches for alternatively spliced exons with higher probability of being “real” relative to alternative exons merely supported by EST evidence. 108,983 human exons for which a mouse counterpart could be identified search ESTs and cDNA 108,983 7% (7495 exons) out of our entire set

To test this classifier in a genome-wide manner 1.The remaining 499 candidate alternative exons ( ) for which no EST/cDNA showing an exon skipping event was found. 2.Using the UCSC genome browser to check, we found that for 190 additional exons there was a human expressed sequence showing patterns of alternative splicing other than exon skipping cases. 1)Alternative donor/acceptor  22% 2)Intron retention  17% 3)Mutually exclusive exon  7% 3.Thus, for 643( ; 68%) of the 952 candidate alternative exons identified by this method, there was independent evidence for alternative splicing in dbEST.

Conclusion 1.We show that a substantial fraction of the splice variants in the human genome could not be identified through current human EST or cDNA data. 2.In the future, we hope it could develop into a more general alternative splicing predictor that would identify other types of alternative splicing.

Classification of alternative splicing 1.Skipped Exons 2. Multiple Skipped Exons 3. Alternative Donor / Acceptors 4. Retained Introns