EGASP 2005 Evaluation Protocol

Slides:



Advertisements
Similar presentations
Application to find Eukaryotic Open reading frames. Lab.
Advertisements

EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Ab initio gene prediction Genome 559, Winter 2011.
Profiles for Sequences
McPromoter – an ancient tool to predict transcription start sites
Reese, E-GASP Short comparion GASP ‘99- EGASP ‘05 Martin Reese Omicia Inc Horton Street Emeryville, CA
Lyle Ungar, University of Pennsylvania Hidden Markov Models.
BME 130 – Genomes Lecture 7 Genome Annotation I – Gene finding & function predictions.
Comparative ab initio prediction of gene structures using pair HMMs
Genome Annotation and the landscape of the Human Genome Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics.
Characterizing Alternative Splicing With Respect To Protein Domains BME 220 Project Charlie Vaske.
Eukaryotic Gene Finding
Eukaryotic Gene Finding
Gene Finding Genome Annotation. Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics.
Chapter 6 Gene Prediction: Finding Genes in the Human Genome.
LECTURE 2 Splicing graphs / Annoteted transcript expression estimation.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Genome Annotation BBSI July 14, 2005 Rita Shiang.
Biology 10.2 Gene Regulation and Structure Gene Regulation and Structure.
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
Coding Domain Sequence Prediction and Alternative Splicing Detection in Human Malaria Gambiae Jun Li 1, Bing-Bing Wang 2, Jose M. Ribeiro 3, Kenneth D.
You should be able to label these pictures Label the following: –RNA polymerase –DNA –mRNA –tRNA –5’ end –3’ end –Amino acid –Ribosome –Polypeptide chain.
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
Analysis of the RNAseq Genome Annotation Assessment Project by Subhajyoti De.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Section 2 CHAPTER 10. PROTEIN SYNTHESIS IN PROKARYOTES Both prokaryotic and eukaryotic cells are able to regulate which genes are expressed and which.
SPIDA Substitution Periodicity Index and Domain Analysis Combining comparative sequence analysis with EST alignment to identify coding regions Damian Keefe.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Mark D. Adams Dept. of Genetics 9/10/04
From Genomes to Genes Rui Alves.
Curation Tools Gary Williams Sanger Institute. SAB 2008 Gene curation – prediction software Gene prediction software is good, but not perfect. Out of.
Eukaryotic Gene Prediction Rui Alves. How are eukaryotic genes different? DNA RNA Pol mRNA Ryb Protein.
Eukaryotic Gene Structure. 2 Terminology Genome – entire genetic material of an individual Transcriptome – set of transcribed sequences Proteome – set.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
Manuel Holtgrewe Algorithmic Bioinformatics, Department of Mathematics and Computer Science PMSB Project: RNA-Seq Read Simulation.
Genome Annotation Assessment in Drosophila melanogaster by Reese, M. G., et al. Summary by: Joe Reardon Swathi Appachi Max Masnick Summary of.
Chapter 19 The Organization & Control of Eukaryotic Genomes.
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Identification of Coding Sequences Bert Gold, Ph.D., F.A.C.M.G.
TRANSCRIPTION AND TRANSLATION Vocabulary. GENE EXPRESSION the appearance in a phenotype characteristic or effect attributed to a particular gene.
Basics of Genome Annotation Daniel Standage Biology Department Indiana University.
Pairwise Sequence Alignment
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
1 Gene Finding. 2 “The Central Dogma” TranscriptionTranslation RNA Protein.
bacteria and eukaryotes
Annotating The data.
EGASP 2005 Evaluation Protocol
What is a Hidden Markov Model?
Exam #1 is T 9/23 in class (bring cheat sheet).
Genes, Genomes, and Genomics
Ab initio gene prediction
GENE REGULATION prokaryotic cells – have about 2,000 genes
Identification and Characterization of pre-miRNA Candidates in the C
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Section: ___ Time of lab:______8th or 9th floor (circle)
Relationship between Genotype and Phenotype
Gene Expression Practice Test
Gene Sizes Vary Strachan p146 DYSTROPHIN.
Reading Frames and ORF’s
Modeling of Spliceosome
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Gene Structure.
Gene Structure.
Presentation transcript:

EGASP 2005 Evaluation Protocol Paul Flicek EBI

Basics The evaluations are probably wrong GTF is not standard There are hidden assumptions Filters, overlaps, clusters Terminology varies Genes, exons, etc. EGASP 2005 Evaluations

Evaluation Measures Exons and introns Transcript Gene Sensitivity (Sn) Specificity (Sp) Exon length Exons per transcript Transcript Sn / Sp Overlap Gene EGASP 2005 Evaluations

Definitions EGASP 2005 Evaluations

Definitions Positive Transcript Positive Gene Correct translation start Correct translation stop Every splice site correct Positive Gene At least one positive transcript EGASP 2005 Evaluations

Examples Annotation Trans Sn = 0.5 Trans Sp = 1.0 Gene Sn = 1.0 Gene Sp = 1.0 Prediction EGASP 2005 Evaluations

Examples Annotation Trans Sn = 0.5 Trans Sp = 1.0 Gene Sn = 1.0 Gene Sp = 1.0 Prediction EGASP 2005 Evaluations

Examples Annotation Trans Sn = 0.0 Trans Sp = 0.0 Gene Sn = 0.0 Gene Sp = 0.0 Prediction EGASP 2005 Evaluations

Examples Annotation Trans Sn = 1.0 Trans Sp = 1.0 Gene Sn = 1.0 Gene Sp = 1.0 Prediction EGASP 2005 Evaluations

Examples Annotation Trans Sn = 0.5 Trans Sp = 0.5 Gene Sn = 1.0 Gene Sp = 1.0 Prediction EGASP 2005 Evaluations

Examples Annotation Trans Sn = 1.0 Trans Sp = 0.67 Gene Sn = 1.0 Gene Sp = 1.0 Prediction EGASP 2005 Evaluations

The winners are… (there are clear trends) The most successful programs use expressed sequences Programs using evolutionary conservation are more successful than those that do not Exon and nucleotide measures are similar We are improving EGASP 2005 Evaluations

Spear Catching Time EGASP 2005 Evaluations

EGASP 2005 Evaluations Block 1 Paul Flicek EBI Expressed Sequence Methods

Nucleotide EGASP 2005 Evaluations

Exon EGASP 2005 Evaluations

Intron EGASP 2005 Evaluations

Gene EGASP 2005 Evaluations

Number of Genes 1027 1389 EGASP 2005 Evaluations

Unique Exons EGASP 2005 Evaluations

Summary EGASP 2005 Evaluations

EGASP 2005 Evaluations Block 2 Paul Flicek EBI Evolutionary Conservation (Dual/Multiple Genome) Methods

Summary EGASP 2005 Evaluations

EGASP 2005 Evaluations Block 3a Paul Flicek EBI Ab initio (single genome) and Exon only Methods

Summary EGASP 2005 Evaluations

EGASP 2005 Evaluations Block 3b Paul Flicek EBI Open (Any) Methods

Summary EGASP 2005 Evaluations