Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

The Proteomics Core at Wayne State University
1336 SW Bertha Blvd, Portland OR 97219
Generalized Protein Parsimony and Spectral Counting for Functional Enrichment Analysis Nathan Edwards Department of Biochemistry and Molecular & Cellular.
Peptide Mass Fingerprinting
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
Bioinformatics. Analysis of proteomic data. Dr Richard J Edwards 28 August 2009; CALMARO workshop. ©Gary Larson (In not much detail)
Annotating genomes using proteomics data Andy Jones Department of Preclinical Veterinary Science.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Scaffold Download free viewer:
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Genome Annotation BCB 660 October 20, From Carson Holt.
Finding prokaryotic genes and non intronic eukaryotic genes
Facts and Fallacies about de Novo Sequencing & Database Search.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteome.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
es/by-sa/2.0/. Large Scale Approaches to the Study of Protein Levels and Activity Prof:Rui Alves
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
Karl Clauser Proteomics and Biomarker Discovery Breast Cancer Proteomics and the use of TCGA Mutational Data - Broad Institute update/issues Karl Clauser.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
MPL Identification of alternative spliced mRNA variants related to cancers by genome-wide ESTs alignment KIM DAE SOO Oncogene Apr.
Laxman Yetukuri T : Modeling of Proteomics Data
Organizing information in the post-genomic era The rise of bioinformatics.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
In-Gel Digestion Why In-Gel Digest?
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Standards for proteomics: The HUPO Proteomics Standards Initiative (HUPO PSI) Public Repository for Mass spectrometry spectral.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Proteomics databases for comparative studies: Transactional and Data Warehouse approaches Patricia Rodriguez-Tomé, Nicolas Pinaud, Thomas Kowall GeneProt,
Multiple flavors of mass analyzers Single MS (peptide fingerprinting): Identifies m/z of peptide only Peptide id’d by comparison to database, of predicted.
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
Overview of Mass Spectrometry
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
S1R1S1R2S2R1S2R2S1R4 S2R3 S2R4 S1R3 S1R1S1R2S2R1S2R2S1R1S1R2S2R1S2R2 S1R3S1R4S2R3S2R4S1R3S1R4S2R3S2R4 PT SET PT SET PT SET PT SET PT SET PT SET PT SETPT.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Finding genes in the genome
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Using Scaffold OHRI Proteomics Core Facility. This presentation is intended for Core Facility internal training purposes only.
GEP Annotation Workflow
Bioinformatics Solutions Inc.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics II
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Misleading Bioinformatics Mistakes, Biases, Mis-Interpretations and how to avoid them Festival of Genomics 2017.
A perspective on proteomics in cell biology
Table 1. Occurrence of N-X-S/T motives in tryptic peptides1
From Mendel to Genomics
Schematic representation of proteogenomic annotation strategy.
High level view of the MAE algorithm.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Sim and PIC scoring results for standard peptides and the test shotgun proteomics dataset. Sim and PIC scoring results for standard peptides and the test.
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

Peptide-assisted annotation of the Mlp genome Philippe Tanguay Nicolas Feau David Joly Richard Hamelin

Objective Use peptide libraries to validate the in silico prediction of gene models  Mapping peptides on a translated genome sequence = provides « correct frames of translation » Assumption : « if a peptide protein is detected, then there must be a gene that encodes it »

Methodology (hardware) Urediniospores (3729) Protein extraction 1D SDS-PAGE Gel slicing (64) Trypsin digestion LC-MS/MS Bioinformatics Waters MassPREP station LTQ ThermoElectron ExtractionSlicing Digestion Elution Peptide MS/MS data acquisition

Methodology (Bioinformatic) Spectral identification by sequence database searching Statistical validation of peptide identifications Protein databases built from… 1 - Comparison of results from both db 2- Comparison of peptides and GM (validation/correction of genome annotations) 6 frames translation of the genome Gene catalog (16694 GM) Mascot Sequest Mascot Sequest

MLP proteomic results so far MS/MS spectra obtained from the total proteins Gene catalog 6-frame translation Mascot + Sequest Only Mascot 352 unique peptides obtained from the 6-frames translation db have do not match GM of the Gene catalog Unique peptides: False discovery rate below 1.6%

Peptide frequency distribution on GM No. peptide/gene model No. gene model Mean  9 peptides covering 134 AA / GM The peptides represent assignments for nearly 10% of the Gene catalog e.g GM

Automated classification of peptides with no hit (352) on the Gene catalog 5’ extension of a predicted GM –If peptide (s) located within the 1000 bp upstream the predicted GM start codon 3’ extension of a predicted GM –If peptide (s) located within the 1000 bp downstream the predicted GM stop codon 5’ and 3’ extension of a predicted GM –If peptides located within the 1000 bp upstream the start codon and within the 1000 bp downstream the predicted GM stop codon Internal extension of a predicted GM –If peptide (s) located in the GM New GM –If no predicted GM in the vicinity of the peptide (s)

Corrections-Additions to the Gene catalog ModificationNumber of GM 5’ extension44 Internal exon extension31 3’ extension22 5’ and 3’ extension5 New GM73 Total172 Mapping of the peptides with no hit on the genome allowed the following modifications

Manual curation- Internal extension

EuGene’s prediction is OK

Manual curation- New GM

Summary – Peptide-assisted genome annotation –Validated 10 % of the predicted GM –Corrected/found > 170 GM According the manual curation accomplished so far, it appears that EuGene had predicted most of the corrected/found > 170 GM With little resources (6000 $ worth of materials and services, and a few weeks worth of labour) our proteomic analysis:

A quantitative proteomic approach (iTRAQ) will be used to compare urediniospores, germinated urediniospores and haustoria protein complexes Perspectives Analysing the Sequest output obtained from the 6-frames translation 5051 peptides identified with Mascot (352 with no hits on the Gene catalog) Sequest ?

Available material Our set of peptide spectra from urediniospores proteins is available to validate new GM predictions The peptides GFF files will be made available to the Melampsora community

Finding the peptides on the different model prediction sets Gene Catalog ,9% EuGene ,9% Genewise ,9% Genewise1Plus ,4% fgenesh1_pg ,2% fgenesh2_pg ,7%  Do we need to perform a new spectra search on the whole model prediction sets ? Total GMModel prediction setGM validated %