Presentation on theme: "A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,"— Presentation transcript:
A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations.mass spectrometrygene
It is used to characterize protein sequence. The basic idea is to ionize proteins and let it “fly” in a vacuum chamber. The mass/charge (m/z) ratio of the ion can be deduced from the Time of Flight (TOF) of the ion (to reach a detector) or the frequency in which it is circling in a magnetic field.
Some Mass Spectrometry technique ionize whole proteins but the current popular method is to chop a protein into peptides. The peptides are separated by their masses before ionization and sequenced independently. The peptide sequences are mapped back to known protein sequences or used for de novo sequencing (very much like genome sequencing) The peptide lengths – according to the people I met is around 7-15 amino acids
Pros: It is accurate in determining mass. It can surely point, assuming unambiguous mapping to a protein sequence, to those proteins that are translated in the cell – this can point which mRNAs get translated and which are not. It can be used to quantify the amount of different proteins in the sample – as opposed to predicting it from the mRNA levels using microarray
Pros: It can identify Post Translational Modification i.e If proteins are phosphorylated (then it is Kinase related) If proteins are methylated and acetylated (important in Histone code) If proteins are ubiquitinated (related to protein degradation) It can detect (ribosomal) programmed frameshift and alternative splicing events.
Cons: It is still expensive (but some expert in RECOMB Satellite for Computational Proteomics said it is just as expensive as RNA-Seq). It is hard to distinguish amino acids with similar mass sum (most notably Leucine and Isoleucine) We do not have reliable way to amplify proteins in the sample (serious problem)
Accurate prediction of Translation Start Site. Accurate prediction of programmed frameshifts. Accurate prediction of post translational modification. A confirmation if a (pseudo)gene is actually translated. Observation: most current algorithms on gene prediction are not based on proteomic data (because they were not available)
For a novel protein, mapping the peptides from the Mass Spectrometry experiments to the exomes/genomes (similar problem as RNA-Seq) Currently they try to collect exomes (regions that is assumed to be exons) and translate them in 6 different frames (3 in each DNA strand). They also build a exon splice graph which models different splicing alternatives of a single gene
They developed a program to search a peptide in this graph called Inspect. Can be found at http://proteomics.ucsd.edu/Inspecthttp://proteomics.ucsd.edu/Inspect Each box represents a single exon and the arrows represent possible combinations of them in the translated protein product.
Revising gene models – hence their annotations. Finding novel peptides that maps to non-exonic regions – novel genes?
Nitin Gupta et al. Whole proteome analysis of post- translational modifications: applications of mass- spectrometry for proteogenomic annotation. Genome Res 2007. Proteogenomics: Annotating Genomes using the Proteome. Natalie Castellana. Poster in RECOMB CP 2011. http://proteomics.ucsd.edu/recombcp2011/Posters/Poste r_B19.pdf http://proteomics.ucsd.edu/recombcp2011/Posters/Poste r_B19.pdf Tutorial: Proteogenomics. Natalie Castellana. http://bix.ucsd.edu/projects/recombcp10_tutorials/RECO MBCP_Tutorial_Castellana.pdf http://bix.ucsd.edu/projects/recombcp10_tutorials/RECO MBCP_Tutorial_Castellana.pdf Most of the work are done by Pavel Pevzner and other groups in UC San Diego. Here is their website http://proteomics.ucsd.edu/ http://proteomics.ucsd.edu/
Is a branch of proteogenomics that compares proteomic data from multiple related species concurrently and exploits the homology between their proteins to improve annotations with higher statistical confidence. In a sense – this is the approximate peptide matching problem. However, it needs to take residue conservation at different part of the proteins into account e.g sites which are post translationally modified must be preserved to maintain function.
Some work in comparative proteogenomics: Nitin Gupta et al. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 2008. GenoMS (Castellana et al. MCP 2010) – This is a program to map peptides to the genome of other related organism
Metaproteomics (also Community Proteomics, Environmental Proteomics, or Community Proteogenomics) is the study of all protein samples recovered directly from environmental samples. This involves simultaneous mapping of peptides to all known genomes and proteomes to get the identity of different organisms present in a sample. Example work in this field is by Wilmes P, Bond PL. Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol. 2006.