A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry,

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Transcriptomics Jim Noonan GENE 760.
Finding Eukaryotic Open reading frames.
Protein Sequencing and Identification by Mass Spectrometry.
Introduction to BioInformatics GCB/CIS535
CSE182-L12 Gene Finding.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Annotating genomes using proteomics data Andy Jones Department of Preclinical Veterinary Science.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
-The methods section of the course covers chapters 21 and 22, not chapters 20 and 21 -Paper discussion on Tuesday - assignment due at the start of class.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
The iPlant Collaborative
Replication Transcription Translation
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Sackler Medical School
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
DNA and Translation Gene: section of DNA that creates a specific protein Approx 25,000 human genes Proteins are used to build cells and tissue Protein.
Amino acids are coded by mRNA base sequences.
Central dogma: the story of life RNA DNA Protein.
Introduction to RNAseq
Bioinformatics and Computational Biology
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
TOX680 Unveiling the Transcriptome using RNA-seq Jinze Liu.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Finding genes in the genome
Starter What do you know about DNA and gene expression?
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Reliable Identification of Genomic Variants from RNA-seq Data Robert Piskol, Gokul Ramaswami, Jin Billy Li PRESENTED BY GAYATHRI RAJAN VINEELA GANGALAPUDI.
Considerations for multi-omics data integration Michael Tress CNIO,
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Post translational modification n- acetylation Peptide Mass Fingerprinting (PMF) is an analytical technique for identifying unknown protein. Proteins to.
(3) Gene Expression Gene Expression (A) What is Gene Expression?
‘Protein sequencing’: Determining protein sequences
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
Sequence based searches:
Investigating Diversity Part 2
Higher Biology Gene Expression Mr G R Davidson.
Transcription & Gene Expression
CSE182-L12 Gene Finding.
Eukaryotic Gene Finding
Transcription & Translation.
Gene Regulation.
Genome organization and Bioinformatics
Eukaryote Regulation and Gene Expression
Proteomics Informatics David Fenyő
Epigenetics Study of the modifications to genes which do not involve changing the underlying DNA
7.2 Transcription & Gene Expression
AH Biology: Unit 1 Proteomics and Protein Structure 1
Proteomics and Amino Acids
Gene Expression Practice Test
From Mendel to Genomics
Schematic representation of proteogenomic annotation strategy.
credit: modification of work by NIH
From gene to protein.
Proteomics Informatics David Fenyő
Presentation transcript:

A combination of the words Proteomics and Genomics. Proteogenomics commonly refer to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations.mass spectrometrygene

It is used to characterize protein sequence. The basic idea is to ionize proteins and let it “fly” in a vacuum chamber. The mass/charge (m/z) ratio of the ion can be deduced from the Time of Flight (TOF) of the ion (to reach a detector) or the frequency in which it is circling in a magnetic field.

Some Mass Spectrometry technique ionize whole proteins but the current popular method is to chop a protein into peptides. The peptides are separated by their masses before ionization and sequenced independently. The peptide sequences are mapped back to known protein sequences or used for de novo sequencing (very much like genome sequencing) The peptide lengths – according to the people I met is around 7-15 amino acids

Pros: It is accurate in determining mass. It can surely point, assuming unambiguous mapping to a protein sequence, to those proteins that are translated in the cell – this can point which mRNAs get translated and which are not. It can be used to quantify the amount of different proteins in the sample – as opposed to predicting it from the mRNA levels using microarray

Pros: It can identify Post Translational Modification i.e If proteins are phosphorylated (then it is Kinase related) If proteins are methylated and acetylated (important in Histone code) If proteins are ubiquitinated (related to protein degradation) It can detect (ribosomal) programmed frameshift and alternative splicing events.

Cons: It is still expensive (but some expert in RECOMB Satellite for Computational Proteomics said it is just as expensive as RNA-Seq). It is hard to distinguish amino acids with similar mass sum (most notably Leucine and Isoleucine) We do not have reliable way to amplify proteins in the sample (serious problem)

Accurate prediction of Translation Start Site. Accurate prediction of programmed frameshifts. Accurate prediction of post translational modification. A confirmation if a (pseudo)gene is actually translated. Observation: most current algorithms on gene prediction are not based on proteomic data (because they were not available)

For a novel protein, mapping the peptides from the Mass Spectrometry experiments to the exomes/genomes (similar problem as RNA-Seq) Currently they try to collect exomes (regions that is assumed to be exons) and translate them in 6 different frames (3 in each DNA strand). They also build a exon splice graph which models different splicing alternatives of a single gene

They developed a program to search a peptide in this graph called Inspect. Can be found at Each box represents a single exon and the arrows represent possible combinations of them in the translated protein product.

Revising gene models – hence their annotations. Finding novel peptides that maps to non-exonic regions – novel genes?

Nitin Gupta et al. Whole proteome analysis of post- translational modifications: applications of mass- spectrometry for proteogenomic annotation. Genome Res Proteogenomics: Annotating Genomes using the Proteome. Natalie Castellana. Poster in RECOMB CP r_B19.pdf r_B19.pdf Tutorial: Proteogenomics. Natalie Castellana. MBCP_Tutorial_Castellana.pdf MBCP_Tutorial_Castellana.pdf Most of the work are done by Pavel Pevzner and other groups in UC San Diego. Here is their website

Is a branch of proteogenomics that compares proteomic data from multiple related species concurrently and exploits the homology between their proteins to improve annotations with higher statistical confidence. In a sense – this is the approximate peptide matching problem. However, it needs to take residue conservation at different part of the proteins into account e.g sites which are post translationally modified must be preserved to maintain function.

Some work in comparative proteogenomics: Nitin Gupta et al. Comparative proteogenomics: Combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res GenoMS (Castellana et al. MCP 2010) – This is a program to map peptides to the genome of other related organism

Metaproteomics (also Community Proteomics, Environmental Proteomics, or Community Proteogenomics) is the study of all protein samples recovered directly from environmental samples. This involves simultaneous mapping of peptides to all known genomes and proteomes to get the identity of different organisms present in a sample. Example work in this field is by Wilmes P, Bond PL. Metaproteomics: studying functional gene expression in microbial ecosystems. Trends Microbiol

CSPS (Bandeira et al. Nat. Biot. 2009)

MassBank nt.html

I notice that Hoang’s problem – the one which may be able to store multiple reference genomes is going to be very relevant. RNA-Seq - Mass Spectrometry = Non- coding RNA? Anything else?