The Student Research and Scholarship Center Grove School of Engineering, And Pathways Bioinformatics Center, CCNY Present Winter Bioinformatics Workshop.

Slides:



Advertisements
Similar presentations
Tandem MS (MS/MS) on the Q-ToF2
Advertisements

Kaizhong Zhang Department of Computer Science University of Western Ontario London, Ontario, Canada Joint work with Bin Ma, Gilles Lajoie, Amanda Doherty-Kirby,
Proteomics Informatics – Protein identification III: de novo sequencing (Week 6)
Chapter 14 Mass Spectroscopy.
Mass Spectrometry The substance being analyzed (solid or liquid) is injected into the mass spectrometer and vaporized at elevated temperature and reduced.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
17.1 Mass Spectrometry Learning Objectives:
PROTEIN IDENTIFICATION BY MASS SPECTROMETRY. OBJECTIVES To become familiar with matrix assisted laser desorption ionization-time of flight mass spectrometry.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
De Novo Sequencing of MS Spectra
HOW MASS SPECTROMETRY CAN IMPROVE YOUR RESEARCH
Each results report will contain:
Scaffold Download free viewer:
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
William H. Brown & Christopher S. Foote
Understanding mass spectroscopy. Mass spectroscopy is a very powerful analytical tool that can provide information on the molecular mass of a compound,
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Session III How we analyzed proteomic data? 台大生技教改暑期課程.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Common parameters At the beginning one need to set up the parameters.
1 Chemical Analysis by Mass Spectrometry. 2 All chemical substances are combinations of atoms. Atoms of different elements have different masses (H =
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
A Phospho-Peptide Spectrum Library for Improved Targeted Assays Barbara Frewen 1, Scott Peterman 1, John Sinclair 2, Claus Jorgensen 2, Amol Prakash 1,
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
For all CHEM5161 students: The first day of class for CHEM5161 (Analytical Spectroscopy) will be on TUE Sept 4 (following Labor Day). There will be no.
INF380 - Proteomics-51 INF380 – Proteomics Chapter 5 – Fundamentals of Mass Spectrometry Mass spectrometry (MS) is used for measuring the mass-to-charge.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
F LORIDA I NTERNATIONAL U NIVERSITY Advanced Mass Spectrometry Piero R. Gardinali/Yong Cai/ Bruce McCord Revised on August 23, 2009.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Faster, more sensitive peptide identification from tandem mass spectra by sequence database compression Nathan J. Edwards Center for Bioinformatics & Computational.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
The observed and theoretical peptide sequence information Cal.MassObserved. Mass ±da±ppmStart Sequence EndSequenceIon Score C.I%modification FLPVNEK.
Deducing protein composition from complex protein preparations by MALDI without peptide separation.. TP #419 Kenneth C. Parker SimulTof Corporation, Sudbury,
Chem. 133 – 4/26 Lecture. Announcements Return graded quiz and additional problem Lab – Lab report deadlines (2:4 – Thursday) Today’s Lecture – Mass Spectrometry.
2014 생화학 실험 (1) 6주차 실험조교 : 류 지 연 Yonsei Proteome Research Center 산학협동관 421호
Constructing high resolution consensus spectra for a peptide library
Chapter 29 Mass Spectrometry. 29 A Principles of mass spectrometry In the mass spectrometer, analyte molecules are converted to ions by applying energy.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Quantitation using Pseudo-Isobaric Tags (QuPIT) and Quantitation using Pseudo-isobaric Amino acids in Cell culture (QuPAC) Parimal Samir Andrew J. Link.
B Monoisotopic mass of neutral peptide M r (calc): Fixed modifications: Carbamidomethyl Ions score: 45 † Expect: ‡ Matches (red): 18/50.
Mass Spectrometry u Chapter 12 Chapter 12.
Yonsei Proteome Research Center Peptide Mass Finger-Printing Part II. MALDI-TOF 2013 생화학 실험 (1) 6 주차 자료 임종선 조교 내선 6625.
Identify proteins. Proteomic workflow Trypsin A typical sample We add a solution of 50 mM NH 4 HCO 3 (pH 7.8) containing trypsin ( µg/µl). Volume.
Mass Spectrometry 101 (continued) Hackert - CH 370 / 387D
‘Protein sequencing’: Determining protein sequences
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
MassMatrix Search Results Explained
Proteomics Lecture 4 Proteases.
Introduction Spectroscopy is an analytical technique which helps determine structure. It destroys little or no sample. The amount of light absorbed by.
Interpretation of Mass Spectra I
Proteomics Informatics –
Bioinformatics for Proteomics
Mass Spectrometry THE MAIN USE OF MS IN ORG CHEM IS:
Shotgun Proteomics in Neuroscience
Processing of fragment ion information in DTA files to remove isotope ions and noise. Processing of fragment ion information in DTA files to remove isotope.
Interpretation of Mass Spectra
Presentation transcript:

The Student Research and Scholarship Center Grove School of Engineering, And Pathways Bioinformatics Center, CCNY Present Winter Bioinformatics Workshop 10am-6pm, January 20 – 21, 2009 Location: Marshak, Room MR-044 Contact info: Invited Speaker Cristina C. Clement, Ph.D. Albert Einstein College of Medicine, Pathology Department Topic: Bioinformatics tools for proteins identification from primary sequence databases using mass spectrometry data January 21, Marshak Building, Room MR-044 1:00pm-2:00pm: Presentation 2:30pm-4:00pm: Online practice

Accuracy & Resolution in Mass Spectroscopy When low molecular weight samples are being analysed using relatively low resolution mass spectrometers, it is common to work with "nominal" mass values, calculated from integer atomic weights. That is, H=1, C=12, N=14, O=16, etc. Nominal mass is rarely used in peptide and protein work because the cumulative error of approximating atomic weights with integers becomes unacceptable. The presence of isotopes at their natural abundances makes it essential to define whether an experimental mass value is an "average" value, equivalent to taking the centroid of the complete isotopic envelope, or a "monoisotopic" value, the mass of the first peak of the isotope distribution. For peptides and proteins, the difference between an average and a monoisotopic weight is approximately 0.06%. This is a significant difference when even the most modest instruments are capable of measuring the mass of a small peptide with an accuracy of a fraction of a Dalton. For example, peptide HLKTEAEMK has an average molecular weight of and a monoisotopic weight of At a mass resolution of 5000, the isotopic envelope has this appearance: m/z

Mass resolution is the dimensionless ratio of the mass of the peak divided by its width. Usually, the peak width is taken as the full width at half maximum intensity, (fwhm). However, this definition of peak width is only a convention, and you may also encounter data acquired on magnetic sector instruments where the resolution has been calculated using the peak width at 5% maximum intensity. To measure a monoisotopic molecular weight requires (i) sufficient mass resolution to resolve the the isotopic distribution (ii) sufficient signal to noise to be able to identify the first peak of the envelope with confidence. For a small peptide, the first peak (often referred to as the 12 C peak) is also the most intense peak. This is not the case for larger molecules. The following two examples show the isotopic envelopes for a small protein (insulin) and a larger protein (BSA): It would be extremely difficult to measure a monoisotopic mass for BSA, and it is far from routine to measure a monoisotopic mass for insulin. In practice, most instruments report monoisotopic molecular weights up to a certain cut-off point. Above this cut- off, isotopic envelopes are centroided as a whole to provide average mass values. Accuracy & Resolution in Mass Spectroscopy m/z

Accuracy & Resolution in Mass Spectroscopy The factor which complicates any general discussion of resolution optimisation is that some types of mass analyser have a trade-off between resolution and sensitivity, while others do not. Where a monoisotopic peak for a single molecular species can be resolved, mass accuracy tends to follow resolution. This is because the narrower the peak, the less the significance of errors due to variations in the peak shape. So, if unit mass resolution is possible, then the more resolution the better... unless there is a sensitivity trade-off. If unit mass resolution is not possible, then there is little benefit to exceeding the instrument resolution at which the isotopic envelope can be defined without significant broadening. For example, the following figure shows the molecular ion of glucagon at resolutions of 1000 (blue), 3000 (red), 10,000 (green) and 30,000 (black). For an average mass measurement, and where there is no trade-off between sensitivity and resolution, the accuracy at 3000 resolution (red) will be just as good as at higher resolution. On an instrument where a trade-off exists, using a resolution greater than 3000 is very likely to degrade mass accuracy. m/z

In this MS-MS approach (depicted above) a first mass spectrometer (MS-1) that employs a quadrupole mass filter is tuned to allow only the analyte ion of interest (e.g. red above) through. This is then taken into a collision cell where Argon is used to fragment the analyte, and the so-called daughter ions are then swept into a second time-of-flight MS (MS-2) where they are separated and detected.

The most common peptide fragments observed in low energy collisions are a, b and y ions, as described in the figure above. The b ions appear to extend from the amino terminus, sometimes called the N-terminus, and y ions appear to extend from the carboxyl terminus, or C-terminus. While readily observed and diagnostic for b ions, a ions occur at a lower frequency and abundance in relation to b ions. The a ions are often used as a diagnostic for b ions, such that a-b pairs are often observed in fragment spectra. The a-b pairs are separated by 28u, the mass for the carbonyl, C=O. The fragment types listed above are the most common fragments observed with ion trap, triple quadrupole, and q-TOF mass spectrometers.

De novo sequencing of peptides This is an MS/MS spectrum of the tryptic peptide GLSDGEWQQVLNVWGK. This data was collected on an ion trap mass spectrometer. This spectrum will be the subject of our first unblinded de novo sequencing example. The sequence of the peptide is determined by the mass difference between these peaks.

The b fragment peaks are labeled from the amino to the carboxyl terminus. The fragment containing only the amino terminal amino acid is termed b1. The fragment containing the first two amino terminal amino acids is termed the b2 ion, and so forth. The nomenclature is very simple to follow. You can calculate the mass of any b ion, basically it is the mass of the shortened peptide (M)-17 (OH) = b ion m/z or more simply M-17 = b ion m/z.

Shows the first six b ions in a little bit more detail. The b ion m/z value is basically the mass of the peptide minus OH, or -17u.

Loss of Ammonia and Water y and b ion fragments containing the amino acid residues R, K, Q, and N may appear to lose ammonia, y and b ion fragments containing the amino acid residues S, T, and E may appear to lose water, -18. In the case of glutamic acid, E must be at the N-terminus of the fragment for this observation to be made. Spectral Intensity Rules b ion intensity will drop when the next residue is P, G or also H, K, and R. - Internal cleavages can occur at P and H residues. An internal cleavage fragment is a fragment that appears to be a shortened peptide with P and or H at its amino terminus, for example the peptide EFGLPGLQNK may display the b ions PGLQNK, PGLQN, PGLQ, etc. These are the result of a double cleavage event. The y ion intensity will often be the most prominent peak in the spectrum. - It is common for b and y ions or y and b ions to swap intensity when a P is encountered in a sequence. This can also be true when the basic residues H, K, or R are encountered in the sequence. - When a cleavage appears before or after R, the -17 (loss of ammonia) peak can be more prominent than the corresponding y or b ion. - When encountering aspartic acid in a sequence, the ion series can die out. Amino Acid Composition It is possible to observe immonium ions at the low end of the spectrum that can give a clue to the amino acid composition of a peptide. One caveat is that if you do not see an immonium ion for a particular amino acid, this does not mean that that amino acid is absent from the sequence. immonium ions

Isobaric Mass Leucine and Isolucine have isobaric masses and cannot be differentiated in a low energy collision. When we see this mass difference in a spectrum we will label it X or Lxx, adopting the Hunt nomenclature. - Lysine and Glutamine have near isobaric masses, and respectively. The delta mass is this difference can be used to differentiate K from Q on a mass spectroneter capable of higher mass accuracy and resolution, such as a q-TOF mass spectrometer. Usually triple quadrupole or ion trap mass spectrometers are incapable of this feat. On a lower mass accuracy mass spectrometers an acetylation can be performed to shift the mass of lysine by 42u. If you like to live dangerously, and we do not, one can assume that a 128 mass shift internally on a tryptic peptide is a glutamine unless followed by a proline or sometimes aspartic acid. Other instances of internal lysines left standing after a tryptic digest (this is our personal observation) is when double lysines occur in a sequence, so be careful. -acetylation There are instances where two residues will nearly equal the mass of a single residue, or a modified residue will nearly equal the mass of another amino acid.

More Rules When starting a de novo sequencing project, start at the high mass end of the spectrum; the lower number of peaks at this end often makes it easier to start sequencing. - The region 60 u below the parent mass can be confounded by multiple water and ammonia losses, be careful. Realize that glycine may be your first amino acid and may fall in this region. - Do you want to know if your tryptic peptide ends in a K or an R? Look for the diagnostic y1 ions at the low end of the spectrum, you may observe 147 for K or 175 for R. - The b1 fragment is seldom observed making it difficult to determine the order of the first two N- terminal amino acids in a peptide sequence. Solutions for this problem can include a one step Edman degradation or an acetylation. -acetylation Once you know the mass of a b or y ion the corresponding y or b ion can be calculated using the following formulas. - y = (M+H)1+ - b +1 - b = (M+H)1+ - y +1

Mascot Search Overview Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases. While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows: Peptide Mass Fingerprint in which the only experimental data are peptide mass values. Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query. MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides. The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry. Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured. The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species. The sequence databases that can be searched on this server are: MSDBMSDB is a comprehensive, non-identical protein sequence database maintained by the Proteomics Department at the Hammersmith Campus of Imperial College London. MSDB is designed specifically for mass spectrometry applications. NCBInrNCBInr is a comprehensive, non-identical protein database maintained by NCBI for use with their search tools BLAST and Entrez. The entries have been compiled from GenBank CDS translations, PIR, SWISS-PROT, PRF, and PDB. SwissProtSwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical, so you may get fewer matches for an MS/MS search than you would from a comprehensive database, such as MSDB or NCBInr. SwissProt is ideal for peptide mass fingerprint searches. dbESTdbEST is the division of GenBank that contains "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. dbEST is a very large database, and is divided into three sections: EST_human, EST_mouse, and EST_others. Even so, searches of these databases take far longer than a search of one of the non-redundant protein databases. You should only search an EST database if a search of a protein database has failed to find a match.

All matches to this query Score Mr(calc):DeltaSequence THRIHWESASLLR SSNSSPSRXSLGQLSE PRFCASLAGGAWLSGL ETKEGTEGEGLQEEA ETKEGTEGEGLQEEX GGSLYEAPVSYTFSK APVIPAPWEAKAGGSR EEGTAASLADIMEIR HVLAHSESINVIAQS SSSILAMRDEQSNPA Protein View Match to: gi| Score: 46: complement C3f Nominal mass (Mr): 2020; NCBI BLAST search of gi| against nrUnformatted sequence string forgi|226159sequence string pasting into other applicationsTaxonomy: Homo sapiens No enzyme cleavage specificity Sequence Coverage: 76% Matched peptides shown in Bold Red 1 SSKITHRIHW ESASLLR Peptide View MS/MS Fragmentation of THRIHWESASLLR Found in gi|226159, complement C3f Match to Query 1: from( ,3+) Mascot search of MS/MS databases Cristina C. Clement, unpublished results

Monoisotopic mass of neutral peptide Mr(calc): Ions Score: 46 Expect: 5.4 Matches (Bold Red): 43/134 fragment ions using 51 most intense peaks #bb ++ b*b* ++ b0b0 b 0++ Seq. yy ++ y*y* ++ y0y0 y 0++ # T H R I H W E S A S L L R Cristina C. Clement, unpublished results

LTQ (2+) Cristina C. Clement, unpublished results

All matches to this query Score Mr(calc):DeltaSequence HWESASLLR HGKEMDLLR GFTFSASDMH AFSFSSALIR HSTYSSLMSS HGEEASSAIPT HQGKLVFNR HGEEGMGQGVV HGEKEEELK QSQKSSMDSC Protein View Match to: gi| Score: 42: complement C3f Nominal mass (Mr): 2020; NCBI BLAST search of gi| against nrUnformatted sequence string for pasting into other applicationsTaxonomy: Homo sapiensgi|226159sequence stringHomo sapiens No enzyme cleavage specificity Sequence Coverage: 52% Matched peptides shown in Bold Red 1 SSKITHRIHW ESASLLR Peptide View MS/MS Fragmentation of HWESASLLR Found in gi|226159, complement C3f Match to Query 1: from( ,2+) Cristina C. Clement, unpublished results

Monoisotopic mass of neutral peptide Mr(calc): Ions Score: 45 Expect: 6.5 Matches (Bold Red): 26/70 fragment ions using 37 most intense peaks #bb ++ b0b0 b 0++ Seq.yy ++ y*y* ++ y0y0 y 0++ # H W E S A S L L R SSKITHRIHW ESASLLR - 1 st hit for complement c3f 1 SSKITHRIHW ESASLLR - 2 nd hit for complement c3f Cristina C. Clement, unpublished results

Monoisotopic:733.32, Charge= +2 Cristina C. Clement, unpublished results Probability Based Mowse Score Ions score is -10*Log(P), where P is the probability that the observed match is a random event. Individual ions scores > 56 indicate peptides with significant homology. Individual ions scores > 73 indicate identity or extensive homology (p<0.05). Protein scores are derived from ions scores as a non- probabilistic basis for ranking protein hits.

Peptide View MS/MS Fragmentation of DSGEGDFLAEGGGVR Found in gi|229185, fibrinopeptide A Match to Query 1: from( ,2+) Data file C:\Documents and Settings\Laura Santambrogio\Desktop\rksah062708\LTQ\LTQ- ccc\LII_3RAW071608\12446Raw_2+.txt #bb ++ b0b0 b 0++ Seq.yy ++ y*y* ++ y0y0 y 0++ # D S G E G D F L A E G G G V R Monoisotopic mass of neutral peptide Mr(calc): Ions Score: 123 Expect: 5.3e-07 Matches (Bold Red): 38/130 fragment ions using 34 most intense peaks Cristina C. Clement, unpublished results

ScoreMr(calc):DeltaSequence DSGEGDFLAEGGGVR DSGEGDFLAEGGGVR LDLCQDSFPGNPTG EMYRNLAQGRNV EMYRNLAQGRNV CARGWAFDIWGQG SSVGTEMIITKAGR RSSGGETETTGQSAV RSSGGETETTGQSAV CARDQAFDIWGQG 1 MFSMRIVCLV LSVVGTAWTA DSGEGDFLAE GGGVRGPRVV ERHQSACKDS 51 DWPFCSDEDW NYKCPSGCRM KGLIDEVNQD FTNRINKLKN SLFEYQKNNK 101 DSHSLTTNIM EILRGDFSSA NNRDNTYNRV SEDLRSRIEV LKRKVIEKVQ 151 HIQLLQKNVR AQLVDMKRLE VDIDIKIRSC RGSCSRALAR EVDLKDYEDQ 201 QKQLEQVIAK DLLPSRDRQH LPLHSSLGDR ARLHLKTNKT AKKKKKKKKK Cristina C. Clement, unpublished results

Bioinformatics Workshop project Leading project supervisor: Cristina C. Clement, Ph.D. Set up the MS/MS Ion search parameters using Mascot algorithm Import txt. files of ms/ms data at the Mascot server ON LINE (3 independent txt files are provided/student) 01/21/2009 Analyze Mascot results -Analyze Peptide View, understand the scoring functions -Analyze the Protein View, understand the sequence coverage Re-analyze the txt MS/MS files using independent searching algorithms at PROSPECT and PROFOUND webservers