INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.

Slides:



Advertisements
Similar presentations
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Advertisements

Aimé Lay-Ekuakille University of Salento. Index: 1.Problem statement 2.Main motivation 3.FDM-Filter Diagonalization Method (mono) 4.DSD-Decimate Signal.
1336 SW Bertha Blvd, Portland OR 97219
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
Heuristic alignment algorithms and cost matrices
Error Propagation. Uncertainty Uncertainty reflects the knowledge that a measured value is related to the mean. Probable error is the range from the mean.
Data Processing Algorithms for Analysis of High Resolution MSMS Spectra of Peptides with Complex Patterns of Posttranslational Modifications Shenheng Guan.
Chapter 2 Fundamentals of Data and Signals
Development of Empirical Models From Process Data
Similar Sequence Similar Function Charles Yan Spring 2006.
Chapter 2: Fundamentals of Data and Signals. 2 Objectives After reading this chapter, you should be able to: Distinguish between data and signals, and.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
1 Chapter 2 Fundamentals of Data and Signals Data Communications and Computer Networks: A Business User’s Approach.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Each results report will contain:
Lecture II-2: Probability Review
Analysis of tandem mass spectra - II Prof. William Stafford Noble GENOME 541 Intro to Computational Molecular Biology.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Goals For This Class Quickly review of the main results from last class Convolution and Cross-correlation Discrete Fourier Analysis: Important Considerations.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Data Communications & Computer Networks, Second Edition1 Chapter 2 Fundamentals of Data and Signals.
UNIVERSITI MALAYSIA PERLIS
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
PROTEIN QUANTIFICATION AND PTM JUN SIN HSS.I. PROJECT 1.
Speech Enhancement Using Spectral Subtraction
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
INF380 - Proteomics-61 INF380 – Proteomics Chapter 6 – Mass Spectrometry – MALDI TOF The MALDI-TOF instruments are the simplest MS instruments suitable.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Fundamentals of Electric Circuits Chapter 18 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Temple University MASS SPECTROMETRY MATLAB SIMULATION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
PeptideProphet Explained Brian C. Searle Proteome Software Inc SW Bertha Blvd, Portland OR (503) An explanation.
Modern Navigation Thomas Herring MW 11:00-12:30 Room
Protein Identification via Database searching Attila Kertész-Farkas Protein Structure and Bioinformatics Group, ICGEB, Trieste.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
CSE182 CSE182-L12 Mass Spectrometry Peptide identification.
INF380 - Proteomics-71 INF380 – Proteomics Chap 7 –Protein Identification and Characterization by MS Protein identification in our context means that we.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
October 16, 2014Computer Vision Lecture 12: Image Segmentation II 1 Hough Transform The Hough transform is a very general technique for feature detection.
Tag-based Blind Identification of PTMs with Point Process Model 1 Chunmei Liu, 2 Bo Yan, 1 Yinglei Song, 2 Ying Xu, 1 Liming Cai 1 Dept. of Computer Science.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 6-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
ISA Kim Hye mi. Introduction Input Spectrum data (Protein database) Peptide assignment Peptide validation manual validation PeptideProphet.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Biomathematics seminar Application of Fourier to Bioinformatics Girolamo Giudice.
1 Chapter 8 The Discrete Fourier Transform (cont.)
Statistical analysis.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
MassMatrix Search Results Explained
MECH 373 Instrumentation and Measurements
Statistical analysis.
Fitting Curve Models to Edges
Computer Vision Lecture 16: Texture II
Perceptual Echoes at 10 Hz in the Human Brain
Interpretation of Mass Spectra I
Statistics for Managers Using Microsoft® Excel 5th Edition
NoDupe algorithm to detect and group similar mass spectra.
Interpretation of Mass Spectra
Operation manual of AI SIDA
Presentation transcript:

INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical spectra constructed from segments from database sequences. The segments to be used are typically theoretical peptides from an in silico digestion. The comparison methods can be characterized by the following: –which fragment ion types are considered –how the intensities in the theoretical spectrum are calculated –how the comparison is performed (the algorithm) and scored –whether modifications/mutations are taken into account, and how Constructing a theoretical spectrum –First one has to specify the fragment types expected in the experimental spectra, {  i } –The segment is then processed by a theoretical fragmentation, producing ions of the specified types (typically singly charged). Peak construction –For each fragmentation site one must decide for which of the fragment types  ={  i } there should be peaks. –The simplest case is to construct all. This result in a complete spectrum. –An alternative approach is to use a fragment type probability p(  i ), an estimate of the probability that an ion of type  i is produced at a fragmentation site. –In addition, peaks can be produced due to noise. If q is the probability for a peak due to noise, then there should be constructed a peak at the corresponding m/z value with probability p(  ={  i }) +[1-p(  i )]q. –The corresponding $m/z$ values of the theoretical peaks are determined by using the equations described in earlier Chapter

INF380 - Proteomics-102 Peak intensities The determination of the peak intensities can be done at three levels, resulting in different types of theoretical spectra (or spectra at different levels), –UT spectra] (Uniform Theoretical) All peaks have the same height. –FT spectra] (Fragment Theoretical) The height of a peak depends on the fragment type, meaning for example that all b-ions get the same height, but with a different height from y-ions. –RT spectra] (Residue Theoretical) Different heights are given to each peak of the same fragment type, depending on information about position, length, sequence, mass, etc, These different spectra types are reflected in different methods for spectral comparison and scoring. –One can quickly construct UT-spectra (or FT-spectra) and compare to the experimental spectrum. If a theoretical spectrum is giving a high score, one would go on to a more sophisticated scoring, based on the same type of information as for constructing RT spectra. Thus, this can be considered as a two-step procedure where the first step functions as a filter. –One can spend more time constructing RT spectra before comparison.

INF380 - Proteomics-103 Non-probabilistic scoring In this context, comparing spectra means comparing an experimental and a theoretical spectrum, though several of the scoring schemes were originally developed for comparing experimental spectra. Note however that there is a difference between comparing two experimental spectra (for example, to examine whether they are generated from the same peptide) and comparing an experimental and a theoretical spectrum, as all expected fragment types are included in the theoretical spectra. In the comparison two main methods are used. –Search for matching peaks. –Divide the m/z axis into intervals, and the integral intensities in corresponding intervals are compared. The latter avoids the process of finding corresponding peaks, but a disadvantage is loss of precision and problems when the ion's mass is on the border of two intervals. Scoring schemes typically include a sum of the scorings for each pair of matching peaks (or intervals), but often also a scoring component based on several matching peaks. Different variants of these methods are used, from quite simple to more advanced. Number and intensities of matching peaks or intervals The simplest procedure for comparison is to process the spectra in parallel, counting the number of matching peaks. This can easily be extended to taking intensities into account (either only from the experimental spectrum, or from both the experimental and theoretical spectra). One possibility is to calculate

INF380 - Proteomics-104 Non-probabilistic scoring We see that only intervals with intensities in both spectra effect the score. The scoring scheme above has two components, the number of matches, and the intensities of the matches. It assumes a linear increase in scoring as the number of matches increases. This means (if all peaks have the same intensity) that a comparison with eight matches is twice as good as a comparison with four matches. This is unreasonable, since some of the matches may occur simply by chance. Underlying probability functions for the number of matches occurring by chance are typically exponential, indicating an exponential increase in the score as function of number of matches. While this does not matter if all intensities were the same (the scores would arrange the segments in the same order), it has effects when the intensities varies.

INF380 - Proteomics-105 Non-probabilistic scoring Spectral contrast angle –A spectrum can be represented as an n-dimensional vector, where n is the number of considered m/z- values (or intervals). The j'th component of the experimental spectrum is then I j R. –Two spectra can be compared by calculating the angle between the vector representations, called the spectral contrast angle. –Two equal spectra have a contrast angle of zero, and 90 degrees indicates the maximum spectra differentiation. –Spectral contrast angle is mainly used to identify spectra produced by the same peptide, Cross-correlation –A common function for calculating the correlation between any two signal series is the cross-correlation function. –For spectra in our context it can be formulated as –where  is a relative displacement between the spectra. –The simplest way to calculate the similarity (or correlation) is to use the correlation value for  =0 (corresponding to the number of matching peaks when intensities are considered). –It has however been found that subtracting the mean of the cross-correlation function over a range -k<  <k from the  =0-value gave better discrimination between similar spectra and was less sensitive to the purity of the samples. –In \cite{Eng-JAM94} it was found empirically that 75 is a suitable value for k for MS/MS data. –For efficient calculation of this function for many , a Fourier transformation can be used. R and T are Fourier transformed, one of them converted to its complex conjugate, and multiplication performed. The result is then inversed Fourier transformed, to get the final value C  for many .

INF380 - Proteomics-106 Sequest scoring Since several of the statistical and assessment analyses for MS/MS spectra use results from SEQUEST, we describe how SEQUEST scores its candidate segments when searching in a database. It has two types of scorings, first a preliminary scoring that is used to filter out segments having small probabilities for being the correct peptide. Then the remaining segments are scored by a final scoring scheme. Preliminary scoring –The preliminary scoring uses b-and y-type continuity, and the presence of immonium ions. –The b- (y-) type continuity includes the number of matching peaks of type b-ion (y-ion) that also have a matching peak for the preceding b-ion (y-ion). If the total number of such peaks is C, a factor 1+C  is included in the scoring (where is used for  ). –The immonium ions are scored by calculating a value  a for each of the amino acids a which are considered for the presence of immonium ions.

INF380 - Proteomics-107 Sequest scoring – Final scoring In addition to cross-correlation values, several other scoring components are calculated for identified segments. The most important ones are given below. –XCorr, the cross-correlation score. –dCn, the delta correlation value. The cross correlation values are normalized such that the highest correlation value is one. dCn for a candidate match is 1 - the normalized correlation value. It is of special interest to consider the dCn value for the second best match (second highest correlation value). –Sp, the preliminary score. –RSp, the rank the segment under consideration got in the preliminary scoring. –Ions, the number of matched peaks divided by the number of peaks in the theoretical spectrum. –dM, the difference between the experimental precursor mass and the mass of the segment under consideration.

INF380 - Proteomics-108 Probabilistic scoring