Download presentation
Presentation is loading. Please wait.
Published byHoratio Parker Modified over 9 years ago
1
Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)
2
Example data – MALDI-TOF Peptide intensity vs m/z
3
Fragment intensity vs m/z Example data – ESI-LC-MS/MS Time m/z MS/MS Peptide intensity vs m/z vs time
4
Sinus amplitude Wave length b a c
5
Sinus and Cosinus b a c
6
Two Frequencies
7
Fourier Transform
8
from numpy import * x=2.0*pi*arange(1000.0)/100000.0 sin1 = sin(1000.0*x) sin2 = 0.2*sin(10000.0*x) sin12=sin1+sin2 fft12=fft.rfft(sin12) Frequency
9
Inverse Fourier Transform Frequency
10
Inverse Fourier Transform from numpy import * x=2.0*pi*arange(1000.0)/100000.0 sin1 = sin(1000.0*x) sin2 = 0.2*sin(10000.0*x) sin12=sin1+sin2 fft12=fft.rfft(sin12) sin12_= fft.irfft(fft12,len(sin12)) Frequency
11
Inverse Fourier Transform Frequency
12
A Peak centroid full width at half maximum (FWHM) area height maximum mean variance skewness kurtosis Intensity
13
Mean and variance Mean Variance A peak is defined by and
14
Skewness and kurtosis Skewness Kurtosis
15
A Gaussian Peak def gaussian(x,x0,s): return exp(-(x-x0)**2/(2*s**2)) x = linspace(-1,1,1000) y=gaussian(x,0,0.1) ffty=fft.rfft(y) Frequency
16
A Gaussian Peak Skewness = 0 Kurtosis = 0 Frequency
17
Peak with a longer tail Frequency
18
A skewed peak def pdf(x): return 1/sqrt(2*pi) * exp(-x**2/2) def cdf(x): return (1 + erf(x/sqrt(2))) / 2 def skew(x,e=0,w=1,a=0): t = (x-e) / w return 2 / w * pdf(t) * cdf(a*t) Frequency
19
Normal noise x = linspace(-1,1,1000) y=0.2*random.normal(size=len(x)) If the noise is not normally distributed, try to find a transform that makes it normal Frequency
20
Lognormal noise x = linspace(-1,1,1000) y=0.2*random.lognormal(size=len(x)) Frequency
21
Skewed noise x=random.uniform(-1.0,1.0,size=10*len(x)) y=random.uniform(0.0,1.0,size=10*len(x)) yskew=skew(x,-0.1,0.2,10)/max(yskew) yn_skew=x_test[y<yskew][:len(x)] Frequency
22
Gaussian peak with normal noise Frequency
23
Removing High Frequences Frequency
24
Convolution http://en.wikipedia.org/wiki/Convolution Describes the response of a linear and time- invariant system to an input signal The inverse Fourier transform of the pointwise product in frequency space
25
Smoothing by convolution
26
Smoothing w=ones(2*width+1,'d') convolve(w/w.sum(),y,'valid‘) Frequency Intensity
27
Smoothing
29
Adaptive Background Correction (unsharp masking) Unsharp masking Original wi = linspace(1,window_len,window_len) w = 1 / ( 2*r_[wi[::-1],0,wi] + 1 ) x_ = x - d*convolve(w/w.sum(),x,'valid')
30
Adaptive Background Correction
31
Smoothing and Adaptive Background Correction
32
Savitsky-Golay smoothing Polynomial order = 3 Bin size = 25 Bin size = 75 Bin size = 150 Polynomial order = 5Polynomial order = 7
33
Background Frequency
34
Background Subtraction Using Smoothing Bin size = 100Bin size = 200Bin size = 300 Smooting Background subtraction
35
Root Mean Square Deviation (RMSD) The Root Mean Square Deviation (RMSD) is often constant for the noise and larger for the peak if the window size is approximately the size of the peak.
36
Background Subtraction using RMSD Bin size = 100Bin size = 200Bin size = 300 RMSD Intensity
37
Convolution, Cross-correlation, and Autocorrelation http://en.wikipedia.org/wiki/Convolution Convolution describes the response of a linear and time-invariant system to an input signal. The inverse Fourier transform of the pointwise product in frequency space. Cross-correlation is a measure of similarity of two signals. It can be used for finding a shift between two signals. Auto-correlation is the cross-correlation of a signal with itself. It can be used for finding periodic signals obscured by noise.
38
Cross-correlation and autocorrelation http://en.wikipedia.org/wiki/Convolution
39
Autocorrelation Signal Same signal
40
Cross-correlation Signal Shifted signal
41
Cross-correlation Signal Half of the peaks shifted
42
How similar are two signals? Dot product Identical vectors: Perpendicular vectors: The dot product is the came as the cross-correation at zero:
43
What are the characteristics of the dot product? 10 3 1 0.3 0.1 S/N 10 100 1000 Dimensions Signal+Noise Noise
44
Autocorrelation Signal Shifted signal Sum of signal and shifted signal
45
Coincidence – enhances the signal The signal to noise can be dramatically increased by measuring several independent signals of the same phenomenon and combining these signals. Ideal signal Product of the four measurements Four measurements
46
Coincidence – supresses and transforms the noise Noise in productOriginal noise
47
Coincidence – supresses interference Ideal signal Product of the four measurements Four measurements with interference
48
Peak Finding The derivative of a function is zero at its minima and maxima. The second derivative is negative at maxima and positive at minima.
49
Peak Finding 1.Characterize the signal and the noise 2.Make a model of the data 3.Select detection method 4.Select parameters using simulations Intensity
50
Peak Finding: Characterizing the noise Intensity Let’s first try without removing the peaks
51
Peak Finding: Characterizing the noise Intensity Removing the peaks by looking for outliers in the root mean square deviation (RMSD) RMSD
52
Peak Finding: Characterizing the peaks Intensity
53
Peak Finding: Model of data points=1000 x = linspace(-1,1,points) y=noise*random.normal(size=len(x)) y+=signal*gaussian(x,0,0.01) S/N=1S/N=2S/N=4
54
Peak Finding: Detection method S/N=1S/N=2S/N=4 Peaks can be detected by finding maxima in the moving average with a window size similar to the peak width
55
Peak Finding: Detection method – moving average S/N=1 S/N=2 S/N=4 Bin size = 5Bin size = 20Bin size = 80Signal
56
Peak Finding: Detection method – RMSD S/N=1 S/N=2 S/N=4 Bin size = 5Bin size = 20Bin size = 80Signal
57
Peak Finding: Information about the Peak centroid (mean) full width at half maximum (FWHM) area height maximum mean variance skewness kurtosis Intensity
58
Information about a Peak Centroid or mean A peak is defined by To calculate any of these measures we need to know where the peak starts and ends.
59
Where does a peak start and end?
60
Estimating peptide quantity Peak height Curve fitting Peak area Peak height Curve fitting m/z Intensity
61
Time dimension m/z Intensity Time m/z Time
62
Sampling Retention Time Intensity
63
5% Acquisition time = 0.05 5% Sampling
65
What is the best way to estimate quantity? Peak height - resistant to interference - poor statistics Peak area - better statistics - more sensitive to interference Curve fitting - better statistics - needs to know the peak shape - slow
66
Homework: Background Subtraction Using Smoothing
67
Summary Fourier transform - transformation to frequency space and back Signal – how do we detect and characterize signals? Noise – how do we characterize noise? Modeling signal and noise Simulation to select thresholds and select parameters Filters – fitering by low-pass (i.e. smoothing) and high-pass filters (e.g. adaptive background correction) Detection methods based on moving average and RMSD Convolution - describes the response of a linear and time-invariant system to an input signal Cross-correlation is a measure of similarity of two signals Autocorrelation can be used for finding periodic signals obscured by noise The dot product can be used to determine how similar two signals are Coincidence measurements enhance the signal and supresses noise The quantity associated with a peak – height and area Sampling – how often do we need to sample a peak to get a good estimate of its area?
68
Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.