Presentation is loading. Please wait.

Presentation is loading. Please wait.

Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)

Similar presentations


Presentation on theme: "Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)"— Presentation transcript:

1 Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)

2 Example data – MALDI-TOF Peptide intensity vs m/z

3 Fragment intensity vs m/z Example data – ESI-LC-MS/MS Time m/z MS/MS Peptide intensity vs m/z vs time

4 Sinus amplitude Wave length b a c 

5 Sinus and Cosinus b a c 

6 Two Frequencies

7 Fourier Transform

8 from numpy import * x=2.0*pi*arange(1000.0)/100000.0 sin1 = sin(1000.0*x) sin2 = 0.2*sin(10000.0*x) sin12=sin1+sin2 fft12=fft.rfft(sin12) Frequency

9 Inverse Fourier Transform Frequency

10 Inverse Fourier Transform from numpy import * x=2.0*pi*arange(1000.0)/100000.0 sin1 = sin(1000.0*x) sin2 = 0.2*sin(10000.0*x) sin12=sin1+sin2 fft12=fft.rfft(sin12) sin12_= fft.irfft(fft12,len(sin12)) Frequency

11 Inverse Fourier Transform Frequency

12 A Peak centroid full width at half maximum (FWHM) area height maximum mean variance skewness kurtosis Intensity

13 Mean and variance Mean Variance A peak is defined by and

14 Skewness and kurtosis Skewness Kurtosis

15 A Gaussian Peak def gaussian(x,x0,s): return exp(-(x-x0)**2/(2*s**2)) x = linspace(-1,1,1000) y=gaussian(x,0,0.1) ffty=fft.rfft(y) Frequency

16 A Gaussian Peak Skewness = 0 Kurtosis = 0 Frequency

17 Peak with a longer tail Frequency

18 A skewed peak def pdf(x): return 1/sqrt(2*pi) * exp(-x**2/2) def cdf(x): return (1 + erf(x/sqrt(2))) / 2 def skew(x,e=0,w=1,a=0): t = (x-e) / w return 2 / w * pdf(t) * cdf(a*t) Frequency

19 Normal noise x = linspace(-1,1,1000) y=0.2*random.normal(size=len(x)) If the noise is not normally distributed, try to find a transform that makes it normal Frequency

20 Lognormal noise x = linspace(-1,1,1000) y=0.2*random.lognormal(size=len(x)) Frequency

21 Skewed noise x=random.uniform(-1.0,1.0,size=10*len(x)) y=random.uniform(0.0,1.0,size=10*len(x)) yskew=skew(x,-0.1,0.2,10)/max(yskew) yn_skew=x_test[y<yskew][:len(x)] Frequency

22 Gaussian peak with normal noise Frequency

23 Removing High Frequences Frequency

24 Convolution http://en.wikipedia.org/wiki/Convolution Describes the response of a linear and time- invariant system to an input signal The inverse Fourier transform of the pointwise product in frequency space

25 Smoothing by convolution

26 Smoothing w=ones(2*width+1,'d') convolve(w/w.sum(),y,'valid‘) Frequency Intensity

27 Smoothing

28

29 Adaptive Background Correction (unsharp masking) Unsharp masking Original wi = linspace(1,window_len,window_len) w = 1 / ( 2*r_[wi[::-1],0,wi] + 1 ) x_ = x - d*convolve(w/w.sum(),x,'valid')

30 Adaptive Background Correction

31 Smoothing and Adaptive Background Correction

32 Savitsky-Golay smoothing Polynomial order = 3 Bin size = 25 Bin size = 75 Bin size = 150 Polynomial order = 5Polynomial order = 7

33 Background Frequency

34 Background Subtraction Using Smoothing Bin size = 100Bin size = 200Bin size = 300 Smooting Background subtraction

35 Root Mean Square Deviation (RMSD) The Root Mean Square Deviation (RMSD) is often constant for the noise and larger for the peak if the window size is approximately the size of the peak.

36 Background Subtraction using RMSD Bin size = 100Bin size = 200Bin size = 300 RMSD Intensity

37 Convolution, Cross-correlation, and Autocorrelation http://en.wikipedia.org/wiki/Convolution Convolution describes the response of a linear and time-invariant system to an input signal. The inverse Fourier transform of the pointwise product in frequency space. Cross-correlation is a measure of similarity of two signals. It can be used for finding a shift between two signals. Auto-correlation is the cross-correlation of a signal with itself. It can be used for finding periodic signals obscured by noise.

38 Cross-correlation and autocorrelation http://en.wikipedia.org/wiki/Convolution

39 Autocorrelation Signal Same signal

40 Cross-correlation Signal Shifted signal

41 Cross-correlation Signal Half of the peaks shifted

42 How similar are two signals? Dot product Identical vectors: Perpendicular vectors: The dot product is the came as the cross-correation at zero:

43 What are the characteristics of the dot product? 10 3 1 0.3 0.1 S/N 10 100 1000 Dimensions Signal+Noise Noise

44 Autocorrelation Signal Shifted signal Sum of signal and shifted signal

45 Coincidence – enhances the signal The signal to noise can be dramatically increased by measuring several independent signals of the same phenomenon and combining these signals. Ideal signal Product of the four measurements Four measurements

46 Coincidence – supresses and transforms the noise Noise in productOriginal noise

47 Coincidence – supresses interference Ideal signal Product of the four measurements Four measurements with interference

48 Peak Finding The derivative of a function is zero at its minima and maxima. The second derivative is negative at maxima and positive at minima.

49 Peak Finding 1.Characterize the signal and the noise 2.Make a model of the data 3.Select detection method 4.Select parameters using simulations Intensity

50 Peak Finding: Characterizing the noise Intensity Let’s first try without removing the peaks

51 Peak Finding: Characterizing the noise Intensity Removing the peaks by looking for outliers in the root mean square deviation (RMSD) RMSD

52 Peak Finding: Characterizing the peaks Intensity

53 Peak Finding: Model of data points=1000 x = linspace(-1,1,points) y=noise*random.normal(size=len(x)) y+=signal*gaussian(x,0,0.01) S/N=1S/N=2S/N=4

54 Peak Finding: Detection method S/N=1S/N=2S/N=4 Peaks can be detected by finding maxima in the moving average with a window size similar to the peak width

55 Peak Finding: Detection method – moving average S/N=1 S/N=2 S/N=4 Bin size = 5Bin size = 20Bin size = 80Signal

56 Peak Finding: Detection method – RMSD S/N=1 S/N=2 S/N=4 Bin size = 5Bin size = 20Bin size = 80Signal

57 Peak Finding: Information about the Peak centroid (mean) full width at half maximum (FWHM) area height maximum mean variance skewness kurtosis Intensity

58 Information about a Peak Centroid or mean A peak is defined by To calculate any of these measures we need to know where the peak starts and ends.

59 Where does a peak start and end?

60 Estimating peptide quantity Peak height Curve fitting Peak area Peak height Curve fitting m/z Intensity

61 Time dimension m/z Intensity Time m/z Time

62 Sampling Retention Time Intensity

63 5% Acquisition time = 0.05  5% Sampling

64

65 What is the best way to estimate quantity? Peak height - resistant to interference - poor statistics Peak area - better statistics - more sensitive to interference Curve fitting - better statistics - needs to know the peak shape - slow

66 Homework: Background Subtraction Using Smoothing

67 Summary Fourier transform - transformation to frequency space and back Signal – how do we detect and characterize signals? Noise – how do we characterize noise? Modeling signal and noise Simulation to select thresholds and select parameters Filters – fitering by low-pass (i.e. smoothing) and high-pass filters (e.g. adaptive background correction) Detection methods based on moving average and RMSD Convolution - describes the response of a linear and time-invariant system to an input signal Cross-correlation is a measure of similarity of two signals Autocorrelation can be used for finding periodic signals obscured by noise The dot product can be used to determine how similar two signals are Coincidence measurements enhance the signal and supresses noise The quantity associated with a peak – height and area Sampling – how often do we need to sample a peak to get a good estimate of its area?

68 Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)


Download ppt "Proteomics Informatics – Signal processing I: analysis of mass spectra (Week 3)"

Similar presentations


Ads by Google