1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
Feature-based 3D Reassembly Devi Parikh Mentor: Rahul Sukthankar September 14, 2006.
Characterizing Non- Gaussianities or How to tell a Dog from an Elephant Jesús Pando DePaul University.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Mass Spectrometry Chapter 2 Pg 48
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Tracking Objects with Dynamics Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/21/15 some slides from Amin Sadeghi, Lana Lazebnik,
Copyright 2004 David J. Lilja1 What Do All of These Means Mean? Indices of central tendency Sample mean Median Mode Other means Arithmetic Harmonic Geometric.
Locally Constraint Support Vector Clustering
Diagnosis of Ovarian Cancer Based on Mass Spectra of Blood Samples Hong Tang Yelena Mukomel Eugene Fink.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.
Mass Spectrometry 12-1 to 12-4
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Dorin Comaniciu Visvanathan Ramesh (Imaging & Visualization Dept., Siemens Corp. Res. Inc.) Peter Meer (Rutgers University) Real-Time Tracking of Non-Rigid.
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Scaffold Download free viewer:
Mass Spectroscopy 1Dr. Nikhat Siddiqi. Mass spectrometry is a powerful analytical technique that is used to identify unknown compounds, to quantify known.
NIST and other spectral databases John C. Huffman IUMSC.
Russell Rouseff FOS 6355 Summer 2005 What is Mass Spectroscopy Analytical Chemistry Technique Used to identify and quantify unknown compounds Can also.
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Entropy and some applications in image processing Neucimar J. Leite Institute of Computing
Genetic Algorithm.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Organic Mass Spectrometry
Qualitative Data Analysis. 2 In This Section, We Will Discuss:  How to load data files.  How to use Signal Options for data display.  How to apply.
Raul Garcia-Sanchez Research Investigator: Dr. Paul R. Mahaffy Code 699, NASA Goddard Space Flight Center Research Mentor: Dr. Prabhakar Misra Department.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Chapter 21 R(x) Algorithm a) Anomaly Detection b) Matched Filter.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Laxman Yetukuri T : Modeling of Proteomics Data
CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Chemistry Topic: Atomic theory Subtopic : Mass Spectrometer.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Low-Dimensional Chaotic Signal Characterization Using Approximate Entropy Soundararajan Ezekiel Matthew Lang Computer Science Department Indiana University.
Combined techniques problems L.O.:  Analyse absorptions in an infrared spectrum to identify the presence of functional groups in an organic compound.
Boundary Detection in Tokenizing Network Application Payload for Anomaly Detection Rachna Vargiya and Philip Chan Department of Computer Sciences Florida.
F LORIDA I NTERNATIONAL U NIVERSITY Advanced Mass Spectrometry Piero R. Gardinali/Yong Cai/ Bruce McCord Revised on August 23, 2009.
Multi-object Similarity Query Evaluation Michal Batko.
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
On Using SIFT Descriptors for Image Parameter Evaluation Authors: Patrick M. McInerney 1, Juan M. Banda 1, and Rafal A. Angryk 2 1 Montana State University,
1 Information Content Tristan L’Ecuyer. 2 Degrees of Freedom Using the expression for the state vector that minimizes the cost function it is relatively.
The Chinese University of Hong Kong
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
Geranyl acetate C12H20O2. Mass Spectral Libraries An Ever-Expanding Resource for Chemical Identification Steve Stein Mass Spectrometry Data Center National.
Real-Time Hierarchical Scene Segmentation and Classification Andre Uckermann, Christof Elbrechter, Robert Haschke and Helge Ritter John Grossmann.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Learning and Removing Cast Shadows through a Multidistribution Approach Nicolas Martel-Brisson, Andre Zaccarin IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Data independent acquisition methods for metabolomics Stephen Tate, Ron Bonner AB SCIEX, 71 Four Valley Drive, Concord, ON, L4K 4V8 Canada A high resolution.
Yiming Yang1,2, Abhay Harpale1 and Subramanian Ganaphathy1
Agenda Welcome from the Skyline team!
Outlier Processing via L1-Principal Subspaces
The Chinese University of Hong Kong
Image Processing, Lecture #8
MS Review.
Proteomics Informatics David Fenyő
Image Processing, Lecture #8
Reasoning in Psychology Using Statistics
Basic Practice of Statistics - 3rd Edition Inference for Regression
Attentional Modulations Related to Spatial Gating but Not to Allocation of Limited Resources in Primate V1  Yuzhi Chen, Eyal Seidemann  Neuron  Volume.
NoDupe algorithm to detect and group similar mass spectra.
Presentation transcript:

1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and Applications Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion

2 Outline Introduction –Mass spectrum search types Related Work –Other techniques NIST, PBM, DotMap Method –Probability and Information –Normalized distribution function Results Conclusion Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar

3 Introduction – Mass Spectrum Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar m/z Intensity Decane

4 Introduction – Mass Spectrum Search Outline Introduction Related Work Method Results and Discussion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Library Unknown Spectrum Search Algorithm Potential Matches Mass Spectrum Search Algorithm Search Types Applications

5 Introduction – Search Types Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Identity search –Unknown mass spectrum present in library –Looking for exact spectrum Similarity search –Unknown mass spectrum not present in library –Looking for similar spectrum Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion

6 Introduction – MS Search Applications Steroid detection in athletes Monitor patient breath during surgery Composition of molecular species found in space Honey adulterated with corn syrup Locate oil deposits Monitor fermentation process in the biotechnology industry Detect dioxins in contaminated fish Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Mass Spectrum Search Algorithm Search Types Applications Outline Introduction Related Work Method Results and Discussion

7 Related Work – NIST MS-Search [Stein ‘94] Pre-search the unknown spectra in library –Reduce search domain (160K  4K compounds) Compute match factor for each compound in the pre-search result Match Factor (MF) –Range –Higher the better Pre-search result sorted based on MF value Pick the topmost compounds as possible matches Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion

8 Related Work – NIST MS-Search [Stein ‘94] Match Factor Computation [Stein ‘94] –Term 1 – Mass weighted normalized dot product –Term 2 – Relative intensities of adjacent peaks in both spectra –Combination of F 1 & F 2 Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion

9 Related Work – NIST MS-Search [Stein ‘94] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS Search Probability Based Matching DotMap Outline Introduction Related Work Method Results and Discussion m/zIntensity m/zIntensity C-1C-2 Compare C-1 & C-1 Compare C-1 & C-2 F1999 F MF999925

10 Related Work – Probability Based Matching [McLafferty et. al. ‘75] Confidence Value (K) instead of MF Four components for each m/z –Term 1 : U : Based on the uniqueness of a m/z value –Term 2 : A : Intensity contribution to the confidence –Term 3 : W : Window factor (measure of agreement) –Term 4 : D : Dilution factor (measure of purity) –K  ∑ (U + A + W – D) for each m/z Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

11 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap Fumaric acid Adipic acid Lactic acid DotMap

12 Related Work – DotMap [Sinovec et. al. ‘04] Inverse problem DotMap computed across the image Higher valued areas indicate presence of compound of interest Multiple compounds of interest –Compute DotMap overlay Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

13 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

14 Related Work – DotMap [Sinovec et. al. ‘04] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Outline Introduction Related Work Method Results and Discussion MS Search Probability Based Matching DotMap

15 Method – Motivation NIST MS-Search [Stein ‘94] –No domain information utilized PBM Matching [McLafferty et. al. ‘75] –Old technique (‘75) –Ad hoc domain information utilization DotMap –No domain information utilized Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

16 Method – Entropy Entropy based approach –Entropy  measure of the amount of uncertainty –Based on probabilities Include domain based knowledge (information) in computing the match factor Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

17 Method – Distribution Function Library –NIST EPA Library –163K compounds Compute distribution function (DF) –2 dimensional array m/z vs intensity –DF[i][j] # compounds in library –m/z = i –Intensity = j Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

18 Method – Distribution Function Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion m/z Intensity

19 Method – Normalized Distribution Function (NDF) Normalized Distribution Function –NDF[mz][int] = DF[mz][int] / ∑ DF[mz][i] –Where ∑ DF[mz][i] = 163K –NDF  Probabilities [0-1] Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion i i

20 Method – Assumptions Assumption Each m/z is treated independently in the match factor computation from normalized distribution function Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

21 Method – Match Factor Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Motivation Probability & Entropy Distribution Function Match Factor Outline Introduction Related Work Method Results and Discussion

22 Results – Overview Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion Technique –Compound in library + Noise –Search noisy compound in library Evaluation metric - Average Rank –Rank = Position of correct compound in hit list –Repeat above 3000 times and take average rank Compared with –NIST –NISTDOT (First term in NIST algorithm)

23 Results – Noise models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion Additive A U = A L + G(0,σ) Multiplicative A U = A L + A L* G(0,σ) Johnson Colored A U = A L + G(0,σ*√m) Random spectrum A U = A L + x * A R

24 Results – Additive Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Additive noise Additive Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + G(0,σ) Outline Introduction Related Work Method Results and Discussion

25 Results – Additive Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

26 Results – Additive Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

27 Results – Multiplicative Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Multiplicative noise Multiplicative Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + A L* G(0,σ) Outline Introduction Related Work Method Results and Discussion

28 Results – Multiplicative Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

29 Results – Multiplicative Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

30 Results – Johnson Colored Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Colored Noise Gaussian noise –Zero mean –Variable standard deviation For each m/z in library spectrum A U = A L + G(0,σ*√m) Outline Introduction Related Work Method Results and Discussion

31 Results – Johnson Colored Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

32 Results – Johnson Colored Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

33 Results – Random Spectrum Noise Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Compound = Compound + Random Spectrum Additive Spectrum –Add x% of another random spectrum For each m/z in library or random spectrum –A U = A L + x * A R Outline Introduction Related Work Method Results and Discussion

34 Results – Random Spectrum Noise (Example) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

35 Results – Random Spectrum Noise (Performance) Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

36 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Additive A U = A L + G(0,σ) Multiplicative A U = A L + A L* G(0,σ) Johnson Colored A U = A L + G(0,σ*√m) Random Spectrum A U = A L + x * A R Outline Introduction Related Work Method Results and Discussion

37 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

38 Results – Summary of Noise Models Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar Overview Additive Noise Multiplicative Noise Johnson Colored Noise Random Spectrum Noise Outline Introduction Related Work Method Results and Discussion

39 Conclusion Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar MS library search algorithm Information theoretic –Domain knowledge incorporated Algorithm works well for various noise models Future work –Must improve performance for the random spectrum noise case Outline Introduction Related Work Method Results and Discussion

40 Questions & Suggestions Information-Theoretic Mass Spectral Library SearchCSCE 990 – GCxGC Seminar ? Outline Introduction Related Work Method Results and Discussion