Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

DEVELOPMENT OF A COMPUTER PLATFORM FOR OBJECT 3D RECONSTRUCTION USING COMPUTER VISION TECHNIQUES Teresa C. S. Azevedo João Manuel R. S. Tavares Mário A.
Introducing AnalyzerPro. Chapter 1: Qualitative Analysis.
Gas Chromatography, GC L.O.:  Explain the term: retention time.  Interpret gas chromatograms in terms of retention times and the approximate proportions.
Gas chromatography–mass spectrometry (GC-MS) is an analytical method that combines the features of gas-liquid chromatographyand mass spectrometry to identify.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Lecture 8. GC/MS.
A Bioinformatics System for GC×GC-MS Comprehensive Two-Dimensional Gas Chromatography with Mass Spectrometry Qingping Tao and Stephen E. Reichenbach, GC.
With Supersonic Molecular Beams Maya Kochman and Aviv Amirav
Unsupervised Feature Selection for Multi-Cluster Data Deng Cai et al, KDD 2010 Presenter: Yunchao Gong Dept. Computer Science, UNC Chapel Hill.
Slepton Discovery in Cascade Decays Jonathan Eckel, Jessie Otradovec, Michael Ramsey-Musolf, WS, Shufang Su WCLHC Meeting UCSB April
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Quadtrees, Octrees and their Applications in Digital Image Processing
1 Information-Theoretic Mass Spectral Library Search Arvind Visvanathan CSCE 990 Seminar in Multi-Dimensional Chromatography Systems, Informatics, and.
Pores and Ridges: High- Resolution Fingerprint Matching Using Level 3 Features Anil K. Jain Yi Chen Meltem Demirkus.
Hierarchical GUI Test Case Generation Using Automated Planning Atif M. Memon, Student Member, IEEE, Martha E. Pollack, and Mary Lou Soffa, Member, IEEE.
ProReP - Protein Results Parser v3.0©
Application of Comprehensive Two-Dimensional Gas Chromatography - Mass Spectrometry to Forensic Science Investigations Glenn S. Frysinger Richard B.
Fast multiresolution image querying CS474/674 – Prof. Bebis.
Chapter 2: Algorithm Discovery and Design
Clustering Vertices of 3D Animated Meshes
1 Using Heuristic Search Techniques to Extract Design Abstractions from Source Code The Genetic and Evolutionary Computation Conference (GECCO'02). Brian.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Associative Pattern Memory (APM) Larry Werth July 14, 2007
Organic Mass Spectrometry
Qualitative Data Analysis. 2 In This Section, We Will Discuss:  How to load data files.  How to use Signal Options for data display.  How to apply.
Raul Garcia-Sanchez Research Investigator: Dr. Paul R. Mahaffy Code 699, NASA Goddard Space Flight Center Research Mentor: Dr. Prabhakar Misra Department.
Efficient Editing of Aged Object Textures By: Olivier Clément Jocelyn Benoit Eric Paquette Multimedia Lab.
1M.Sc.(I.T.), VNSGU, Surat. Structured Analysis Focuses on what system or application is required to do. It does not state how the system should be implement.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Automated Face Detection Peter Brende David Black-Schaffer Veni Bourakov.
CoRoT fields before CoRoT – Processing of Large Photometric Databases Zoltán Csubry Konkoly Observatory Budapest, Hungary Hungarian CoRoT Day Budapest,
Organic Mass Spectrometry
Searching for Brown Dwarf Companions to Nearby Stars Michael W. McElwain, James E. Larkin & Adam J. Burgasser (UC Los Angeles) Background on Brown Dwarfs.
Use of Aerial Videography in Habitat Survey and Computers as Observers Leonard Pearlstine University of Florida.
Laxman Yetukuri T : Modeling of Proteomics Data
1 5 Nov 2002 Risto Pohjonen, Juha-Pekka Tolvanen MetaCase Consulting AUTOMATED PRODUCTION OF FAMILY MEMBERS: LESSONS LEARNED.
ادارة الوثائق الالكترونية Naji Shukri Alzaza University of Palestine April 2010.
Object-Oriented Design Simple Program Design Third Edition A Step-by-Step Approach 11.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Prostate Cancer CAD Michael Feldman, MD, PhD Assistant Professor Pathology University Pennsylvania.
Graphical Enablement In this presentation… –What is graphical enablement? –Introduction to newlook dialogs and tools used to graphical enable System i.
Automated Solar Cavity Detection
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Murielle Florins 1, Francisco Montero Simarro 2, Jean Vanderdonckt 1, Benjamin Michotte 1 1 Université catholique de Louvain 2 Universidad de Castilla-la-Mancha.
Top-K Generation of Integrated Schemas Based on Directed and Weighted Correspondences by Ahmed Radwan, Lucian Popa, Ioana R. Stanoi, Akmal Younis Presented.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
SEESCOASEESCOA SEESCOA Meeting Activities of LUC 9 May 2003.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Big Data Quality Panel Norman Paton University of Manchester.
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
김지형. Introduction precursor peptides are dynamically selected for fragmentation with exclusion to prevent repetitive acquisition of MS/MS spectra.
Accelerating Research in Life Sciences
Face Detection EE368 Final Project Group 14 Ping Hsin Lee
ARTIFICIAL NEURAL NETWORKS
Accelerating Research in Life Sciences
Potter’s Wheel: An Interactive Data Cleaning System
Retention Time Based Peak Clustering in gcxgc
Presentation Title NEMC 2018 Dale Walker, Bruce Quimby Agilent
Proteomics Informatics David Fenyő
Best Practices for Identification and Quantitation
Accelerating Research in Life Sciences
Operation manual of AI SIDA
Presentation transcript:

Smart Templates for Chemical Identification in GCxGC-MS QingPing Tao 1, Stephen E. Reichenbach 2, Mingtian Ni 3, Arvind Visvanathan 2, Michael Kok 2, Luke L. Waltman 1 1 GC Image, LLC, Lincoln, NE 2 Computer Science & Engineering Dept., University of Nebraska-Lincoln 3 Microsoft Corp., Redmond, WA Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

GCxGC Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work GCxGC Objective of GCxGC Analysis Chemical Identification Outline GCxGC Two independent separations for the entire sample GCxGC data can be processed as a digital image Mass spectrometry (MS) data provides rich information for identification Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

GC Chromatogram GCxGC Total Ion Chromatogram Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work GCxGC Objective of GCxGC Analysis Chemical Identification Outline GCxGC Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work GCxGC Objective of GCxGC Analysis Chemical Identification Outline Objective of GCxGC Analysis Separate individual peaks from background Quantify each peak Identify the peaks for chemicals of interest Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work GCxGC Objective of GCxGC Analysis Chemical Identification Outline Chemical Identification with GCxGC Automated approaches –Library search Such as NIST Mass Spectral Library –Rule-based techniques: Such as constrain expressions based on retention times or mass spectrum –Pattern matching: markers, chemical retention pattern. Challenges: –Data inconsistencies: retention time variations –Data and task complexities: multi-dimension, many peaks Solution: combined approach (“Smart templates”) with peak template matching constraint expressions Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work GCxGC Objective of GCxGC Analysis Chemical Identification Outline Peak template matching Smart Templates –Computer Language for Identifying Chemicals (CLIC) –Smart Template Peak template matching with CLIC Building Smart Multi-type Templates –Interactive tools for building smart multi-type templates –Automatically generating CLIC for peaks –Automatically generating peak sets with markers. Conclusion and Future Works Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Chemical Identification Process Fundamental Difficulty Peak Template Matching Identify compounds by matching previously known peaks to unidentified peaks A peak template is a set of annotated peaks with: –Computed features such as peak location (2D retention times) –Assigned information such as compound name A target peak is a set of unannotated peaks that have computed features Peak template matching tries to establish as many correspondences as possible from template peaks to target peaks Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Chemical Identification Process Fundamental Difficulty Chemical Identification Process Construct a peak template: –Created by a chemist through interactive annotation Match template and target peaks Apply the template: assign information from template to target peaks Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Chemical Identification Process Fundamental Difficulty Fundamental Difficulty of Peak Template Matching Peak pattern distortions: –Peaks for the same compound may appear at different retention times in different images Basic solution: –Search all allowable matches from template peaks to target peaks under some retention time windows –Find a transformation that maximize the number of matched peaks Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Challenges: –Searching transformations in a large space is computationally expensive –False identifications may happen when two target peaks are very close together A solution: Smart Template Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Chemical Identification Process Fundamental Difficulty Fundamental Difficulty of Peak Template Matching Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS B1 B2 B3 peak1 B1 B2 B3 peak1 peak2

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Computer Language for Identifying Chemicals (CLIC) Example CLIC Expressions Peak Template Matching with CLIC Smart Templates Objective: –Reduce the number of allowable matchings –Reduce ambiguous matchings –Reduce human intervention Approach: –Add more constraints other than retention time patterns –CLIC Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Computer Language for Identifying Chemicals (CLIC) Example CLIC Expressions Peak Template Matching with CLIC Computer Language for Identifying Chemicals (CLIC) CLIC [Reichenbach et al., 2004] is a language to express constraints on GCxGC retention times, peak characteristics, and mass spectral characteristics: Functions of retention times Functions of peak characteristics Functions of mass spectra Logical and arithmetic operators –Comparative operators:, ≥, =, ≠ –Addition, subtraction, negation (-) and parentheses –Logical operators: and(&), or (|), and negation(!) Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Computer Language for Identifying Chemicals (CLIC) Example CLIC Expressions Peak Template Matching with CLIC Computer Language for Identifying Chemicals (CLIC) Functions characterizing chromatographic properties and mass spectral characteristics are key features of CLIC. –Retention(2) –Retention(2) returns the retention time of the peak on the second column. –Intensity(40) –Intensity(40) returns the intensity value of the indicated channel (m/z = 40) in the mass spectrum of the peak. –Ordinal(40) –Ordinal(40) returns the ordinal position of the indicated channel (m/z = 40 in a mass spectrum) in the intensity-ordered multi- channel array of the peak. –… Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Computer Language for Identifying Chemicals (CLIC) Example CLIC Expressions Peak Template Matching with CLIC Example CLIC Expressions Alkanes Criteria (Welthagen et al. 2003): Base peak 57 or 71 and second largest peak 71 or 57. No time rule is needed for this group, but a retention window of 1.0 – 1.5s can be used. CLIC Expression: –(Ordinal(57) ≤ 2) & (Ordinal(71) ≤ 2) & (Retention(2) ≥ 1.0) & (Retention(2) ≤ 1.5) Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Computer Language for Identifying Chemicals (CLIC) Example CLIC Expressions Peak Template Matching with CLIC Each peak in the template has an optional CLIC expression –First, find a set of possible matches by using the retention time pattern –Then, prune the matching space based on CLIC expressions –Finally, search for the best matching –The combination of chromatographic retention time patterns and CLIC expressions can improve both speed and accuracy of automated chemical identification for GCxGC –We call it a Smart template Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS B1 B2 B3 peak1

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Building Smart Multi-type Templates Smart Template has been implemented in GC Image ® software. Interactive tools for building Multi-type Templates –Peaks, groups, graphics, text objects and chemical structures Interactive tools for building CLIC expressions –A GUI that resembles a calculator for users to create and test CLIC expressions –An interface for specifying a CLIC expression for each template peak Interfaces to match and apply templates Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers Build Multi-type template Construct and edit template with peaks and other structures: Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Build Smart Template – Example with Grob Mix 1) Build a template from the original image Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Build Smart Template – Example with Grob Mix 2) Wrong match because of peak pattern distortions: Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Build Smart Template – Example with Grob Mix 3) Smart template solution: a) Create CLIC expression for Nonanal Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Build Smart Template – Example with Grob Mix b) Specify CLIC expression for Nonanal template peak: Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Build Smart Template – Example with Grob Mix 4) Matching with Smart Template: Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Auto CLIC Automatically generate CLIC expressions for peaks: Match() CLIC function: Match() –NIST matching function returns the similarity between two mass spectra Value range is [0, 1000] Higher is the value, more similar are the two mass spectra Match(“ ”) > threshold CLIC expression, Match(“ ”) > threshold, can be used to identify chemicals. –The mass spectrum of the peak is –Automatically generate the threshold Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Auto CLIC Automatically generate matching threshold: –The similarities between this peak and all other peaks should be less than the threshold –The threshold has the large margins from this peak and all other peaks –For example: peak 1 represents blob B1 B1. Match(“ ”) = 1000 B2. Match(“ ”) = 600 B3. Match(“ ”) = 400 Match(“ ”) > 800 Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS B2 B3 peak1 B B2B3B1

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Auto CLIC 1) Set Auto CLIC function Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart Template Auto CLIC Auto chemical grouping with markers Auto CLIC 2) Generate CLIC Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers Group peaks based on their retention time patterns: –Build natural clusters of peaks –Each cluster is represented by a polygon –A template may consist of a set of groups (polygons) Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS 29 clusters created by single linkage hierarchical clustering algorithm

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers Automatically generate markers for groups (polygons) –Template matching is based on peaks –Each group needs markers A marker is a significant peak in the group Markers are at evenly distributed locations to represent the retention time pattern of the group –Auto marker: Cluster all peaks in a group Pick the largest peak in each cluster as a marker Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers 1) Set Auto Marker function Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers 2) Generate markers Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Build Multi-type template Build Smart template Auto CLIC Auto chemical grouping with markers 2) Generate markers Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Smart Templates Building Smart Multi-type Templates Conclusions and Future Work Smart Template matching can be used to automatically identify chemicals in GCxGC-MS data The combination of retention time patterns and constraints greatly reduces ambiguity and speeds processing Future work: –Enhanced GUI –More CLIC functions –Semi-automated or automated template construction Auto CLIC and Auto Marker are just two first steps Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS

Introduction Peak Template Matching Computer Language for Identifying Chemicals Peak Template Matching with CLIC Conclusions and Future Work Current work is supported by NSF and NIH Questions? Software: Licensing: Q. Tao, S. E. Reichenbach, et al.Multi-type Templates in GCxGC-MS