A novel interactive tool for multidimensional biological data analysis Zhaowen Luo, Xuliang Jiang Serono Research Institute, Inc.

Slides:



Advertisements
Similar presentations
Leveraging ChemAxon Cheminformatics in an Integrated Drug Discovery and Development Platform Zhenbin Li, Paul Starbard, Jim Gregory, Donald Chen, Paul.
Advertisements

Chemaxon's chemo-informatics toolkit integration into the Affectis Data Management System Database Automated Data Integration - Example: IC50 Data generated.
The use of Ontology in Organising and Managing Protein Family Resources Katy Wolstencroft, University Of Manchester.
With TimeCard appointments are tagged with information that converts them into time sheets. This way you can report time and expenses from inside your.
Once you complete the Target Tests you will be returned to the PADDS main menu screen. Usually the next step is to Generate reports. Begin by clicking.
Analysis of High-Throughput Screening Data C371 Fall 2004.
Networked Digital Whiteboard with Handwritten-Symbol Interpreter and Dynamic-Display-Object Creator Atsuhide Kobashi Henry M. Gunn High School Palo Alto,
With TimeCard SharePoint events are tagged with information that converts them into time sheets. This way users can report time and expenses from their.
SharePoint 2010 Business Intelligence Module 11: Performance Point.
Microsoft Excel 2013 An Overview. Environment Quick Access Toolbar Customizable toolbar for one-click shortcuts Tabs Backstage View Tools located outside.
Copyright © 2010 SAS Institute Inc. All rights reserved. A Quick Introduction to JMP Dara Hammond JMP Account Rep.
With TimeCard appointments are tagged with information that converts them into time sheets. This way users can report time and expenses from their Outlook.
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Chapter 5 Creating, Sorting, and Querying a Table
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Organizing Data & Information
Chapter 15 Data Warehousing, OLAP, and Data Mining
IELM 511: Information System design
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
CS2032 DATA WAREHOUSING AND DATA MINING
1 A Student Guide to Object- Orientated Development Chapter 9 Design.
Systems Analysis I Data Flow Diagrams
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Phylogeny in Drug Discovery?
XP Information Information is everywhere in an organization Employees must be able to obtain and analyze the many different levels, formats, and granularities.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Application Software.
Classroom User Training June 29, 2005 Presented by:
Taking Raw Data Towards Analysis 1 iCSC2015, Vince Croft, NIKHEF Exploring EDA, Clustering and Data Preprocessing Lecture 2 Taking Raw Data Towards Analysis.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Lab 4 Using Solver, Linking Workbooks,
© 2002 ComputerPREP, Inc. All rights reserved. Excel 2000: Database Management and Analysis.
Excel-Based Solutions For Large Data Systems by Douglas M. Smith / Abundant Solutions Data can be extracted from large data systems (mainframe, AS/400,
Fundamentals of Information Systems, Fifth Edition
Siemens Power Generation, Instrumentation &Controls
2005 SPRING CSMUIntroduction to Information Management1 Organizing Data John Sum Institute of Technology Management National Chung Hsing University.
Introduction With TimeCard users can tag SharePoint events with information that converts them into time sheets. This way they can report.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
Chapter 10 Fireworks: Part II The Web Warrior Guide to Web Design Technologies.
Succeeding with Technology Database Systems Basic Data Management Concepts Organizing Data in a Database Database Management Systems Using Database Systems.
© ABB Inc. - USETI All Rights Reserved 10/17/2015 Insert image here An Economic Analysis Development Framework for Distributed Resources Aaron F. Snyder.
New approaches to elucidating Structure Activity Relationships Chris Petersen Technical Manager, Informatics.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Copyright © 2005, Pearson Education, Inc. Slides from resources for: Designing the User Interface 4th Edition by Ben Shneiderman & Catherine Plaisant Slides.
Software Architecture to Support The Compound Discovery Process Mike Richards Senomyx, Inc. MUG 2003.
Model View Controller A Pattern that Many People Think They Understand, But Has A Couple Meanings.
XP. Objectives Sort data and filter data Summarize an Excel table Insert subtotals into a range of data Outline buttons to show or hide details Create.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
T HE S CREENERS WERE CREATED BY MAN. T HEY EVOLVED.
© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
PubChem: An Open Repository for Chemical Structure and Biological Activity Information Steve Bryant The NIH Biowulf Cluster: 10 Years of Scientific Supercomputing.
CIAF Summary Report 2012/13 TPM Software within Good Spirit School Division.
Use of Machine Learning in Chemoinformatics
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
Improving compound–protein interaction prediction by building up highly credible negative samples Toward more realistic drug-target interaction predictions.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Business System Development
Relating Small Molecule Structure to Small Molecule Performance
DataLyzer® Spectrum SPC Wizard.
Welcome to E-Prime E-Prime refers to the Experimenter’s Prime (best) development studio for the creation of computerized behavioral research. E-Prime is.
Accelerating Research in Life Sciences
S.JOSEPHINE THERESA, DEPT OF CS, SJC, TRICHY-2
Cancer Cell Line Encyclopedia
Presentation transcript:

A novel interactive tool for multidimensional biological data analysis Zhaowen Luo, Xuliang Jiang Serono Research Institute, Inc.

2 1. Introduction 2. Methods 3. Applications and Examples  H2L decision making  Multiple kinases inhibitors analysis Outline

3 1. Introduction

4 Data representation in drug discover 1. Two dimensional view: Chemistry vs. Biology  Chemistry – different compound structures  Biology - different assay data (potency, selectivity profiles, ADEM/PK and toxicity). 2. A heat map is a graphical representation of data where the values taken by a variable in a two- dimensional map are represented as colors. (WikiPedia). Heat map meets the need of data representation in drug discovery and could be a good decision support tool.

5 My first heat map Nice picture Some interesting patterns But what is that??? A 200 X 200 Map

6 Heat map is not enough 1. Lack of interactivity:  Difficult to retrieve information  Unable to display related information, such as structure 2. Static  Unable to manipulate data  Unable to do real-time analysis Solution: Fully interactive heat map application – only application can satisfy all need for decision support.

7 2. Methods

8 An interactive heat map application

9 Methods

10 Features Point to any point in heat map, a tooltip box will show structure as well as assay result. More details about the point shows here Double click on any points will bring user to the source of original data Draw a box in heat map will create a focus heat map for the area of interesting.

11 Example of focus map Assay Name Compound ID

12 More operations  Color spectrum can be changed.  Map Orientation can be changed.  Data analysis tools 1. Data points can be re-arranged based on analysis results. 2. Analysis results can be exported.

13 Normalize biological endpoints  Problem: Compare Orange with Apple  Solution: Use relative scale: MIN-MAX method  Define good and bad end for each endpoint.  Normalize result based both ends  For different kinds of assays, we define deferent methods to normalize result.  User can customize their own normalization methods.

14 Normalization examples GoodBad  Potency – Enzyme Assay - logIC 50  Good: -8  Bad: -5  Potency – Cell Based Assay – logIC 50  Good: -7  Bad: -4.5  Cytochrom C P450 Inhibition – logIC 50  Good: -4.5  Bad: -7  Rat T 1/2  Good: > 2 hours  Bad: < 0.25 hour

15 Normalized results Raw Data (different units) Normalized data (No unit) Distance Matrix For Compounds Distance Matrix For Assays Assays as descriptors Compounds as descriptors

16 Data analysis Distance Matrix SortingClusteringSimilarity analysis Analysis can be done for compounds and assays Based on biological assays results Results can exported to Excel file for further analysis

17 Structural analysis 1. Clustering and sorting compounds by their structural similarity.  Using fingerprint to calculate the similarity between compounds. 2. Provides structural-activity representation and analysis.

18 Business consideration 1. Hide information  Use generic name for compounds and assays  For example, compounds use prefix and sequence number.  Use generic structure, such as Benzene, to hide real structure.  Look-up table for symbol replacement 2. Offline (offsite) capability  Export and import heat map to binary file  Re-import map offline without connecting to corporate database.

19 Application development 1. JAVA™ JDK 1.5 (from Sun Microsystems) 2. ChimePro™ for JAVA from MDL 3. CDK 4. JDBC 1.4 from Oracle Features: 1. Direct extract structural and assays information from Accord Enterprise database, MDL ISIS/Host database. 2. Web deployed (Java Web Start)

20 3. Applications and Examples

21 H2L Project data analysis and decision making Heat map details: compounds from a list of Accord Enterprise assays in four assays group 1. Potency 2. CYP450 inhibition 3. In vitro ADME 4. In vivo PK

22 Heat map

23 Heat map – after sorting by biological profile

24 Focus on most active area (top area) All top compounds are in clinical or lead candidates

25 Summary for H2L data analysis  Bring together structural, as well as many biological assays for a discovery project.  Multiple dimensional data analysis  Most activity compounds is not the top compounds in overall profile score.  Heat map can pick up the drug candidates Problems:  Missing data point: lots of compounds do not have in vivo data.  Clustering analysis is not accurate in this case.

26 Kinase activity and selective analysis Heat map details: positives in multiple kinases screen Kinase assays  4 Kinase family  AGC  OTHER  STE  TK

27 Sorted by Active Profile Multiple-kinase inhibitors are ranked in top.

28 Cluster by structural similarity 1.Compounds are colored by clusters 2.Cluster 1: AGC-2, Other_3, and TK_10 inhibitors 3.Cluster 2: Other_3 inhibitors 4.Pan-inhibitors Structural similar compounds (in same structural cluster) have similar kinase inhibitory behaviors.

29 Cluster by overall kinase inhibitory profile Multiple inhibitor cluster Three singlet exhibit different inhibitory pattern AGC_1 inhibitor cluster AGC_1,AGC_2 and TK_10 inhibitors

30 Cluster assays based on overall compounds profile The clusters of assays based on compounds profile is not same as phylogeny tree. Identify kinases with possible cross-interaction.

31 Summary of kinase inhibitors heat map 1. Identify pan-inhibitors. 2. Graphics structural-activity relationship 3. Identify kinase inhibitors activity patter for selectivity analysis. 4. Clustering kinase based on compounds profile and identify possible cross-interaction group.

32 Conclusion 1. Provide a interactive graphics tool for decision making in drug discovery process.  Direct get data from corporate database  Interactive  Information-rich: structural and biological assay in one place  One-stop shop for information analysis of drug discovery 2. Statistic analysis based on result code provides powerful tool in decision making  Based on overall biological profile  Can pick winner in H2L process  Provide useful SAR analysis for compounds  Provide selectivity profiles for biological targets.

33 Acknowledge Ben Askew Steve Arkinstall Brian Healey