Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analysis of High-Throughput Screening Data C371 Fall 2004.

Similar presentations


Presentation on theme: "Analysis of High-Throughput Screening Data C371 Fall 2004."— Presentation transcript:

1 Analysis of High-Throughput Screening Data C371 Fall 2004

2 Drug Discovery Process The key steps of drug discovery are: –research - average 2 to 3 years –pre-clinical testing - average 1 year –clinical trial testing (involving human patients) - average 10 years –regulatory approval - average 2 years

3 Drug Discovery Process: Web Sites

4 INTRODUCTION HTS allows hundreds of thousands of compounds to be assayed very quickly HTS data characterized by: –High volume –High level of noise –Diverse nature of the chemical classes involved –Possible presence of multiple binding modes

5 INTRODUCTION Select the most potent compounds to progress to the next stage Problems: –Functional groups that interfere with the assay (e.g., fluoresce) –Functional groups that react with biological systems –Catch these with substructure and “drug- likeness” filters

6 Techniques for Analysis of HTS Data Can’t use multiple linear regression or partial least squares as statistical tests –Data sets are too large Data visualization Data reduction Data mining (if activity data is known)

7 HTS Methodology Procedure: –Measure activity at different concentrations for a subset of compounds –Define IC50 (Inhibitory Concentration 50): the concentration of a material estimated to inhibit the biological endpoint of interest (e.g., cell growth, ATP levels) by 50% –Solid pure sample that tests positively gets structure determined (hits-to-leads phase)

8 DATA VISUALIZATION Need to display simultaneously large data sets with many thousands of molecules and their properties Typical software packages: –Draw various kinds of graphs –Color selected properties –Calculate simple statistics HTS data sets may be divided into subsets to aid navigation

9 SpotFire DecisionSite DecisionSite Examples

10 Features of Data Visualization Often combined with structure searching to find compounds with certain features Unsupervised methods – don’t use activity data Supervised methods – incorporate activity data Use of molecular descriptors

11 Non-Linear Mapping Descriptors: –Physicochemical properties –Fingerprints: a Boolean array with the meaning of each bit not predefined List of patterns is generated for each –Atom, pair of adjacent atoms, bonds connecting them –Each group of atoms joined by longer pathways –Substructural fragments –Known activity against related targets

12 Non-Linear Mapping (cont’d) Non-Linear Mapping takes multidimensional data to a lower space (2- or 3-dimensional) Multidimensional scaling –Generate initial set of coordinates in the low- dimensional space –Modify the coordinates using optimization procedures

13 DATA MINING METHODS Construct models that enable the establishment of relationships between the structures and the observed activity Simple division of structures is desirable: –Active vs. inactive –High, medium, or low activity classes

14 Data Mining Methods: Techniques Substructural analysis: weight each aspect of the structure according to a pre- assigned activity designation act i W i = act i + inact i

15 Data Mining Techniques Discriminant Analysis: aims to separate the molecules into constituent classes –Linear discriminant analysis works with two variables and two activity classes Straight line separates the data into areas where the maximum number of correct activities is found

16 Data Mining Techniques Neural Networks – need a training set of data Once trained, the program predicts values for new molecules Examples: feed-forward network and Kohonen network (self-organizing map) Problem: over-training—gives excellent results on the test data, but poor results on unseen data

17 Data Mining Techniques Decision Trees –Rules associate specific molecular and/or descriptor values with the activity or property of interest –Start with the entire data set and identify the descriptor or variable that gives the best split –Follow the procedure until no more splits are possible or desirable –Some consider multiple splits at each node

18 SUMMARY Much interest and research on HTS analysis New techniques being applied (e.g., support vector machines) Analysis of large diverse data sets needs the most work Results need to feed into subsequent analysis


Download ppt "Analysis of High-Throughput Screening Data C371 Fall 2004."

Similar presentations


Ads by Google