Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
Yue Han and Lei Yu Binghamton University.
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Computational Analysis of the Taxanomical Classification of Short 16S rRNA Sequences Christel Chehoud Mentor: Brian Haas.
Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.
Faculty of Computer Science © 2006 CMPUT 605February 04, 2008 Novel Approaches for Small Bio-molecule Classification and Structural Similarity Search Karakoc.
K nearest neighbor and Rocchio algorithm
Text Classification With Support Vector Machines
A Study on Feature Selection for Toxicity Prediction*
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Feature Selection for Regression Problems
Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?
Cancer classification using Machine Learning Techniques on Microarray Data Yongjin Park 1 and Ming-Chi Tsai 2 1 Department of Biology, Computational Biology.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Margin Based Sample Weighting for Stable Feature Selection Yue Han, Lei Yu State University of New York at Binghamton.
Presented by Zeehasham Rasheed
Active Learning Strategies for Drug Screening 1. Introduction At the intersection of drug discovery and experimental design, active learning algorithms.
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Guidelines on Statistical Analysis and Reporting of DNA Microarray Studies of Clinical Outcome Richard Simon, D.Sc. Chief, Biometric Research Branch National.
What is machine learning? 1. A very trivial machine learning tool K-Nearest-Neighbors (KNN) The predicted class of the query sample depends on the voting.
Data Mining to Aid Beam Angle Selection for IMRT Stuart Price-University of Maryland Bruce Golden- University of Maryland Edward Wasil- American University.
Overview: Humans are unique creatures. Everything we do is slightly different from everyone else. Even though many times these differences are so minute.
Whole Genome Expression Analysis
Understanding miRNA Turnover: A Study of miRNA Half-Life
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
The Broad Institute of MIT and Harvard Classification / Prediction.
Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Multivariate Data Analysis Chapter 5 – Discrimination Analysis and Logistic Regression.
MA4102 – Data Mining and Neural Networks Nathan Ifill University of Leicester Image source: Antti Ajanki, “Example of k-nearest neighbor.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Achieving High Software Reliability Using a Faster, Easier and Cheaper Method NASA OSMA SAS '01 September 5-7, 2001 Taghi M. Khoshgoftaar The Software.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
Characterization of Small Molecule ETS Transcription Factor Binders Nicole M. Martinez Marius S. Pop and Levi A. Garraway Cancer Biology Program.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
December 1, Classification Analysis of HIV RNase H Bioassay Lianyi Han Computational Biology Branch NCBI/NLM/NIH Rocky ‘07.
CS378 Final Project The Netflix Data Set Class Project Ideas and Guidelines.
COT6930 Course Project. Outline Gene Selection Sequence Alignment.
Design of a Compound Screening Collection Gavin Harper Cheminformatics, Stevenage.
Collaborative Filtering via Euclidean Embedding M. Khoshneshin and W. Street Proc. of ACM RecSys, pp , 2010.
Use of Machine Learning in Chemoinformatics
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Discriminating between Drugs and Nondrugs by Prediction of Activity Spectra for Substances (PASS) Soheila Anzali, Gerhard Barnickel, Bertram Cezanne, Michael.
Computational Approach for Combinatorial Library Design Journal club-1 Sushil Kumar Singh IBAB, Bangalore.
Getting Past Diversity in Assessing Virtual Library Designs Bob Clark Tripos, Inc. St. Louis, Missouri USA  2001 Tripos,
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Relating Small Molecule Structure to Small Molecule Performance
Selcia Fragment Library
Evaluating Techniques for Image Classification
Information Retrieval
Dieudo Mulamba November 2017
Local Binary Patterns (LBP)
Movie Recommendation System
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Predicting patterns of biological performance using chemical substructure features Diego Borges-Rivera 08/04/08

Introduction  cheminformatics – allow us to computationally describe similarity  synthetic chemists – describe through visual inspection  we will describe compounds by the presence of chemical substructures  we will attempt to identify sets of substructures that predict biological performance

Previous work  Clemons/Kahne/Wagner et al. -- disaccharide profiling in multiple cell states  found sets of substructures relevant to biological activity patterns  substructures highly specific to disaccharides substructures

Biological performance profile  400 compounds, 8 assays in duplicate  tested for cell proliferation in 8 different cell lines  class labels are active (A) or inactive (I) active compound

What are fingerprints?  compound collection fed into commercial software  each substructure = 1 bit  the fingerprint shows which substructures are present substructure #1725 substructure #886 substructure #7017

Overview of cheminformatic methods  produced fingerprints  7700 total substructures  filtered set  left 2166 substructures

feature (substructure) selection to find predictive subsets evaluate methods for predictive value Overview of computational methods  two steps independent of each other

ReliefF: substructure selection weights Top 5 Bottom 5

K nearest neighbors (knn): predictive accuracy  Examples: k = 2, 5 compound being classified = ?

Similarity between compounds  similarity between two fingerprints  Tanimoto coefficient  this is used twice: (1) in ReliefF (2) in knn Example: Compound a: Compound b: Tanimoto coefficient = 1 / 2 =.5

Cross-validation: predictive accuracy  10 subsets  test set: one of the subsets  training set: the remaining subsets test set training set

Picking parameters for methods  which parameters produce the best predictive accuracies number of neighbors used in ReliefF {1, 2, 4, etc} number of neighbors used in knn {1, 2, 4, etc} number of ReliefF substructures used to predict classes in knn {1, 20, 100, etc}

Picking number of substructures predictive accuracy all number of substructures used to predict

Group of substructures best able to predict

Future work  multi-class  different feature selection

Acknowledgements Computational Chemical Biology Joshua Gilbert Paul Clemons Hyman Carrinski Summer Research Program in Genomics Shawna Young Lucia Vielma Maura Silverstein