MayaChemTools: An open source package for computational discovery Manish Sud COMP Poster #306, 243rd ACS National Meeting & Exposition, March 25-29 2012,

Slides:



Advertisements
Similar presentations
Scientific & technical presentation Fragmenter Nóra Máté Sept 2005.
Advertisements

Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
Analysis of High-Throughput Screening Data C371 Fall 2004.
1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)
Hydrogen bonds in Rosetta: a phenomonological study Jack Snoeyink Dept. of Computer Science UNC Chapel Hill.
…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set.
Cheminformatics II Apr 2010 Postgrad course on Comp Chem Noel M. O’Boyle.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 14 Web Database Programming Using PHP.
1 On Updating Torsion Angles of Molecular Conformations Vicky Choi Department of Computer Science Virginia Tech (with Xiaoyan Yu, Wenjie Zheng)
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Vladimir V. Ufimtsev Adviser: Dr. V. Rykov A Mathematical Theory of Communication C.E. Shannon Main result: Entropy function - average value of information.
Bioinformatics IV Quantitative Structure-Activity Relationships (QSAR) and Comparative Molecular Field Analysis (CoMFA) Martin Ott.
Distance Measures Tan et al. From Chapter 2.
Periodicity of Atomic Properties Elements in the same group have the same number of valence electrons and related electron configurations; hence have similar.
Comparing protein structure and sequence similarities Sumi Singh Sp 2015.
1 Chemical Structure Representation and Search Systems Lecture 6. Nov 18, 2003 John Barnard Barnard Chemical Information Ltd Chemical Informatics Software.
Using 3D-SURFER. Before you start 3D-Surfer can be accessed at For visualization.
Pharmacophore and FTrees
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Module 2: Structure Based Ph4 Design
Molecular Descriptors
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
1. Ionic Compounds They are formed by the transfer of one or more valence electrons from one atom to another Electropositive atoms: give up electrons.
CHE 311 Organic Chemistry I Dr. Jerome K. Williams, Ph.D. Saint Leo University.
Similarity Methods C371 Fall 2004.
Chapter 121 Chemical Bonding Chapter 12. 2Introduction The properties of many materials can be understood in terms of their microscopic properties. Microscopic.
Introduction to Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
ATOMS AND MOLECULES THE CHEMICAL BASIS OF LIFE. ATOMS AND MOLECULES Elements are not changed in normal chemical reactions Each element has a unique chemical.
4 1 Array and Hash Variables CGI/Perl Programming By Diane Zak.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Digital Image Processing CCS331 Relationships of Pixel 1.
In silico discovery of inhibitors using structure-based approaches Jasmita Gill Structural and Computational Biology Group, ICGEB, New Delhi Nov 2005.
Ligand-based drug discovery No a priori knowledge of the receptor What information can we get from a few active compounds.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Alessandro Pedretti MetaPies, an annotated database for metabolism analysis and prediction: results and future perspectives L’Aquila November 21, 2011.
Chapter 2: Getting to Know Your Data
PROTEIN STRUCTURE SIMILARITY CALCULATION AND VISUALIZATION CMPS 561-FALL 2014 SUMI SINGH SXS5729.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Ch. 8 Covalent Bonding 8.1 Molecular Compounds. I. Molecules A. Neutral groups of atoms joined by covalent bonds B. Covalent bonds: atoms share electrons.
GE3M25: Computer Programming for Biologists Python, Class 5
Introduction to Chemoinformatics and Drug Discovery Irene Kouskoumvekaki Associate Professor February 15 th, 2013.
Chapter 9- Covalent Bonds Agenda- Lab - Review - Quiz – Review –Chapter 8 / 9 Test – Chapter 8/9.
Use of Machine Learning in Chemoinformatics
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 14 Web Database Programming Using PHP.
4. Molecular Similarity. 2 Similarity and Searching Historical Progression Similarity Measures Fingerprint Construction “Pathological” Cases MinMax- Counts.
Part 2. Physicochemical Properties 1.Rules ( 양혜란 ) 2.Liphophilicity ( 백아름 ) 3.pKa ( 박숙진 ) 4.Solubility ( 전종수, 최영재 ) 5.Permeability ( 김소연, 강경태 )
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
CARBON AND THE MOLECULAR DIVERSITY OF LIFE Chapter 4 I. The Importance of Carbon.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
Structure and Properties of Organic Molecules
Lecture 27 Molecular orbital theory III
Bonding Chapter 8.
March 21, 2008 Christopher Bruns
Lecture 2-2 Data Exploration: Understanding Data
The heroic times of crystallography
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Building Hypotheses and Searching Databases
Daylight and Discovery
Virtual Screening.
CZ3253: Computer Aided Drug design Lecture 4: Structural modeling of chemical molecules Prof. Chen Yu Zong Tel:
Web DB Programming: PHP
Volume 13, Issue 4, Pages (February 2004)
Chapter 8 Covalent Bonding.
Structural differences between mesophilic, moderately thermophilic and extremely thermophilic protein subunits: results of a comprehensive survey  András.
Patrick: An Introduction to Medicinal Chemistry 6e
Andrey V Kajava, Gilbert Vassart, Shoshana J Wodak  Structure 
Presentation transcript:

MayaChemTools: An open source package for computational discovery Manish Sud COMP Poster #306, 243rd ACS National Meeting & Exposition, March , San Diego, CA

Introduction A growing collection of Perl scripts, modules and classes to support day-to-day computational drug discovery needs Freely available under the terms of the LGPL license at

Introduction Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, PDB and fingerprints files Properties of periodic table elements, amino acids and nucleic acids Calculation of physicochemical properties such as hydrogen bond donors and acceptors, SLogP and topological polar surface area Generation of fingerprints corresponding to atom neighborhoods, atom types, E-state indicies, extended connectivity, MACCS keys, path lengths, topological atom pairs/triplets/torsions and topological pharmacophore atom pairs/triplets Similarity searching and calculation of similarity matrices An extensive set of modules and classes available for custom development

Software architecture bin lib Out of the box scripts Classes Data files Custom scripts Modules & Packages Third party: Jmol lib/data, lib/Jmol

Physicochemical properties profiling NameDescription Molecular WeightSum of atomic weights Heavy AtomsNumber of non-hydrogen atoms Rings, Aromatic Rings Number of rings and aromatic rings (aromaticity detection using Hϋckel’s rule) Rotatable bonds Number of non-ring single bonds involving only non-hydrogen atoms with the option to exclude: terminal bonds; attached to triple bonds; amide, thioamide and sulfonamide bonds van der Waals Molecular Volume Sum of atomic volumes corresponding to van der Waals atomic radii with adjustments for number of bonds, aromatic and non-aromatic rings Hydrogen Bond Donors & Acceptors Type1 - Donor: Any N and O with implicit/explicit H; Acceptor: Any N without implicit/explicit H and any O Type2 - Donor: Any N and O with implicit/explicit H; Acceptor: Any N and O LogP & Molar Refractivity (SLogP & SMR) Sum of atomic contributions from pre-defined atom types corresponding to specific structure fragments Topological Polar Surface Area (TPSA) Sum of atomic contributions from pre-defined N and O atom types corresponding to specific structure fragments with option to include P and N atom types Fraction of SP3 Carbons (FSP3Carbons ) Number of SP3Carbons divided by the total number of carbons Molecular Complexity Number of bits-set or unique keys in 2D fingerprints. Supported fingerprints: atom types, extended connectivity, MACCS keys, path lengths, topological atom pairs/triplets/torsions and topological pharmacophore atom pairs/triplets

SD files Calculate Physicochemical Properties.pl Analyze data & generate plots Physicochemical properties profiling

Distribution of physicochemical properties for a subset (7447) of NCGC pharmaceutical collection data set Scripts used: FilterSDFiles.pl, ExtractFromSDFiles.pl, ExtractFromTextFiles.pl, CalculatePhysicochemicalProperties.pl, Rscript; Data set URL: tripod.nih.gov/npc

Distribution of physicochemical properties for a subset (7447) of NCGC pharmaceutical collection data set Scripts used: FilterSDFiles.pl, ExtractFromSDFiles.pl, ExtractFromTextFiles.pl, CalculatePhysicochemicalProperties.pl, Rscript; Data set URL: tripod.nih.gov/npc Physicochemical properties profiling

2D Fingerprints Type Values Type Key Default Parameters/Description Atom Neighborhoo ds Vector Values: Alphanumerical vector; MinNeighborhoodRadius: 0; MaxNeighborhoodRadius: 2; AtomIdentifierType: AtomicInvariants (AS, X, BO, H,F C) Atom Types Bit-vector or vector Values: Numerical vector; AtomIdentifierType: AtomicInvariants (AS, X, BO, H, FC) E-state Indicies VectorValues: Numerical vector; EStatAtomTypesSetSize: Arbitrary Extended Connectivity Bit-vector or vector Values: Alphanumerical vector; NeighborhoodRadius: 2; AtomIdentifierType: AtomicInvariants (AS, X, BO, H, FC, MN) MACCS Keys Bit-vector or vector Values: Bit-vector; Size: 166; Available sizes: 166 and 322; Keys count available Path Lengths Bit-vector or vector Values: Bit-vector; Size: 1024; AtomIdentifierType: AtomicInvariants (AS); MinPathLength: 1; MaxPathLength: 8; Paths count available … … … Atom identifier atom types: Atomic invariants, Functional class, DREIDING, EState, MMFF94, SLogP, SYBYL, TPSA and UFF Atomic invariants: AS(Atom symbol), X(Num of heavy atom neighbors), BO(Sum of bond orders to heavy atoms), LBO(Largest bond order to heavy atoms), SB(Num of single bonds to heavy atoms), DB(Num of double bonds to heavy atoms), TB(Num of Triple bonds to heavy atoms), H(Num of implicit and explicit hydrogens), Ar (Aromatic), RA(Ring atom), FC(Formal charge), MN(Mass number), SM(Spin multiplicity) Functional class: HBD(Hydrogen bond donor), HBA(Hydrogen bond acceptor), PI(Positively ionizable), NI(Negatively ionizable), Ar(Aromatic), Hal(Halogen), H(Hydrophobic), RA(RingAtom), CA(ChainAtom)

2D Fingerprints Type Values Type Key Default Parameters/Description … … … Topological Atom Pairs Vector Values: Numerical vector; AtomIdentifierType: AtomicInvariants (AS, X, BO, H, FC); MinDistance: 1; MaxDistance: 10 Topological Atom Triplets Vector Values: Numerical vector; AtomIdentifierType: AtomicInvariants (AS,X,BO,H,FC); MinDistance: 1; MaxDistance: 10; TriangleInequality: No Topological Atom Torsions VectorValues: Numerical vector; AtomIdentifierType: AtomicInvariants (AS, X, BO, H, FC) Topological Pharmacoph ore Atom Pairs Vector Values: Numerical vector; AtomTypes: HBD, HBA, PI, NI, H; MinDistance: 1; MaxDistance: 10; AtomTypesWeight: None; Normalization: None; FuzzifyAtomPairsCount: No Topological Pharmacoph ore Atom Triplets Vector Values: Numerical vector; AtomTypes: HBD, HBA, PI, NI, H, Ar; MinDistance: 1; MaxDistance: 10; DistanceBinSize: 2; TriangleInequality: Yes Atom identifier atom types: Atomic invariants, Functional class, DREIDING, EState, MMFF94, SLogP, SYBYL, TPSA and UFF Atomic invariants: AS(Atom symbol), X(Num of heavy atom neighbors), BO(Sum of bond orders to heavy atoms), LBO(Largest bond order to heavy atoms), SB(Num of single bonds to heavy atoms), DB(Num of double bonds to heavy atoms), TB(Num of Triple bonds to heavy atoms), H(Num of implicit and explicit hydrogens), Ar (Aromatic), RA(Ring atom), FC(Formal charge), MN(Mass number), SM(Spin multiplicity) Functional class: HBD(Hydrogen bond donor), HBA(Hydrogen bond acceptor), PI(Positively ionizable), NI(Negatively ionizable), Ar(Aromatic), Hal(Halogen), H(Hydrophobic), RA(RingAtom), CA(ChainAtom)

SD files Generate fingerprints 2D fingerprints SD, FP, CSV/TSV MACCSKeysFingerprints.pl, ExtendedConnectivityFingerprints.pl, PathLengthFingerprints.pl, TopologicalPharmacophoreAtomPairs.pl, … … … 2D Fingerprints

Fingerprints comparisons Fingerprints bit-vectors: Name Formula Baroni Urbani & Buser(SQRT(Nc*Nd) + Nc)/(SQRT(Nc*Nd) + Nc + (Na –Nc) + (Nb -Nc)) Cosine & OchiaiNc/SQRT(Na*Nb) Dice2*Nc/(Na + Nb) Dennis(Nc*Nd -((Na - Nc)*(Nb - Nc)))/SQRT(Nt*Na*Nb) ForbesNt*Nc/Na*Nb Fossum(Nt*((Nc – 0.5)**2)/(Na*Nb) Hamann((Nc + Nd) - (Na - Nc) - (Nb - Nc))/Nt Jaccard & TanimotoNc/((Na - Nc) + (Nb –Nc) + Nc)) = Nc/(Na + Nb - Nc) Kulczynski 1: Nc/(Na + Nb -2Nc) 2: 0.5*(Nc/Na + Nc/Nb) Na = Num of bits set to "1" in A Nb = Num of bits set to "1" in B Nc = Num of bits set to "1" in both A and B Nd = Num of bits set to "0" in both A and B Nt = Num of bits set to "1" or "0" in A and B Nt = Na + Nb - Nc + Nd Na -Nc = Num of bits set to “1” in A not in B Nb - Nc = Num of bits set to “1” in B not in A

NameFormula Matching(Nc + Nd)/Nt McConnaughey(Nc**2 - (Na - Nc)*(Nb - Nc))/(Na*Nb) Pearson ((Nc*Nd) - (( Na - Nc)*(Nb - Nc))/SQRT(Na*Nb*(Na – Nc + Nd)*(Nb – Nc + Nd)) Rogers Tanimoto(Nc + Nd)/(Na + Nb - 2Nc + Nt) Russell RaoNc/Nt SimpsonNc/MIN(Na, Nb) Skoal Sneath 1: Nc/(2*Na + 2*Nb -3*Nc) 2: (2*Nc + 2*Nd)/(Nc + Nd +Nt) 3: (Nc + Nd)/(Na + Nb -2*Nc) TverskyNc/(alpha*(Na - Nb ) + Nb) Yule ((Nc*Nd) - ((Na - Nc)*(Nb - Nc)))/((Nc*Nd) + ((Na -Nc)*(Nb - Nc))) Fingerprints comparisons Fingerprints bit-vectors: Na = Num of bits set to "1" in A Nb = Num of bits set to "1" in B Nc = Num of bits set to "1" in both A and B Nd = Num of bits set to "0" in both A and B Nt = Num of bits set to "1" or "0" in A and B Nt = Na + Nb - Nc + Nd Na -Nc = Num of bits set to “1” in A not in B Nb - Nc = Num of bits set to “1” in B not in A

NameAlbgebric FormBinary Form City Block, Hamming & Manhattan Distance SUM(ABS (Xai –Xbi))Na + Nb – 2*Nc Cosine & Ochiai Similarity SUM(Xai*Xbi) / SQRT(SUM (Xai**2) * SUM( Xbi**2)) Nc/SQRT(Na*Nb) Czekanowski, Dice & Sorenson Similarity (2*(SUM (Xai*Xbi))) / (SUM (Xai**2) + SUM (Xbi**2)) 2*Nc/(Na + Nb) Euclidean DistanceSQRT(SUM((Xai – Xbi )**2))SQRT(Na + Nb – 2*Nc) Jaccard & Tanimoto Similarity SUM(Xai *Xbi) / (SUM (Xai**2) + SUM (Xbi**2) – SUM (Xai*Xbi)) Nc/(Na + Nb –Nc) Soergel DistanceSUM(ABS(Xai - Xbi)) / SUM(MAX(Xai, Xbi )) (Na + Nb – 2*Nc)/(Na + Nb - Nc) Fingerprints comparisons Fingerprints vectors containing ordered numerical, numerical or alphanumerical values: Na = Num of bits set to "1" in A = SUM(Xai) Nb = Num of bits set to "1" in B = SUM(Xbi) Nc = Num of bits set to "1" in both A and B = SUM(Xai*Xbi) Nd = Num of bits set to "0" in both A and B = SUM(1 - Xai - Xbi + Xai*Xbi) Xa = Values of vector A Xai= Value of ith element in A Xb = Values of vector B Xbi = Value of ith element in B SetIntersectionXaXb = SUM(MIN(Xai, Xbi)) SetDifferenceXaXb = SUM(Xa)+ SUM(Xb) - SUM(MIN(Xai, Xbi)) N = Num of values SUM = Sum over values

NameSet Theoretic Form City Block, Hamming & Manhattan Distance SUM(Xai) + SUM (Xbi) - 2*(SUM(MIN(Xai, Xbi ))) Cosine & Ochiai SimilaritySUM(MIN(Xai, Xbi )) / SQRT(SUM(Xai ) * SUM(Xbi)) Czekanowski, Dice & Sorenson Similarity 2*(SUM(MIN (Xai, Xbi ))) / (SU (Xai ) + SUM (Xbi)) Euclidean DistanceSQRT(SUM (Xai) + SUM (Xbi) – 2*(SUM(MIN(Xai, Xbi) ))) Jaccard & Tanimoto Similarity SUM(MIN(Xai, Xbi)) / (SUM(Xai) + SUM (Xbi) – SUM(MIN(Xai, Xbi))) Soergel Distance (SUM(Xai) + SUM(Xbi) - 2*(SUM(MIN( Xai, Xbi )))) / (SUM(Xai) + SUM(Xbi) - SUM(MIN(Xai, Xbi ))) Fingerprints comparisons Fingerprints vectors containing ordered numerical, numerical or alphanumerical values: Na = Num of bits set to "1" in A = SUM(Xai) Nb = Num of bits set to "1" in B = SUM(Xbi) Xa = Values of vector A Xai= Value of ith element in A Xb = Values of vector B Xbi = Value of ith element in B SetIntersectionXaXb = SUM(MIN(Xai, Xbi)) SetDifferenceXaXb = SUM(Xa)+ SUM(Xb) - SUM(MIN(Xai, Xbi)) N = Num of values SUM = Sum over values

Similarity matrices Similarity Matrices Fingerprints.pl Similarity matrix: full, upper or lower Fingerprints SD, FP, CSV/TSV CSV/TSV

Similarity matrices Scripts used: ExtendedConnectivityFingerprints.pl, SimilarityMatricesFingerprints.pl, TextFilesToHTML.pl

Similarity searching Similarity Searching Fingerprints.pl Neighbors of reference compounds Reference fingerprints Database fingerprints SD, FP, CSV/TSV

Similarity searching Scripts used: PathLengthFingerprints.pl, SimilaritySearchingFingerprints.pl, SDFilesToHTML.pl

File data info, manipulation & analysis SD Analyze, Extract, Filter, Info, Join, Merge, Modify, ToHTML, ToMOL, Sort, Split SD, CSV/TSV text or HTML Input filesOutput filesOperations

File data info, manipulation & analysis CSV/TSV text Analyze, Extract, Info, Join, Merge, Modify, Sort, Split, ToHTML, ToSD CSV/TSV text, or HTML Input filesOutput filesOperations

File data info, manipulation & analysis Sequence & alignment Analyze, Extract, Info Sequence & alignment Input filesOutput filesOperations

File data info, manipulation & analysis PDBExtract, Info, ModifyPDB Input filesOutput filesOperations

Data retrieval from databases DBSQLToTextFiles.pl DBSchemaTablesToTextFiles.pl DBTablesToTextFiles.pl CSV/TSV text files Perl DBI

Information for periodic table elements InfoPeriodicTableElements.pl Atomic number: 6 Element symbol: C Element name: Carbon Atomic weight: … … … Input: Name, symbol, number, group name/number, group label, period number

Information for amino acids InfoAminoAcids.pl Three letter code: Glu One letter code: E Name: Glutamic acid Molecular weight: … Input: One letter code, three letter code, Name

Information for nucleic acids InfoNucleicAcids.pl Code: Ado Other codes: A Name: Adenosine Type: Nucleoside Molecular weight: … … Input: Code, Name, Type

Your feedback is welcome:

The End