Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study Colin Batchelor, Ken Karapetyan, Valery Tkachenko,

Slides:



Advertisements
Similar presentations
Chemical named entity recognition and literature mark-up Colin Batchelor Informatics Department Royal Society of Chemistry
Advertisements

Solutions for Cheminformatics
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Warmup: Think back to the structure of the atom and draw a Carbon atom
Supporting Engagement in Open Access: a Publishers Perspective
Organic Chemistry IB.
Organic Chemistry.
Connecting to Open PHACTS API via Python/Pipeline Pilot
Semantic Web Introduction
INTRODUCTION TO THE BEILSTEIN AND GMELIN DATABASES Margarete Bower Chemistry Library.
Chapter 25 Hydrocarbons.
Chapter 22.  Gasoline, diesel fuel, and kerosene are examples of liquid fuels. A solid fuel, coal, produced the steam for the locomotives that pulled.
+ Lewis Dot Structures Wednesday, November 5 th C.3.1 Describe, compare, and contrast the characteristics of the interactions between atoms in ionic and.
Ionic and Metallic Bonding Chapter 7. WHAT IS AN ION? An atom or groups of atoms that has a positive or negative charge.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
Organic Chemistry Organic Chemistry (10 lectures) Book:
Atomic and Molecular Orbitals l The horizontal rows of the periodic table are called Periods. l Each period represents a different quantum energy level.
UNIT 3 – ORGANIC CHEMISTRY. OBJECTIVES What does Organic mean? Is “organic” always good? (or better?)
Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC.
The Open PHACTS Discovery Platform Open PHACTS for Academia.
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
Chapter 11: Chemical Bonding Chemistry 1020: Interpretive chemistry Andy Aspaas, Instructor.
2008, Prentice Hall Chemistry: A Molecular Approach, 1 st Ed. Nivaldo Tro Roy Kennedy Massachusetts Bay Community College Wellesley Hills, MA.
Chapter 121 Chemical Bonding Chapter 12. 2Introduction The properties of many materials can be understood in terms of their microscopic properties. Microscopic.
Wednesday, March 5 th : “A” Day Thursday, March 6 th : “B” Day Agenda  Collect labs: “Polymers and Toy Balls”  Organic chemistry overview  Movie:
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
Organic Chemistry Hydrocarbons Organic Chemistry The study of the compounds that contain the element carbon Are numerous due to the bonding capability.
The Chemical Bond. Chemical Bonds  Are the forces that hold atoms together to form compounds  Bond energy – the amount of energy needed to break a bond.
Chemical Bonding Chemical bonding occurs as electrons are rearranged creating a substance with new physical and chemical properties – The changes cannot.
Carbon and the Molecular Diversity of Life.   The Properties of Carbon that make it so important You Must Know.
CHEMISTRY 2500 Topic #1: Functional Groups and Drawing Organic Molecules Fall 2014 Dr. Susan Findlay.
Chapter 11: Chemical Bonding Chemistry 1020: Interpretive chemistry Andy Aspaas, Instructor.
Structure of chemical compounds Bonds and isomery Richard Vytášek 2008 Presentation is only for internal purposes of 2nd Medical faculty.
1 Cheminformatics David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Organic Chemistry for Nursing students Chapter 1 Introduction into organic chemistry Bonding and isomerism 1.
Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform.
Bonding.
Stereochemistry of organic compounds-i. Stereochemistry Stereochemistry, a subdiscipline of chemistry, involves the study of the relative spatial arrangement.
Electron Dot Formulas Chemistry 7(C). Lesson Objectives Draw electron dot formulas – Ionic compounds – Covalent compounds Electron Dot Formulas.
Chapter 8: Lewis Structures and the Octet Rule AP Chemistry
Hybridization Combination of atomic orbitals to form hybrid orbitals OR Hybridization is also way to explain molecular shapes that can't be explained easily.
Chemical Bonding Recall: Electron dot diagrams are for predicting bonding Dots are written on 4 sides of the symbol a. One dot - single electron b. Two.
Welcome to Thermochemistry!. Energy in Chemistry Energy in Chemistry (11:23)  Energy is the ability to do work or produce heat. The sum of the potential.
4. Stereochemistry of Alkanes and Cycloalkanes. 2 The Shapes of Molecules The three-dimensional shapes of molecules result from many forces There is free.
Organic Chemistry Review Part II. Organic Chemistry: Carbon Atom 1. Structural Classifications 2. Atomic Theory 3. Dipoles & Resonance 4. Isomers 5. Functional.
Lewis Dot Structures and Molecular Geometries
Implementing chemistry platform for OpenPHACTS: Lessons learned
Building linked-data, large-scale chemistry platform: challenges, lessons and solutions Valery Tkachenko, Alexey Pshenichnov, Aileen Day, Colin Batchelor,
© 2017 Pearson Education, Inc.
Dealing with the complex challenge of managing diverse chemistry data online Antony Williams, Valery Tkachenko, Alexey Pshenichnov and Ken Karapetyan.
Entity-Relationship Model
Open PHACTS 1.3 Release ( triples)
Chapter 8 Basic Concepts of Chemical Bonding
Modern Systems Analysis and Design Third Edition
Warmup - work with your table partner
Basic Chemistry and Water
The Re3gistry software and the INSPIRE Registry
Chapter 10 Chemical Bonding II
Organic Compounds.
Lewis structures Page 52 in notebook
Chapter 10 Chemical Bonding II
Topics 10 & 20 Organic Chemistry
Chapter 6 Objectives Define chemical bond.
Bonding theories.
Chapter 1B Carbon Compounds and Chemical Bonds
3. Organic Compounds: Alkanes and Their Stereochemistry
Alkenes & Alkynes.
3. Organic Compounds: Alkanes and Their Stereochemistry
3. Organic Compounds: Alkanes and Their Stereochemistry
Stereochemistry of Alkanes and Cycloalkanes
Presentation transcript:

Validation and Standardization of Molecular Structures in General and Sugars in Particular: a Case Study Colin Batchelor, Ken Karapetyan, Valery Tkachenko, Antony Williams 6th Joint Sheffield Conference on Chemoinformatics

Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

Who is involved?28 Consortium Members>45 Associated Partners 3-year European project funded by: European Pharmaceutical Industry Innovative Medicines Initiative Open PHACTS API Applications using the Open PHACTS API dev.openphacts.org Explorer

How do we fit in? We integrate and standardize the chemical compound collection underpinning Open PHACTS and provide regular updates and on- going data curation. The validation and standardization rules have been derived from the FDA structure guidelines and have been changed for consistency and input from members of EFPIA.

Open PHACTS provides an integrated platform of publicly available pharmacological and physicochemical data ” “ Data accessible via: Free application programming interface (API) dev.openphacts.org Third-party applications built to use the API Open PHACTS app ecosystem

How does Open PHACTS work?

Currently integrated databases DatabaseMillions of triples ACD Labs / ChemSpider161.3 ChEBI0.9 ChEMBL146.1 ConceptWiki3.7 DrugBank0.5 Enzyme0.1 Gene Ontology0.9 SwissProt156.6 WikiPathways0.1 TOTAL470.2

CVSP and the OPS CRS Standardization workflows (CVSP, FDA, OPS, custom) using modules such as: SMIRKS transformations layout (GGA) canonical tautomers (ChemAxon) sugar interpretation (RSC)

Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

RDF and Open PHACTS The underlying language of Open PHACTS is RDF. There are few constraints as such, only guidelines for which classes of identifier to use and accounts of best practice. This RDF goes into the data cache and we access the results through user interfaces built on RESTful JSON web services.

What does RDF look like? In the Turtle format below, each line is a triple, in which a binary predicate links a subject and an object. :CSID1execution obo:OBO_ :CSID1prop11. :CSID1prop11 obo:IAO_ ops:OPS1. :CSID1prop11 rdf:type cheminf:CHEMINF_ :CSID1prop11 qudt:numericValue "1.049E-17"^^xsd:double. :CSID1prop11 qudt:unit obo:UO_ There is also RDF/XML, which is less human- readable.

Royal Society of Chemistry data in Open PHACTS 1.Molecule synonyms and identifiers 2.Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3.Molecule–molecule relations (“parent– child”) of interest for drug discovery 4.Calculated physicochemical properties for compounds (both molecular and macroscopic)

Royal Society of Chemistry data in Open PHACTS 1.Molecule synonyms and identifiers 2.Linksets between ChEBI, ChEMBL, DrugBank and OPS identifiers 3.Molecule–molecule relations (“parent– child”) of interest for drug discovery 4.Calculated physicochemical properties for compounds (both molecular and macroscopic)

Calculated physicochemical properties (ACD 12.0) log P log D (at pH 5.5, at pH 7.4) bioconcentration factor K OC (at pH 5.5, at pH 7.4) index of refraction polar surface area molar refractivity molar volume polarizability surface tension density at STP boiling point at 1 atm flash point at 1 atm enthalpy of vaporization at STP vapour pressure at STP

RDF for calculated properties: vocabularies Two dozen calculated properties for each of >10 6 molecules. CHEMINF ontology for kinds of calculation and chemical data QUDT for results OPS IDs for molecules OBI and IAO to connect calculations to results

RDF for calculated properties: schema benzene’s connection table OPS benzene calculation result QUDT dimensionless quantity “2.17”^^xsd:float IAO is about OBI has specified output OBI has specified input QUDT has value QUDT has standard uncertainty QUDT has unit CHEMINF calculated log P rdf:type CHEMINF connection table rdf:type “0.234”^^xsd:float calculation process CHEMINF execution of ACD/Labs PhysChem software library version rdf:type

Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL and DrugBank Sugar case study: Perspective perception

ChEMBL and DrugBank analysed Taking ChEMBL 16 ( which contains distinct molecules, CVSP found something to say about of them (35%). DrugBank 3.0 ( contains 6510 distinct molecules of which CVSP has found something to say about 662 of them (10%) (We haven’t done all of CS yet; we will.)

ChEMBLDrugBank Potentially serious things % %Not an overall neutral system %210.32%Forbidden-valence atoms 44—0—Has adjacent atoms with like charges 4—0—Has more than one radical centre

ChEMBLDrugBank Aesthetics % % Uneven-length bonds % % Congested layout % % Containing not-quite-linear cyano groups %1—Zero-dimensional structures %0—Containing not-quite-linear isocyano groups

ChEMBLDrugBank Artwork molecules 00Cyclobutane 80Ethane molecules in the structure 60Sulfur atoms with no explicit bonds 40Boron atoms with no explicit bonds 10Ethyne molecule (in the ChEMBL case it actually is acetylene) 30Stray methane molecules

ChEMBLDrugBank FDA tautomer and metal rules %801.29%In enol form (or chalcogenoenol form) %40.07%N=C–OH tautomer of a carbonyl compound 2—1—Nitroso-form oximes %6 Metal–nitrogen bond %100.15%Non-metal–transition-metal bond %100.15%Metal–oxygen bond 3—2—Aluminium–non-metal bond 2—0—Metal–fluorine bond

ChEMBLDrugBank Stereochemistry %390.60%G2-4: Has a single unknown stereocentre and no defined stereocentres: probably a racemate %130.20%G2-42 Has more than one unknown stereocentre and no defined stereocentres: probably problematic. Could indicate relative stereochemistry? %270.44%G2-44 At least one defined stereocentre, and one is stereocentre undefined or unknown: probably an epimer or mixture of anomers %110.17%G2-46 Has more than one unknown stereocentre and more than one defined stereocentre – probably problematic again %130.20%Unknown double bond arrangement %1—At least one ring containing stereobonds

Overview Open PHACTS and chemical validation and standardization RDF for chemoinformatics calculations General case study: ChEMBL Sugar case study: Perspective perception

Sugar depiction challenges Stereochemistry not stored in V2000 format (though present in.cdx).

Consequences

ChEMBL (19275) DrugBank (153) Sugar questions % %At least one L-pyranose ring (often antibiotics contain these) %0—At least one perspective chair %0—At least one Haworth ring %0—At least one perspective boat or twist boat

Sugar ring redepiction algorithm 1.Identify perspective conformation (boat, chair, Haworth) 2.Determine perspective stereo 3.Assign wedge or hash to bonds accordingly 4.Reconstruct sugar ring so as to minimize disruption to the rest of molecule 5.Tidy

Take the x-axis as parallel to the line through the top two chair atoms or through the bottom two chair atoms. Δy positive: wedge Δy negative: hash Then remap chair to homotropous hexagon.

In the boat case, the substituent further up the page is the wedge, while the one further down the page is the hash, regardless of whether bridgehead or not.

Depiction 1.Identify mean bond length and chair centroid. 2.Snap ring atoms to a regular-hexagonal grid. 3.Remove superfluous hydrogen atoms. 4.Only mark stereo on a single substituent if they are paired (cf. Grice).

Tidying: desiderata Different problem from structure layout in general. The structure we end up with is, in many important respects, fine. Preserve drawing conventions—aglycones being on the top right hand side.

Next steps Stable user-facing URI for CVSP (currently but subject to change) Apply CVSP to all of ChemSpider. Investigate fused rings.

Acknowledgements In particular, Jon Steele (RSC) David Sharpe (RSC) John Blunt (Canterbury, NZ)

Any