The collection, curation and modeling of Open Melting Point measurements August 26, 2011 5 th Meeting on U.S. Government Chemical Databases and Open Chemistry.

Slides:



Advertisements
Similar presentations
Solubility of Bio-Sourced Feedstocks in Green Solvents Samantha Payne and Fran Kerton* Department of Chemistry, Memorial University of Newfoundland, St.
Advertisements

Organic Compounds Organic compounds are the compounds that contain carbon they can be found in products made from living things or things that are.
Carbon Compounds Chapter 8 Section 2.
Classifying Organic Compounds
Validation of chemical data on Wikipedia Martin A. Walker Dept. of Chemistry, SUNY Potsdam Member of the Wikipedia Chemistry Project.
Helping Students Succeed at Identifying Organic Compounds: Optimizing Location and Content of a Guide to the Literature Susan K. Cardinal & Kenneth J.
Instrumental Chemistry Chapter 1 Introduction. Classical Methods Early years of chemistry  Separation of analytes by precipitation, extraction, or distillation.
PHYSICAL PROPERTIES OF ORGANIC COMPOUNDS Mr. Maywan Hariono.
Structure and Classification of Amines Amines are derivatives of ammonia, the same way that alcohols are derivatives of water Amines have a nitrogen,
Organic Chemistry Larry Scheffler Lincoln High School 1 Revised September 12, 2010.
Crowdsourced Curation of Chemistry Data. How Bad is Online Chemistry Data? Antony Williams Wolfram Summit, September 2010.
Crowdsourcing Chemistry for the Community – 5 Years of Experiences Antony Williams NFAIS, February 28 th 2012.
The implications of Open Notebook Science and other new forms of scientific communication for Nanoinformatics Jean-Claude Bradley November 3, 2010 Nanoinformatics.
FINDING PHYSICAL AND CHEMICAL PROPERTIES ON THE INTERNET Margarete Bower Chemistry Library.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
HOW TO FIND PHYSICAL AND THERMODYNAMIC PROPERTIES OF CHEMICAL SUBSTANCES Bruce Slutsky NJIT Library.
Open Notebook Science for Collaborative Drug Development Jean-Claude Bradley Sept 5, 2008 University of Manchester E-Learning Coordinator College of Arts.
Open Notebook Science: Implications for the Future of Libraries Jean-Claude Bradley E-Learning Coordinator College of Arts and Sciences Associate Professor.
Carbon Chemistry Classification of Hydrocarbons and Organic Functional Groups.
Chapter 1 Organic Chemistry Chemistry 20. Organic Compounds.
Campbell and Reece Chapter 4. Organic Chemistry  study of carbon compounds (most also have hydrogen)  range from small molecules (methane has 4 atoms)
Chapter 18 Carboxylic Acids and Their Derivatives
ONS Challenge: Solubility Determination of Carboxylic Acids in Chloroform using 1 H NMR and External Water Standard David Bulger 1, Andrew Lang, Ph. D.
© SSER Ltd..
Chemical Database Projects Delivered by RSC eScience at the FDA Meeting “Development of a Freely Distributable Data System for the Registration of Substances”
Functional Groups Chemistry 11. Functional Groups There are several different groups that can be added to a hydrocarbon in order to change it into a different.
Affordable Sources for Property Data Authors: Carrie Newsom Susan K. Cardinal Last Updated: 3/20/07 A product of the ACS CINF Education Committee.
Searching the Chemical Literature. Information Available in the Literature Physical constants – Melting point, boiling point, density, solubility data,
Searching the Chemical Literature: Reference Books and Online Resources Dr. Sheppard Chemistry 4401L.
ChemModLab: A Web-based Cheminformatics Modeling Laboratory S. Stanley Young + ECCR and ChemSpider Teams.
Chapter 1 Introduction to Organic Chemistry: 1.1 Organic Compounds 1.
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
TOPIC 11 – ORGANIC CHEMISTRY. TOPIC 11 – Regents Review Organic compounds consist of carbon atoms bonded to each other in chains, rings, and networks.
Cheminformatics in Open Notebook Science Jean-Claude Bradley E-Learning Coordinator College of Arts and Sciences Associate Professor of Chemistry Drexel.
Chapter 10 Introduction to Organic Chemistry: Alkanes 10.1 Organic Compounds 1 Copyright © 2009 by Pearson Education, Inc.
Cheminformatics and Open Notebook Science Jean-Claude Bradley E-Learning Coordinator College of Arts and Sciences Associate Professor of Chemistry Drexel.
Organic Chemistry Lab 315 Fall, 2014 For Tues., Sept. 16, we will meet in room 310 Planetary Hall at 1:00 p.m.
Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform.
Chapter 8 Amines Chemistry 20. Amines: Are derivatives of ammonia NH 3. Contain N attached to one or more alkyl (Aliphatic amine) or aromatic groups (Aromatic.
Vendor Session: ChemSpider, from Royal Society of Chemistry.
Organic Chemistry Nathan Watson Lincoln High School Portland, OR.
Processing drug discovery raw data collaboratively and openly using Open Notebook Science Jean-Claude Bradley E-Learning Coordinator College of Arts and.
Organic Chemistry Lab 318 Spring, DUE DATES Today –At beginning of lab – Synthesis of di-t-Bu-biphenyl Report –Spectroscopy Problem Set, Part II,
Research Publications Analysis Saint Mary’s
Completing, Balancing, and Classifying Chemical Reactions Example 4.
Introduction to Carbon Chemistry Honors Physical Science Ms. Mandel.
Amines. 2 Learning Objectives Chapter ten discusses the following topics and by the end of this chapter the students will:  Know.
Organic Chemistry The study of carbon and carbon-containing compounds.
1 Dr Nahed Elsayed. Learning Objectives Chapter ten discusses the following topics and by the end of this chapter the students will:  Know the structure.
Amines And Amides Edith, Aaron and Clark. After this class, you will... tell the different types of amines recognize the differences between amines and.
Indiana University School of Indiana University ECCR Summary Infrastructure: Cheminformatics web service infrastructure made available as a community resource.
Notes 8-2 Carbon Compounds. Organic compounds Made up of carbon Have similar properties such as melting point, boiling point, odor, electrical conductivity,
Chapter 2 Families of Carbon Compounds. Basic Definitions Hydrocarbons- Compounds containing only carbon and hydrogen. Alkanes- hydrocarbons that contain.
The KNIME workflow for automated processing of PHYSPROP data
Chapter 10 Organic Chemistry
Organic Chem.
Carboxylic Acids, Esters, Amides
Instrumental Chemistry
Carbon Compounds.
Organic Chemistry Mrs. Rose Marie Capanema Mansur.
Functional Groups In an organic molecule, a functional group is an atom or group of atoms that always reacts in a certain way. Section 22-1.
Chapter 3 Organic compounds.
HL Physical Organic Chemistry: Supplementary Material
Mobilizing EPA’s CompTox Chemistry Dashboard Data on Mobile Devices
Chapter 10 Organic Chemistry
26th June 2012 Alcohols AIM – to describe the reactions of alcohols.
Organic Functional Groups
5.3 ORGANIC COMPOUNDS Compounds can fall under two broad categories:
The Chemical Biology of Human Vitamins
CARBON COMPOUND UNIQUE.
Presentation transcript:

The collection, curation and modeling of Open Melting Point measurements August 26, th Meeting on U.S. Government Chemical Databases and Open Chemistry Jean-Claude Bradley Department of Chemistry Drexel University Andrew Lang Department of Mathematics Oral Roberts University Antony Williams ChemSpider Royal Society of Chemistry

The Problem of Data Quality in Chemistry Lack of provenance Reliance on a system of “trusted sources” CRC Handbook Merck Index Chemical Vendor Catalogs (e.g. Sigma-Aldrich) Peer-Reviewed Journals In the case of melting points:

Strategy for the curation of melting points Using technology, we can begin to replace the “trusted source” model with one based on transparency and provenance 1.Rely on redundancy when possible 2.Provide the maximum level of provenance when necessary (Open Notebook Science) 3.Adhere to Open Data, Open Descriptors and Open Algorithms for measurements and modeling

The Chemical Information Validation Sheet 567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course

Investigating the m.p. inconsistencies of EGCG

Most popular data sources

Alfa Aesar donates melting points to the public

Open Melting Point Explorer

Outliers MDPI dataset EPA/PhysProp (donated all data to public also)

Outliers for ethanol: Alfa Aesar and Oxford MSDS

Inconsistencies and SMILES problems within MDPI dataset

MDPI Dataset labeled with High Trust Level

EPA/PHYSPROP Structure Errors (Incorrect Valence): 2315 out of were contained pentavalent nitrogens

EPA/PHYSPROP Errors: Structure displayed is for the neutral compound dopamine but the associated CAS Number and chemical name in the file are for the hydrobromide salt.

Common errors in datasets 1.multiple melting points for the same compound in the same database 2.stereochemistry issues 3.sign inversion 4.conversion errors (Kelvin/Celcius Fahrenheit/Celcius) 5.bad SMILES (non-rendering) 6.salts associated with SMILES for free base 7.using boiling point for melting point

Open melting point datasets Double+ validated: 2706 compounds (7413 highly curated measurements. range: C. Compounds that had at least one chiral center, possessed cis/trans isomerism, were inorganic or a salt removed.) Entire dataset: unique compounds (27684 measurements – no inorganics or salts)

Open Models with Open Data Using Open Descriptors (CDK)

Modeling Results ModelTraining setTest set (TS) DescriptorsTS AAETS RMSETS R D D/3D D D

Melting point prediction service

Melting point predictions and measurements on iPhone/iPad (Alex Clark)

Publication of double+ validated melting point dataset to Nature Precedings and LuLu

For all Formats of ONS Projects

Open Melting Point Datasets Currently 20,000 compounds with Open MPs

Some melting points can’t be resolved only with literature: 4-benzyltoluene

Motivation: Faster Science, Better Science

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Using melting point for temperature dependent solubility prediction

Crowdsourcing Solubility Data

Integration of Multiple Web Services to Recommend Solvents for Reactions

All ONS web services

Google Apps Scripts web services

Google Apps Scripts for conveniently exploring melting point data

Straight chain carboxylic acids from 1 to 10 carbons Straight chain alcohols from 1 to 10 carbons Comparison of model with triple validated measurements

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Google Apps Scripts for planning reactions and creating schemes

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Conclusions For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance Open Notebook Science offers an efficient way to make research transparent and discoverable