Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
CC SQL Utilities.
An Introduction to the IMSLP Petrucci Music Library An online music resource
ChemSpider: Searching by Chemical Name. ChemSpider  What is ChemSpider?  How to conduct a search  What do you get?
Edoclite and Managing Client Engagements What is Edoclite? How is it used at IU? Development Process?
Jewelry Inventory Management Software Your Logo Here Welcome to a demonstration of Del Mar Data Systems Jewelry Inventory Management.
Royal Society of Chemistry developments to support open drug discovery Antony Williams, Ken Karapetyan, Valery Tkachenko, Colin Batchelor Alexey Pshenichnov.
Engineering Village ™ ® Basic Searching On Compendex ®
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
1 Creating and Tweaking Data HRP223 – 2010 October 24, 2011 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This.
Introduction to PKI Seminar What is PKI? Robert Brentrup July 13, 2004.
Nu Project Management Office A web based tool to Manage Projects.
Client/Server Architecture
Troy Eversen | 19 May 2015 Data Integrity Workshop.
Jurisdictional Presentation May 21 st 2015 New Online Business Filing System.
By: Shawn Li. OUTLINE XML Definition HTML vs. XML Advantage of XML Facts Utilization SAX Definition DOM Definition History Comparison between SAX and.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Overview of Mini-Edit and other Tools Access DB Oracle DB You Need to Send Entries From Your Std To the Registry You Need to Get Back Updated Entries From.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 27 Slide 1 Quality Management 1.
CLEO’s User Centric Data Access System Christopher D. Jones Cornell University.
Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC.
Proposal for App Id and Service Provider Id registration Group Name: Shelby Kiewel Source: Shelby Kiewel, iconectiv / Ericsson,
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
Copyright © 2012 Accenture All Rights Reserved.Copyright © 2012 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are.
ChemSpider – A Crowdsourcing Environment for Hosting and Validating Chemistry Resources (and lessons from President Bush) Antony Williams 5th Meeting on.
Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Return to the Word 2007 web page Lesson 5: Using Mail Merge.
Configuration Management (CM)
Chemical Database Projects Delivered by RSC eScience at the FDA Meeting “Development of a Freely Distributable Data System for the Registration of Substances”
ChemSpider – A Combination Platform of Free Chemistry Database, Free Prediction Engines and Crowdsourcing Environment Antony Williams University of Oregon,
Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational Informatics Valery Tkachenko RSC-CSIR/OSDD meeting Pune,
May 2009 ChemAxon - What’s New?. What’s new and hot? All products have seen enhancements in the past 12 months BUT WHAT’S REALLY HOT?
CakePHP is an open source web development framework. It follows Model-View- Controller and is developed using PHP. IT is the basic for user to create.
This chapter is extracted from Sommerville’s slides. Text book chapter
CHAPTER TEN AUTHORING.
XRules An XML Business Rules Language Introduction Copyright © Waleed Abdulla All rights reserved. August 2004.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Professor Michael J. Losacco CIS 1110 – Using Computers Database Management Chapter 9.
Mock-up of ReStore repository site
SE: CHAPTER 7 Writing The Program
Monthly Publishing System (MPS) Developer Workshop 25 August, 2015.
The european ITM Task Force data structure F. Imbeaux.
1 Lab 2 and Merging Data (with SQL) HRP223 – 2009 October 19, 2009 Copyright © Leland Stanford Junior University. All rights reserved. Warning:
Marrying ACD/Labs technologies to eScience Projects at the Royal Society of Chemistry Antony Williams ACD/Labs User Meeting June 2013.
Writing Chemical Equations
1 © 1999 Microsoft Corp.. Microsoft Repository Phil Bernstein Microsoft Corp.
Software Testing and Maintenance 1 Code Review  Introduction  How to Conduct Code Review  Practical Tips  Tool Support  Summary.
Chapter 8 Lecture 1 Software Testing. Program testing Testing is intended to show that a program does what it is intended to do and to discover program.
Using the Open PHACTS API with KNIME Daniela Digles Open PHACTS Community Workshop.
BFC Moodle: SharePoint File Picker for Moodle 2. Upload Document into SharePoint.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
HRP Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected by copyright law and.
Accurate  Consistent  Compliant Contact: i4i the structured content company the structured content company.
Developer Exam Preparation Thom Robbins Bryan Soltis
InfoPath Forms and Workflows in SP 2010 Wylde Solutions Sydney SharePoint User Group 18 September 2011 Sydney, Australia.
1 A Look at the Application Authorized users can access Communicator! NXT from any Internet-capable computer via the Web.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
Data Mining Techniques Applied in Advanced Manufacturing PRESENT BY WEI SUN.
Work Smarter Not Harder Standardize Your Environment With Best Practices.
Building Enterprise Applications Using Visual Studio®
The KNIME workflow for automated processing of PHYSPROP data
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
The Re3gistry software and the INSPIRE Registry
Lab 2 HRP223 – 2010 October 18, 2010 Copyright © Leland Stanford Junior University. All rights reserved. Warning: This presentation is protected.
VoiceXML An investigation Author: Mya Anderson
Presentation transcript:

Delivering an online service for validating and standardizing chemical structure files using the ChemSpider platform

Overview Introduction – Why do we need to validate/standardise data – Examples of problems in general – Examples of Problems in ChemSpider – Why InChI is not enough – FDA rules

What are we trying to achieve? Everyone wants high quality data The ChemSpider team is building a reputation on data quality Many datasources have errors We need to identify: – Errors – Inconsistencies – Data duplication/Inappropriate separation of data Requires a process of validation and standardization

What do we mean by Validation and Standardisation? Validated – Check for hypervalency, charge balance, missing stereo – Name-Structure relationships, etc. Standardized – Use standard rules to “standardize” compounds; Nitro groups, O-Metal bonds, tautomers, etc.

Where will CVSP be useful Currently, a standalone system In the future; Validation/standardisation routines will be used: – Built in to our deposition system – At registration for new compounds – To improve existing data in ChemSpider – pass through the ChemSpider backfile Potential to offer optional checking service to authors

What we want to avoid

What do we do now? Currently, ChemSpider uses structures (as InChI’s) as the database key Need structures for depositions 2 Steps: – Pre-processing prior to deposition – InChI algorithm; provides standardisation and mapping

What are the common errors? Records without a structure Incorrect valences Atom labels

What are the common errors? Unbalanced charge – Name-structure errors Salts Polymers/Organometallics Missing stereochemistry

Side Effects of InChI on ChemSpider: Sort of helpful

Side Effects of InChI on ChemSpider Advantages and disadvantages – The depictions are meant to represent the same molecule – Not easy to pick out “bad” representations

Substance Registry System How do you decide your standardisation rules? Avoid standards in isolation m-UniqueIngredientIdentifierUNII/ucm pdf Note: This document is only a starting point

Salt and Ionic Bonds

Nitro groups

Ammonium salts

Validation rules In XML: Code generated dynamically from rule set. Indigo API used behind the scenes.

Standardization rules Corrections stored in database: SMIRKS-based corrections and also proximity- based metal–non-metal reconnection.

Case study: DrugBank DrugBank ( maintained by David WishartDavid Wishart Database contains 6711 structures Widely regarded as a well curated, high quality dataset DrugBank 3.0: a comprehensive resource for 'omics' research on drugs. Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djoumbou Y, Eisner R, Guo AC, Wishart DS., Nucleic Acids Res., 2011, 39, Jan, D

ChemSpider Standardization Entire ChemSpider database will be standardized using modified FDA rule set Original Molfiles will be standardized and all properties (predicted properties, SMILES, InChIs, Names) will all be regenerated Standardization procedures automatically applied to all future depositions

CVSP as a Flexible System There will be various rules sets – Rigid pre-defined rules: e.g. Meeting FDA specifications as written, Open PHACTS modified rules set, etc. – Flexible user-defined rules: users upload their rules in our custom format (XML) – The Open PHACTS rule set will be open to the community to reuse

Incorporating CVSP into data processing platforms: Knime The workflow includes: – SDF reader – Indigo nodes – calls for ChemSpider validation Web services

Incorporating CVSP into data processing platforms: Knime Warning is returned as a result of processing

Summary Will release back results of DrugBank Alpha version of CVSP available: Will be a resource for the Community Will improve ChemSpider Still a long way to go….

Thank you Twitter: ChemSpider