Presentation is loading. Please wait.

Presentation is loading. Please wait.

Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC.

Similar presentations


Presentation on theme: "Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC."— Presentation transcript:

1 Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC

2 Overview Introduction – What data can we consider? – What are the challenges – What data and sources does the RSC have? – Experimental Data Checker Case Studies: – Project Prospect – Chair forms of Sugars/cyclohexanes

3 Traditional Chromatography Images taken from: http://www.sciencemadness.org/talk /viewthread.php?tid=3960&page=3 http://en.wikipedia.org/wiki/Column_ chromatography

4 Why Digital Chromatography? Useable information is mixed in with description and analysis – Makes it difficult to find Despite our best efforts – still lots of ambiguous or plain wrong/unusable chemical information Why? – Human error – Processing errors – Incorrect usage of data generation/extraction – Style over meaning – Data not generated with reuse in mind – Data generated for humans

5 Style/Layout Vs Meaning Structures drawn to illustrate more than just the identity Data not generated with reuse in mind Author practices Mixed 2D and perspective representations Unintentional definition of stereochemistry

6 Data generated for humans Separated/Orphaned information inc. Markush structures, information passed by reference

7

8 What chemical data can we consider? Chemistry is an especially challenging - wide range of types of data – Numeric data – Names – Structures – Terminology Over a hugely different set of topics: Org, Inorg, Physical – Meanings/interpretations are not perfectly aligned Application of standards can be challenging Drawing conventions – are documented but not used

9 What chemical data and sources does the RSC have?

10 A beginning: helping chemists review their own work Amphidinoketide I To a solution of……. …. Amphidinoketide I was isolated as a …….. [α] D 25 −17.6 (c 0.085, CH 2 Cl 2 ); R f = 0.61 (1:1 hexane:ethyl acetate); ν max (CHCl 3 )/cm −1 1707.2 (CO), 1686.9 (CO), 1632.4 (CO), 1618.9 (CC), 1458.1; 1 H NMR (CD 2 Cl 2, 500 MHz) δ H 6.08 (1H, t, J = 1.3 Hz, 3-CHC), 5.82 (1H, ddt, J = 16.9, 10.2, 6.7 Hz, 19-CHCH 2 ), 4.99 (1H, m (17.1 Hz), 20-CH A ), 4.92 (1H, m (10.2 Hz), 20-CH B ), 3.05 (1H, dd, J = 17.9, 9.3 Hz, 8-CH A ), 3.00–2.90 (3H, m, 9-CHCH 3, 11-CH A, 12-CHCH 3 ), 2.72–2.64 (2H, m, 5-CH A, 6-CH A ), 2.62–2.55 (2H, m, 5-CH B, 6-CH B ), 2.51–2.45 (3H, m, 8-CH B, 11-CH B, 14-CH A ), 2.33 (1H, dd, J = 16.9, 7.4 Hz, 14-CH B ), 2.09 (3H, s, 21-CH 3 ), 2.05–1.99 (2H, m, 18-CH 2 ), 1.99–1.96 (1H, m, 15-CHCH 3 ), 1.88 (3H, s, 1-CH 3 ), 1.39–1.25 (3H, 17-CH 2, 16-CH A ), 1.14–1.10 (1H, m, 16-CH B ), 1.07 (3H, d, J = 7.0 Hz, 22-CH 3 ), 1.05 (3H, d, J = 7.2 Hz, 23-CH 3 ), 0.87 (3H, d, J = 6.7 Hz, 24-CH 3 ); 13 C NMR (CD 2 Cl 2, 125 MHz) δ C 213.15 (13-CO), 212.08 (10-CO), 208.40 (7-CO), 198.76 (4- CO), 155.40 (2-CCH), 138.41 (19-CHCH 2 ), 123.54 (3-CHC), 114.19 (20-CH 2 C), 48.81 (14-CH 2 ), 45.93 (11-CH 2 ), 44.50 (8-CH 2 ), 41.43 (9-CHCH 3 ), 41.01 (12-CHCH 3 ), 37.74 (5-CH 2 ), 36.55 (16-CH 2 ), 36.27 (6- CH 2 ), 34.18 (18-CH 2 ), 28.74 (15-CHCH 3 ), 27.57 (1-CH 3 ), 26.60 (17-CH 2 ), 20.63 (21-CH 3 ), 19.77 (24- CH 3 ), 16.65 (22 or 23-CH 3 ), 16.62 (22 or 23-CH 3 ); HRMS (ESI) Calculated for C 24 H 38 O 4 413.2668, found 413.26600 (MNa + ). (9R, 12R, 15S)-1 had [α] D 25 +11 (c 0.245, CH 2 Cl 2 ). http://www.rsc.org/is/journals/checker/run.htm

11 Case study 1: Project Prospect

12 12 What is Prospect? OSCAR Enhanced RSC XML InChI–Name pairs (from ChemSpider) OntologiesRSC XML Tool layer Input layer Information layer Output layer RSS Enhanced HTML Prospect database InChI–name pairs (in ChemSpider) Author CDX files Better ontologies Visible output

13 People and machines People Can understand narratives. Can interpret pictures. Can reason about three- dimensional objects. Can do a high-quality job. Machines Can’t understand narratives. Can’t interpret pictures. Not able to infer 3D structure from 2D without cues. Can do a lower-quality, but still useful job.

14 Case study 2: The chair representation issue InChI=1S/C6H12O6/c7-1-2-3(8)4(9)5(10)6(11)12-2/h2-11H,1H2 WQZGKKKJIJFFOK-UHFFFAOYSA-N 5 stereocentres = 2^5 isomers =32 structures

15 Case study 2: Chair forms of hexacycles what could go wrong?

16 How we normalize them: 1.Identify 6-membered rings (Indigo) 2.Identify what sort of ring it is 3.Map atoms onto a standard structure (eg. beta-D-glucopyranose) 4.Tidy How do we “fix” chair-representations

17 The future: “The digester” Ability to: – Reconnect R-groups – Expand abbreviations – Expand brackets – Link structures with reference IDs

18 Other examples that we didn’t mention in case studies CIF data importer Structure Validation and Standardisation – (Thurs Aug 23, 9:15 am, Marriott Downtown, Franklin Hall 6) Work on creation of ontologies, RXNO, CMO – Also collaborating on: ChEBI ontology, GO, SO Collaboration with Utopia to enable Prospect mark-up of PDFs

19 Summary Many data sharing practices are based on: – Traditional print articles – Consumption of data by humans only This poses issues for publishers and users alike The RSC is developing innovative solutions to address some of these problems – Chemical structures are challenging – Limitations to what a machine methods can achieve – Need to educate authors to think differently

20 Acknowledgements Colin Batchelor - Development and Technical work Jeff White & Aileen Day Richard Kidd, Graham McCann and Will Russell RSC ICT staff

21 Thank you Email: chemspider@rsc.org Twitter: @ChemSpider http://www.chemspider.com


Download ppt "Approaches for extraction and “digital chromatography” of chemical data: A perspective from the RSC."

Similar presentations


Ads by Google