Presentation is loading. Please wait.

Presentation is loading. Please wait.

Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex.

Similar presentations


Presentation on theme: "Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex."— Presentation transcript:

1 Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex

2 What makes up chemical data? ● Numbers - big, small, precise and vague ● Circumstances - How hot? What pressure? ● Assumptions – This is pretty pure, let's say it's pure – Standard conditions? More or less – That peak on the spectrum isn't important

3 Using the Data: QSPR Take lots of data Magical statistics occur Validate results Predictive model

4 So What is Real Data like? Bad - take the commercial Physprop Database Can we handle these melting points?

5 Let's Make a Database ● One data source is not enough ● Good(?) data isn't free ● Different sources have varied style of content ● Most database software not suited to data mining ● We cannot plumb these varied sources for data, we must reconcile them to make sensible statistics

6 Relational Design For one molecule: Cyclohexanone PropertyValueUnits Solubility2500mg/L Melting point-31C Boiling point155.4C PropertyValueErrorUnitsSource Solubility2500+/-50mg/LPhysprop 2650+/-60mg/LOur lab Melting point-31+/-0.1CDetherm Boiling point155.4+/-0.5CMerck Index PropertyValueErrorUnitsSourceMethodAuthor Solubility2500+/-50mg/LPhyspropLaboratory... 2650+/-60mg/LSouthamptonSimulationMe Melting point-31+/-0.1CDethermLaboratory... Boiling point155.4+/-0.5CMerck IndexLaboratory... Arbitrary numbers of points are hard to store in relational databases We're not done yet: We still have to account for multiple experimental conditions, statements of validity and molecules. Provenance = Senary relational model? PropertyValueErrorUnitsSourceMethodAuthorNote Solubility2500+/-50mg/LPhyspropLaboratory... 2650+/-60mg/LSouthamptonSimulationMeSuperceded 2599+/-25mg/LSouthamptonSimulation BMe Melting point-31+/-0.1CDethermLaboratory... Boiling point155.4+/-0.5CMerck IndexLaboratory...Decomposing

7 RDF Triplestore is the Solution ● RDF describes trees and networks of entities ● Data of this complexity lends itself well to a tree representation ● RDF trees enable additional clever things ● Triplestores provide persistent RDF models

8

9

10 What can we do with this? ● Store almost any chemical data as normal ● Track the where, when and how of each and every data point ● Filter values down whether real, simulated, old, new, from a particular source, or done by a particular person. ● Bolt on RDF schemas such as FOAF and our units system.

11 What have we done with this? http://green.chem.soton.ac.uk/triangle/query.html

12 Thanks to: ● AKT and Steve Harris for 3store ● Rob Gledhill for web tech and discussion ● Perl for s/ / /g


Download ppt "Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex."

Similar presentations


Ads by Google