Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex.

Slides:



Advertisements
Similar presentations
Design of a Grid Enabled Database System to Facilitate Reuse, Provenance Tracking and Automated Processing of Chemical Information Robert Gledhill University.
Advertisements

XML Flattened The lessons to be learned from XBRL.
 Natural consequence of the way Internet is organized o Best effort service means routers don’t do much processing per packet and store no state – they.
28 October 2005Jeremy Frey, University of Southampton1 “The CombeChem Experience” CICC Workshop 28 October 2005 Bloomington Indiana.
Introduction to Database Management  Department of Computer Science Northern Illinois University January 2001.
Experiments and statistics. QNT, Paul Cairns, University of York2 Classic “lab” study  Studying cause and effect – “novel navigation for faster task.
System Concepts and Architecture Rose-Hulman Institute of Technology Curt Clifton.
SE 450 Software Processes & Product Metrics Reliability Engineering.
Data Mining.
Cloud Computing for Chemical Property Prediction Paul Watson School of Computing Science Newcastle University, UK Microsoft Cloud.
BUSINESS MODELS MODELS What is a MODEL ? Model in physical world A small object, built to scale, representing a larger, more complex object.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
ChE 551 Lecture 19 Transition State Theory Revisited 1.
Pemrograman Berbasis WEB XML part 2 -Aurelio Rahmadian- Sumber: w3cschools.com.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
1 SWAD Europe Storage and Retrieval Workshop Dave Beckett.
Database Technical Session By: Prof. Adarsh Patel.
A Web Crawler Design for Data Mining
Comb-e-day e-models Dr Jonathan W Essex University of Southampton.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
Computer Science 101 Database Concepts. Database Collection of related data Models real world “universe” Reflects changes Specific purposes and audience.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Data Structures & Algorithms and The Internet: A different way of thinking.
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
Investigating Protein Conformational Change on a Distributed Computing Cluster Christopher Woods Jeremy Frey Jonathan Essex University.
Databases From A to Boyce Codd. What is a database? It depends on your point of view. For Manovich, a database is a means of structuring information in.
Psychology 3306 Dr. D. Brodbeck. Introduction You knew it would start this way…. You knew it would start this way…. What is learning? What is learning?
Samad Paydar Web Technology Lab. Ferdowsi University of Mashhad 10 th August 2011.
RDF and triplestores CMSC 461 Michael Wilson. Reasoning  Relational databases allow us to reason about data that is organized in a specific way  Data.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
Data Base Systems Some Thoughts. Ethics Guide–Nobody Said I Shouldn’t Kelly make a backup copy of his company’s database on CD and took it home and installed.
1 CSC 9010 Spring Paula Matuszek CSC 9010 ANN Lab Paula Matuszek Spring, 2011.
$1 Million $500,000 $250,000 $125,000 $64,000 $32,000 $16,000 $8,000 $4,000 $2,000 $1,000 $500 $300 $200 $100 Welcome.
RDF languages and storages part 1 - expressivness Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004.
Matter Unit 1: All That Glitters is Not Gold. What is Matter? Amount of stuff that is in an object Anything that has mass and takes up space Made up of.
Application Programmer Done by:Nickietha Phinn option 10(5)
1 Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a: Architectural Design Software Engineering: A Practitioner’s Approach, 6/e Chapter 10a:
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
© 2006 University of Kansas An LSID resolver for specimens and a digression into issues raised by the use of GUIDs Steve Perry
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Describing Matter. Properties of Matter  Matter is anything that has mass and takes up space.  Matter can be hard, soft, rough or smooth, round, square,
LCAs and policymaking: The good, the bad, and the ugly Dr Michael Warhurst, Friends of the Earth.
© 2009 OpenLink Software, All rights reserved. Mapping Relational Databases to RDF with OpenLink Virtuoso Orri Erling - Program Manager, Virtuoso.
What’s it all about Alfie? Scientific tools and the scientific method Scientific tools and the scientific method What is chemistry? What is chemistry?
OSG Area Coordinator’s Report: Workload Management February 9 th, 2011 Maxim Potekhin BNL
The following principles must be kept in mind while formulating models.  PRINCIPLE OF SIMPLICITY Mathematicians are of the habit of making complex models.
1 Chapter 1 Chemistry: The Study of Matter. 2 What is Chemistry?  The study of the matter, its composition, properties, and the changes it undergoes.
SOL Review 7 Matter and Thermochemistry. Matter Anything that has mass and takes up space.
Lecture 2 Page 1 CS 236 Online Security Policies Security policies describe how a secure system should behave Policy says what should happen, not how you.
Unit 3 Dr. B.’s ChemAdventure Matter 1 Unit 3: Matter 1.What is it? 2. Mixtures: types 3. Mixtures: purification 3. Pure matter and it’s properties 4.
Pure Substances and Mixtures TEST REVIEW. Matter anything that takes up space and has mass Properties describe the characteristics of matter - colour,
Databases and Database User ch1 Define Database? A database is a collection of related data.1 By data, we mean known facts that can be recorded and that.
BIG DATA Initiative SMART SubstationBig Data Solution.
CS4222 Principles of Database System
Understanding Data Storage
Database Management.
Database System Concepts and Architecture
Server Concepts Dr. Charles W. Kann.
ADVANTAGES OF SIMULATION
Overview: Fedora Architecture and Software Features
Chapter 1 Chemistry: The Study of Matter
Business Communication Dr. Aravind Banakar –
9/22/2018.
Classifying Matter: Atoms, Elements, & Molecules
Design Model Like a Pyramid Component Level Design i n t e r f a c d s
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management
Presentation transcript:

Smart Storage for Physical Properties Or How on Earth do we Store this Stuff? Kieron Taylor with Jeremy Frey and Jonathan Essex

What makes up chemical data? ● Numbers - big, small, precise and vague ● Circumstances - How hot? What pressure? ● Assumptions – This is pretty pure, let's say it's pure – Standard conditions? More or less – That peak on the spectrum isn't important

Using the Data: QSPR Take lots of data Magical statistics occur Validate results Predictive model

So What is Real Data like? Bad - take the commercial Physprop Database Can we handle these melting points?

Let's Make a Database ● One data source is not enough ● Good(?) data isn't free ● Different sources have varied style of content ● Most database software not suited to data mining ● We cannot plumb these varied sources for data, we must reconcile them to make sensible statistics

Relational Design For one molecule: Cyclohexanone PropertyValueUnits Solubility2500mg/L Melting point-31C Boiling point155.4C PropertyValueErrorUnitsSource Solubility2500+/-50mg/LPhysprop 2650+/-60mg/LOur lab Melting point-31+/-0.1CDetherm Boiling point155.4+/-0.5CMerck Index PropertyValueErrorUnitsSourceMethodAuthor Solubility2500+/-50mg/LPhyspropLaboratory /-60mg/LSouthamptonSimulationMe Melting point-31+/-0.1CDethermLaboratory... Boiling point155.4+/-0.5CMerck IndexLaboratory... Arbitrary numbers of points are hard to store in relational databases We're not done yet: We still have to account for multiple experimental conditions, statements of validity and molecules. Provenance = Senary relational model? PropertyValueErrorUnitsSourceMethodAuthorNote Solubility2500+/-50mg/LPhyspropLaboratory /-60mg/LSouthamptonSimulationMeSuperceded 2599+/-25mg/LSouthamptonSimulation BMe Melting point-31+/-0.1CDethermLaboratory... Boiling point155.4+/-0.5CMerck IndexLaboratory...Decomposing

RDF Triplestore is the Solution ● RDF describes trees and networks of entities ● Data of this complexity lends itself well to a tree representation ● RDF trees enable additional clever things ● Triplestores provide persistent RDF models

What can we do with this? ● Store almost any chemical data as normal ● Track the where, when and how of each and every data point ● Filter values down whether real, simulated, old, new, from a particular source, or done by a particular person. ● Bolt on RDF schemas such as FOAF and our units system.

What have we done with this?

Thanks to: ● AKT and Steve Harris for 3store ● Rob Gledhill for web tech and discussion ● Perl for s/ / /g