Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.

Similar presentations


Presentation on theme: "Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in."— Presentation transcript:

1 Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in Databases May 21st 2008

2 Background biomedical research basic & clinical science animal, cell models, patients genes, proteins, pathways data analysis & mining publication

3 Biomedical discovery Looking for contribution to – human health and disease In house experiments – data workflows – knowledge capture Use public databases – many data types – integration is a problem

4 Databases we use sequencestructure function expression domain specific

5 Data workflows experiment 2 spreadsheet raw data calculations publication database processed data experiment 1 database

6 Data workflows copy and paste open from file ‘algorithm’ copy and paste save to file IN OUT BUT: web services automated tools & databases bioinformatics workflows

7 Bioinformatics workflows

8 Is our field changing? databases experiments knowledge knowledgebase

9 Knowledge capture

10

11 What provenance to we need? Example: Gene expression in a transgenic animal gene annotation gene expression measurements public databasesoutput from machine processingintegration where, when which identifiershow when, what, how data mining what and how did we select genes …

12 What provenance to we need? Example: Curated protein database expert data database links curator input archive contributor, date verify, add, delete, modify source, identifiers, dates Curated database versions, dates development schema & interface changes

13 What do we do now (for provenance) ? We trust the main data providers a lot! – a pragmatic approach We use tools and note the settings – rarely fully We put extra fields in our databases – source, modify date We deposit our data in public repositories – but only when we need to

14 What might we do next? Use workflow tools like Taverna – capture workflow provenance Build provenance tool & database – widely applicable Make provenance more visible to biologists – so they value and use it

15 Conclusions In biology we don’t do provenance well (yet) We use databases and manual workflows We implement rudimentary provenance We should build useful provenance tools We need to make provenance visible


Download ppt "Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in."

Similar presentations


Ads by Google