Why is this a problem? Hard drives explode (more often than you think..) How is *your* “My Documents” filing system? –Most of us live in folder chaos! How well does your hard drive integrate with your lab book? –Well, generally not at all… you might be able to match things on dates if you’re lucky! Big data is EXPENSIVE to generate It makes sense to get the most value out of it Your funding bodies know this!
MRC “The MRC expects valuable data arising from MRC-funded research to be made available to the scientific community with as few restrictions as possible. Such data must be shared in a timely and responsible manner.”
BBSRC “BBSRC expects research data generated as a result of BBSRC support to be made available with as few restrictions as possible in a timely and responsible manner to the scientific community for examination and use.” (even more pointedly, they also suggest that IP and commercialisation concerns should NOT preclude you from releasing data in a timely fashion)
Opens up a new problem How do we make sure that we can exchange, and understand the data that we share with other researchers? Standardised formats for reporting certain experimental data types have been developed Although pre-dated by massive open access biological sequence databases – GenBank, DDBJ, EMBL, PDB, UniprotKB etc. these suffer from the fact there are 20+ ‘standards’ for representing DNA or protein sequence data. A new set of data standards has emerged for modern biological data Often called ‘MI’ data standards Capture ‘minimum information’ metadata (data about data) required to comprehend and share scientific data
Particularly for high throughput data All started with MIAME (minimum information about a microarray experiment) Now extends to proteomics, neurophysiology, genome sequences – even gel electrophoresis If you are going to publish a microarray experiment it is very likely that the journal you publish in will MANDATE that the data is annotated to MIAME standards AND deposited in a recognised repository for that data –GEO –ArrayExpress
Why stop at data? Whilst the RCUK’s are moving to policies where data is openly deposited, other scientific information is also being openly released Open Access publication – a new paradigm for journals (they charge no subscription fees) Scientists are beginning to really utilise the internet to share data, ideas, foster collaborations But why? –The realisation that the data in your lab books is ‘tombed’. Unless you’re going to commercialise it, or it’s going to win you a Nobel…. Why not share?
But still people argue about sharing I don’t want to be scooped! My data isn’t very good I am hoping to commercialise this some day
Open notebook science A new concept being pioneered by some scientists Using ‘Web 2.0’ tools (i.e. user generated content) A combination of –Blogs Even if you’re not sharing data, why not share some ideas? –Wikis Wikis are like lab books on steroids, and you can link them to all kinds of external resources, open them up to the world –Other collaborative tools
To sum up Be aware what the expectations are for releasing your data to the public from your funding body The more metadata you capture about your work the easier it will be to comply with data standards regulations later Don’t be afraid to use technology, keeping track of science is hard, and there’s no way to Google a lab book! Engage with online communities – many a collaboration has been formed via a blog post! Want to talk about how best to analyse and store your digital data? Come talk to us!