Well-oiled cogs meshing perfectly (would be nice) How well are things working? —Cue the Tower of Babel analogy… —Situation is improving with respect to standards —But few tools, fewer carrots (though some sticks) Why do we care about that..? —Data exchange / deposition Comprehensibility (/quality) of work Scope for reuse (parallel or orthogonal) “Publicly-funded research data are a public good, produced in the public interest” “Publicly-funded research data should be openly available to the maximum extent possible.”
Technologically-delineated views of the world A: transcriptomics B: proteomics C: metabolomics …and… Biologically-delineated views of the world A: plant biology B: epidemiology C: microbiology …and… Generic features (‘common core’) — Description of source biomaterial — Experimental design components Arrays Scanning Arrays & Scanning Columns Gels MS MS FTIR NMR Columns Modelling the biosciences (inefficiently)
‘Omics’ is about as useful as a chocolate teapot Assay:Omics and miscellaneous techniques Investigation:Medical syndrome, environmental effect, etc. Study:Toxicology, environmental science, etc.
Reporting guidelines — a case in point MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE, MISFISHIE, MIGS, MIMIx, MIQAS, MIRIAM, (MIAFGE, MIAO), My Goodness… ‘MI’ checklists usually developed independently, by groups working within particular biological or technological domains —Difficult to obtain an overview of the full range of checklists —Tracking the evolution of single checklists is non-trivial —Checklists are inevitably partially redundant one against another —Where they overlap arbitrary decisions on wording and sub structuring make integration difficult Significant difficulties for those who routinely combine information from multiple biological domains and technology platforms —Example: An investigation looking at the impact of toxins on a sentinel species using proteomics (‘eco-toxico-proteomics’) —What reporting standard(s) should they be using?
Drafting MIBBI Foundry modules Analytical approach proved ‘challenging’ Cross analyses were either too coarse or too depressing Conclusion: no ‘perfect’ solution… If in doubt, hack (a.k.a. ‘iterative development’) Start with one set of guidelines, breaking it into ‘paragraphs’ Add another set, breaking it up similarly (‘shared subject’) Where there are overlaps, seek to resolve —If similar, aim for an ‘average’ module —If distinct, use core and extension modules —Record dependencies in a matrix (for reference) ‘Normalise’ (look for efficiencies, to a point) Validation Asking for something like MIxxx should get something like MIxxx Weigh the conflicts/compromises; reexamine extensions etc.
Current coverage: Portal versus Foundry Checklists covered to date (x) MIGS/MIMS, MIAPE, MIFlowCyt, MIARE, ‘Env’ extensions Modules developed to date 35 (set to rise rapidly)… Investigation Study design Study overview Organism L1 Organismal genetic component Cell culture Environment Geographic location Sampling event Sample description Biological sample description Sample size Sample processing Column chromatography Nucleic acid sequencing Mass spectrometry Capillary electrophoresis Flow cytometry Gel electrophoresis RNAi assay Nucleic acid sequencing data processing Mass spectrometry informatics Flow cytometry data analysis Gel informatics RNAi assay data analysis Person or responsible role Date Time Data set RNAi assay data set Reagent Fluorescence reagent Publication Organization Database record Project-specific extensions MIGS Investigation MIGS Organism
‘Pedro’ tool → XML → (via XSLT) Wiki code (etc.)
Future direction for MICheckout? Current status Very simple interface —Pick what you want, in the order you want —Download or view in the format you want Issues with the current interface —Pick what you want, in the order you want (=anarchy) —No way to work out everything that you need (fiddly bits) Different approaches 1.Wizard-based Q&A for normal users, plus ‘advanced’ interface —Simple ordered (ISA) questions for users; high level concepts —Advanced interface similar to the current one 2.Domain-specific-MI-based concepts as keys/shortcuts —“I normally get MIxxx – please give me the equivalent” —Similar advanced access to #1
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project 18 Example of guiding the experimentalist to search and select a term from the EnvO ontology, to describe the habitat of a sample (Ontologies, accessed in real time via the Ontology Lookup Service and BioPortal.)
BII @ the NERC Environmental Bioinformatics Centre
ISA Software Users Several groups have now begun to use all or part of the ISA software suite Easy to get going by using the data entry tool alone (ISAcreator) Power users can reconfigure ISAcreator to meet local need (ISAconfigurator) Some skill required to install the full suite (back end stuff) Satisfies two needs: 1.Internal data management 2.Requirement to share data
The BioSharing project provides stable web-based catalogues and a user forum. The project seeks to: Build links between journals, funders and well-constituted standardization efforts in the biosciences; e.g., BMC http://is.gd/WIMqz3 Expedite the production of an integrated standards- based framework for the biosciences Coming soon: IDs/DOIs for all items Domain-specific views of standards — feedback required: http://is.gd/biosharing_feedback (@ISMB 2011: http://is.gd/biosharing_ISMB_2011)
MIBBI and BioSharing: Proposals to PSI BioSharing Provide/maintain up-to-date information (content) Offer feedback on the site’s functionality as it matures MIBBI: three options 1.Maintain status quo: MIBBI (and BioSharing) scrape information —Passive participation only; no real impact (or additional benefit) —Draw on MIBBI for description of sample and study context only 2.Use the MIBBI Portal as the source for the most current MIAPE (+?) —MIBBI XML can be transformed into several output types —MIBBI and BioSharing sites increasingly visible to users 3.Participate in the MIBBI Foundry activity (as well as the Portal) —Maintain ‘independent’ MIAPE documents (Portal), but... —Take (joint) ownership of the appropriate Foundry modules —Use the Foundry to re-engineer MIAPE+ where necessary —Show support for integrated cross-domain reporting
Acknowledgements MIBBI Chris Taylor (EBI, NEBC), Susanna-Assunta Sansone (U. Oxford), Dawn Field (NEBC), contributions from participants in MIBBI-registered projects. BioSharing Susanna Sansone (U. Oxford), Dawn Field (NEBC), Philippe Rocca-Serra (U. Oxford) Annapaola Santarsiero (Mario Negri Institute; U. Oxford), Eamonn Maguire (U. Oxford), Chris Taylor (EBI, NEBC), contributions from numerous communities and individuals. ISA Infrastructure Susanna-Assunta Sansone, Philippe Rocca-Serra, Eamonn Maguire (U. Oxford); Chris Taylor, Marco Brandizi, Gabriella Rustici, Nataliya Sklyar, Manon Delahaye, Richard Evan (EBI) ; Kimberly Begley, Dorothy Reilly, Oliver Hofmann, Winston Hide (Harvard School of Public Health); Hong Fang, Joshua Xu, Martin Jackson, Jie Zhang, Stephen Harris, Weida Tong (FDA Center for Bioinformatics); Tim Booth, Bela Tiwari, Norman Morrison, Dawn Field (NEBC); Steffen Neumann (Leibniz Institute of Plant Biochemistry); Peter Sterk, Jack Gilbert, Folker Meyer, Linda Amaral-Zettler, Dawn Field (GSC); Alain Zasadzinski, Marie- Christine Jacquemot, Florian Mazur, Damien Fleury, Yahia Berchi, Morad Mercheref, Claude Niederlander, Magali Roux (CNRS Institute of Biological Sciences); Audrey Kauffman (Bergonie Cancer Institute); Miroslaw Dylag (Mentor Software Ltd.). Funding NEBC, NERC, BBSRC.
The objections to fuller reporting Why should I dedicate resources to providing data to others? —Pro bono arguments have no impact (altruism is a myth) —Sticks wielded by funders and publishers get the bare minimum —No traceability in most contexts (intellectual property = ?) —Loss of competitive advantage (both direct and indirect) This is just a ‘make work’ scheme for bioinformaticians —Bioinformaticians get a buzz out of having big databases —Parasites benefitting from others’ work ( mutualism..?) I don’t trust anyone else’s data — I’d rather repeat work —Problems of quality, which are justified to an extent —But what of people lacking resources or specific expertise? How on earth am I supposed to do this anyway..? —Perception that there is no money to pay for this —No mature free tools — Excel sheets are no good for HT —Worries about vendor support, legacy systems (business models)
Credit where credit’s due Data sharing is more or less a given now, and tools are emerging —Lots of sticks, but they only get the bare minimum —How to get the best out of data generators? —Need standards- and user-friendly tools, and meaningful credit Central registries of data sets that can record deposit and reuse —Well-presented, detailed papers get cited more frequently —The same principle should apply to data sets (metadata, etc.) —ORCIDs for people (orcid.org), DOIs for data (datacite.org) Side-benefits, challenges —Would also clear up problems around paper authorship —Would enable other kinds of credit (training, curation, etc.) —Community policing — researchers ‘own’ their credit portfolio (enforcement body useful, but most likely to be reviewers) —Problem of ‘micro data sets’ and legacy data