Presentation is loading. Please wait.

Presentation is loading. Please wait.

David De Roure Creating Research Objects that contain collections of data, papers and research workflows.

Similar presentations


Presentation on theme: "David De Roure Creating Research Objects that contain collections of data, papers and research workflows."— Presentation transcript:

1 David De Roure Creating Research Objects that contain collections of data, papers and research workflows

2 “Web as carrier pigeon”

3

4 BioEssays,, 26(1):99–105, January 2004 http://research.microsoft.com/en-us/collaboration/fourthparadigm/

5 Social Complexity Compute & Data Complexity

6 http://force11.org/

7 1.The myExperiment experiment 2.Workflow Forever 3.Science fiction about science facts Outline

8 Data Analysis Pipelines Workflows are the new rock and roll Machinery for coordinating the execution of services and linking together resources Repetitive and mundane boring stuff made easier E. Science laboris

9 Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle Paul meets Jo. Jo is investigating Whipworm in mouse. Jo reuses one of Paul’s workflow without change. Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. Previously a manual two year study by Jo had failed to do this. Reuse, Recycling, Repurposing

10 Kepler Triana BPEL Taverna Trident Meandre Galaxy

11

12 mySpace for scientists!Facebook Not too open! too passé!

13  “Facebook for Scientists”...but different to Facebook!  A repository of research methods  A community social network of people and things  A Social Virtual Research Environment  A probe into researcher behaviour  Open source (BSD) Ruby on Rails app  REST and SPARQL interfaces, supports Linked Data  Influenced BioCatalogue, MethodBox and SysMO-SEEK myExperiment currently has 309 groups, 2553 workflows, 651 files and 264 packs - see wiki.myexperiment.org

14 http://www.myexperiment.org/

15 Results Logs Results Metadata Paper Slides Feeds into produces Included in produces Published in produces Included in Published in Workflow 16 Workflow 13 Common pathways QTL Paul’s Pack Paul’s Research Object

16 data method

17 SELECT?pack ?contrib WHERE { ?pack rdf:type mepack:Pack. ?pack ore:aggregates ?contrib. } SELECT?pack ?contrib WHERE { ?pack rdf:type mepack:Pack. ?pack ore:aggregates ?contrib. } SELECT?wf ?uri WHERE { ?wf mebase:has-current-version ?v. ?v mecomp:executes-dataflow ?d. ?d mecomp:has-component ?c. ?c rdf:type mecomp:WSDLProcessor. ?c mecomp:processor-uri ?uri. } SELECT?wf ?uri WHERE { ?wf mebase:has-current-version ?v. ?v mecomp:executes-dataflow ?d. ?d mecomp:has-component ?c. ?c rdf:type mecomp:WSDLProcessor. ?c mecomp:processor-uri ?uri. }

18 Workflow – pack contains a number of workflows Presentation - encapsulation of a single presentation Collection - a number of things (workflows/presentations/pa pers) Heterogeneous - where the workflows do not appear to have a clear common purpose Homogeneous - workflows appear to be designed to work together Paper - source for a paper Tutorial - tutorial material Data - collection of data files Derived data - results of workflow Benchmark - benchmarking data Supplementary - stuff associated with a paper Noise - tests, tryouts, rubbish Oddity - none of the above Analysis by Sean Bechhofer Pack analysis

19 Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. The R dimensions Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.scilogs.com/eresearch/http://blogs.scilogs.com/eresearch/

20 http://wf4ever.github.com/ro-primer/

21 http://www.wf4ever-project.org/

22

23 20 items in this RO, including 3 big workflows and a small pack Research Object: Last execution:Stability: Decay:Annotations: Aggregated resources (20) Evolution. Reused by 4 users Cited by 3 users Liked by 13 users Simple status indicators Abstract (250 chars max.) Key resources inside Popularity Title and basic facts Users’ opinion Collapsed tabs Resources diagram

24 Q. Are we locking into the paper process? Publish then filter – put everything out there, then see what sticks Web-Particle duality – versioning, conservation, preservation

25 Machine repeat Machine repeat REPRODUCE Machine software paper Research Record software Software REPRODUCE OR REPEAT? software workflow paper Software wf Machine software workflow software blogs.scilogs.com/eresearch/

26 openresearchsoftware.metajnl.com www.scfbm.org

27 The Executable Thesis new data new results executable thesis PhD Student

28 Notifications and automatic re-runs Machines are users too Autonomic Curation Self-repair New research? New computer science?

29 Luna De Ferrari

30 Knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds Knowledge Infrastructures Rethinking knowledge now that the facts aren't the facts, experts are everywhere, and the smartest person in the room is the room

31 Automation versus assistance – Letting humans get on with what they’re best at Role of narrative and visualisation – The last mile to the brain Data quality and uncertainty – Data wrangling is significant task today – Provenance, peer-to-peer review? Responsible Innovation – Who owns the intellectual property? – Who is responsible for damage? Enabling or preventing a paradigm shift? – Encoding a research paradigm in the infrastructure? Discussion

32 david.deroure@oerc.ox.ac.uk www.oerc.ox.ac.uk/people/dder blogs.scilogs.com/eresearch @dder http://www.myexperiment.org/packs/329

33 myExperiment project wiki http://wiki.myexperiment.org/ http://wiki.myexperiment.org/ Workflow Forever project (Wf4Ever) http://www.wf4ever-project.org/ http://www.wf4ever-project.org/ Future of Research Communication (FORCE11) http://force11.org/ http://force11.org/ Fourth Paradigm http://research.microsoft.com/en- us/collaboration/fourthparadigm/ http://research.microsoft.com/en- us/collaboration/fourthparadigm/ Links

34 Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame, Graham Klyne, Esteban Garcia- Cuestay, Aleix Garridoy, Kristina Hettne, Marco Roos, David De Roure, Carole Goble, "Why Workflows Break - Understanding and Combating Decay in Taverna Workflows", accepted for eScience 2012, Chicago, October 2012 Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo Missier, David Newman, Raul Palma, Sean Bechhofer, Esteban Garc Cuesta, Jose Manuel Gomez-Perez, Graham Klyne, Kevin Page, Marco, Roos, Jose Enrique Ruiz, Stian Soiland-Reyes, Lourdes Verdes- Montenegro, David De Roure and Carole A. Goble, "Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse", SePublica2012 at ESWC2012, Greece, May 2012. Carole A. Goble, David De Roure and Sean Bechhofer, "Accelerating scientists’ knowledge turns". In press for publication in Lecture Notes in Computer Science. Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, Danius Michaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, “Why linked data is not enough for scientists”, Future Generation Computer Systems De Roure, D., Goble, C. and Stevens, R. (2009) The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems 25, pp. 561-567. doi:10.1016/j.future.2008.06.010 Goble, C.A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., and De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows, Nucl. Acids Res., 2010


Download ppt "David De Roure Creating Research Objects that contain collections of data, papers and research workflows."

Similar presentations


Ads by Google