Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Nadia Ranaldo - Eugenio Zimeo Department of Engineering University of Sannio – Benevento – Italy 2008 ProActive and GCM User Group Orchestrating.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Search Engines and Information Retrieval
A FRAMEWORK BASED ON WEB SERVICES ORCHESTRATION FOR BIOINFORMATICS WORKFLOW MANAGEMENT Laboratory for Bioinformatics (LBI), Institute of Computing (IC)
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.
An Intelligent Broker Approach to Semantics-based Service Composition Yufeng Zhang National Lab. for Parallel and Distributed Processing Department of.
Component-based Authoring of Complex, Petri net-based Digital Library Infrastructure Yung Ah Park, Unmil P. Karadkar, and Richard Furuta Department of.
Programming by Example using Least General Generalizations Mohammad Raza, Sumit Gulwani & Natasa Milic-Frayling Microsoft Research.
Evgeny Zolin, School of Computer Science, University of Manchester, UK, Andrey Bovykin, Department of Computer Science, University.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
RDA Wheat Data Interoperability Working Group Outcomes RDA Outputs P5 9 th March 2015, San Diego.
Semantic Interoperability Jérôme Euzenat INRIA & LIG France Natasha Noy Stanford University USA.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
June Amsterdam A Workflow Bus for e-Science Applications Dr Zhiming Zhao Faculty of Science, University of Amsterdam VL-e SP 2.5.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Search Engines and Information Retrieval Chapter 1.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
Composing Models of Computation in Kepler/Ptolemy II
Taverna and my Grid Basic overview and Introduction Tom Oinn
Web Services Description Language CS409 Application Services Even Semester 2007.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Scientific Workflow Interchanging Through Patterns: Reversals and Lessons Learned Bruno Fernandes Bastos Regina Maria Maciel Braga Antônio Tadeu Azevedo.
Use of Hierarchical Keywords for Easy Data Management on HUBzero HUBbub Conference 2013 September 6 th, 2013 Gaurav Nanda, Jonathan Tan, Peter Auyeung,
Domain-Specific Languages for Composing Signature Discovery Workflows Ferosh Jacob*, Adam Wynne+, Yan Liu+, Nathan Baker+, and Jeff Gray* *Department of.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Semantic Web Services CS - 6V81 University of Texas at Dallas November.
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
Workshop on Software Product Archiving and Retrieving System Takeo KASUBUCHI Hiroshi IGAKI Hajimu IIDA Ken’ichi MATUMOTO Nara Institute of Science and.
Professor Carole Goble
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
® IBM Software Group © 2004 IBM Corporation Developing an SOA with RUP and UML 2.0 Giles Davies.
Khalid Belhajjame 1, Paolo Missier 2, and Carole A. Goble 1 1 University of Manchester 2 University of Newcastle Detecting Duplicate Records in Scientific.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Competency based learning & performance Ola Badersten.
Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation Bioinformatics, July 2003 P.W.Load,
Asymmetries in Retrieval of Gene Function Information Timothy B. Patrick, PhD 1, Lillian C. Folk, MS 2, Catherine K. Craven, MLS 3 1 Healthcare Administration.
Suggestions for Galaxy Workflow Design Using Semantically Annotated Services Alok Dhamanaskar, Michael E. Cotterell, Jessica C. Kissinger, and John Miller.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Compilation of XSLT into Dataflow Graphs for Web Service Composition Peter Kelly Paul Coddington Andrew Wendelborn.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
How to read a scientific paper Professor Mark Pallen Acknowledgements : John W. Little and Roy Parker, University of Arizona.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
VOA3R Virtual Open Access Agriculture & Aquaculture Repository: A platform for sharing scientific and scholarly research related to agriculture, aquaculture.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
What contribution can automated reasoning make to e-Science?
A Semantic Peer-to-Peer Overlay for Web Services Discovery
NIEM Tool Strategy Next Steps for Movement
Scientific Workflows Lecture 15
Presentation transcript:

Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK

Agenda Web services in science Workflow re-use Workflow discovery –Is workflow discovery a new problem? –How do people match up workflows? –Can we replicate the behaviour with tools? Conclusions

Workflows Web services BPEL, SCUFL, MOML, VDL … descriptions SOAP, WSDL description Workflow engineReadily invoked Orchestrates (Web-) services Can be published as Web service

Science is highly distributed and connected

The Web has revolutionised science

Web services about to do the same?

Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures featuring lots of specialised data Case study: bioinformatics –Understanding the DNA to behaviour link –3000 bio-services via the Taverna workflow editor –Re-use and repurposing of workflows –+/- 200 Taverna workflows shared at fffff

Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures Case study: bioinformatics –Understanding the DNA to life link –3000 bio-services via the Taverna workflow editor –Re-use and repurposing of workflow fragments –+/- 200 Taverna workflows shared at fffff

Manchester, CS dept Manchester Biology dept Newcastle, CS dept

Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures Case study: bioinformatics –Understanding the DNA to life link –3000 bio-services via the Taverna workflow editor –Re-use and repurposing of workflow fragments –+/- 200 Taverna workflows shared at

One + Three questions 1. Can’t we just do it with ? Keyword search doesn’t seem to cut it 1. Is workflow discovery a new problem? 2. How do people match up workflows? 3. Can we replicate the behaviour with tools?

my current workflow myExperiment.org

my current workflow myExperiment.org ?

1. Is workflow discovery a new problem? Service discoveryWorkflow discovery Discovery goalEncapsulate found service Edit found workflow Matching processMatch over signature Match over signature and content (data and service flow) Starting contextService or dataService or data or workflow Source: survey of 21 my Grid/Taverna users

1. Is workflow discovery a new problem? Yes Service discoveryWorkflow discovery Discovery goalEncapsulate found service Edit found workflow Matching processMatch over signature Match over signature and content (data and service flow) Starting contextService or dataService or data or workflow Workflow discovery subsumes service discovery

2. How do people match up workflows? ?

3. Can we replicate the behaviour with tools? ?

A user experiment with bioinformatics workflows ? +

Workflow discovery task Can I sensibly adapt an existing experimental procedure (workflow) with another one? Extend Replace + ?

Workflow corpus 66 similar workflows for Graves’ disease done by single author workflows Workflow diagram No documentation No annotation 1 + 5

By the experts, for the experts 9 bioinformaticians and 4 developers at a Taverna training day

Matching strategies Matching input workflow with 5 others ?

Human on-line matching strategies! Traits Scores of attraction Yes or no

Matching strategy: traits Men want..Women want.. Short term relationship Long term relationship SlimTall Students, artists, musicians, veterinarians Lawyers, financial execs, firemen BlondeHair or shaved Medium incomeHigh income From an analysis of profiles

Matching strategy: scoring Confidence level Score Percentile

Matching strategy: yes or no

Traits Predicted trait Biological subtask Biological supertask Shared inputs + outputs Same service type Shared service compositions Shared path between intermediary input and output

Traits and score Predicted trait Score of similarity, usefulness and confidence E.g. [1 Identical – 9 Not similar] Biological subtask Biological supertask Shared inputs + outputs Same service type Shared service compositions Shared path between intermediary input and output

The gold standard ? The collection of workflow similarity assessments Predictive traits, possibly interacting Traits/score

2. How do people match up workflows? Difficulty of task –Biological relationship very difficult for 6 out of 9 –Shape similarity difficult for 4 out of 13 –Medium confidence Consistency –Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test] Predictive traits –No one trait dominant between and within participants [Levene homogeneity of variance test]

Can we do better? Simpler tasks and workflows Taverna experienced users Workflow documentation and annotation Other factors in use, e.g. size difference –Fix allowed factors –Adopt black box approach: yes/no matching

Automated discovery technique Unattributed graph matcher implementation by Messmer and Bunke –Sub-isomorphism detection; exponential time complexity –DAGs and optimization for repository of graphs Workflows parsed as graphs –Workflow input, workflow output and intermediate services as nodes –Data links as edges probeSetid AffyMapper_seq databaseid Blastx Results_Blastx

Ranking based on –shared nodes –difference in size between input graph and repository graphs Automated discovery technique

3. Can we replicate the behaviour with tools? Kind of.. Average similarity assessments across participants ? Traits/score

Current work ? Yes/no Text clustering OWL workflow ontology Precision / recall Graph matching

Take home Scientists compose Web services for real – and share their results Workflow discovery is a real problem, which subsumes service discovery A range of matching strategies and techniques apply Evaluation is a challenge - gold standards hard to build Come and play at myExperiment.orgmyExperiment.org References at