Presentation on theme: "ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester."— Presentation transcript:
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester
ISWC 2005, Galway Take home message New problem –Workflow reuse and repurposing is happening, how do we make it scale? Data: Survey of 6 e-Science middleware projects Requirements analysis: 7 bottlenecks –Creating a pool of process knowledge –Accessing this pool
ISWC 2005, Galway e-Science Support sharing and col-laboratories in science The world of distributed web services –A boom in services: e.g. 1800+ bio services in the my Grid project Pulled together as in silico experiments –Scientist-friendly workflow languages –Hard to build (>1 year!) –A boom in workflows? 100 workflows in my Grid, up to 50 services
ISWC 2005, Galway Evolving e-Science to a Web of Science? In silico experiments as commodities and know-how Share, reuse, repurpose – authoring time, quality and provenance collection Manchester, CS Manchester, Biology Newcastle, CS
ISWC 2005, Galway Scientists & developers 3 rd party annotation providers Scientists Discover existing work Edit workflow (repurposing actions) Try out workflow Register and annotate workflow and new services for reuse Deploy workflow Workflow by example Scientists & developers Maintain reuse/repurpose history Wroe, Goble, Goderis, Lord et al. Recycling workflows and services through discovery and reuse. CCPE 2005
ISWC 2005, Galway Analyze This x #scientists x #workflows x #versions x #runs
ISWC 2005, Galway Workflow Web service Describes process Different workflow languages: BPEL, Scufl etc. SOAP/WSDL interface Orchestration/choreography of Web and web services Participant in a workflow Executable with workflow enactor Executable Can be published as a web or Web service
ISWC 2005, Galway Workflow reuse Web service reuse Reuse of editable processesReuse of encapsulated processes Repurpose / build on other people’s work Incorporate other people’s work Hackable; change data/control flow Parametrisable operations Discovery based on data/control flow Discovery based on WSDL operations Measures of aggregated task similarity and flow similarity Measures of task similarity
ISWC 2005, Galway Repurposing, discovery and composition Discovery –The process of finding, ranking and selecting existing resources Composition –The process of combining resources into a new working assembly –(auto-) discovery + (auto-) integration Repurposing –Auto discovery + manual integration –Need techniques for composition-oriented discovery Discovery supporting integration through rankings
ISWC 2005, Galway A field report of six projects www.myGrid.org.uk –reuse by collaborators –personal reuse (versioning) www.kepler-project.org –10 complex workflows –reuse of distributed execution models www.inforsense.com –intranet exchanges within large pharmas www.geodise.org –150 Matlab functions, 10 scripts –reuse of function combinations
ISWC 2005, Galway A field report of six projects www.myGrid.org.uk –reuse by collaborators –personal reuse (versioning) www.kepler-project.org –10 complex workflows –reuse of distributed execution models www.inforsense.com –intranet exchanges within large pharmas www.geodise.org –150 Matlab functions, 10 scripts –reuse of function combinations No support for comparing workflows! No third party reuse!
ISWC 2005, Galway 7 bottlenecks to reuse & repurposing Service availability Workflow interoperability Workflow rigidity Discovery model Process KA IP rights Ranking We are here
ISWC 2005, Galway Step 1: Collect as many workflows as possible Ranking Service availability Workflow interoperability Workflow rigidity Discovery model Process KA IP rights
ISWC 2005, Galway Ranking Service availability Workflow interoperability Workflow rigidity Discovery model Process KA IP rights Step 2: Make this collection usable
ISWC 2005, Galway Ranking Service availability Workflow interoperability Workflow rigidity Discovery model Process KA IP rights e-Science community Semantic Web community? Wanted: technology providers
ISWC 2005, Galway The bottlenecks, in more detail 1.Service availability – web services: Kepler actors, my Grid processors, Inforsense services –Local services: Web enable, encode, repository 2.Intellectual property rights –Anonymization; journal policies 3.Workflow rigidity –Evolution and adaptation: parametrisation
ISWC 2005, Galway 4 The nice thing about workflow standards… Workflow languages abound Out of 6 projects, 5 do not use BPEL Behavioural semantics left implicit, as a feature Repurposing in case of multiple workflow systems –outside system boundaries –and across Benesh notationLaba notation
ISWC 2005, Galway Bring out the behavioural semantics –Comparing 3 projects through workflow patterns E.g. simple merge –Scientific workflows use functional programming patterns –How do these combine into different distributed execution models? –WSMO/SWSI/OWL-S? 4 The nice thing about workflow standards…
ISWC 2005, Galway How to retrieve existing scientific workflows? –Scientists & developers facing distributed programs For scientists? Data flow discovery, in jargon, largely abstracting from control ACAAGATGCCATTGT For developers? Control flow discovery, largely abstracting from data –Workflow patterns, Kepler distributed execution models Process networks, process algebra, Petri nets… 5 What belongs in the discovery model? = ? ?
ISWC 2005, Galway For scientists –WSMO Capability and OWL-S Profile clearly not intended for data flow-based queries –OWL DL: A-Box based workflow queries [Goderis+DL’05] For developers –Workflow patterns, Kepler distributed execution models Pattern example based retrieval An early table of combined execution models 5 What belongs in the discovery model?
ISWC 2005, Galway Who does the annotation? + + What should be in the annotation? –Workflow fragments Task aggregation/prediction “Service decomposition” –The things that went wrong! 6 New challenges in Knowledge Acquisition
ISWC 2005, Galway Who does the annotation? –Updated service ontology learning and automated service annotation techniques What should be in the annotation? –Workflow fragments “Service decomposition” –Cutting up service webs »Social network analysis (services as users!) –The things that went wrong Web site usability mining 6 New challenges in Knowledge Acquisition
ISWC 2005, Galway Repurposing measuring integration effort Ranking data flow (in jargon) Structural edit distance E.g. services to remove/add/replace to equal 2 workflows For OWL workflow ontology, need abduction or off-line processing Ranking control flow Relationship between control flow constructs 7 Ranking workflow relevance
ISWC 2005, Galway Take home message Problem: Workflow reuse and repurposing is happening, how do we make it scale Data: Survey of 6 e-Science middleware projects Requirements analysis: 7 bottlenecks –Creating a pool of process knowledge Workflow interoperability –Accessing this pool of knowledge Workflow discovery, KA and ranking
ISWC 2005, Galway Acknowledgements This work is supported by the UK e-Science programme EPSRC GR/ R67743. The authors would like to acknowledge the myGrid team. Hannah Tipney developed the Williams’ syndrome workflow and is supported by The Wellcome Foundation (G/R:1061183). We thank the survey interviewees for their contribution: Chris Wroe, Mark Greenwood and Peter Li ( my Grid), Ilkay Altintas (Kepler), Vasa Curcin (InforSense), Ian Wang (Triana), Colin Puleston (Geodise) and Ben Butchart (Sedna). Sean Bechhofer provided useful comments on the draft.