Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa.

Similar presentations


Presentation on theme: "Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa."— Presentation transcript:

1 Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa

2 Agenda Web services in science Workflow re-use Workflow discovery –Is workflow discovery a new problem? –How do people match up workflows? –Can we replicate the behaviour with tools? Conclusions

3 Workflows Web services BPEL, SCUFL, MOML, VDL … descriptions SOAP, WSDL description Workflow engineReadily invoked Orchestrates (Web-) services Can be published as Web service

4 Science is highly distributed and connected

5 The Web has revolutionised science

6 Web services about to do the same?

7 Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures featuring lots of specialised data Case study: bioinformatics –Understanding the DNA to behaviour link –3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna http://mygrid.org.uk/taverna –Re-use and repurposing of workflows –+/- 200 Taverna workflows shared at fffff

8

9 Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures Case study: bioinformatics –Understanding the DNA to life link –3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna http://mygrid.org.uk/taverna –Re-use and repurposing of workflow fragments –+/- 200 Taverna workflows shared at fffff

10 Manchester, CS dept Manchester Biology dept Newcastle, CS dept

11 Scientific workflows e-science = supporting scientists to encode, enact, explain and share experimental procedures Case study: bioinformatics –Understanding the DNA to life link –3000 bio-services via the Taverna workflow editor http://mygrid.org.uk/taverna http://mygrid.org.uk/taverna –Re-use and repurposing of workflow fragments –+/- 200 Taverna workflows shared at www.myExperiment.org www.myExperiment.org

12

13 One + Three questions 1. Can’t we just do it with ? Keyword search doesn’t seem to cut it 1. Is workflow discovery a new problem? 2. How do people match up workflows? 3. Can we replicate the behaviour with tools?

14 my current workflow myExperiment.org

15 my current workflow myExperiment.org ?

16 1. Is workflow discovery a new problem? Service discoveryWorkflow discovery Discovery goalEncapsulate found service Edit found workflow Matching processMatch over signature Match over signature and content (data and service flow) Starting contextService or dataService or data or workflow Source: survey of 21 my Grid/Taverna users

17 1. Is workflow discovery a new problem? Yes Service discoveryWorkflow discovery Discovery goalEncapsulate found service Edit found workflow Matching processMatch over signature Match over signature and content (data and service flow) Starting contextService or dataService or data or workflow Workflow discovery subsumes service discovery

18 2. How do people match up workflows? ?

19 3. Can we replicate the behaviour with tools? ? + 1 2 3... 1 2 3

20 A user experiment with bioinformatics workflows ? +

21 Workflow discovery task Can I sensibly adapt an existing experimental procedure (workflow) with another one? Extend Replace + ?

22 Workflow corpus 66 similar workflows for Graves’ disease done by single author 1 + 5 workflows Workflow diagram No documentation No annotation 1 + 5

23 By the experts, for the experts 9 bioinformaticians and 4 developers at a Taverna training day

24 Matching strategies Matching input workflow with 5 others 1 2 3 4 5 ?

25 Human on-line matching strategies! Traits Scores of attraction Yes or no

26 Matching strategy: traits Men want..Women want.. Short term relationship Long term relationship SlimTall Students, artists, musicians, veterinarians Lawyers, financial execs, firemen BlondeHair or shaved Medium incomeHigh income From an analysis of 30 000 profiles

27 Matching strategy: scoring Confidence level Score Percentile www.AmIHotOrNot.com

28 Matching strategy: yes or no

29 Traits Predicted trait Biological subtask Biological supertask Shared inputs + outputs Same service type Shared service compositions Shared path between intermediary input and output

30 Traits and score Predicted trait Score of similarity, usefulness and confidence E.g. [1 Identical – 9 Not similar] Biological subtask Biological supertask Shared inputs + outputs Same service type Shared service compositions Shared path between intermediary input and output

31 The gold standard ? The collection of workflow similarity assessments Predictive traits, possibly interacting 1 + 5 Traits/score

32 2. How do people match up workflows? Difficulty of task –Biological relationship very difficult for 6 out of 9 –Shape similarity difficult for 4 out of 13 –Medium confidence Consistency –Inter participant disagreement on how to order biological similarity and shape similarity [Spearman rank order test] Predictive traits –No one trait dominant between and within participants [Levene homogeneity of variance test]

33 Can we do better? Simpler tasks and workflows Taverna experienced users Workflow documentation and annotation Other factors in use, e.g. size difference –Fix allowed factors –Adopt black box approach: yes/no matching

34 Automated discovery technique Unattributed graph matcher implementation by Messmer and Bunke –Sub-isomorphism detection; exponential time complexity –DAGs and optimization for repository of graphs Workflows parsed as graphs –Workflow input, workflow output and intermediate services as nodes –Data links as edges probeSetid AffyMapper_seq databaseid Blastx Results_Blastx

35 Ranking based on –shared nodes –difference in size between input graph and repository graphs Automated discovery technique

36 3. Can we replicate the behaviour with tools? Kind of.. Average similarity assessments across participants ? + 1 + 66 Traits/score

37 Current work ? + 1 2 3... 1 2 3 12 + 21 Yes/no Text clustering OWL workflow ontology Precision / recall Graph matching

38 Take home Scientists compose Web services for real – and share their results Workflow discovery is a real problem, which subsumes service discovery A range of matching strategies and techniques apply Evaluation is a challenge - gold standards hard to build Come and play at myExperiment.orgmyExperiment.org References at www.cs.man.ac.uk/~goderisawww.cs.man.ac.uk/~goderisa


Download ppt "Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK www.cs.man.ac.uk/~goderisa."

Similar presentations


Ads by Google