Presentation is loading. Please wait.

Presentation is loading. Please wait. Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University.

Similar presentations

Presentation on theme: " Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University."— Presentation transcript:

1 Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University of Chicago Argonne National Laboratory

2 2 About me Research Fellow at the Computation Institute, University of Chicago Lead architect for Workflow technologies in the caBIG project Workflow Working Group Chair and a key person in the BIRN project Interested in Informatics, Applications of High throughput data transfer, computing in Biomedical informatics

3 3 And..

4 4 Agenda Introduction to Service Oriented Science (SoS) Introduction to caBIG as an example of SoS Introduce caGrid as an enabler of SoS vision Introduce Workflow concepts Talk about our implementation using Taverna Show a few Taverna workflows including the AutoQRS workflow from CVRG Lessons learned and future directions.

5 5 Service-Oriented Science People create services (data, code, instr.) … which I discover (& decide whether to use) … & compose to create a new function... & then publish as a new service. I find someone else to host services, so I dont have to become an expert in operating services & computers! I hope that this someone else can manage security, reliability, scalability, … !! Service-Oriented Science, Science, 2005

6 6 caBIG Goal and Vision caBIG is a virtual web of interconnected data, individuals and organizations that redefines how research is conducted, care is provided, and patients/participants interact with the biomedical enterprise. Connect the cancer research community through a shareable, interoperable infrastructure Deploy and extend standard rules and a common language to more easily share information Build or adapt tools for collecting, analyzing, integrating and disseminating information associated with cancer research and care

7 7 caGrid caBIG function dimensions Clinical Data and Trials Management Biospecimen Management In Vivo Imaging Molecular Characterization

8 8 What is caGrid? Biomedical applications that share data all have common needs for syntactic and semantic interoperability caGrid is a software toolkit aimed at software developers creating Grid applications

9 9 caGrid provides Metadata services that add semantic information to all Grid services The GAARDS toolkit, a standard security platform Introduce: the Eclipse for services development Index Service: A service registry for advertisement and discovery of capabilities

10 10 caGrid: nuts and bolts

11 11 A scientific workflow precisely defines a multi-step procedure, to seamlessly integrate and streamline local and remote heterogeneous computational and data resources to perform in silico scientific exploration.

12 Workflow Requirements 12 Service discovery Data access Service interaction Security enforcement Knowledge sharing

13 13 caGrid data instruments computation resource Virtualization Security Connectivity Overview of caGrid Workflow Discovery Composition Orchestration Analysis Community reuse generate Workflow as consumer -Easily reuse services for complex experiments. -Workflow as contributor -Workflow as best practice wrapped as services. -Workflow providing RoI for SOA

14 14 caGrid Workflow Suite Service discovery Data access Service interaction Security enforcement Knowledge sharing

15 15 The caBIG Workflow System caGrid Discovery composition Execution Reuse Community reuse generate Service discovery based on cancer research metadata. Data-flow modeling flavor caGrid activity State management (WSRF) Security (GSI) Implicit iteration: handle parallel execution WSRF and GSI enforcement A Facebook for caGrid workflows Workflow Execution. Service Workflows in caGrid Portal

16 Semantic Service Discovery Semantic search – searches Index Service for registered caGrid services matching various search criteria: – Service name, inputs, outputs, research center, class names, concept codes, etc.

17 17 Service metadata Types of query -String based. -Property based. -Semantic based. Semantic Service Discovery

18 18 caBIG services palette As a result of semantic search or direct adding – caBIG services appear in Tavernas Service Panel – Ready to be drag and dropped into caGrid workflows

19 19 Data access: CQL Builder

20 20 Service interaction: managing state

21 21 Security enforcement Authentication – Ability to invoke services secured by Grid Security Infrastructure (GSI) – Integrated caGrid Security framework (GAARDS) with Tavernas Credential manager – Transport Level Security Authorization – This is done on the service side upon looking at Users credentials Credential Delegation Service Integration

22 22 Secure Grid services Taverna can invoke secure Grid services that require user to log in to caGrid Taverna interacts with caGrids GAARDS infrastructure to obtain users proxy: – Authenticate the user with users affiliated Authentication Service – Obtain users proxy from Dorian Service – Default proxy lifetime: 12 hours

23 23 Using secure caGrid services Involves: 1. Discovering a secure caGrid service from Taverna 2. Logging onto selected caGrid to obtain a proxy certificate 3. Saving and managing caGrid proxies and username and passwords

24 24 Configuring secure services (1/2) Authentication Service and Dorian Service urls required in order to obtain users proxy Can be configured globally for all services from the same caGrid (in preferences) Can be configured individually for a particular caGrid service (overrides configuration from preferences)

25 25 Configuring secure services (2/2) View secures service details Configure services security properties

26 26 Logging onto caGrid User is prompted for his caGrid username and password when any secure service is invoked from a workflow for the first time

27 27 Credential management Taverna obtains proxy for user from Dorian Service using users caGrid username and password Proxies are saved and managed by Credential Manager caGrid username and password can also be remembered

28 28 Workflow execution service Taverna Workflow Service wraps the Taverna execution engine into a WS- Resource and exposes operations such as createResource, startWorkflow, getStatus, and getOutput for user submitted workflows. startWorkflow createResource getStatus getOutput Workflow Service Stateful Resources (Resource Properties) Stateful Resources (Resource Properties) EPR Taverna Engine Data Services Data Services Analytical Services Analytical Services caGrid & Other Services Client API Taverna Workbench Workflow Portlet

29 29 Workflow execution service Taverna Workflow Service Provides stateful resources that execute the workflows. Supports caGrid security architecture (GSI Security). Allows programmatic submission of workflows.

30 30 Access Taverna workflow via caGrid portal Taverna Workflow Portlet is deployed in the caGrid Portal on the training Grid: URL : The Portlet currently lists a few workflows with their descriptions that can be browsed from the above URL Users can select a workflow they are interested in running. View : 1

31 31 Access Taverna workflow via caGrid portal URL : Based on the number of input ports in the workflow, the portlet prompts the users to enter the input values in the textbox. For example, the Lymphoma workflow takes only one input in the form an Experiment ID that identifies the experiment that caArray uses for data collection. Hit submit after the entering the data. View : 2

32 32 Access Taverna workflow via caGrid portal URL : The portlet stores the user submitted workflows in the current session of the portal. Users can View all the Active and Completed Workflows in the session. Clicking the Output Button shows the output of the workflow. The portlet provides workflow specific view-resolvers to render the outputs. For E.g: Lymphoma workflow currently displays the output in a html table. Views : 3, 4, & 5

33 33 Search cabig in myExperiment or Type rkflows&query=cabig rkflows&query=cabig Type Knowledge Sharing

34 Discovery using myExperiment 34

35 MicroArray from tumor tissue Microarray preProcessing Lymphoma prediction Lymphoma Prediction Workflow

36 Lymphoma type prediction Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT)

37 AutoQRS Analysis Workflow WFDB binary and Patient ID WFDBdata service AutoQRS Output Data Service AutoQRS Analytical Service Retrieve WFDB Patient Record JSDL service Invoke Processing Analysis Execution Record AutoQRS XML Results Store WFDB

38 38 The Taverna workflow

39 39 The result in MS Excel

40 40 Accomplishments Lymphoma workflow – Among the top 20 most viewed/downloaded Workflows in myExperiment – This is more impressive given that this workflow was uploaded much later than the other workflows Our BMC-Bioinformatics Article on caGrid Workflow Toolkit: A Taverna based workflow tool for cancer Grid achieved Highly Accessed relative to its age We are part of the CVRG Project that recently got renewed

41 41 Lessons Learned Lower the barriers to entry for sharing data and analytics Software is surprisingly hard to use for end users – more so if the benefit is not all too clear Return on Investment of a SOA is in creating reusable workflows (LEGO blocks) Workflows are only as good as the services we create Traditional SDLC does not always work in the favor of the end users and KISS

42 42 Goals of Workflow Project in CVRG Deploy existing technology on the CVRG that can be used to store and execute workflows generated locally using the Taverna workbench Develop new technology that allows non-expert users to graphically compose and execute workflows via a web- interface. Extend the Taverna Engine and add support to invocation of REST-style services so that users can annotate workflow inputs and outputs using ontology terms from NCBO Bioportal and other ontology repositories Develop specifications describing how workflows should be designed, validated, and documented, and support user development of workflows. Extend the technology so that workflows can be executed in a cloud-computing environment

43 43 Suggested Direction Hosted Workflow Solution– SaaS workflow tools Globus Online Galaxy

44 44 Acknowledgements Univ. Chicago / ANL – Ian Foster – Dinanath Sulakhe – Bo Liu Univ. Manchester, UK – Carole Goble – Stian Soiland-Reyes – Alexandra Nenadic Inventrio – Shannon Hastings – Stephen Langella – Scott Oster Other colleagues from Ohio State University, National Cancer Institute, JHU …

45 45 Journal papers & book chapters Composition as a Service. IEEE Internet Computing A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid. CCPE Data-driven Service Composition in Building SOA Solutions: A Petri Net Approach. IEEE T-ASE, 2010 Scientific workflows that enable Web-scale collaboration: combining the power of Taverna and caGrid. IEEE Internet Computing Workflow in a Service Oriented Cyberinfrastructure Environment. in: Junwei Cao (Ed.). Cyberinfrastructure Technologies and Applications. Nova Science Publishers, (book chapter)

46 46 Conference papers Scientific workflows as services in caGrid: a Taverna and gRAVI approach. ICWS 2009 Wrap Scientific Applications as WSRF Grid Services using gRAVI. ICWS 2009 Orchestrating caGrid Services in Taverna. ICWS 2008 Building Scientific Workflow with Taverna and BPEL: a Comparative Study in caGrid. WESOA 2008 Build Grid Enabled Scientific Workflows using gRAVI and Taverna. SWBES 2008

47 47 Contact information Ravi Madduri – Computation Institute, Univ. Chicago –

Download ppt " Creating and Sharing Re-usable Workflows in Cardiovascular Research: Lessons learned using Taverna Ravi Madduri University."

Similar presentations

Ads by Google