Presentation is loading. Please wait.

Presentation is loading. Please wait.

VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,

Similar presentations


Presentation on theme: "VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,"— Presentation transcript:

1 VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK http://www.mygrid.org.uk

2 VBI Web Services Workshop 26-27 May 2005 EPSRC funded UK eScience Program Pilot Project Thanks to the other members of the Taverna project, http://taverna.sf.nethttp://taverna.sf.net

3 VBI Web Services Workshop 26-27 May 2005 Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Steve Kemp, Liverpool, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

4 VBI Web Services Workshop 26-27 May 2005 Bioinformatics Services A typical HAD environment– Distributed, Autonomous and very, very Heterogeneous No standard API or calling mechanisms Complex types are often implicit – everything is String No domain typing – everything is String Numerous Services and growing Close the world – controlled, but constrained Open the world – uncontrolled, but versatile

5 VBI Web Services Workshop 26-27 May 2005 In silico Bioinformatics Bioinformatics experiments use 1, 2 up to N services chained together Ultimate result is the goal and some or all intermediates are part of the goal Intermediates are necessary for evidence gathering Often need to be repeated Often need to be re-purposed Workflows offer a suitable model for bioinformatics experiments

6 VBI Web Services Workshop 26-27 May 2005 Williams-Beuren Syndrome Contiguous sporadic gene deletion disorder 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis Haploinsufficiency of the region results in the phenotype Chr 7 ~155 Mb ~1.5 Mb 7q11. 23 ** WBS SVAS Patient deletions CTA-315H11 CTB-51J22 ‘Gap’ Physical Map

7 VBI Web Services Workshop 26-27 May 2005 1.Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

8 VBI Web Services Workshop 26-27 May 2005 Filling a genomic gap in silico Frequently repeated – info rapidly added to public databases Time consuming and mundane Don’t always get results Huge amount of interrelated data is produced – handled in notebooks and files saved to local hard drive Much knowledge remains undocumented: Bioinformatician does the analysis Advantages: Specialist human intervention at every step, quick and easy access to distributed services Disadvantages: Labour intensive, time consuming, highly repetitive and error prone process, tacit procedure so difficult to share both protocol and results

9 VBI Web Services Workshop 26-27 May 2005 The individual scientist doodling Workflows & distributed queries to link up your own and others resources Data intensive, up stream pipelines Reuse - sharing and adapting workflows & resources, and their outcomes Semantic descriptions for discovery, validation & linkage Whole experiment lifecycle, including logging provenance Middleware for data intensive in silico biology by bioinformaticians Discovering and reusing experiments and resources Managing lifecycle, provenance and results Sharing services & experiments Personalisatio n Forming experiments Executing & monitoring experiments

10 VBI Web Services Workshop 26-27 May 2005 An Open World Open source Open domain services and resources Open community Open application –Nothing specific to biology but oriented to Open model and open data –No prescribed typing or domain data model –A layered information model Open architecture –Service Oriented Architecture –Loosely coupled –Web services based –Assemble your own components –Designed to work together Taverna Freefluo Grimoire Registry Event Notification mIR Pedro Annotation Feta Discovery Info. Model Soaplab Gowlab BioNanny Mediator Portal LSIDs KAVE DQP

11 VBI Web Services Workshop 26-27 May 2005 Biologists BioinformaticiansService Providers Stakeholders

12 VBI Web Services Workshop 26-27 May 2005 Jam today Important for take up and community building. Take up leads to much better understanding. Energy of bioinformaticians and service providers Dealing with lots of legacy remote services Incorporating my bits and pieces Networking effects Added value with added effort Activation Energy Cost Benefit

13 VBI Web Services Workshop 26-27 May 2005 Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Taverna SeqHound Service Special processor http://taverna.sourceforge.net/

14 VBI Web Services Workshop 26-27 May 2005 Viewer plug-ins Service failure protocol Viewer plug-ins

15 VBI Web Services Workshop 26-27 May 2005 Life Science Identifiers Model Driven Approach OWL & RDFS Ontologies To annotate and classify entities with a common vocabulary based on a common understanding. RDF Knowledge Added Value to Experiment Information Repository and Common Information model for e-Science

16 VBI Web Services Workshop 26-27 May 2005 Williams-Beuren Workflows Characterisation of nucleotide sequence Identification of overlapping sequence Characterisation of protein sequence

17 VBI Web Services Workshop 26-27 May 2005 WBS Workflow Experience Correct and Biologically meaningful results: Found all expected results; plus unnoticed pseudo gene Automation: Saved time, increased productivity Sharing: Other people have used and want to develop the workflows, notably mouse and chicken

18 VBI Web Services Workshop 26-27 May 2005 Gene annotation pipelines Microarray analysis pipelines Find differentially expressed genes, e.g. NF-kappa beta inhibitor protein Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism Graves Disease

19 VBI Web Services Workshop 26-27 May 2005 Trypanosomiasis in cattle Chicken genome Mouse genome Reuse adapting and sharing best practice and know-how across a community Chris Wroe, Carole Goble, Antoon Goderis, Phillip Lord, Simon Miles, Juri Papay, Pinar Alper, Luc Moreau Recycling workflows and services through discovery and reuse Concurrency and Computation: Practice and Engineering accepted for publication

20 VBI Web Services Workshop 26-27 May 2005 Third- party tools Utopia HaystackLSID Launchpad my Grid information model Applications Core Services External Services Service & workflow discovery Feta semantic discovery GRIMOIRES registry Web portals Taverna e-Science workbench Workflow enactment Taverna- Freefluo workflow engine Metadata Management KAVE metadata store ProQA provenance manager my Grid ontology Soaplab Gowlab Termino Lexical mark-up Legacy applications Web Services OGSA-DAI databases Web Sites OGSA-DQP service e-Science coordination e-Science mediator e-Science process patterns e-Science events LSID support Data Management mIR my Grid information repository Web Service (Grid Service) communication fabric Notification service Pedro semantic publication Java applications Executable codes with an IDL Custom databases

21 VBI Web Services Workshop 26-27 May 2005 Taverna currently ships with access to over 1000 services But it wasn’t always the case! Lack of available services, at least at first A lot of activation energy needed that hopeful gets less as services get pooled Service partnerships and network effects If your service ain’t there, that’s an obstacle. First, catch your service

22 VBI Web Services Workshop 26-27 May 2005 Soaplab and Gowlab wrappers http://industry.ebi.ac.uk/soaplab/ WSDL scavenging Processor abstraction over stereotypical invocation patterns of service families Many services are not plain WSDL API consumer in Taverna 1.1 Service Bootstrapping

23 VBI Web Services Workshop 26-27 May 2005 Classes and Interfaces presented here User selects appropriate methods to be exposed within Taverna API Consumer Interface Interoperate existing APIs with SOAP services, SoapLab, BioMoby, SeqHound, caBIG, BioJava, etc. Refine complex APIs to sets of task centric functionality Take advantage of my Grid infrastructure: monitoring, result browsing, provenance etc. and applies it to your APIs Taverna 1.1 onwards, download API consumer and toolset at http://taverna.sf.nethttp://taverna.sf.net

24 VBI Web Services Workshop 26-27 May 2005 Import into Taverna Previously created API definition is imported – methods and constructors appear as components alongside other services.

25 VBI Web Services Workshop 26-27 May 2005 Invocation Heterogeneity WSDL - single Web Service operation described in a WSDL file. Local Java or Beanshell function Soaplab - CORBA-like stateful protocol of the Web Service operations Nested workflow - implemented by a Scufl workflow. BioMOBY processor. SeqHound - a Representational State Transfer style interface BioMart - directly accesses queries over a relational database. Styx - executes a workflow subgraph containing streamed services using P2P data transfer based on Styx Grid service protocol. BLAST createJob() setProgram() run() getResults() setDatabase() setE_value() blastQuery() IBM Life Sciences BLAST service SOAPLAB BLAST service Processors

26 VBI Web Services Workshop 26-27 May 2005 Freefluo Workflow enactor Scufl + Workflow Object Model Processor Plain Web Service Soap lab Processor Local App Processor Enactor Taverna Workbench Processor Bio MOBY Processor Seq Hound Processor Bio MART Three tiered abstraction Application data flow layer Scufl graph + service introspection Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Processor invocation layer Workflow Execution

27 VBI Web Services Workshop 26-27 May 2005 Architecture Confusagram Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat and Chris Wroe Taverna: Lessons in creating a workflow environment for the life sciences in Concurrency and Computation: Practice and Engineering in press

28 VBI Web Services Workshop 26-27 May 2005 Soaplab Service WSDL Web Service BioMOBY Service Local Java Service

29 VBI Web Services Workshop 26-27 May 2005 Workflows are not the only game Workflows OGSA-DQP Applications e-Science coordination e-Science mediator e-Science process patterns e-Science events Notification service Mediator Protein Phosphatases

30 VBI Web Services Workshop 26-27 May 2005 ? How to select among 1000+ services? Mostly inputs & outputs are “string” Domain specific descriptions of capabilities Selection is part of workflow assembly by bioinformaticians Selection of alternates for failure also generally user defined, and usually replicas, but need not be. So many services, so poorly described

31 VBI Web Services Workshop 26-27 May 2005 Semantic discovery Publish and find services (and workflows) with description using an ontology (in OWL/RDF) Define domain types for objects passed around and a set of dimensions with which service capabilities can be defined using processor abstraction Bootstrapping descriptions Mining and maintaining descriptions The Expert Annotator GRIMOIRE / WebDAV directory Tie into BioMOBY central http://phoebus.cs.man.ac.uk:8100/fet a-beta/mygrid/descriptions/http://phoebus.cs.man.ac.uk:8100/fet a-beta/mygrid/descriptions/ Phillip Lord, Pinar Alper, Chris Wroe, and Carole Goble Feta: A light-weight architecture for user oriented semantic service discovery in Proc of 2 nd European Semantic Web Conference, Crete, June 2005

32 VBI Web Services Workshop 26-27 May 2005 Web Interface Processor API Processor API Generic Schema for Service (part of Information model) Specific Application Ontology e.g. caCORE Semantic Web Services Layered model Wroe C, Goble CA, Greenwood M, Lord P, Miles S, Papay J, Payne T, Moreau L Automating Experiments Using Semantic Data on a Bioinformatics Grid in IEEE Intelligent Systems Jan/Feb 2004 We don’t describe WSDL, we describe operations and processors We are classifying for people not machines, so don’t be too clever!

33 VBI Web Services Workshop 26-27 May 2005 Operation name, description task method resource application Service name description author organisation Parameter name, description semantic type format transport type collection type collection format WSDL based Web service WSDL based operation Soaplab servicebioMoby serviceworkflow hasInput hasOutput Local Java code subclass

34 VBI Web Services Workshop 26-27 May 2005 Service hassles The workflow are only as good as the services they link together. Licensing models  Instability and unreliability BioNanny + QoS registry description Configurable fault tolerance and fail over strategies for graceful failure Few alternates and genuine replica services

35 VBI Web Services Workshop 26-27 May 2005 Type management: Shims Sequence i.e. last known 3000bp MaskBLAST Identify new sequences and determine their degree of identity Sequence database entry Fasta format sequence Genbank format sequence Alignment of full query sequence V full ‘new’ sequence Old BLAST result Simplify and Compare Lister Retrieve BLAST2 ‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’ The fiddly bits necessitated by not having a common type system or object model, or building elaborate wrappers Adding functionality to Web Services Shim libraries; Automatic deployment at workflow assembly Beanshell scripts for quick and dirty scripting

36 VBI Web Services Workshop 26-27 May 2005 Put the workflow together to duplicate how they did the linking without duplicating how they did the on-the-fly integration Post hoc analysis. Don’t analyse data piece by piece receive all data all at once Service interoperability but fragmented results Because integration needs smarter workflows and smart thinking about data types. Close the world with Shims or services and build domain objects. Smarter ways of visualising and linking intermediate results using provenance graphs Custom visualisation application Provenance Record Result Input Workflow Practices

37 VBI Web Services Workshop 26-27 May 2005 Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Custom Data Model Input Result Integrated results

38 VBI Web Services Workshop 26-27 May 2005 Integration and interoperation e-Science Semantics Configuration Invocation model Interface Data format Domain Semantics e-Science Semantics Configuration Invocation model Interface Data format Domain Semantics Syntax Provenance Annotation Service & Data Annotation App & Shim Services Information Model Information model is a container for domain semantics Linking stuff together is Integration Lite Data identityData Identity Ontologies Custom Data Objects Ontologies Custom Data Objects LSID Workflows Processors Shims

39 VBI Web Services Workshop 26-27 May 2005 Take Homes Our apps are providing real scientific results – or at least the hypotheses… The problem is not really gathering and coordinating services, but gathering and coordinating the results Are you interoperating or integrating Careful thought has to go into the abstractions we apply to services for finding them and running them Activation energy vs reusability of service: ROI and altruism We need more services, more replicas of services, better service interfaces and better reliability and stability Most of our services turn out not to be vanilla WSDL Light touch vs added value

40 VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK http://www.mygrid.org.uk


Download ppt "VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,"

Similar presentations


Ads by Google