A myGrid Project Tutorial

Slides:



Advertisements
Similar presentations
GRADD: Scientific Workflows. Scientific Workflow E. Science laboris Workflows are the new rock and roll of eScience Machinery for coordinating the execution.
Advertisements

IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
ISMB Demo; June 27, 2005 Integrating Text Mining into Bio-Informatics Workflows Neil Davis George Demetriou Robert Gaizauskas Yikun Guo Ian Roberts Henk.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
An integrative approach for attaching semantic annotations to service descriptions Luc Moreau, University of Southampton,UK.
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
1 Middleware for In silico Biology Phillip Lord
Migrating to the Semantic Web: Bioinformatics as a case study.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Taverna and my Grid Basic overview and Introduction Tom Oinn
High level Knowledge-based Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK myGrid project
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
GGF Summer School 24th July 2004, Italy Middleware for in silico Biology Professor Carole Goble University of Manchester
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.
The Grid as Future Scientific Infrastructure Ian Foster Argonne National Laboratory University of Chicago Globus Alliance
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA.
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
KAROLINSKA INSTITUTET International Biobank and Cohort Studies: Developing a Harmonious Approch February 7-8, 2005, Atlanta; GA Standards The P 3 G knowledge.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Workflow in Grid Systems Workshop Dave Berry, Research Manager UK National e-Science Centre GGF10, Mar 2004.
High level Grid Services for Bioinformaticans Carole Goble, University of Manchester, UK Robin McEntire, GSK.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
GGF11 Semantic Grid Applications Workshop, Hilton Hawaiian Village Beach Resort & Spa, Honolulu, Thursday June 10, 2004 Exploring Williams-Beuren Syndrome.
© Geodise Project, University of Southampton, Knowledge Management in Geodise Geodise Knowledge Management Team Barry Tao, Colin Puleston, Liming.
My Grid and Taverna: Now and in the Future Dr. K. Wolstencroft University of Manchester.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
BIOINFOGRID: Bioinformatics Grid Application for life science MILANESI, Luciano National Research Council Institute of.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Katy Wolstencroft University of Manchester
Provenance: Problem, Architectural issues, Towards Trust
Functional Annotation of the Horse Genome
Grid Systems: What do we need from web service standards?
Presentation transcript:

A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the myGrid team.

Open Source Upper Middleware for Bioinformatics (Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Sheffield Manchester Nottingham Hinxton Southampton

myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

Roadmap - start services data

Tenet I High level Middleware services for data intensive resource interoperation for Bioinformatics Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management

Tenet II High level services for e-Science experimental management; Provenance Event notification Personalisation Sharing knowledge and sharing components Scientific discovery is personal & global. Federated third party registries for workflows and services Workflow and service discovery for reuse and repurposing Find Annotate Registry Register

Tenet III Open Source and Open Services No control or influence over service providers Open to third party metadata and services Open extensible architecture Assemble your own components Designed to work together Toolkit Freefluo WfEE Taverna View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gateway & Portal LSID Haystack Provenance Browser Openness open source open world of services open extensible technology open to wider eScience context open to user feedback open to third party metadata Collection of components for assembly Pick and mix

Tenet IV (Web) Service architecture Metadata driven Publication, discovery, interoperation, composition, decommissioning of myGrid services WS-I -> OGSA / WSRF Metadata driven Ontologies Common information model Semantic Web technologies RDF, OWL

Tenet V Middleware for Tool Developers Bioinformaticians Service Providers Biologists are indirectly supported by the portals and apps these develop. In silico experiments are workflows (see other talks)

Roadmap services workflows workflows data discover services run workflows data data management

Data-intensive bioinformatics Bioinformatics analyses typically involve visiting many data resources and analytical tools ID MURA_BACSU STANDARD; PRT; 429 AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY). FT CONFLICT 374 374 S -> A (IN REF. 3). SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Use Scenarios Graves’ Disease Williams-Beuren Syndrome Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK

Manually filling a genomic gap Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis

WBS Workflows: Query nucleotide sequence RepeatMasker ncbiBlastWrapper Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns WBS Workflows: GenBank Accession No URL inc GB identifier Translation/sequence file. Good for records and publications prettyseq GenBank Entry Amino Acid translation Identifies PEST seq Sort for appropriate Sequences only epestfind 6 ORFs Identifies FingerPRINTS Seqret pscan MW, length, charge, pI, etc pepstats Nucleotide seq (Fasta) sixpack Predicts Coiled-coil regions ORFs transeq pepcoil RepeatMasker tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr Coding sequence GenScan ncbiBlastWrapper Restriction enzyme map SignalP TargetP PSORTII restrict Predicts cellular location CpG Island locations and % cpgreport Identifies functional and structural domains/motifs InterPro PFAM Prosite Smart RepeatMasker Repetative elements Hydrophobic regions Pepwindow? Octanol? ncbiBlastWrapper Blastn Vs nr, est databases.

Graves’ Disease Bioinformatics Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003) 1School of Computing Science and 2Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene pool Graves’ Disease Bioinformatics Annotation Pipeline Genotype Assay Design System 3D Protein Structure What is known about my candidate gene? Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Medline Gene ID PDB Query PDB & display protein structure Primer Design EMBL GO Emboss Eprimer application in SoapLab Use primers designed by myGrid to amplify region flanking SNP on the gene Obtain information about protein & extract information about active site Swiss-Prot AMBIT Interpro SNP Query Restriction Fragment Length Polymorphism experiment OMIM BLAST AMBIT Determine whether coding SNP affects the active site of the protein Selection of restriction enzyme Emboss Restrict in SoapLab Talisman DQP SN SNP P SN P

Experiment life cycle Forming experiments Personalisation Discovering and reusing experiments and resources Executing and monitoring experiments Sharing services & experiments Managing lifecycle, provenance and results of experiments

(e-)Scientists… …Experiment …Analyze …Collaborate …Publish …Review Can workflow be used as an experimental method? How many times has this experiment been run? …Analyze How do we manage the results to draw conclusions from them? How reliable are these results? …Collaborate Can we share workflows, results, metadata etc? …Publish Can we link to these workflows and results from our papers? …Review Can I find, comprehend and review your work? How was that result derived?

Collections of Tasks Building Domain Tasks Workflow Enactment Storage Service Providers Enactment Bioinformaticians Storage Scientists Description Provenance Service Discovery Data Management Finding Querying Annotation providers

Registry Bioinformaticians Taverna WF Builder Discovery View FreeFluo Querying/sharing/ federating/registering Query & Retrieve Workflow Execution Annotation/description Discovery View FreeFluo Enactor invoking Annotation providers Interface Description Store data/ knowledge Pedro Annotation tool mIR Others Service Providers WSDL Soap- lab Vocabulary Haystack Provenance Browser Ontology Store Data descriptions Scientists

myGrid Service Stack Bioinformaticians Tool Providers Service Providers Work bench Taverna Web Portal Talisman Applications Gateway Personalisation Registries Service and Workflow Discovery Provenance Event Notification Ontologies Ontology Mgt Views Metadata Mgt myGrid Information Repository Core services FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor services that are the tools that will constitute the experiments, that is: specialised services such as AMBIT text extraction [6], and external third party services such databases, computational analyses, simulations etc, wrapped as web services possibly by SoapLab [2]; services for forming and executing experiments, that is: workflow management services [3], information management services, distributed database query processing [5]; semantic services for discovering services and workflows, and managing metadata, such as: third party service registries and federated personalised views over those registries [8], ontologies and ontology management [13]; services for supporting the e-Science scientific method and best practice found at the bench but often neglected at the workstation, specifically: provenance management [7] and change notification [9].Multiple ways of cutting the architecture. All the services are web services. Some are grid services (miR andn the DQP) and most will be migrated to rid services (WFEE and event notification) Bioservices and external services themselves. Web services currently and will become grid services esp. those that are stateful through soaplab. Some bio services are rendered as ws via soaplab others are not (i.e. those from newcastle). AMBIt is external web service Top layer. Apps Workbench, Taverna, Portland Talisman. They use the middle layers through a gateway (fib) which has an overarching view of the myGrid information model. the User Proxy and Gateway are logically parts of the same thing (i.e. the persistent bit that stands between a paricular user client such as an instance of the workbench and the rest of myGrid), and the User Proxy is still used to support notitications from the workbench and user service selection during workflows (not yet supported in workbench UI) The Gateway per se may not be used by the current version of the workbench since it has yet to be extended to support the newer mIR and enactor designs; i guess that this upgrade will happen between now and all hands. The details of abstract job invocation are currently up in the air since we had our own proprietary job/workflow description schema in the past, but are now trying to unify this with Chris W's emerging stuff and possible scufl extension and broader use of xml schema Middle payers are the core mygrid services. 5 clusters Workflow enactment engine 2. DQP mIR The e-Science support services: Personalisation, provenance and event notification Semantics: service/workflow discovery, ontology mgt, metadata mgt + registries and ontologies. Registries and ontologies could be external Bottom layer – interest to service providers Middle layer interest to tool providers Top layer interest to bioinformaticans Who build apps for biologists Web Service (Grid Service) communication fabric External services SoapLab GowLab Native Web Services AMBIT Text Extraction Service Legacy apps Legacy apps

Two+ Paths Innovative work Core functionality Service and workflow registration Semantic discovery Provenance management Text mining Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Concentric circles? Reflects on the state of play In between Event notification Gateway

myGrid Service Stack Bioinformaticians Tool Providers Service Providers Work bench Taverna Web Portal Talisman Applications Gateway Personalisation Registries Service and Workflow Discovery Provenance Event Notification Ontologies Ontology Mgt Views Metadata Mgt myGrid Information Repository Core services FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor services that are the tools that will constitute the experiments, that is: specialised services such as AMBIT text extraction [6], and external third party services such databases, computational analyses, simulations etc, wrapped as web services possibly by SoapLab [2]; services for forming and executing experiments, that is: workflow management services [3], information management services, distributed database query processing [5]; semantic services for discovering services and workflows, and managing metadata, such as: third party service registries and federated personalised views over those registries [8], ontologies and ontology management [13]; services for supporting the e-Science scientific method and best practice found at the bench but often neglected at the workstation, specifically: provenance management [7] and change notification [9].Multiple ways of cutting the architecture. All the services are web services. Some are grid services (miR andn the DQP) and most will be migrated to rid services (WFEE and event notification) Bioservices and external services themselves. Web services currently and will become grid services esp. those that are stateful through soaplab. Some bio services are rendered as ws via soaplab others are not (i.e. those from newcastle). AMBIt is external web service Top layer. Apps Workbench, Taverna, Portland Talisman. They use the middle layers through a gateway (fib) which has an overarching view of the myGrid information model. the User Proxy and Gateway are logically parts of the same thing (i.e. the persistent bit that stands between a paricular user client such as an instance of the workbench and the rest of myGrid), and the User Proxy is still used to support notitications from the workbench and user service selection during workflows (not yet supported in workbench UI) The Gateway per se may not be used by the current version of the workbench since it has yet to be extended to support the newer mIR and enactor designs; i guess that this upgrade will happen between now and all hands. The details of abstract job invocation are currently up in the air since we had our own proprietary job/workflow description schema in the past, but are now trying to unify this with Chris W's emerging stuff and possible scufl extension and broader use of xml schema Middle payers are the core mygrid services. 5 clusters Workflow enactment engine 2. DQP mIR The e-Science support services: Personalisation, provenance and event notification Semantics: service/workflow discovery, ontology mgt, metadata mgt + registries and ontologies. Registries and ontologies could be external Bottom layer – interest to service providers Middle layer interest to tool providers Top layer interest to bioinformaticans Who build apps for biologists Web Service (Grid Service) communication fabric External services SoapLab GowLab Native Web Services AMBIT Text Extraction Service Legacy apps Legacy apps

Its turned into a workbench. Find services dynamically.

Viewing intermediate results Run the Workflow Viewing intermediate results

Run the Workflow

Drilling Down: myGrid and Semantics Workflow and service discovery Prior to and during enactment Semantic registration Workflow assembly Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL Instance store, ontology server, reasoner Materialised vs at point of delivery reasoning. myGrid Information Model

Semantic Discovery Pedro data capture tool Drag a workflow entry into the explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion into the workflow View annotations on workflow

Tutorial focus Innovative work Core functionality Service and workflow registration Semantic discovery Provenance management Text mining Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Concentric circles? Reflects on the state of play In between Event notification Gateway

Roadmap services workflows workflows data Registry Taverna workbench 1. Describe services Registry workflows 2. Discover services workflows 3. Write & run workflows Taverna workbench LSID authorities data 4. Provenance & data management

Sessions on Details Workflows - hands on with Taverna Semantics Timetable – split sessions Session 1 Group 1 – hands on (Swanson) Group 2 – semantics (Newhaven) Teabreak (short) Session 2 Group 1 – semantics (Newhaven) Group 2 –hands on (Swanson) Discussions and Conclusions

Questions? http://www.mygrid.org.uk http://taverna.sf.net http://freefluo.sf.net/