Presentation is loading. Please wait.

Presentation is loading. Please wait.

A myGrid Project Tutorial

Similar presentations


Presentation on theme: "A myGrid Project Tutorial"— Presentation transcript:

1 A myGrid Project Tutorial
Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the myGrid team.

2 Open Source Upper Middleware for Bioinformatics
(Web) Service-based architecture Targeted at Tool Developers, Bioinformaticians and Service Providers Newcastle Sheffield Manchester Nottingham Hinxton Southampton

3 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

4 Roadmap - start services data

5 Tenet I High level Middleware services for data intensive resource interoperation for Bioinformatics Information Grid not computational Grid Exploratory, ad hoc For individuals In silico experiment as workflow Distributed query processing Information Management

6 Tenet II High level services for e-Science experimental management;
Provenance Event notification Personalisation Sharing knowledge and sharing components Scientific discovery is personal & global. Federated third party registries for workflows and services Workflow and service discovery for reuse and repurposing Find Annotate Registry Register

7 Tenet III Open Source and Open Services
No control or influence over service providers Open to third party metadata and services Open extensible architecture Assemble your own components Designed to work together Toolkit Freefluo WfEE Taverna View UDDI registry Event Notification mIR Pedro Semantic Discovery Info. Model Soaplab Gateway & Portal LSID Haystack Provenance Browser Openness open source open world of services open extensible technology open to wider eScience context open to user feedback open to third party metadata Collection of components for assembly Pick and mix

8 Tenet IV (Web) Service architecture Metadata driven
Publication, discovery, interoperation, composition, decommissioning of myGrid services WS-I -> OGSA / WSRF Metadata driven Ontologies Common information model Semantic Web technologies RDF, OWL

9 Tenet V Middleware for Tool Developers Bioinformaticians
Service Providers Biologists are indirectly supported by the portals and apps these develop. In silico experiments are workflows (see other talks)

10 Roadmap services workflows workflows data discover services
run workflows data data management

11 Data-intensive bioinformatics
Bioinformatics analyses typically involve visiting many data resources and analytical tools ID MURA_BACSU STANDARD; PRT; AA. DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASE DE (EC ) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINE DE ENOLPYRUVYL TRANSFERASE) (EPT). GN MURA OR MURZ. OS BACILLUS SUBTILIS. OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE; OC BACILLUS. KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE. FT ACT_SITE BINDS PEP (BY SIMILARITY). FT CONFLICT S -> A (IN REF. 3). SQ SEQUENCE AA; MW; C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

12 Use Scenarios Graves’ Disease Williams-Beuren Syndrome
Autoimmune disease of the thyroid Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle Discover all you can about a gene Annotation pipelines and Gene expression analysis Services from Japan, Hong Kong, various sites in UK Williams-Beuren Syndrome Microdeletion of 155 Mbases on Chromosome 7 Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Characterise an unknown gene Annotation pipelines and Gene expression analysis Services from USA, Japan, various sites in UK

13 Manually filling a genomic gap
Two major steps: Extend into the gap: Similarity searches; RepeatMasker, BLAST Characterise the new sequence: NIX, Interpro, etc… Numerous web-based services (i.e. BLAST, RepeatMasker) Cutting and pasting between screens Large number of steps Frequently repeated – info now rapidly added to public databases Don’t always get results Time consuming Huge amount of interrelated data is produced – handled in lab book and files saved to local hard drive Mundane Much knowledge remains undocumented Bioinformatician does the analysis

14 WBS Workflows: Query nucleotide sequence RepeatMasker ncbiBlastWrapper
Pink: Outputs/inputs of a service Purple: Taylor-made services Green: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns WBS Workflows: GenBank Accession No URL inc GB identifier Translation/sequence file. Good for records and publications prettyseq GenBank Entry Amino Acid translation Identifies PEST seq Sort for appropriate Sequences only epestfind 6 ORFs Identifies FingerPRINTS Seqret pscan MW, length, charge, pI, etc pepstats Nucleotide seq (Fasta) sixpack Predicts Coiled-coil regions ORFs transeq pepcoil RepeatMasker tblastn Vs nr, est, est_mouse, est_human databases. Blastp Vs nr Coding sequence GenScan ncbiBlastWrapper Restriction enzyme map SignalP TargetP PSORTII restrict Predicts cellular location CpG Island locations and % cpgreport Identifies functional and structural domains/motifs InterPro PFAM Prosite Smart RepeatMasker Repetative elements Hydrophobic regions Pepwindow? Octanol? ncbiBlastWrapper Blastn Vs nr, est databases.

15 Graves’ Disease Bioinformatics
Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003) 1School of Computing Science and 2Institute of Human Genetics, University of Newcastle-upon-Tyne. Candidate gene pool Graves’ Disease Bioinformatics Annotation Pipeline Genotype Assay Design System 3D Protein Structure What is known about my candidate gene? Is this SNP present in my samples? What is the structure of the protein product encoded by my candidate gene? Medline Gene ID PDB Query PDB & display protein structure Primer Design EMBL GO Emboss Eprimer application in SoapLab Use primers designed by myGrid to amplify region flanking SNP on the gene Obtain information about protein & extract information about active site Swiss-Prot AMBIT Interpro SNP Query Restriction Fragment Length Polymorphism experiment OMIM BLAST AMBIT Determine whether coding SNP affects the active site of the protein Selection of restriction enzyme Emboss Restrict in SoapLab Talisman DQP SN SNP P SN P

16 Experiment life cycle Forming experiments Personalisation
Discovering and reusing experiments and resources Executing and monitoring experiments Sharing services & experiments Managing lifecycle, provenance and results of experiments

17 (e-)Scientists… …Experiment …Analyze …Collaborate …Publish …Review
Can workflow be used as an experimental method? How many times has this experiment been run? …Analyze How do we manage the results to draw conclusions from them? How reliable are these results? …Collaborate Can we share workflows, results, metadata etc? …Publish Can we link to these workflows and results from our papers? …Review Can I find, comprehend and review your work? How was that result derived?

18 Collections of Tasks Building Domain Tasks Workflow Enactment Storage
Service Providers Enactment Bioinformaticians Storage Scientists Description Provenance Service Discovery Data Management Finding Querying Annotation providers

19 Registry Bioinformaticians Taverna WF Builder Discovery View FreeFluo
Querying/sharing/ federating/registering Query & Retrieve Workflow Execution Annotation/description Discovery View FreeFluo Enactor invoking Annotation providers Interface Description Store data/ knowledge Pedro Annotation tool mIR Others Service Providers WSDL Soap- lab Vocabulary Haystack Provenance Browser Ontology Store Data descriptions Scientists

20 myGrid Service Stack Bioinformaticians Tool Providers
Service Providers Work bench Taverna Web Portal Talisman Applications Gateway Personalisation Registries Service and Workflow Discovery Provenance Event Notification Ontologies Ontology Mgt Views Metadata Mgt myGrid Information Repository Core services FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor services that are the tools that will constitute the experiments, that is: specialised services such as AMBIT text extraction [6], and external third party services such databases, computational analyses, simulations etc, wrapped as web services possibly by SoapLab [2]; services for forming and executing experiments, that is: workflow management services [3], information management services, distributed database query processing [5]; semantic services for discovering services and workflows, and managing metadata, such as: third party service registries and federated personalised views over those registries [8], ontologies and ontology management [13]; services for supporting the e-Science scientific method and best practice found at the bench but often neglected at the workstation, specifically: provenance management [7] and change notification [9].Multiple ways of cutting the architecture. All the services are web services. Some are grid services (miR andn the DQP) and most will be migrated to rid services (WFEE and event notification) Bioservices and external services themselves. Web services currently and will become grid services esp. those that are stateful through soaplab. Some bio services are rendered as ws via soaplab others are not (i.e. those from newcastle). AMBIt is external web service Top layer. Apps Workbench, Taverna, Portland Talisman. They use the middle layers through a gateway (fib) which has an overarching view of the myGrid information model. the User Proxy and Gateway are logically parts of the same thing (i.e. the persistent bit that stands between a paricular user client such as an instance of the workbench and the rest of myGrid), and the User Proxy is still used to support notitications from the workbench and user service selection during workflows (not yet supported in workbench UI) The Gateway per se may not be used by the current version of the workbench since it has yet to be extended to support the newer mIR and enactor designs; i guess that this upgrade will happen between now and all hands. The details of abstract job invocation are currently up in the air since we had our own proprietary job/workflow description schema in the past, but are now trying to unify this with Chris W's emerging stuff and possible scufl extension and broader use of xml schema Middle payers are the core mygrid services. 5 clusters Workflow enactment engine 2. DQP mIR The e-Science support services: Personalisation, provenance and event notification Semantics: service/workflow discovery, ontology mgt, metadata mgt + registries and ontologies. Registries and ontologies could be external Bottom layer – interest to service providers Middle layer interest to tool providers Top layer interest to bioinformaticans Who build apps for biologists Web Service (Grid Service) communication fabric External services SoapLab GowLab Native Web Services AMBIT Text Extraction Service Legacy apps Legacy apps

21 Two+ Paths Innovative work Core functionality
Service and workflow registration Semantic discovery Provenance management Text mining Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Concentric circles? Reflects on the state of play In between Event notification Gateway

22 myGrid Service Stack Bioinformaticians Tool Providers
Service Providers Work bench Taverna Web Portal Talisman Applications Gateway Personalisation Registries Service and Workflow Discovery Provenance Event Notification Ontologies Ontology Mgt Views Metadata Mgt myGrid Information Repository Core services FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor services that are the tools that will constitute the experiments, that is: specialised services such as AMBIT text extraction [6], and external third party services such databases, computational analyses, simulations etc, wrapped as web services possibly by SoapLab [2]; services for forming and executing experiments, that is: workflow management services [3], information management services, distributed database query processing [5]; semantic services for discovering services and workflows, and managing metadata, such as: third party service registries and federated personalised views over those registries [8], ontologies and ontology management [13]; services for supporting the e-Science scientific method and best practice found at the bench but often neglected at the workstation, specifically: provenance management [7] and change notification [9].Multiple ways of cutting the architecture. All the services are web services. Some are grid services (miR andn the DQP) and most will be migrated to rid services (WFEE and event notification) Bioservices and external services themselves. Web services currently and will become grid services esp. those that are stateful through soaplab. Some bio services are rendered as ws via soaplab others are not (i.e. those from newcastle). AMBIt is external web service Top layer. Apps Workbench, Taverna, Portland Talisman. They use the middle layers through a gateway (fib) which has an overarching view of the myGrid information model. the User Proxy and Gateway are logically parts of the same thing (i.e. the persistent bit that stands between a paricular user client such as an instance of the workbench and the rest of myGrid), and the User Proxy is still used to support notitications from the workbench and user service selection during workflows (not yet supported in workbench UI) The Gateway per se may not be used by the current version of the workbench since it has yet to be extended to support the newer mIR and enactor designs; i guess that this upgrade will happen between now and all hands. The details of abstract job invocation are currently up in the air since we had our own proprietary job/workflow description schema in the past, but are now trying to unify this with Chris W's emerging stuff and possible scufl extension and broader use of xml schema Middle payers are the core mygrid services. 5 clusters Workflow enactment engine 2. DQP mIR The e-Science support services: Personalisation, provenance and event notification Semantics: service/workflow discovery, ontology mgt, metadata mgt + registries and ontologies. Registries and ontologies could be external Bottom layer – interest to service providers Middle layer interest to tool providers Top layer interest to bioinformaticans Who build apps for biologists Web Service (Grid Service) communication fabric External services SoapLab GowLab Native Web Services AMBIT Text Extraction Service Legacy apps Legacy apps

23 Its turned into a workbench.
Find services dynamically.

24 Viewing intermediate results
Run the Workflow Viewing intermediate results

25 Run the Workflow

26 Drilling Down: myGrid and Semantics
Workflow and service discovery Prior to and during enactment Semantic registration Workflow assembly Semantic service typing of inputs and outputs Provenance of workflows and other entities Experimental metadata glue Use of RDF, RDFS, DAML+OIL/OWL Instance store, ontology server, reasoner Materialised vs at point of delivery reasoning. myGrid Information Model

27 Semantic Discovery Pedro data capture tool
Drag a workflow entry into the explorer pane and the workflow loads. Drag a service/ workflow to the scavenger window for inclusion into the workflow View annotations on workflow

28 Tutorial focus Innovative work Core functionality
Service and workflow registration Semantic discovery Provenance management Text mining Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Concentric circles? Reflects on the state of play In between Event notification Gateway

29 Roadmap services workflows workflows data Registry Taverna workbench
1. Describe services Registry workflows 2. Discover services workflows 3. Write & run workflows Taverna workbench LSID authorities data 4. Provenance & data management

30 Sessions on Details Workflows - hands on with Taverna Semantics
Timetable – split sessions Session 1 Group 1 – hands on (Swanson) Group 2 – semantics (Newhaven) Teabreak (short) Session 2 Group 1 – semantics (Newhaven) Group 2 –hands on (Swanson) Discussions and Conclusions

31 Questions? http://www.mygrid.org.uk http://taverna.sf.net


Download ppt "A myGrid Project Tutorial"

Similar presentations


Ads by Google