Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics.

Slides:



Advertisements
Similar presentations
Large-Scale, Adaptive Fabric Configuration for Grid Computing Peter Toft HP Labs, Bristol June 2003 (v1.03) Localised for UK English.
Advertisements

VirtualSim Inc. Real tools for virtual worlds Presentation.
Accelerating Content Management Server solutions with MCMS.RAPID Mark Harrison – Microsoft Tony Sloggett.
Page 1 CSISS LCenter for Spatial Information Science and Systems 03/19/2008 GeoBrain BPELPower Workflow Engine Liping Di, Genong Yu Center.
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
CWE, EC – ESA joint activities on e-collaboration Brussels, 13 April 2005 IST Call 5 Preparatory workshop.
Copyright 2006 Digital Enterprise Research Institute. All rights reserved. MarcOnt Initiative Tools for collaborative ontology development.
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
Current status of grids: the need for standards Mike Mineter TOE-NeSC, Edinburgh.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Component-Based Software Engineering Main issues: assemble systems out of (reusable) components compatibility of components.
Discovery Workflow: (ServiceFlow) Programming the Grid Prof. Yike Guo Imperial College London.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software processes 2.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
BUILD EFFICIENCY IN YOUR ORGANIZATION WITH SHAREPOINT 2010 Steve Deming Partner Solutions Advisor Microsoft US Partner Group
Building Enterprise Applications Using Visual Studio ®.NET Enterprise Architect.
GenSpace: Exploring Social Networking Metaphors for Knowledge Sharing and Scientific Collaborative Work Chris Murphy, Swapneel Sheth, Gail Kaiser, Lauren.
Grid Programming Environment (GPE) Grid Summer School, July 28, 2004 Ralf Ratering Intel - Parallel and Distributed Solutions Division (PDSD)
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
A Semantic Workflow Mechanism to Realise Experimental Goals and Constraints Edoardo Pignotti, Peter Edwards, Alun Preece, Nick Gotts and Gary Polhill School.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Influenza Research Database (IRD): A Web-based Resource for Influenza Virus Data and Analysis Victoria Hunt 1 *, R. Burke Squires 1, Jyothi Noronha 1,
A long tradition. e-science, Data Centres, and the Virtual Observatory why is e-science important ? what is the structure of the VO ? what then must we.
ICENI Overview & Grid Scheduling Laurie Young London e-Science Centre Department of Computing, Imperial College.
Taverna and my Grid Basic overview and Introduction Tom Oinn
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Definition of Computational Science Computational Science for NRM D. Wang Computational science is a rapidly growing multidisciplinary field that uses.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Kepler/pPOD: Scientific Workflow and Provenance Support for Assembling the Tree of Life UC DAVIS Department of Computer Science The Kepler/pPOD Team Shawn.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Scalable Clustering on the Data Grid Patrick Wendel Moustafa Ghanem Yike Guo Discovery Net Department of Computing Imperial College,
© DATAMAT S.p.A. – Giuseppe Avellino, Stefano Beco, Barbara Cantalupo, Andrea Cavallini A Semantic Workflow Authoring Tool for Programming Grids.
4 th Annual EPSRC e-science meeting The need for e-Science An industrial perspective Stephen Calvert – VP Cheminformatics GSKYike Guo – Imperial College.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Supporting Scientific Collaboration Online SCOPE Workshop at San Diego Supercomputer Center March 19-22, 2008.
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Distributed Data Mining in Discovery Net Dr. Moustafa Ghanem Department of Computing Imperial College London.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
© Copyright AARNet Pty Ltd PRAGMA Update & some personal observations James Sankar Network Engineer - Middleware.
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
Provenance: Problem, Architectural issues, Towards Trust
Recap: introduction to e-science
Web Ontology Language for Service (OWL-S)
About Thetus Thetus develops knowledge discovery and modeling infrastructure software for customers who: Have high value data that does not neatly fit.
Scientific Workflows Lecture 15
Presentation transcript:

Copyright Discovery Net Imperial College SARS Analysis on the Grid Discovery Net in Bioinformatics

Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion

Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion

Copyright Discovery Net Imperial College Structure of Discovery Net Workflow Execution A compositional GRID Workflow Management Collaborative Knowledge Management Workflow Deployment: Grid Service and Portal Workflow Warehousing Resource Mapping Service Abstraction Workflow Authoring Composing services Condor-G Native MPI OGSA-service Web Service Unicore Oralce 10g Web Wrapper Sun Grid Engine Component Design/Integration

Copyright Discovery Net Imperial College Workflow Representation Workflow = Discovery Planning by Service/Component Composition Internal representation in Discovery Process Mark-up Language (DPML) Enables: –Collaboration between researchers involved (who owns which part of analysis) –Transparency of component location in the analysis –End-user empowerment D-Net Workflow for Genome Annotation : 16 services executing across Internet

Copyright Discovery Net Imperial College Component model Components –Nodes –Basic units of composition –Contain compositional, integrity and execution logic Component frameworks –Groups of related nodes (sequence alignment) –Common object model (inputs/outputs are typed) Component architectures –Grouping of related frameworks (bioinformatics)

Copyright Discovery Net Imperial College Three levels of a component Connectivity: –What are my inputs? –What are my outputs? Metadata: –What are my logical constraints? –How do I verify myself? –What will I produce? Execution: –What do I actually do? Input types Input metadata Input data Output types Result metadata Result

Copyright Discovery Net Imperial College Construction of a component Through Software Development Kit – for new algorithms Using template nodes for webservices, command-line tools With specialised IDEs to produce customised components Idea is to remove the complexity of component construction as far as possible from the user

Copyright Discovery Net Imperial College Workflow Warehousing and Provenance Workflows/Services record their history: Discovery Net records the full authoring information Users may annotate workflows All information stored in DPML Shared IP for a virtual Organization Users can browse for services based on properties Users can browse for existing workflows and workflow templates Users can see full project history for each service

Copyright Discovery Net Imperial College Publishing of workflows Parameterisation of a workflow Defining the black box that is offered to the end-user Once deployed, workflow is accessible as: –Web service –Grid service –Command line tool –Web page Workflows combined in personalised portals

Copyright Discovery Net Imperial College Discovery Net users Component developers –IT-literate to an extent Analysis designers –Domain experts with understanding of the research problem End users –Scientists with no interest in IT and coding/assembling their software Line does get blurry!

Copyright Discovery Net Imperial College Discovery Net Application Examples Discovery Net Application Examples Environmental Modelling –High throughput dispersed air sensing technology Life Sciences –High throughput genomics and proteomics Real time geo-hazard modelling –Earthquake modelling through satellite imagery GM Crop trial studies –Simulating the effects of GM crops on the surrounding ecosystem NM L KJIHGFEDCBA

Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion

Copyright Discovery Net Imperial College SARS Basic Facts Appeared first in January 2003, Guangdong province, China SARS Coronavirus (SARS-CoV) identified as the cause China started a major research initiative to investigate the biology of the virus and predict its behaviour

Copyright Discovery Net Imperial College SARS project Collaboration between Discovery Net and SCBIT (Shanghai Center for Bioinformation Technology) Annotation of SARS genomes obtained from different patient samples Analysis of mutation patterns of SARS virus Discovery Net providing the IT platform to organize the analysis

Copyright Discovery Net Imperial College Work done Data –Research performed on 33 sample of SARS virus, sequenced from the Chinese patients –Combined with publicly available data from NCBI Goal –Deeper understanding of the mutation patterns of the SARS virus Analysis –Examining the variability of the virus on both genomic and proteomic level –Providing full insight into the significance of changes in the nucleic structure of the virus

Copyright Discovery Net Imperial College Genomic analysis Alignment - data intensive, performed on the Grid Retrieval of publicly available knowledge Examining the variations in different strains

Copyright Discovery Net Imperial College Phylogenetic view SARS Genome taken from Hong Kong Patients SARS Genome taken from Beijing Patients SARS Genome taken from Singapore Patients

Copyright Discovery Net Imperial College Proteomic analysis Isolating interesting genomic regions Identifying relevant protein sequences Observing the variations in the resulting protein

Copyright Discovery Net Imperial College Proteomic annotation Parallel annotation with multiple sequence analysis tools Framework first used in Supercomputing 2002

Copyright Discovery Net Imperial College Annotation editor

Copyright Discovery Net Imperial College SCBIT Analysis Portal

Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion

Copyright Discovery Net Imperial College Next step Portal technology used to build thematical portals concentrating on particular research areas Goal: to construct a number of public portals for the needs of the UK eScience community and make them accessible to all

Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion

Copyright Discovery Net Imperial College Discovery Net Advantages… Rapid component integration through SDK or generic connectors: –Grid services –Web services –Command-line tools etc. Intuitive research assembly and management –Graphical workflow assembly Provenance of analysis –Within the server warehouse Personalised end-user environments –Discovery Portal

Copyright Discovery Net Imperial College … applied to SCBIT research Integrated –Existing tools (EMBOSS, alignment apps) –In-house data stores (with SARS sequence data) –Original algorithms for mining variation info Workflows assembled by the whole research group Research history tracked through the project change information SCBIT Portal creating a common platform for multidisciplinary users

Copyright Discovery Net Imperial College Summary IT platform supporting an urgent discovery research Access to data within a scalable knowledge creation infrastructure Exploitation and annotation of biological information using multiple sources, data types and locations Integration of external applications within a unified environment Sharing of methods, results and data views across the Virtual Organisation

Copyright Discovery Net Imperial College Credits and further info Discovery Net team, especially Moustafa Ghanem, Jameel Syed and Stuart Hassard Exhibiting at EPSRC and LESC stands Demo today at 13:15 – 14:45 at EPSRC stand