Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics.

Similar presentations


Presentation on theme: "Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics."— Presentation transcript:

1 Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics

2 Copyright Discovery Net Imperial College 2001-2004 Overview Introduction to Discovery Net SARS project Demo Conclusion

3 Copyright Discovery Net Imperial College 2001-2004 Overview Introduction to Discovery Net SARS project Demo Conclusion

4 Copyright Discovery Net Imperial College 2001-2004 Structure of Discovery Net Workflow Execution A compositional GRID Workflow Management Collaborative Knowledge Management Workflow Deployment: Grid Service and Portal Workflow Warehousing Resource Mapping Service Abstraction Workflow Authoring Composing services Condor-G Native MPI OGSA-service Web Service Unicore Oralce 10g Web Wrapper Sun Grid Engine Component Design/Integration

5 Copyright Discovery Net Imperial College 2001-2004 Workflow Representation Workflow = Discovery Planning by Service/Component Composition Internal representation in Discovery Process Mark-up Language (DPML) Enables: –Collaboration between researchers involved (who owns which part of analysis) –Transparency of component location in the analysis –End-user empowerment D-Net Workflow for Genome Annotation : 16 services executing across Internet

6 Copyright Discovery Net Imperial College 2001-2004 Component model Components –Nodes –Basic units of composition –Contain compositional, integrity and execution logic Component frameworks –Groups of related nodes (sequence alignment) –Common object model (inputs/outputs are typed) Component architectures –Grouping of related frameworks (bioinformatics)

7 Copyright Discovery Net Imperial College 2001-2004 Three levels of a component Connectivity: –What are my inputs? –What are my outputs? Metadata: –What are my logical constraints? –How do I verify myself? –What will I produce? Execution: –What do I actually do? Input types Input metadata Input data Output types Result metadata Result

8 Copyright Discovery Net Imperial College 2001-2004 Construction of a component Through Software Development Kit – for new algorithms Using template nodes for webservices, command-line tools With specialised IDEs to produce customised components Idea is to remove the complexity of component construction as far as possible from the user

9 Copyright Discovery Net Imperial College 2001-2004 Workflow Warehousing and Provenance Workflows/Services record their history: Discovery Net records the full authoring information Users may annotate workflows All information stored in DPML Shared IP for a virtual Organization Users can browse for services based on properties Users can browse for existing workflows and workflow templates Users can see full project history for each service

10 Copyright Discovery Net Imperial College 2001-2004 Publishing of workflows Parameterisation of a workflow Defining the black box that is offered to the end-user Once deployed, workflow is accessible as: –Web service –Grid service –Command line tool –Web page Workflows combined in personalised portals

11 Copyright Discovery Net Imperial College 2001-2004 Discovery Net users Component developers –IT-literate to an extent Analysis designers –Domain experts with understanding of the research problem End users –Scientists with no interest in IT and coding/assembling their software Line does get blurry!

12 Copyright Discovery Net Imperial College 2001-2004 Discovery Net Application Examples Discovery Net Application Examples Environmental Modelling –High throughput dispersed air sensing technology Life Sciences –High throughput genomics and proteomics Real time geo-hazard modelling –Earthquake modelling through satellite imagery GM Crop trial studies –Simulating the effects of GM crops on the surrounding ecosystem NM L KJIHGFEDCBA 1 2 3 4 5 6 7 8 9 10

13 Copyright Discovery Net Imperial College 2001-2004 Overview Introduction to Discovery Net SARS project Demo Conclusion

14 Copyright Discovery Net Imperial College 2001-2004 SARS Basic Facts Appeared first in January 2003, Guangdong province, China SARS Coronavirus (SARS-CoV) identified as the cause China started a major research initiative to investigate the biology of the virus and predict its behaviour

15 Copyright Discovery Net Imperial College 2001-2004 SARS project Collaboration between Discovery Net and SCBIT (Shanghai Center for Bioinformation Technology) Annotation of SARS genomes obtained from different patient samples Analysis of mutation patterns of SARS virus Discovery Net providing the IT platform to organize the analysis

16 Copyright Discovery Net Imperial College 2001-2004 Work done Data –Research performed on 33 sample of SARS virus, sequenced from the Chinese patients –Combined with publicly available data from NCBI Goal –Deeper understanding of the mutation patterns of the SARS virus Analysis –Examining the variability of the virus on both genomic and proteomic level –Providing full insight into the significance of changes in the nucleic structure of the virus

17 Copyright Discovery Net Imperial College 2001-2004 Genomic analysis Alignment - data intensive, performed on the Grid Retrieval of publicly available knowledge Examining the variations in different strains

18 Copyright Discovery Net Imperial College 2001-2004 Phylogenetic view SARS Genome taken from Hong Kong Patients SARS Genome taken from Beijing Patients SARS Genome taken from Singapore Patients

19 Copyright Discovery Net Imperial College 2001-2004 Proteomic analysis Isolating interesting genomic regions Identifying relevant protein sequences Observing the variations in the resulting protein

20 Copyright Discovery Net Imperial College 2001-2004 Proteomic annotation Parallel annotation with multiple sequence analysis tools Framework first used in Supercomputing 2002

21 Copyright Discovery Net Imperial College 2001-2004 Annotation editor

22 Copyright Discovery Net Imperial College 2001-2004 SCBIT Analysis Portal

23 Copyright Discovery Net Imperial College 2001-2004 Overview Introduction to Discovery Net SARS project Demo Conclusion

24 Copyright Discovery Net Imperial College 2001-2004 Next step Portal technology used to build thematical portals concentrating on particular research areas Goal: to construct a number of public portals for the needs of the UK eScience community and make them accessible to all

25 Copyright Discovery Net Imperial College 2001-2004 Overview Introduction to Discovery Net SARS project Demo Conclusion

26 Copyright Discovery Net Imperial College 2001-2004 Discovery Net Advantages… Rapid component integration through SDK or generic connectors: –Grid services –Web services –Command-line tools etc. Intuitive research assembly and management –Graphical workflow assembly Provenance of analysis –Within the server warehouse Personalised end-user environments –Discovery Portal

27 Copyright Discovery Net Imperial College 2001-2004 … applied to SCBIT research Integrated –Existing tools (EMBOSS, alignment apps) –In-house data stores (with SARS sequence data) –Original algorithms for mining variation info Workflows assembled by the whole research group Research history tracked through the project change information SCBIT Portal creating a common platform for multidisciplinary users

28 Copyright Discovery Net Imperial College 2001-2004 Summary IT platform supporting an urgent discovery research Access to data within a scalable knowledge creation infrastructure Exploitation and annotation of biological information using multiple sources, data types and locations Integration of external applications within a unified environment Sharing of methods, results and data views across the Virtual Organisation

29 Copyright Discovery Net Imperial College 2001-2004 Credits and further info Discovery Net team, especially Moustafa Ghanem, Jameel Syed and Stuart Hassard http://www.discovery-on-the.net Exhibiting at EPSRC and LESC stands Demo today at 13:15 – 14:45 at EPSRC stand


Download ppt "Copyright Discovery Net Imperial College 2001-2004 SARS Analysis on the Grid Discovery Net in Bioinformatics."

Similar presentations


Ads by Google