Copyright Discovery Net Imperial College SARS Analysis on the Grid Discovery Net in Bioinformatics
Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion
Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion
Copyright Discovery Net Imperial College Structure of Discovery Net Workflow Execution A compositional GRID Workflow Management Collaborative Knowledge Management Workflow Deployment: Grid Service and Portal Workflow Warehousing Resource Mapping Service Abstraction Workflow Authoring Composing services Condor-G Native MPI OGSA-service Web Service Unicore Oralce 10g Web Wrapper Sun Grid Engine Component Design/Integration
Copyright Discovery Net Imperial College Workflow Representation Workflow = Discovery Planning by Service/Component Composition Internal representation in Discovery Process Mark-up Language (DPML) Enables: –Collaboration between researchers involved (who owns which part of analysis) –Transparency of component location in the analysis –End-user empowerment D-Net Workflow for Genome Annotation : 16 services executing across Internet
Copyright Discovery Net Imperial College Component model Components –Nodes –Basic units of composition –Contain compositional, integrity and execution logic Component frameworks –Groups of related nodes (sequence alignment) –Common object model (inputs/outputs are typed) Component architectures –Grouping of related frameworks (bioinformatics)
Copyright Discovery Net Imperial College Three levels of a component Connectivity: –What are my inputs? –What are my outputs? Metadata: –What are my logical constraints? –How do I verify myself? –What will I produce? Execution: –What do I actually do? Input types Input metadata Input data Output types Result metadata Result
Copyright Discovery Net Imperial College Construction of a component Through Software Development Kit – for new algorithms Using template nodes for webservices, command-line tools With specialised IDEs to produce customised components Idea is to remove the complexity of component construction as far as possible from the user
Copyright Discovery Net Imperial College Workflow Warehousing and Provenance Workflows/Services record their history: Discovery Net records the full authoring information Users may annotate workflows All information stored in DPML Shared IP for a virtual Organization Users can browse for services based on properties Users can browse for existing workflows and workflow templates Users can see full project history for each service
Copyright Discovery Net Imperial College Publishing of workflows Parameterisation of a workflow Defining the black box that is offered to the end-user Once deployed, workflow is accessible as: –Web service –Grid service –Command line tool –Web page Workflows combined in personalised portals
Copyright Discovery Net Imperial College Discovery Net users Component developers –IT-literate to an extent Analysis designers –Domain experts with understanding of the research problem End users –Scientists with no interest in IT and coding/assembling their software Line does get blurry!
Copyright Discovery Net Imperial College Discovery Net Application Examples Discovery Net Application Examples Environmental Modelling –High throughput dispersed air sensing technology Life Sciences –High throughput genomics and proteomics Real time geo-hazard modelling –Earthquake modelling through satellite imagery GM Crop trial studies –Simulating the effects of GM crops on the surrounding ecosystem NM L KJIHGFEDCBA
Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion
Copyright Discovery Net Imperial College SARS Basic Facts Appeared first in January 2003, Guangdong province, China SARS Coronavirus (SARS-CoV) identified as the cause China started a major research initiative to investigate the biology of the virus and predict its behaviour
Copyright Discovery Net Imperial College SARS project Collaboration between Discovery Net and SCBIT (Shanghai Center for Bioinformation Technology) Annotation of SARS genomes obtained from different patient samples Analysis of mutation patterns of SARS virus Discovery Net providing the IT platform to organize the analysis
Copyright Discovery Net Imperial College Work done Data –Research performed on 33 sample of SARS virus, sequenced from the Chinese patients –Combined with publicly available data from NCBI Goal –Deeper understanding of the mutation patterns of the SARS virus Analysis –Examining the variability of the virus on both genomic and proteomic level –Providing full insight into the significance of changes in the nucleic structure of the virus
Copyright Discovery Net Imperial College Genomic analysis Alignment - data intensive, performed on the Grid Retrieval of publicly available knowledge Examining the variations in different strains
Copyright Discovery Net Imperial College Phylogenetic view SARS Genome taken from Hong Kong Patients SARS Genome taken from Beijing Patients SARS Genome taken from Singapore Patients
Copyright Discovery Net Imperial College Proteomic analysis Isolating interesting genomic regions Identifying relevant protein sequences Observing the variations in the resulting protein
Copyright Discovery Net Imperial College Proteomic annotation Parallel annotation with multiple sequence analysis tools Framework first used in Supercomputing 2002
Copyright Discovery Net Imperial College Annotation editor
Copyright Discovery Net Imperial College SCBIT Analysis Portal
Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion
Copyright Discovery Net Imperial College Next step Portal technology used to build thematical portals concentrating on particular research areas Goal: to construct a number of public portals for the needs of the UK eScience community and make them accessible to all
Copyright Discovery Net Imperial College Overview Introduction to Discovery Net SARS project Demo Conclusion
Copyright Discovery Net Imperial College Discovery Net Advantages… Rapid component integration through SDK or generic connectors: –Grid services –Web services –Command-line tools etc. Intuitive research assembly and management –Graphical workflow assembly Provenance of analysis –Within the server warehouse Personalised end-user environments –Discovery Portal
Copyright Discovery Net Imperial College … applied to SCBIT research Integrated –Existing tools (EMBOSS, alignment apps) –In-house data stores (with SARS sequence data) –Original algorithms for mining variation info Workflows assembled by the whole research group Research history tracked through the project change information SCBIT Portal creating a common platform for multidisciplinary users
Copyright Discovery Net Imperial College Summary IT platform supporting an urgent discovery research Access to data within a scalable knowledge creation infrastructure Exploitation and annotation of biological information using multiple sources, data types and locations Integration of external applications within a unified environment Sharing of methods, results and data views across the Virtual Organisation
Copyright Discovery Net Imperial College Credits and further info Discovery Net team, especially Moustafa Ghanem, Jameel Syed and Stuart Hassard Exhibiting at EPSRC and LESC stands Demo today at 13:15 – 14:45 at EPSRC stand