Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)

Similar presentations


Presentation on theme: "Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)"— Presentation transcript:

1 Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical) Bioinformatics Research Centre University of Glasgow 24 th February 2005

2 Sys-Bio Talk, 24 th Feb 2005 Grids? E-Science? E-Research? sensor nets Shared data archives computers software colleagues instruments Grid methodologies transforming science, engineering, medicine and business driven by exponential growth in data, compute demands  enabling a whole-system approach

3 Sys-Bio Talk, 24 th Feb 2005 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton NeSC in the UK NeSC Core National Grid Service White Rose Grid HPC(x ) CSA R Previous work on UK e-Science Grid based on GT2 Demonstrated broad set of applications across it  Monte Carlo simulations of ionic diffusion through radiation damaged crystal structures  Integrated Earth system modelling  BLAST on the Grid  Grid Integration Test Script Suite  … Transition to WSRF/OGSA under discussion Two UK OGSA Test Grid projects started in January  UCL, Imperial College, Universities of Edinburgh and Newcastle  Universities of Portsmouth, Reading, Manchester, Westminster and CCLRC There are still issues to be resolved OGSA definition and delivery  Standards OGSI, WSRF, …  …and Technologies GT3, GT4… Hosting environments & Platforms Combinations of services supported Material and grids to support adopters Challenges/ Opportunities ? The next Grid software

4 Sys-Bio Talk, 24 th Feb 2005 Life Sciences Extensive Research Community >1000 per research university Extensive Applications Many people care about them  Health, Food, Environment Interacts with virtually every discipline Physics, Chemistry, Maths/Stats, Nano-engineering, … 450+ databases relevant to bioinformatics (and growing!) Heterogeneity, Interdependence, Complexity, Change, …

5 Sys-Bio Talk, 24 th Feb 2005 Systems Biology? Nucleotide sequences Nucleotide structures Gene expressions Protein Structures Protein functions Protein-protein interaction (pathways) Cell Cell signalling Tissues Organs PhysiologyOrganisms Populations + links to plant/crops, environmental, health, … information sources

6 Sys-Bio Talk, 24 th Feb 2005 More genomes …... Arabidopsis thaliana mouse rat Caenorhabitis elegans Drosophila melanogaster Mycobacterium leprae Man Plasmodium falciparum Mycobacterium tuberculosis Neisseria meningitidis Z2491 Helicobacter pylori Xylella fastidiosa Borrelia burgorferi Rickettsia prowazekii Bacillus subtilis Archaeoglobus fulgidus Campylobacter jejuni Aquifex aeolicus Thermotoga maritima Chlamydia pneumoniae Pseudomonas aeruginosa Ureaplasma urealyticum Buchnerasp. APS Escherichia coli Saccharomyces cerevisiae Yersinia pestis Salmonella enterica Thermoplasma acidophilum

7 Sys-Bio Talk, 24 th Feb 2005 Distributed and Heterogeneous data LPSYVDWRSA GAVVDIKSQG ECGGCWAFSA IATVEGINKI TSGSLISLSE QELIDCGRTQ NTRGCDGGYI TDGFQFIIND GGINTEENYP YTAQDGDCDV Sequence Structure Function Gene expressionMorphology

8 Sys-Bio Talk, 24 th Feb 2005 Database Growth PDB Content Growth DBs growing exponentially!!! Biobliographic (MedLine, …) Amino Acid Seq (SWISS-PROT, …) 3D Molecular Structure (PDB, …) Nucleotide Seq (GenBank, EMBL, …) Biochemical Pathways (KEGG, WIT…) Molecular Classifications (SCOP, CATH,…) Motif Libraries (PROSITE, Blocks, …)

9 Sys-Bio Talk, 24 th Feb 2005 Is Grid the Answer? Some key problems to be addressed Tools that simplify access to and usage of data  Internet hopping is not ideal! Tools that simplify access to and usage of large scale HPC facilities  qsub [-a date_time] [-A account_string] [-c interval] [-C directive_prefix] [-e path] [-h] [-I] [-j join] [-k keep] [-l resource_list] [-m mail_options] [-M user_list] [-N name] [-o path] [-p priority] [-q destination] [-r c] [-S path_list] [-u user_list] [-v variable_list] [-V] [-W additional_attributes] [-z] [script] Tools designed to aid understanding of complex data sets and relationships between them  e.g. through visualisation

10 Sys-Bio Talk, 24 th Feb 2005 Access to and Usage of Data Grid technology should allow to hide heterogeneity, deal with location transparency, address security concerns, … Data Access and Integration Specification (DAIS) being defined by GGF OGSA-DAI and DAIT projects key role in shaping these standards Other commercial solutions  IBM Information Integrator, …

11 Sys-Bio Talk, 24 th Feb 2005 Access to and Usage of HPC facilities Consider whole genome-genome (2*3*10^9 bp) comparisons between two species Current strategy essentially chops up one genome and fires searches for those fragments in the other then re-assembles results  messy approximate matching - re-assembly difficult  important correlations can be lost –to make this tractable so called junk DNA ignored –chopping may introduce artefacts or hide phenomena  Better to put both full genomes in memory and perform a useful complete comparison  Only possible with very high-end machines (available via grids) Should not have to be script writer/Linux sys-admin to use these facilities

12 Sys-Bio Talk, 24 th Feb 2005 Cognitive aspects of Data Life science data can be “ugly” Raw data sets messy Requires significant effort to understand Schemas/data models evolving … Tools needed to Simplify understanding Improve analysis Navigate through potentially huge data sets  e.g. to find genes of interest in chromosomes of different species

13 Sys-Bio Talk, 24 th Feb 2005 Nucleotide sequences Nucleotide structures Gene expressions Protein Structures Protein functions Protein-protein interaction (pathways) Cell Cell signalling Tissues Organs PhysiologyOrganisms Populations BRIDGES SBRN VOTES DyVOSE GHI JDSS

14 Sys-Bio Talk, 24 th Feb 2005 Overview of BRIDGES Biomedical Research Informatics Delivered by Grid Enabled Services (BRIDGES) NeSC (Edinburgh and Glasgow) and IBM Started October 2003 Supporting project for CFG project Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, … Aim is integrated infrastructure supporting Data federation Security

15 Sys-Bio Talk, 24 th Feb 2005 Bridges Project Synteny Service Information Integrator OGSA-DAI Magna Vista Service VO Authorisation blast ++ +

16 Sys-Bio Talk, 24 th Feb 2005 JDSS Project Public data resources openness Often cannot query directly Often not easy/possible to find schemas Joint Data Standards Study investigating this  Started on 1 st June and involves –Digital Archiving Consultancy –Bioinformatics Research Centre (Glasgow) –NeSC (Edinburgh and Glasgow)  Look at technical, political, social, ethical etc issues involved in accessing and using public life science resources –Interview relevant scientists, data curators/providers  8 month project with final report due imminently –Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI

17 Sys-Bio Talk, 24 th Feb 2005 Dynamic Virtual Organisations for e-Science Education (DyVOSE) project Two year project started 1 st May 2004 funded by JISC Exploring advanced authorisation infrastructures for security  … in Grid Computing Module as part of advanced MSc at Glasgow –Provide insight into rolling Grid out to the masses! DyVOSE Project

18 Sys-Bio Talk, 24 th Feb 2005 DyVOSE Phase 2/3 ScotGrid PERMIS based Authorisation checks/decisions Glasgow Education VO policies Condor pool Edinburgh Education VO policies Shibboleth Blue Dwarf GlasgowEdinburgh Dynamically established VO resources/users Delegated VO policies

19 Sys-Bio Talk, 24 th Feb 2005 Scottish Bioinformatics Research Network Four year proposal expected to start imminently Funded (£2.4M) by Scottish Enterprise, Scottish Higher Education Funding Council, Scottish Executive Environment and Rural Affairs Department  Involves Glasgow, Dundee, Edinburgh, Scottish Bioinformatics Forum Aim to provide bioinformatics infrastructure for Scottish health, agriculture and industry  Infrastructure support at Dundee, Edinburgh and Glasgow to support first-rate research in bioinformatics at each academic institute  Infrastructure support at three institutes, to support inter-institutional sharing of compute and data resources through application of Grid computing  Outreach and training activities mediated by the Scottish Bioinformatics Forum

20 Sys-Bio Talk, 24 th Feb 2005 VOTES Virtual Organisations for Trials and Epidemiological Studies 3 year MRC (£2.8M) funded project expected to start imminently Plans to develop Grid infrastructure to address key components of clinical trial/observational study  Recruitment of potentially eligible participants  Data collection during the study  Study administration and coordination –Involves Glasgow, Oxford, Leicester, Nottingham, Manchester

21 Sys-Bio Talk, 24 th Feb 2005 Genetics and Healthcare Initiative Five (2+3) year proposal (£4.4M) expected to start imminently Funded by Health Department and Department for Enterprise and Lifelong Learning  Involves Glasgow, Dundee, Edinburgh, Aberdeen –focus of genetics as applied to healthcare –first two years emphasis on providing a platform for research into the genetic basis of common complex diseases in Scotland »Mental health, cardiovascular, … »Plan to establish 15,000 family-based intensively-phenotyped cohort recruited from the East and West of Scotland –basis for neutralising heritable (genetic) risk factors in disease surveillance, treatment optimisation, avoidance of adverse drug events and prediction of response to therapy, health care planning and drug discovery, …

22 Sys-Bio Talk, 24 th Feb 2005 Systems Biology? Once we have (securely) connected all relevant data sets and simplified access to and usage of HPC resources, wrapped your favourite bioinformatics applications as Grid services... what questions would you like to ask? –How does a cell work? –Why do people who eat less tend to live longer? –How many people across Scotland had a heart attack in the last 5 years took drug X, and of those that did where genes A or B influenced by this drug? –Who has performed an experiment similar to mine and where their results similar? –…

23 Sys-Bio Talk, 24 th Feb 2005 www.nesc.ac.uk

24 Sys-Bio Talk, 24 th Feb 2005 www.nesc.ac.uk

25 Sys-Bio Talk, 24 th Feb 2005 Bridges Portal

26 Sys-Bio Talk, 24 th Feb 2005 www.nesc.ac.uk MagnaVista

27 Sys-Bio Talk, 24 th Feb 2005 MagnaVista

28 Sys-Bio Talk, 24 th Feb 2005 QTL upload

29 Sys-Bio Talk, 24 th Feb 2005 QTL upload

30 Sys-Bio Talk, 24 th Feb 2005 QTL browsing

31 Sys-Bio Talk, 24 th Feb 2005 Grid Blast Client Allows ‘genome scale’ blasting Uses ScotGrid and idle compute resources of training lab Condor pool

32 Sys-Bio Talk, 24 th Feb 2005

33

34


Download ppt "Sys-Bio Talk, 24 th Feb 2005 Towards Grid-Based System Biology Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director (Technical)"

Similar presentations


Ads by Google