Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow

Similar presentations


Presentation on theme: "Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow"— Presentation transcript:

1 Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk 18 th March 2004 BRIDGES Status Report

2 Overview Review goals of Bridges project Briefly summarise technical approach Outline achievements thus far Demonstration Plans for the future

3 Bridges Goals High blood pressure affects 25% of adults in western societies Cardiovascular Functional Genomics (CFG) project investigating this through physiological models of hypertension in rat Bridges is a supporting project to CFG and will provide Grid infrastructure to facilitate scientific research CFG project partners are distributed but need to access and integrate various software and especially data resources Main aims of BRIDGES are to develop re-useable infrastructure to provide data federation incorporating appropriate security concerns

4 CFG Partner Distribution Shared data Glasgow Edinburgh Leicester Oxford London Netherlands Public curated data Private data Private data Private data Private data Private data Private data

5 Problems to be addressed BRIDGES will address the following problems facing CFG biologists How to integrate data with multiple levels of security including public data, project only data and private data? How to search multiple distributed databases through single optimised queries? How to use multiple tools in a coordinated (and automated) manner, e.g. how to develop re-useable workflows for the CFG scientists? Integration of a range of bioinformatics analysis and visualisation tools, e.g. BLAST, genome browsers, etc. How to deal with inconsistencies of online databases and possible “dirty data”? How to get more “up to date” data? Make it all user friendly…  portals,  hidden infrastructure, e.g. security authorisation

6 Planned Approach BRIDGES will address these problems through Development of re-useable Grid services based upon GT3 technologies Virtualisation of multiple distributed data sets to provide a single virtual data set for use by the biologists – exploiting IBM’s DiscoveryLink Developing a collection of data on a well-managed platform, including copies of extracts of relevant public data, all project data, and the required software tools (administered using DB2 and DiscoveryLink) Access to and integration of multiple distributed data sets in a Grid environment using results from the OGSA_DAI/DAIT projects A secure environment offering authentication and authorisation  will build on results of the PERMIS security authorisation project

7 Bridges team Project Management Richard Sinnott Dave Berry Database Design/Development Derek Houghton Grid Services Developer Micha Bayer Magnus Ferrier Technical Input David White, Jean-Christophe Mestres, Andy Knox, Emmanuel Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow) Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak,

8 Achievements Web site and project portal established http://europa.nesc.gla.ac.uk/wps/portal Engaged with CFG consortia Staff trained in relevant technologies GT3, DiscoveryLink, Condor Initial version of local repository developed Populated with data that cannot be federated  e.g. public data sets with no programmatic interface –Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite, Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, … Includes shared data sets of CFG scientists  QTL DB, …

9 Achievements …ctd GT3 based Grid services offered that allow to make use of these local data sets Grid enabled BLAST services produced  Offer access to large e-Science infrastructures at Glasgow (ScotGrid) SyntenyVista tool extended to allow Grid enabled visual navigation of genomic data sets Planned front end for many other tools Externally Poster at AHM 2003 Tutorial submitted to ISMB/ECCB (the major bioinformatics conference) Liaising with other projects  eDIKT, myGrid, GeneGrid, PERMIS,...

10 Achievements …ctd Demonstration of some of the achievements

11 Plans Refine/extend and requirements Further refinement of use cases & scenarios More data sets (public, shared, private, …) Implementation and realisation of further use cases e.g. extended query services for microarray data interpretation, workflows for probe set mapping, … Security realisation and roll-out We can only help share CFG data sets if we can get SECURE access to them – following up with CFG sites  Authorisation with PERMIS coming  GSI based authentication Investigate application of replication manager (RLS)  Should support illusion of data from each site being available to all other sites Further Grid based data visualisation services accessible via SyntenyVista Ensure that keep track of relevant developments (WSRF, GT4, …)

12 To sequence To multiple alignment To tabular summaries DRILL-DOWN FUNCTIONS Future Vision of Tools via Portal

13 Questions?

14 Other Scenarios to be Realised Manual micro array data interpretation For each probe visit Affymetrix web pages For each probe follow links  OMIM, HUGO, Ensembl, PubMED, GeneCards, … For each probe look up map positions For each probe select papers from Pubmed, print out the papers Examine the datasets at RIKEN, Array Express, or other expression databases Examine any other data Correlate other results with our data Use of BRIDGES technologies will allow to link multiple remote/local data sets and have queries over those data sets automate processes for dealing with responses, e.g. workflows…

15 Other Scenarios …ctd Design PCR probes for a large number of micro array hits query Affymetrix database  for each probe name -> find target sequence (sequence from which probes were designed)  for each probe BLAST at www.ensembl.org for rat, human, mouse genes, & rat, human, mouse genomic sequencewww.ensembl.org record map locations record real gene sequences Takes 5-10 minutes per probe - human intensive Relies on shared resources ensembl, Affymetrix Local resources (ScotGrid, others) available for BLAST’ing Local data repository for most up to date/relevant CFG data sets Automated processes to realise these (and other) scenarios

16 Initial Deadlines and Deliverables Started October 2003 (when full team on board!) WP1 - ends M3 Hold 2-day workshop with all participants (CFG leaders, IBM specialists, bioinformaticians, team members) Agree on team training Schedule installation of software infrastructure Develop an architecture Choose initial set of use cases and identify test data sets/analysis tools used to establish system is functional Outcome of WP1 is initiated UP with initial architecture, set of use cases and initial system design D1.1 List of documented use cases D1.2 Architecture Definition D1.3 Plan and prototypes for UP cycle

17 Initial Deadlines and Deliverables …ctd WP2 starts M3 ends M6 Develop collection of data on well-managed platform, including copies of extracts of relevant public data, all project data, and required software tools  Administer this using DB2 and DiscoveryLink Use Grid technology to farm out workloads  Should make use of large e-Science computational infrastructures (ScotGrid, …)  Data manager will work with the CFG researchers to migrate the relevant subset of their data into required form Outcome of WP 2 enlarged set of refined use cases operational system subset of research data organised so can be used by bioinformaticians  D2.1 Updated list of documented use cases  D2.2 Working system at Glasgow and Edinburgh  D2.3 Report on cycle 1: experience, lessons & issues  D2.4 Plan and base system for UP cycle 2

18 System Usage Scenario Usage of Extended SyntenyVista and BLAST service BRIDGES Portal Data Repository Client Site X Secure access for CFG VO Shared/ Private Data Sets Personalised Services BLAST Smith W SV DL OGSA-DAI Authorisation Per user, per site Data remote? Browser based clients… Java App downloaded (via WebStart) QTL DB Relevant data sets copied onto ScotGrid and correctly formatted CONDOR POOL??? Export interesting data

19 Security Authentication via X.509 certificate based PKI Embedded in browsers Authorisation via PERMIS PERMIS working with GLOBUS team to define Security Assertion MarkUp Language (SAML) interface to GT3  PERMIS SAML interface already implemented – now waiting for GT3 to support this interface  Likely early April – (von Welch)

20 Where we are today! Web site and project portal established http://europa.nesc.gla.ac.uk/wps/portal DiscoveryLink (Information Integrator) DB repository established and being populated … with public data sets (data warehousing) … links to ensembl (federated data) … with local CFG VO shared data sets (QTL DB) Grid services developed (BLAST, …) Through the pain barrier of GT3! General usage of ScotGrid OpenPBS job submission from client with data staging Extended SyntenyVista to work with remote data sets Gaining experience with security technologies Setting up policies with PERMIS etc


Download ppt "Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow"

Similar presentations


Ads by Google