Presentation is loading. Please wait.

Presentation is loading. Please wait.

EBI is an Outstation of the European Molecular Biology Laboratory. Rodrigo Lopez Head of EMBL-EBI/ES Andrew Lyall ELIXIR PM. ELIXIR and the integration.

Similar presentations


Presentation on theme: "EBI is an Outstation of the European Molecular Biology Laboratory. Rodrigo Lopez Head of EMBL-EBI/ES Andrew Lyall ELIXIR PM. ELIXIR and the integration."— Presentation transcript:

1 EBI is an Outstation of the European Molecular Biology Laboratory. Rodrigo Lopez Head of EMBL-EBI/ES Andrew Lyall ELIXIR PM. ELIXIR and the integration of biomolecular data in life sciences information systems

2 Summary Definitions Challenges Technologies Solutions Community Conclusions 03.07.20162

3 Definitions Biomolecular Computer representation of living molecules and bio-active compounds (e.g. Gene, transcript, gene expression, protein structure, function, drugs). Representations have a structure (i.e. computer readable formats) “Web Service” vs. web service see http://www.ebi.ac.uk/Tools/webservices Architectures: SOAP (Simple Object Access Protocol), REST (Representational State Transfer) and DAS (Distributed Annotation System) Cloud Cloud computing is the delivery of computing as a service rather than a product (e.g. lease storage space instead of buying physical hard disks). 03.07.20163

4 ELIXIR – What is it? 03.07.20164 An EU Framework 7 Preparatory Phase Project Coordinated by Prof Janet Thornton, Director EMBL-EBI To construct a plan for the operation of a sustainable infrastructure for biological information in Europe €4.5 million grant awarded May 2007, three year term 32 member consortium engaging many of Europe’s main bioinformatics funding agencies and research institutes Deliverables are memoranda of understanding to fund the implementation phase which could cost €500 million Interested parties should register as stake-holders via the ELIXIR Website: www.elixir-europe.orgwww.elixir-europe.org

5 03.07.20165 Databases Challenges: Biomolecular diversity Genomes Ensembl Ensembl Genomes EGA Genomes Ensembl Ensembl Genomes EGA Nucleotide sequence ENA Nucleotide sequence ENA Functional genomics ArrayExpress, Expression Atlas Functional genomics ArrayExpress, Expression Atlas Protein sequences UniProt Protein sequences UniProt Protein families + motifs InterPro Macromolecular Structure PDBe Macromolecular Structure PDBe Protein expression PRIDE Protein expression PRIDE Chemical entities ChEBI Chemical entities ChEBI Interactions + pathways IntAct, Reactome Interactions + pathways IntAct, Reactome Literature and ontologies CiteXplore, UKPMC, (GO) Literature and ontologies CiteXplore, UKPMC, (GO) Chemogenomics ChEMBL Chemogenomics ChEMBL Systems BioModels Systems BioModels

6 03.07.20166 Challenges: Growth of core biomolecular data a.Nucleotide sequences in the European Nucleotide Archive b.Genomes in Ensembl & Ensembl Genomes c.Gene expression: hybridisations in the Array Express Archive d.Protein sequences in UniParc e.Macromolecular structures in PDBe f.Protein families, motifs and domains from entries in InterPro

7 03.07.20167 Challenges: Disk storage at EMBL-EBI 7 Petabytes Dec 2011.

8 Challenges: Storing data 1000genomes will produce 1TB of data but will require 100TB of raw storage to get there (before NGS). …9PB at present. Your NGS (Illumina, 454, etc.) analysis strategy directly affects your data storage needs. Are you doing Whole Genome or Exome sequencing? 03.07.20168 An updated diagram for the "Moore's Law" : http://www.nature.com/news/2011/11072 7/full/475435a/box/1.html http://www.nature.com/news/2011/11072 7/full/475435a/box/1.html

9 Technologies: Genomics 03.07.20169 omicmaps.com

10 Technologies: Data management ENA 03.07.201610 Sources http://www.ebi.ac.uk/ena & http://www.ncbi.nlm.nih.gov RNA-Seq, ChIP-Seq, and epigenomic data that are submitted to GEO and ArrayExpress Genomic and Transcriptomic assemblies are submitted to INSDC (EMBL-Bank, GenBank and DDBJ) 16S ribosomal RNA data associated with metagenomics that are submitted to INSDC

11 Technologies: Proteomics The PRIDE database currently contains: 21,731 Experiments 8,897,573 Identified Proteins 51,246,134 Identified Peptides 4,896,394 Unique Peptides 292,341,092 Spectra Proteome Commons Annotations: 22 Tb gpmDB statistics for Tue May 22 11:51:50 2012 UTC (#3030) models = 197,292 proteins = 63,651,028 distinct proteins = 1,476,612 protein redundancy = 43.1 × peptides = 539,375,566 distinct peptides = 4,003,692 peptide redundancy = 134.7 × residues = 7,551,257,924 03.07.201611

12 03.07.201612 Europe 2020: The Grand Societal Challenges Europe has an ageing population, an unsustainable food supply and is facing increasing threats from environment destruction, bioterrorism and emerging pandemics. Business and commerce are challenged by competitive pressures from globalization. The future well-being and prosperity of our citizens will depend absolutely on innovation to tackle these Grand Challenges as well as to create new products and services in life sciences and ICT. Innovation has been placed at the heart of the Europe 2020 Strategy for Growth and Jobs and indeed the Innovation Union explicitly highlights the biological and medical challenges as providing the opportunity for economic recovery and growth.

13 Challenges: Access to data Download data to local facilities from central repositories: ftp.ebi.ac.uk (306 TB compressed data/year to >1500 worldwide institutes core facilities) fasp.sra.ebi.ac.uk; fasp.era.ebi.ac.uk (110 TB compressed data/year to 20 core genomic centres) Management challenge: Data is generated quicker than it can be analysed. Many never finish downloading a data set before a new one is ready. 03.07.201613

14 03.07.201614 Challenges: Disruptive technologies. “A technology becomes disruptive when the rate at which it improves exceeds the rate at which users can adapt to the new performance.” The Innovator's Dilemma. Clayton M. Christensen. Harvard Press. 1997

15 Solutions – a timeline Web Services – Service Oriented Architecture under TEMBLOR, EMBRACE, FELICS, SLING EU funded NoE projects. Cloud storage (Amazon S3, EMC Atmos, Google Cloud Storage, iCloud, Windows Azure, etc.) Enterprise Service Bus (Integration via Interoperability) Focus: Giving scientists the possibility to bring their analysis software closer to the big data (ELIXIR, BiomedBridges, - EGI.eu, Helix Nebula, etc.). 03.07.201615 2004 2011 2013 2012

16 Solutions: Community/Hybrid Cloud architectures 03.07.201616 Source: www.vmware.com

17 Solutions: EBI - SaaS Web Services (SOAP/REST) 03.07.201617

18 Solutions: EBI - IaaS 03.07.201618

19 Solutions: PaaS 03.07.201619 COMMON PLATFORMS

20 03.07.201620 Community: ELIXIR: An e-Infrastructure for biological and medical research GÉANT, DANTE, EGI.eu, PRACE, etc e-Infrastructure

21 21 Community: Visits during consultation phase.

22 22 Sites of ELIXIR survey data providers

23 Community: UXD 03.07.201623

24 Conclusions…so far. Interoperability is more important than integration. Measurements, not only of compute variables but of work pattern metrics are extremely important. User engagement and outreach are important. Also at the grass-root level. The challenge of BIG data is how to get it closer to the users. The community is [a big] part of the solution. ELIXIR is already delivering and setting a pace for research and development investment in bioinformatics (e.g beyond the feasibility phase: Web Services, Identity Federation for EGA and Distributed Search and Retrieval for several trillion biological data objects ) 03.07.201624

25 03.07.201625 abell nepp TECHNICAL HUB @ EBI Ground breaking ceremony 13 th June 2012

26 Thanks TERENA 2012 EMBL EU, WT, BBRC, EPO, Data & Scientific Content Providers and many collaborators http://www.ebi.ac.uk/ http://www.ebi.ac.uk/ http://www.elixir-europe.org Director General of EMBL (Ian Mattaj) and Director of EMBL-EBI (Janet Thornton) will visit Iceland towards the end of May 2012. 03.07.201626


Download ppt "EBI is an Outstation of the European Molecular Biology Laboratory. Rodrigo Lopez Head of EMBL-EBI/ES Andrew Lyall ELIXIR PM. ELIXIR and the integration."

Similar presentations


Ads by Google