ELIXIR activities in Norway (and Europe)

Slides:



Advertisements
Similar presentations
Development on Nordic platform for sensitive biomedical data The Tryggve project Antti Pursula.
Advertisements

Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
Tryggve project developing services for sensitive biomedical data: Call for Nordic use cases NeiC 2015 Conference Workshop on sensitive data Antti Pursula.
European Life Sciences Infrastructure for Biological Information ELIXIR FI for BBMRI IT Morris FIMM and THL Tommi Nyrönen.
European Life Sciences Infrastructure for Biological Information ELIXIR
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.
Nordic platform for sensitive biomedical data The Tryggve project Antti Pursula
European Life Sciences Infrastructure for Biological Information ELIXIR and Identity Management 2 nd Workshop on Federated Identity.
Aalto Data Repository Keijo Heljanko and Mikko Hakala
Lars Ailo Bongo NBS meeting Tromsø, Jan 23, 2016 NeLS Norwegian e-Infrastructure for Life Sciences Overview and recent developments
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.
European Life Sciences Infrastructure for Biological Information European Life Sciences Infrastructure for Biological Information.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Rafael Jimenez ELIXIR CTO BioMedBridges Life science requirements from e-infrastructure: initial results from a joint BioMedBridges workshop Stephanie.
Work Plan for the Second Period Bob Jones, CERN First Helix Nebula Review 03 July This document produced by Members of the Helix Nebula consortium.
E-Infrastructure for Sensitive biomedical data NeiC 2015 Conference Espoo, Finland Antti Pursula.
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
EGI-InSPIRE RI An Introduction to European Grid Infrastructure (EGI) March An Introduction to the European Grid Infrastructure.
EGI… …is a Federation of over 300 computing and data centres spread across 56 countries in Europe and worldwide …delivers advanced computing.
Accessing the VI-SEEM infrastructure
AENEAS WP6 first conference call
RCauth.eu CILogon-like service in EGI and the EOSC
partly FAIR, partly Cloudy
ELIXIR Core Data Resources and Deposition Databases
ELIXIR - Capacity Building
Tools and Services Workshop
MMG: from proof-of-concept to production services at scale (part II)
Joslynn Lee – Data Science Educator
EOSCpilot Service Pilots Face to Face with SDs and RPs
INTAROS WP5 Data integration and management
GO-FAANG Workshop 7-8 October 2015
Introduction inf-2202 Concurrent and Data-intensive Programming
WP6: Marine metagenomics
ELIXIR: Potential areas for collaboration with e-Infrastructures
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
Our cloud usage - and not
Presentation on Copernicus Dissemination
Introduction to EGI; Training activities and plans
SPOCS : Simple Procedures Online for Crossborder Services
EGI-Engage Engaging the EGI Community towards an Open Science Commons
ELIXIR Safeguarding the results of life science research in Europe
Project Overview and EOSC Governance
European Open Science Cloud All Hands Meeting Pisa 8-9 March 2018
Climate Data Analytics in a Big Data world
Federated Identity Management: Status and perspectives of EGI
EOSC & e-Science: enabling the digital transformation of Science
EGI Webinar - Introduction -
EOSCpilot All Hands Meeting 8 March 2018 Pisa
Cyberinfrastructure for the Life Sciences
Common Authentication and Authorisation Service for Life Science Research Mikael Linden, ELIXIR Finland.
ELIXIR Competence Center
European Open Science Cloud All Hands Meeting Pisa 8-9 March 2018
RCauth.eu CILogon-like service in EGI and the EOSC
Integrating social science data in Europe
EMBRC - European Marine Biological Resource Center K. Deneudt, I. Nardello Pilot Blue Cloud Workshop March 28th, 2017 Brussels.
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Challenges and approaches for providing a pipeline as a service
MMG: from proof-of-concept to production services at scale
Distributing META-pipe on ELIXIR compute resources
WP6 – EOSC integration J-F. Perrin (ILL) 15th Jan 2019
Expand portfolio of EGI services
EOSC-hub Contribution to the EOSC WGs
BioData.pt|ELIXIR PT: A Biological Data e -Infrastructure for Research and Innovation Ricardo Leite, Ana Portugal Melo, Cirenia Baldrich, Daniel Faria,
GLENNA2 – The Nordic Cloud
Common Authentication and Authorisation Service for Life Science Research Mikael Linden, ELIXIR Finland.
Presentation transcript:

ELIXIR activities in Norway (and Europe) Lars Ailo Bongo (ELIXIR-NO, UiT) Gard Thomassen (ELIXIR-NO, UiO) NorduGrid 2017, 29 June 2017, Tromsø, Norway

Outline ELIXIR ELIXIR-Norway Background Platforms Use cases META-pipe pipeline and backend ELIXIR-Norway Services Norwegian eInfrastructure for Life Sciences (NeLS)

ELIXIR

ELIXIR’s mission To build a sustainable European infrastructure for biological information, supporting life science research and its translation to: medicine environment bioindustries society

Data growth in the life sciences Phenomenal data growth – dotted lines represents doubling every twelve months (A) Data accumulation at EMBL-EBI by data type, for example mass spectrometry (MS); (B) Data accumulation by dedicated resource, for example PRIDE. The y-axis is log-scale, with the slope of the dashed lines indicating a 12-month doubling time. Continued data growth is seen in all types of data at EMBL-EBI and all data resources. In all data resources shown here, growth rates are predicted to continue increasing, with notable sustained exponential growth in PRIDE, the European Genome-phenome Archive (EGA) and MetaboLights: all have doubling times of around 12 months. All three contributing platforms show rates that are increasing over time, with data growing exponentially with around a 12-month doubling time. EGA – European Genome-phenome Archive, PRIDE – Proteome identification database Data growth at EMBL-EBI Source: Charles E. Cook et al. Nucl. Acids Res. 2016;44:D20-D26

The data challenge: Geographic spread http://www.illumina.com/systems/sequencing-platforms.html http://omicsmaps.com

Summary Large amounts of biological data is produced Need to distribute analysis services across Europe Elixir is the solution

ELIXIR: An international distributed infrastructure for biological data Technical platforms Data Standards Tools Compute Training User communities Marine metagenomics Crop and forest plants Human data Rare diseases

Platforms Compute platform Interoperability platform Training platform Services to store, share, and analyze large datasets. Interoperability platform Standards to describe life science data. Training platform Organize training workshops. Data platform Identify key data resources, link data with literature. Tools platform Help researchers find the best tools for their data. https://www.elixir-europe.org/platforms

ELIXIR Compute Platform Authentication and authorization infrastructure Single login for all ELIXIR services Cloud and compute Standardized way to setup backend for analysis services Setup analysis environment in secure platforms Storage and data transfer Replicate reference databases Infrastructure services registry Help desk https://drive.google.com/file/d/0B0KXZdVao0kqUE9BbXVrc3ZLY1E/view

Scientific use cases Marine metagenomics Human data Rare diseases Plant sciences (Training) https://www.elixir-europe.org/use-cases

Marine metagenomics Define a comprehensive metagenomic data standards environment The metagenomic data life-cycle: standards and best practices, Gigascience 2017 Create marine reference databases The Marine Metagenomics Portal (MMP) Implement pipelines for marine metagenomics analyses EBI EMG UiT META-pipe (used to generate data for MMP) Provide training and workshops Metagenomics training using META-pipe on CSC cPouta cloud

META-pipe: marine metagenomics analysis pipeline

META-pipe: architecture https://github.com/uit-no/elixir-excelerate/blob/master/meta-pipe.md

META-pipe physical architecture

META-pipe: cloud execution Pipeline tools & reference DBs: Mostly 3rd party binaries Hundreds of GB of reference DBs Packaged in META-pipe Jenkins server Not in a container/ VM (no benefits for now) Ongoing: standardize provenance data reporting Spark program Regular spark program + abstractions/interfaces for running 3rd party binaries Ongoing: better error detection, logging, and handling TODO: more secure execution TODO: accounting and payment

META-pipe: cloud execution Spark, NFS execution environment: Standalone Spark NFS since some tools need a shared file system Ongoing: optimize execution environments Ongoing: test scalability Ongoing: test AWS cPouta ansible playbook Setup Spark and NFS execution environment on cPouta OpenStack Setup execution environment on CESNET Open Nebula Ongoing: testing setup on EGI Federated Clouds (OCCI)

MMG EOSC Pilot Marine metagenomics use case, Elixir Compute Platform, EGI Elixir Competency Center Aims: Evaluate the performance of META-pipe and EMG at scale using EOSC resources. Cost-optimize the analyses on EOSC. Evaluate the use of elasticity in EOSC for execution of job queues. Develop a full-service delivery model and potential business model between the stakeholders and entities involved. Not funded Next step: Nordic Open Science Cloud? https://docs.google.com/document/d/124x5ygyE5xIUVHJOq94TwoqLxHgABxGhmrawEmXdN5w/edit#

ELIXIR Norway Bioinformatics services for Norwegian users Tools Pipelines Compute resources Storage resources (project & archive) Sensitive data storage and analysis Common Galaxy interface User profile management

ELIXIR Norway and Norwegian Bioinformatics Platform

ELIXIR Norway: Data life cycle management

WP8 ELIXIR Europe deliverables ELIXIR-Norway 2 WP8 ELIXIR Europe deliverables WP7 Help Desk WP1 Project Management WP3 Microbial Genomics WP4 Non-human Genomics WP5 Biomedicine WP6 Systems Biology WP2 NeLS Sigma2 TSD

COLLABORATION FOR SENSITIVE BIOMEDICAL DATA Tryggve2 project COLLABORATION FOR SENSITIVE BIOMEDICAL DATA Project aims to strengthen biomedical research by facilitating use of sensitive data in cross-border projects Partners and funders are NeIC and ELIXIR Nodes in Denmark, Finland, Norway and Sweden 3-year project with volume of ca. 200 PMs /year (starts 2017) Project builds on strong existing capacities and resources in Nordic countries

Project goal European Genome-Phenome archive (EGA) To transform the EGA to a joint project (in the context of ELIXIR Europe) to have a real impact in the development of personalized medicine Project goal The EGA was created in 2008 by the EBI

The EGA contains a growing amount of data >3.5 PB* ~760,000 files* July-2010 Oct-2016 * Files encrypted in different formats are counted only once

Summary ELIXIR: distributed infrastructure for life science data analysis Marine metagenomics is a demonstrator for ELIXIR platforms META-pipe marine metagenomics analysis pipeline Spark based backend Portable execution on different clouds ELIXIR-Norway provides services for Norwegian users Galaxy analysis pipelines and project management Access to storage and compute Sensitive data in TSD, TRYGGVE, and Local EGA End-to-end solution for Norwegian life scientists