For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish www.ebi.ac.uk.

Slides:



Advertisements
Similar presentations
Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
Advertisements

Identity management – life sciences perspective Ugis Sarkans European Bioinformatics Institute.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Global Alignment and Collaboration Jo
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
EMBL-EBI and Bioinformatics Steven Newhouse, Head of Technical Services, EMBL-EBI.
Welcome to EMBL-EBI Dr Laura Emery. Before we start… Stand up How experienced are you in bioinformatics? Get to know each other by arranging yourselves.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
European Life Sciences Infrastructure for Biological Information ELIXIR FI for BBMRI IT Morris FIMM and THL Tommi Nyrönen.
Bioinformatics.
Steven Newhouse, Head of Technical Services European Bioinformatics Institute: ICT Challenges.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
European Life Sciences Infrastructure for Biological Information ELIXIR
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
EGI-Engage EGI-Engage Engaging the EGI Community towards an Open Science Commons Project Overview 9/14/2015 EGI-Engage: a project.
CCP-EM community meeting 7 February 2013 EMDB and beyond Ardan Patwardhan and Gerard Kleywegt Protein Data Bank in Europe EMBL-EBI.
Thinking the Future: European Services for Official Statistics (ESCOS) and European Remote Access Network (EuRAN) David Schiller (IAB) and Christof Wolf.
Gramene Objectives Develop a database and tools to store, visualize and analyze data on genetics, genomics, proteomics, and biochemistry of grass plants.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
E-BioSci a platform for e-publishing and information integration in the life sciences Les Grivell European Molecular Biology Organization.
ELIXIR UK - Industry Engagement sector Gabriella Rustici School of Biological Sciences.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
European Life Sciences Infrastructure for Biological Information Life science community update for the 7 th Federated Identity Management.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
European Life Sciences Infrastructure for Biological Information META-pipe WP6 Kick-off Lars Ailo Bongo, ELIXIR-NO.
Resource Entitlement Management System Mikael Linden CSC – IT Center for Science.
ELIXIR: a sustainable infrastructure for biological information in Europe Workshop on the future of Big Data Management The Blackett Laboratory, Imperial.
BIOLOGICAL DATABASES. BIOLOGICAL DATA Bioinformatics is the science of Storing, Extracting, Organizing, Analyzing, and Interpreting information in biological.
Bioinformatics and Computational Biology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Learning and exploring Life science through the EBI reosurces and tools BIOQUEST workshop_2011 Vicky Schneider, EMBL-EBI Training Programme Project leader.
European Life Sciences Infrastructure for Biological Information ELIXIR and Identity Management 2 nd Workshop on Federated Identity.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
EMBL-EBI Data Archives – An Overview. The EMBL-EBI mission Provide freely available data and bioinformatics services to all facets of the scientific community.
Copyright OpenHelix. No use or reproduction without express written consent1 1.
Describing Bioinformatic Metadata at EBI James Malone
Networks ∙ Services ∙ People Thomas Bärecke Journée Fédération, Paris Collaboration européenne GÉANT SA5 03/07/2015 SA5 T5 team
High throughput biology data management and data intensive computing drivers George Michaels.
European Life Sciences Infrastructure for Biological Information Safeguarding the results of life science research in Europe Niklas.
European Life Sciences Infrastructure for Biological Information EGI 2015, Lisbon, 18 May 2015 Rafael C Jimenez, ELIXIR CTO ELIXIR.
European Life Sciences Infrastructure for Biological Information ELIXIR Cloud Roadmap Chairs: Steven Newhouse, EMBL-EBI & Mirek Ruda,
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
1 Modelling and Simulation EMBL – Beyond Molecular Biology Physics Computational Biology Chemistry Medicine.
European Life Sciences Infrastructure for Biological Information ELIXIR’s needs from the EOSC Steven Newhouse, EMBL-EBI Part of the.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No West-Life.
European Life Sciences Infrastructure for Biological Information European Life Sciences Infrastructure for Biological Information.
For EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Aalto Data Repository.
Getting GO annotation for your dataset
Hub Updates for Year 3 Carl Kesselman.
ELIXIR Core Data Resources and Deposition Databases
EMBL’s European Bioinformatics Institute
GO-FAANG Workshop 7-8 October 2015
Ideas for an ICOS Competence Centre Implementation of an on-demand computation service Ute Karstens, André Bjärby, Oleg Mirzov, Roger Groth, Mitch Selander,
ELIXIR activities in Norway (and Europe)
ELIXIR: Authentication and Authorization Infrastructure Requirements
EGI-Engage Engaging the EGI Community towards an Open Science Commons
ELIXIR Safeguarding the results of life science research in Europe
Florian Gräf Software Developer of the McEntyre group at EMBL-EBI
EOSCpilot All Hands Meeting 8 March 2018 Pisa
A User’s Guide to GO: Structural and Functional Annotation
Common Authentication and Authorisation Service for Life Science Research Mikael Linden, ELIXIR Finland.
ELIXIR Competence Center
Break out group coordinator:
MMG: from proof-of-concept to production services at scale
Distributing META-pipe on ELIXIR compute resources
Common Authentication and Authorisation Service for Life Science Research Mikael Linden, ELIXIR Finland.
Presentation transcript:

for EGI/EUDAT EMBL/ELIXIR use-cases Tony Wildish

What is EMBL-EBI? Europe’s home for biological data services, research and training A trusted data provider for the life sciences Part of the European Molecular Biology Laboratory, an intergovernmental research organisation International: 570 members of staff from 57 nations Home of the ELIXIR Technical hub.

A distributed data infrastructure for Europe EMBL-EBI is a founding member of ELIXIR: Europe’s distributed research infrastructure for biological information Mission: to support life science research and its translation to medicine, the environment, the bioindustries and society ELIXIR Nodes represent centres of excellence throughout Europe.

Data resources available from EMBL-EBI Genes, genomes & variation RNA Central Array Express Expression Atlas Metabolights PRIDE InterProPfamUniProt ChEMBLChEBI Molecular structures Protein Data Bank in Europe Electron Microscopy Data Bank European Nucleotide Archive European Variation Archive European Genome-phenome Archive Gene, protein & metabolite expression Protein sequences, families & motifs Chemical biology Reactions, interactions & pathways IntActReactomeMetaboLights Systems BioModelsEnzyme PortalBioSamples Ensembl Ensembl Genomes GWAS Catalog Metagenomics portal Europe PubMed Central Gene Ontology Experimental Factor Ontology Literature & ontologies

ELIXIR: Driven by 4 scientific use-cases Marine Metagenomics Genomic & Phenotypic data for Crop and Forest plants Rare Diseases Human Genetic Data Will not start with human data due to security constraints  All scientific use cases require either private or public data sets to be replicated from the source or between analysis sites

Use-case characteristics Data volumes from 10’s to several 100’s of GB monthly Human data likely to be largest volume/traffic Replication between a handful of sites Periodic updates to reference datasets => metadata handling to describe datasets consistently Download smaller subsets for individual analyses End-users widely distributed

Use-case characteristics Metadata replication not a target for the pilot Complex, domain-specific, well established No clear gain in replicating it at this time Decouple dataset-description metadata from file-location and transfer metadata Allow file-distribution to be explored and understood without digging into details of what the data is about

Use-case characteristics Subscription-based model Datasets subscribed to a destination, new versions distributed automatically as they become available Need to understand metadata requirements to allow this Need an opaque ID for data that can be shared between EBI and EUDAT/EGI to identify dataset versions Rely on EBI source archive for determining what the ID represents File-transfer system needs to handle overlapping datasets (partial updates to existing datasets)

Initial prototype Standalone prototype, first investigate metadata issues Provide a flat list of files to transfer Use globus-connect endpoints & CLI to perform transfer Side-step issues with dependency on AAI Switch to using AAI as soon as possible (ELIXIR, EGI, EUDAT) Currently works on EBI-Embassy, CESNET, and Amazon Integrate with ELIXIR portal Allow data-discovery followed by subscription to ELIXIR/EGI/EUDAT destinations

Summary Initial pilot to investigate issues Data-description metadata out of scope for pilot File-distribution based on AAI from multiple providers Start with globus-connect for simplicity, move to gridFTP once AAI in place File-replica metadata to be handled by prototype TBD: how to do this, tools, technologies… Integrate with ELIXIR cloud portal, (under development) Early days, lots to learn...

Questions?