Research Data Archive - technology

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Computing Infrastructure
George A. Komatsoulis, Ph.D. National Center for Biotechnology Information National Library of Medicine National Institutes of Health U.S. Department of.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Technology Overview. Agenda What’s New and Better in Windows Server 2003? Why Upgrade to Windows Server 2003 ?  From Windows NT 4.0  From Windows 2000.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Ceph Storage in OpenStack Part 2 openstack-ch,
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
ESP workshop, Sept 2003 the Earth System Grid data portal presented by Luca Cinquini (NCAR/SCD/VETS) Acknowledgments: ESG.
High Performance Storage System Harry Hulen
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
Topaz : A GridFTP extension to Firefox M. Taufer, R. Zamudio, D. Catarino, K. Bhatia, B. Stearn University of Texas at El Paso San Diego Supercomputer.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
1 NETE4631 Working with Cloud-based Storage Lecture Notes #11.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Future home directories at CERN
Sync and Exchange Research Data b2drop.eudat.eu This work is licensed under the Creative Commons CC-BY 4.0 licence B2DROP EUDAT’s Personal.
Afresco Overview Document management and share
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
Andrei Moskalenko Storage team, Centre de Calcul de l’ IN2P3. HPSS – The High Performance Storage System Storage at the Computer Centre of the IN2P3 HEPiX.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Globus and ESGF Rachana Ananthakrishnan University of Chicago
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
Data Infrastructure in the TeraGrid Chris Jordan Campus Champions Presentation May 6, 2009.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
ORNL is managed by UT-Battelle for the US Department of Energy OLCF HPSS Performance Then and Now Jason Hill HPC Operations Storage Team Lead
Virtuozzo 4.0 Carla Safigan Virtuozzo Marketing Jack Zubarev COO.
High Performance Storage System (HPSS) Jason Hick Mass Storage Group HEPiX October 26-30, 2009.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
PaaS services for Computing and Storage
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Compute and Storage For the Farm at Jlab
Course: Cluster, grid and cloud computing systems Course author: Prof
Scalable sync-and-share service with dCache
StoRM: a SRM solution for disk based storage systems
Unified Data Access and MGMT. in Distributed hybrid Cloud
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Introduction to Data Management in EGI
Study course: “Computing clusters, grids and clouds” Andrey Y. Shevel
Ákos Frohner EGEE'08 September 2008
Computing Infrastructure for DAQ, DM and SC
VI-SEEM Data Repository
Large Scale Test of a storage solution based on an Industry Standard
Kirill Lozinskiy NERSC Storage Systems Group
USF Health Informatics Institute (HII)
XenData SX-550 LTO Archive Servers
CMIP6 / ENES Data TF Meeting: DKRZ
Implementing an Institutional Repository: Part II
AWS Cloud Computing Masaki.
Distributing META-pipe on ELIXIR compute resources
Implementing an Institutional Repository: Part II
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
How to Implement an Institutional Repository: Part II
IBM Tivoli Storage Manager
Presentation transcript:

Research Data Archive - technology Infrastructure and services for reliable long term storage of scientific and cultural data

Long time storage versus Archive positioning of the archive service A. Project storage – Long term storage Personal access Primary and cold experiment data Sharing within the local group or community Simple upload with accepted protocols B Good scientific practice store Like A. above and additionally. Guaranteed storage for at least 10 years Reliable storage with integrity assurance Minimal set of descriptive metadata and PID assignment Open Access C Archive conforming - Repository Like B. above and additionally Forever or as long as funding sustains Data curation Extended metadata search capabilities, OAI-PMH Interface Conforming with standards i.e. OAIS, DSA, ISO16363 06/10/2016 BWDA - Archive infrastructure for scientific data

Portfolio Further details and information at https://www.rda.kit.edu  Data protection hardware based checksums between server memory and all storage media (T10 PI)  trusted encrypted network transfers per file checksum comparison when data is read (MD5) data copies at distinct physical locations user verifiable checksum (of published data) - EUDAT - Data storage support for large volumes ( > 10 PetaByte ) and large file sizes ( > 1 TB ) on-line directory meta-data and recently used content (through disk cache) storage classes: single, dual (on divergent technologies), triple/shelf Data access easy access with common protocols and clients (sftp, gridftp) using federated authentication and local user db management account chaining, access to data remains after change of home organisation supporting rules for good scientific practice (retention of data for at least 10 year) simple publication of data - in development (bwDIM) - Miscellaneous sharing of data - in development - scheduled deletion - in development - features for distinct user groups, communities and projects that require long time storage for specific applications Further details and information at https://www.rda.kit.edu 06/10/2016 BWDA - Archive infrastructure for scientific data

HPSS Hardware test systems Core/Mover DB/Cache MD storage 1 core DB servers MD storage 2 cache storage 1 admin nodes GridKa data movers data movers cache storage 2 cache storage Test Nodes 06/10/2016 BWDA - Archive infrastructure for scientific data

HPSS as archive platform scalable to billions of objects It’s the DataBase stupid! HPC users like the throughput scalability end to end data protection (IEEE T10 dif) user definable checksums tape optimised reads POSIX access (incl. acls) OpenStack Swift tape container detailed API core servers standby production front-end nodes (sftp, gridftp) FUSE mount hpss mover nodes SAN 2 x disk cache Data Centre A IBM TS1140 drives Data Centre B Oracle 10kD drives 06/10/2016 BWDA - Archive infrastructure for scientific data

scalability of FUSE interface varying number of parallel streams 1,2,4,6,8,12,16 parallel streams per file (chunks of a single file by opening multiple network sockets) 16 concurrent connections (i.e. 16 files are transferred at the same time varying number of concurrent connections 2,4,8,16,32 concurrent connections (2,4,8,16,32 files are transferred at the same time by opening multiple control connections) 8 parallel streams 2 4 6 8 12 16 parallel streams 2 4 8 16 32 concurrent connections 06/10/2016 BWDA - Archive infrastructure for scientific data

Archive Usage  Pilot from KIT and HLRS Data protection hardware based checksums between server memory and all storage media (T10 PI)  trusted encrypted network transfers per file checksum comparison when data is read (MD5) data copies at distinct physical locations user verifiable checksum (of published data) - in development (EUDAT) - Data storage support for large volumes ( > 10 PetaByte ) and large file sizes ( > 100 GigaBytes ) on-line directory meta-data and recently used content (through disk cache) storage classes: single, dual (on different technology) Data access easy access with common protocols and clients using federated authentication of bwIDM in BW supporting rules for good scientific practice (retention of data for at least 10 year) simple publication of data - in development (bwDIM) - published data is automatically set read-only (immutable data) - in development - Miscellaneous sharing of data - in development - scheduled deletion - in development - features for distinct user groups, communities and projects that require long time storage for specific applications *) Pilot from KIT and HLRS Support for enduser and projects (e.g. RADAR, EUDAT) As of 1.10.2016, over 2 PB data stored, ca. 50 users i.c. projects 06/10/2016 BWDA - Archive infrastructure for scientific data

Go Live 24.10 Production test run (Version 1) 22.11 Production start

Next steps IDPs must set “Archive” entitlement others must contact us per person per institute per group per ... it all comes down to money Formalities are here to stay Responsibilities and costs must be cleared beforehand Storage service contracts (with KIT procurement, financial, legal dept.) Security concept descriptions Planning Version 2 features with ‘bwDIM’ publication light AAI features Migration of other existing data: GridKa, LSDF, ForHLR 06/10/2016 BWDA - Archive infrastructure for scientific data