Presentation is loading. Please wait.

Presentation is loading. Please wait.

Research Data Archive - technology

Similar presentations


Presentation on theme: "Research Data Archive - technology"— Presentation transcript:

1 Research Data Archive - technology
Infrastructure and services for reliable long term storage of scientific and cultural data

2 Long time storage versus Archive positioning of the archive service
A. Project storage – Long term storage Personal access Primary and cold experiment data Sharing within the local group or community Simple upload with accepted protocols B Good scientific practice store Like A. above and additionally. Guaranteed storage for at least 10 years Reliable storage with integrity assurance Minimal set of descriptive metadata and PID assignment Open Access C Archive conforming - Repository Like B. above and additionally Forever or as long as funding sustains Data curation Extended metadata search capabilities, OAI-PMH Interface Conforming with standards i.e. OAIS, DSA, ISO16363 06/10/2016 BWDA - Archive infrastructure for scientific data

3 Portfolio Further details and information at https://www.rda.kit.edu 
Data protection hardware based checksums between server memory and all storage media (T10 PI) trusted encrypted network transfers per file checksum comparison when data is read (MD5) data copies at distinct physical locations user verifiable checksum (of published data) - EUDAT - Data storage support for large volumes ( > 10 PetaByte ) and large file sizes ( > 1 TB ) on-line directory meta-data and recently used content (through disk cache) storage classes: single, dual (on divergent technologies), triple/shelf Data access easy access with common protocols and clients (sftp, gridftp) using federated authentication and local user db management account chaining, access to data remains after change of home organisation supporting rules for good scientific practice (retention of data for at least 10 year) simple publication of data - in development (bwDIM) - Miscellaneous sharing of data - in development - scheduled deletion - in development - features for distinct user groups, communities and projects that require long time storage for specific applications Further details and information at 06/10/2016 BWDA - Archive infrastructure for scientific data

4 HPSS Hardware test systems Core/Mover DB/Cache MD storage 1
core DB servers MD storage 2 cache storage 1 admin nodes GridKa data movers data movers cache storage 2 cache storage Test Nodes 06/10/2016 BWDA - Archive infrastructure for scientific data

5 HPSS as archive platform
scalable to billions of objects It’s the DataBase stupid! HPC users like the throughput scalability end to end data protection (IEEE T10 dif) user definable checksums tape optimised reads POSIX access (incl. acls) OpenStack Swift tape container detailed API core servers standby production front-end nodes (sftp, gridftp) FUSE mount hpss mover nodes SAN 2 x disk cache Data Centre A IBM TS1140 drives Data Centre B Oracle 10kD drives 06/10/2016 BWDA - Archive infrastructure for scientific data

6 scalability of FUSE interface
varying number of parallel streams 1,2,4,6,8,12,16 parallel streams per file (chunks of a single file by opening multiple network sockets) 16 concurrent connections (i.e. 16 files are transferred at the same time varying number of concurrent connections 2,4,8,16,32 concurrent connections (2,4,8,16,32 files are transferred at the same time by opening multiple control connections) 8 parallel streams parallel streams concurrent connections 06/10/2016 BWDA - Archive infrastructure for scientific data

7 Archive Usage  Pilot from KIT and HLRS
Data protection hardware based checksums between server memory and all storage media (T10 PI) trusted encrypted network transfers per file checksum comparison when data is read (MD5) data copies at distinct physical locations user verifiable checksum (of published data) - in development (EUDAT) - Data storage support for large volumes ( > 10 PetaByte ) and large file sizes ( > 100 GigaBytes ) on-line directory meta-data and recently used content (through disk cache) storage classes: single, dual (on different technology) Data access easy access with common protocols and clients using federated authentication of bwIDM in BW supporting rules for good scientific practice (retention of data for at least 10 year) simple publication of data - in development (bwDIM) - published data is automatically set read-only (immutable data) - in development - Miscellaneous sharing of data - in development - scheduled deletion - in development - features for distinct user groups, communities and projects that require long time storage for specific applications *) Pilot from KIT and HLRS Support for enduser and projects (e.g. RADAR, EUDAT) As of , over 2 PB data stored, ca. 50 users i.c. projects 06/10/2016 BWDA - Archive infrastructure for scientific data

8 Go Live 24.10 Production test run (Version 1) 22.11 Production start

9 Next steps IDPs must set “Archive” entitlement others must contact us
per person per institute per group per ... it all comes down to money Formalities are here to stay Responsibilities and costs must be cleared beforehand Storage service contracts (with KIT procurement, financial, legal dept.) Security concept descriptions Planning Version 2 features with ‘bwDIM’ publication light AAI features Migration of other existing data: GridKa, LSDF, ForHLR 06/10/2016 BWDA - Archive infrastructure for scientific data


Download ppt "Research Data Archive - technology"

Similar presentations


Ads by Google