Presentation is loading. Please wait.

Presentation is loading. Please wait.

San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer.

Similar presentations


Presentation on theme: "San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer."— Presentation transcript:

1 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE/

2 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Data Management Objectives Automate all aspects of data management –Discovery (without knowing the file name) –Access (without knowing its location) –Retrieval (using your preferred API) –Control (without having a personal account at the remote storage system) –Performance (use latency management mechanisms to minimize impact of wide-area-networks)

3 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Collections Replicated via SRB onto TeraGrid 2MASS –10 TBs, 5 million images DPOSS –3 TBs, 6000 images USNO-B –In progress SDSS –In progress MACHO –In negotiation

4 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB Implementations Data collecting –Sensor systems, object ring buffers and portals Data organization –Collections, manage data context Data sharing –Data grids, manage heterogeneity Data publication –Digital libraries, support discovery Data preservation –Persistent archives, manage technology evolution Data analysis –Processing pipelines, manage knowledge extraction

5 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure NSF Infrastructure Projects Using SRB Partnership for Advanced Computational Infrastructure - PACI –Data grid - Storage Resource Broker Distributed Terascale Facility - DTF/ETF –Compute, storage, network resources Digital Library Initiative, Phase II - DLI2 –Publication, discovery, access Information Technology Research projects - ITR –SCEC Southern California Earthquake Center –GEON GeoSciences Network –SEEK Science Environment for Ecological Knowledge –GriPhyN Grid Physics Network –NVO National Virtual Observatory National Science Digital Library - NSDL –Support for education curricula modules

6 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Federal Infrastructure Projects Using SRB NASA –Information Power Grid - IPG –Advanced Data Grid - ADG –Data Management System - Data Assimilation Office Integration of DODS with Storage Resource Broker data grid –Earth Observing Satellite EOS data pools –Consortium of Earth Observing Satellites CEOS data grid Library of Congress –National Digital Information Infrastructure and Preservation Program - NDIIPP National Archives and Records Administration and National Historical Public Records Commission –Prototype persistent archives NIH –Biomedical Informatics Research Network data grid DOE –Particle Physics Data Grid - Babar, CMS

7 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SDSC Collaborations Hayden Planetarium Simulation & Visualization Knowledge Network for BioComplexity (NSF) Mol Science – JCSG, AfCS Visual Embryo Project (NLM) RoadNet (NSF) Earth System Sciences – CEED, Bionome, SIO Explorer Hyper LTER Grid Portal (NPACI) Tera Scale Computing (NSF) Long Term Archiving Project (NARA) Education – Transana (NPACI) NSDL – National Science Digital Library (NSF) Digital Libraries – ADL, Stanford, UMichigan, UBerkeley, CDL … 31 additional collaborations

8 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Approach Use collections to organize digital entities –Digital entity - file, URL, SQL, directory, table, … Create logical name space –Location independent naming convention –Map state information created by data access services to the logical name space –Manage consistency constraints on the metadata update Build an interoperability mechanism –Map from storage repository protocols to preferred APIs

9 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Basic Concepts Logical name space –Map administrative, descriptive, authenticity, consistency metadata onto the logical name Storage repository abstraction –Standard operations performed at remote storage Information repository abstraction –Standard operations to manage collection in a database Access abstraction –Standard operations supported for metadata and data access Authentication abstraction –Collection-owned data, ACLs for data and metadata Latency management mechanisms

10 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Unix Shell Java, NT Browsers OAI WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM ORB Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Postgres, SQLServer, Informix C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

11 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Production Data Grid SDSC Storage Resource Broker –Federated client-server system, managing Over 70 TBs of data at SDSC Over 10 million files –Manages data collections stored in Archives (HPSS, UniTree, ADSM, DMF) Hierarchical Resource Managers Tapes, tape robots File systems (Unix, Linux, Mac OS X, Windows) FTP sites Databases (Oracle, DB2, Postgres, SQLserver, Sybase, Informix) Virtual Object Ring Buffers

12 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB server SRB agent SRB server Federated SRB server model MCAT Read Application SRB agent 1 2 3 4 6 5 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6

13 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Logical Name Space Example - Hayden Planetarium Generate fly-through of the evolution of the solar system Access data distributed across multiple administration domains Gigabyte files, total data size was 7 TBytes Very tight production schedule - 3 months

14 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure

15 Hayden Data Flow NCSA SDSC AMNH NYC GPFS 7.5 TB IBM SP2 SGI Production parameters, movies, images data simulation visualization HPSS 7.5 TB 2.5 TB UniTree UVa NY CalTech BIRN

16 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Logical Name Space Global, location-independent identifiers for digital entities –Organized as collection hierarchy –Attributes mapped to logical name space Attributed managed in a database Types of system metadata –Physical location of file –Owner, size, creation time, update time –Access controls

17 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Mappings on Name Space Define logical resource name –List of physical resources Replication –Write to logical resource completes when all physical resources have a copy Load balancing –Write to a logical resource completes when copy exist on next physical resource in the list Fault tolerance –Write to a logical resource completes when copies exist on k of n physical resources

18 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Latency Management Example - Digital Sky Project 2MASS (2 Micron All Sky Survey): –Bruce Berriman, IPAC, Caltech; John Good, IPAC, Caltech, Wen- Piao Lee, IPAC, Caltech NVO (National Virtual Observatory): –Tom Prince, Caltech, Roy Williams CACR, Caltech, John Good, IPAC, Caltech SDSC – SRB : –Arcot Rajasekar, Mike Wan, George Kremenek, Reagan Moore

19 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Digital Sky - 2MASS http://www.ipac.caltech.edu/2mass The input data was originally written to DLT tapes in the order seen by the telescope –10 TBytes of data, 5 million files Ingestion took nearly 1.5 years - almost daily reading of tapes, one at a time Images aggregated into 147,000 containers by SRB

20 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Digital Sky Data Ingestion Informix SUN SRB SUN E10K HPSS …. 800 GB 10 TB SDSC IPAC CALTECH input tapes from telescopes star catalog Data Cache

21 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure

22 SRB Latency Management Replication Server-initiated I/O Streaming Parallel I/O Caching Client-initiated I/O Remote Proxies, Staging Data Aggregation Containers Source Destination Prefetch Network Destination Network

23 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Containers Images sorted by spatial location –Retrieving one container accesses related images Minimizes impact on archive name space –HPSS stores 680 Tbytes in 17 million files Minimizes distribution of images across tapes Bulk unload by transport of containers

24 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure SRB Development Peer-to-peer federation –Support multiple independent MCAT catalogs –Replicate metadata mySQL/BerkeleyDB port OGSA/OGSI compliant interface GridFTP interfaces –Waiting for next release of the software (4thQ)

25 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure MySRB Features Data & File Management Collection Creation and Management Collection of Varied Objects –Files, SQL Objects, Databases, URLs, directories, archives, … Metadata Handling Browsing & Querying Interface Access Control Version Control (soon) Support proxy (remote) operations

26 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure MySRB Web-based Access to the SRB Secure HTTP Uses Cookies for Session Control Self Registration of Users Supported –Currently limited to SDSC users Self Registration of Resources (soon) Access to Both Data and Metadata

27 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Data Management Browse in Hierarchical Collections Registration of (remote) Legacy Files & Directories Registration of SQL Objects Registration of URLs Data Movement Operations –Ingest & Re-Ingest, Delete, Unlink –Replicate, Copy, Move, S-Link Access Control Operations –Read, Write, Own, Curate, Annotate, … –Ticket-based Access Version Control Operations (soon) –Read Lock, Write Lock, Unlock –Check In Check Out

28 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Types of Meta data System-level Metadata –Size, resource, owner, date, access control, … User-defined Meta data –for data & collections – triples –No limits in number of metadata –Support for Collection-level schemas Comments, default values, drop-down lists –Support for Standardized Schemas (eg. Dublin Core) Annotations –Supports textual annotations –Annotator, date, context also registered

29 San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Meta Data Management Insert, Update and Delete of Metadata Access Control for Metadata (soon in mySRB) Querying across system-level, user-defined metadata and annotations –Query under collections & across collections Browsing on user-defined metadata Metadata supported for legacy files & directories Extract Metadata (using proxy operations)


Download ppt "San Diego Supercomputer Center & National Partnership for Advance Computational Infrastructure Storage Resource Broker Reagan W. Moore San Diego Supercomputer."

Similar presentations


Ads by Google