Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.

Similar presentations


Presentation on theme: "Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego."— Presentation transcript:

1 Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego moore@sdsc.edu http://www.npaci.edu/DICE/ moore@sdsc.edu

2 Data Management Systems Data sharing - data grids –Federation across administration domains –Latency management –Sustained data transfers Data publication - digital libraries –Discovery –Organization Data preservation - persistent archives –Technology management –Authenticity

3 Consistent Data Environments Storage Resource Broker combines the functionality of data grids, digital libraries, and persistent archives within a single data environment SRB provides –Metadata consistency –Latency management functions –Technology evolution management

4 Metadata Consistency Storage Resource Broker uses a logical name space to assign global identifiers to digital entities –Files, SQL command strings, database tables, URLs State information that characterizes the result of operations on the digital entities is mapped onto the logical name space Consistency of state information is managed as update constraints on the mapping –Write locks, synchronization flags, schema extension SRB state information is managed in the MCAT metadata catalog

5 SRB Latency Management Replication Server-initiated I/O Streaming Parallel I/O Caching Client-initiated I/O Remote Proxies, Staging Data Aggregation Containers Source Destination Prefetch Network Destination Network

6 SRB 2.0 - Parallel I/O Client-directed parallel I/O - Client/Server –Thread-safe client –client decides the number of threads to use –each thread is responsible for a data segment and connects to the server independently –utilities srbpput and srbpget Sustains 80% to 90% of available bandwidth using 4 parallel I/O streams and a window size of 800 kBytes

7 SRB 2.0 - Parallel I/O (cont1) Server-directed parallel I/O - Client/Server –Server plans and decides number of threads to use –Separate “Control” and “data transfer” sockets –Client listens on the “control” socket and spawns threads to handle data transfer –Always a one-hop data transfer between client and server –Similar to HPSS Works seamlessly with HPSS Mover protocol Also works for other file systems

8 SRB 2.0 - Parallel I/O (cont2) Parallel I/O - Server/Server –Copy, replicate and staging operations –Always used in third-party transfer operations Server/server data transfer, client not involved –Uses up to 4 threads depending on file size –7-10 times improvement for large files across country –Up to 39 MB/sec across campus (PC raid disk, gBit ethernet).

9 SRB server SRB agent SRB server Federated SRB server model MCAT Read Application SRB agent 1 2 3 4 6 5 Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6

10 SRB 2.0 - Bulk operations Uploading and downloading large number of small files –Multi-threaded Bulk registration – 500 files in one call –Fill 8 MB buffer before sending –Use of container New Sbload and Sbunload utilities –Over 100 files per second registration –3-10+ times speedup

11 Unix Shell Java, NT Browsers OAI WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase, SQLServer C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python Technology Management

12 SRB Archival Tape Library System SRB archival storage system in addition to HPSS, UniTree, ADSM. –A distributed pool of disk caches for front end –A tape library system back end STK silo for tape storage and tape mount 3590 tape drives I/O always performed on disk cache –Always stage data to cache

13 CMS Experiment Ian Fisk - user level application –Installed SRB servers at CERN, Fermi Lab, UCSD under a user account Remotely invoked data replication –From UCSD, invoked data replication from CERN to Fermi Lab, and to UCSD –Data transfers automatically used four parallel I/O streams, default window size of 800 kBytes Observed –Sustained data transfer at 80% to 90% of available bandwidth –Transferred over 1 TB of data per day using multiple sessions

14 Future plans SRB 2.1 - Grid-oriented features, SRB-G (5/31/03) –Add GridFTP driver – Access data through GridFTP server –Upgrade to GSI 2.2 (GSI 1.1 in current version) –Provide encrypted data transfer facility, using GSI encryption, between servers and between server and client. Explore network encryption as a digital entity property –WSDL Services interface for SRB including data movement, replication, access control, metadata ingestion and retrieval and container support. SRB 2.2 – Federated MCATs (8/30/03) –Peer-to-peer MCATs –Mount point like interface - /sdsc/…, /caltech/…

15 Next CMS Experiments Sustained transfer –Use 4 MB window size Bulk data registration –In tests with DOE ASCI project, sustained registration of 400 files per second Peer-to-peer federation –Prototype of ability to initiate data and metadata exchanges between MCAT catalogs

16 For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE http://www.npaci.edu/DICE/SRB/index.html http://www.npaci.edu/dice/srb/mySRB/mySRB.html


Download ppt "Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego."

Similar presentations


Ads by Google