The Data Grid: Towards an architecture for Distributed Management and analysis of large scientific Datasets Shiung Jiun-Kuei
Outline 1 2 3 4 5 Introduction Data Grid Design Core Data Grid Services 4 High Level Components 5 Implementation National Kaohsiung University of Applied Sciences, Electrical Engineering
National Kaohsiung University of Applied Sciences, Electrical Engineering
Introduction Face questions Resolution Before Huge storage capacity hedge about geography Analysis of large scientific dataset Resolution Supercomputer Data Grid Before Digital library community Storage Resource Broker (SRB) High Performance Storage System (HPSS) National Kaohsiung University of Applied Sciences, Electrical Engineering
Data Grid Design Mechanism Neutrality Policy Neutrality Support heterogeneous systems Policy Neutrality performance decisions exposed to the users Compatibility with Grid Infrastructure use existing Grid tools (i.e. authentication, resource management, and information) Uniformity of Information Infrastructure uniform and convenient access to everything National Kaohsiung University of Applied Sciences, Electrical Engineering
Data Grid Design<cont.> Policies implement High performer National Kaohsiung University of Applied Sciences, Electrical Engineering
Core Data Grid Services Storage Systems Operating file instances Name will be a hierarchical directory Data Access Support control remote file Detect and report error Metadata Service Data Type Application metadata Replica metadata System configuration metadata Other Authorization, Reservation, Co-allocation, Performance measure. National Kaohsiung University of Applied Sciences, Electrical Engineering
Higher-Level Components Replica management Maintain a repository or catalog. A set of services for registering files in the replica catalog, publishing files to locations, and adding/removing replicas at other locations. Locate and select replicas of files. Uses Replica Catalog and GridFTP. Replica Selection & Data Filtering Matchmaker use property about time, cost or etc.. Require only the need subset. Exploit Grid-enabled servers. National Kaohsiung University of Applied Sciences, Electrical Engineering
Replica management National Kaohsiung University of Applied Sciences, Electrical Engineering
Implementation Experiences LDAP implementation Directory Information Tree (DIT) Climate Modeling Time-step or variable mapping to a specific logical file. Using URL move the data. Data Visualization Case by FLASH. #Result? LDAP bind URL ,結合select server, metric match National Kaohsiung University of Applied Sciences, Electrical Engineering
National Kaohsiung University of Applied Sciences, Electrical Engineering
Status of the Data Grid Implementation What’s LDAP to do? Store attribute information. Collective logic file and build the catalog. API functions Create, delete, open, close, read and write Storage API for uniform access HTTP,FTP,DPSS Dpss = Distributed Parallel Storage Server National Kaohsiung University of Applied Sciences, Electrical Engineering
Replica Management APIs globus_ftp_control provides access to low-level GridFTP control and data channel operations. globus_ftp_client provides typical GridFTP client operations. globus_gass_copy provides the ability to start and manage multiple data transfers using GridFTP, HTTP, local file, and memory operations. The globus-url-copy program is a thin wrapper around this API National Kaohsiung University of Applied Sciences, Electrical Engineering
Replica Management APIs globus_replica_catalog provides basic Replica Catalog operations. globus_replica_management combines GridFTP and the Replica Catalog to manage replicated datasets. National Kaohsiung University of Applied Sciences, Electrical Engineering
Thank you!
Replica Catalog Structure: A Climate Modeling Example Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 … Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: 1468762 National Kaohsiung University of Applied Sciences, Electrical Engineering