Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Data Grid: Towards an architecture for Distributed Management

Similar presentations


Presentation on theme: "The Data Grid: Towards an architecture for Distributed Management"— Presentation transcript:

1 The Data Grid: Towards an architecture for Distributed Management
and analysis of large scientific Datasets Shiung Jiun-Kuei

2 Outline 1 2 3 4 5 Introduction Data Grid Design
Core Data Grid Services 4 High Level Components 5 Implementation National Kaohsiung University of Applied Sciences, Electrical Engineering

3 National Kaohsiung University of Applied Sciences, Electrical Engineering

4 Introduction Face questions Resolution Before Huge storage capacity
hedge about geography Analysis of large scientific dataset Resolution Supercomputer Data Grid Before Digital library community Storage Resource Broker (SRB) High Performance Storage System (HPSS) National Kaohsiung University of Applied Sciences, Electrical Engineering

5 Data Grid Design Mechanism Neutrality Policy Neutrality
Support heterogeneous systems Policy Neutrality performance decisions exposed to the users Compatibility with Grid Infrastructure use existing Grid tools (i.e. authentication, resource management, and information) Uniformity of Information Infrastructure uniform and convenient access to everything National Kaohsiung University of Applied Sciences, Electrical Engineering

6 Data Grid Design<cont.>
Policies implement High performer National Kaohsiung University of Applied Sciences, Electrical Engineering

7 Core Data Grid Services
Storage Systems Operating file instances Name will be a hierarchical directory Data Access Support control remote file Detect and report error Metadata Service Data Type Application metadata Replica metadata System configuration metadata Other Authorization, Reservation, Co-allocation, Performance measure. National Kaohsiung University of Applied Sciences, Electrical Engineering

8 Higher-Level Components
Replica management Maintain a repository or catalog. A set of services for registering files in the replica catalog, publishing files to locations, and adding/removing replicas at other locations. Locate and select replicas of files. Uses Replica Catalog and GridFTP. Replica Selection & Data Filtering Matchmaker use property about time, cost or etc.. Require only the need subset. Exploit Grid-enabled servers. National Kaohsiung University of Applied Sciences, Electrical Engineering

9 Replica management National Kaohsiung University of Applied Sciences, Electrical Engineering

10 Implementation Experiences
LDAP implementation Directory Information Tree (DIT) Climate Modeling Time-step or variable mapping to a specific logical file. Using URL move the data. Data Visualization Case by FLASH. #Result? LDAP bind URL ,結合select server, metric match National Kaohsiung University of Applied Sciences, Electrical Engineering

11 National Kaohsiung University of Applied Sciences, Electrical Engineering

12 Status of the Data Grid Implementation
What’s LDAP to do? Store attribute information. Collective logic file and build the catalog. API functions Create, delete, open, close, read and write Storage API for uniform access HTTP,FTP,DPSS Dpss = Distributed Parallel Storage Server National Kaohsiung University of Applied Sciences, Electrical Engineering

13 Replica Management APIs
globus_ftp_control provides access to low-level GridFTP control and data channel operations. globus_ftp_client provides typical GridFTP client operations. globus_gass_copy provides the ability to start and manage multiple data transfers using GridFTP, HTTP, local file, and memory operations. The globus-url-copy program is a thin wrapper around this API National Kaohsiung University of Applied Sciences, Electrical Engineering

14 Replica Management APIs
globus_replica_catalog provides basic Replica Catalog operations. globus_replica_management combines GridFTP and the Replica Catalog to manage replicated datasets. National Kaohsiung University of Applied Sciences, Electrical Engineering

15 Thank you!

16 Replica Catalog Structure: A Climate Modeling Example
Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: National Kaohsiung University of Applied Sciences, Electrical Engineering


Download ppt "The Data Grid: Towards an architecture for Distributed Management"

Similar presentations


Ads by Google