The Data Grid: Towards an architecture for Distributed Management

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Globus DataGrid Overview Bill Allcock, ANL GridPP Meeting 30 June 2003.
The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
High Performance Computing Course Notes Grid Computing.
Data Grids Darshan R. Kapadia Gregor von Laszewski
Grid computing Globus GridFTP & Replica Management Robert Nickel BTU - Mathematik 01.Februar 2002.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
File and Object Replication in Data Grids Chin-Yi Tsai.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Major Grid Computing Initatives Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer Science The.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Globus Replica Management Bill Allcock, ANL PPDG Meeting at SLAC 20 Sep 2000.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
The Earth System Grid (ESG) Computer Science and Technologies DOE SciDAC ESG Project Review Argonne National Laboratory, Illinois May 8-9, 2003.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Globus – Part II Sathish Vadhiyar. Globus Information Service.
Department of Computing, School of Electrical Engineering and Computer Sciences, NUST - Islamabad KTH Applied Information Security Lab Secure Sharding.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Globus Presented by: Yayati Kasralikar for CPA 5937.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Introduction to Active Directory
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Data Management The European DataGrid Project Team
Globus Data Storage Interface (DSI) - Enabling Easy Access to Grid Datasets Raj Kettimuthu, ANL and U. Chicago DIALOGUE Workshop August 2, 2005.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Protocols and Services for Distributed Data- Intensive Science Bill Allcock, ANL ACAT Conference 19 Oct 2000 Fermi National Accelerator Laboratory Contributors:
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Vincenzo Spinoso EGI.eu/INFN
Data Bridge Solving diverse data access in scientific applications
Evaluation of “data” grid tools
Globus —— Toolkits for Grid Computing
Data Management in Release 2
University of Technology
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Distributed Systems Bina Ramamurthy 11/30/2018 B.Ramamurthy.
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
The Anatomy and The Physiology of the Grid
Distributed Systems Bina Ramamurthy 4/22/2019 B.Ramamurthy.
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

The Data Grid: Towards an architecture for Distributed Management and analysis of large scientific Datasets Shiung Jiun-Kuei

Outline 1 2 3 4 5 Introduction Data Grid Design Core Data Grid Services 4 High Level Components 5 Implementation National Kaohsiung University of Applied Sciences, Electrical Engineering

National Kaohsiung University of Applied Sciences, Electrical Engineering

Introduction Face questions Resolution Before Huge storage capacity hedge about geography Analysis of large scientific dataset Resolution Supercomputer Data Grid Before Digital library community Storage Resource Broker (SRB) High Performance Storage System (HPSS) National Kaohsiung University of Applied Sciences, Electrical Engineering

Data Grid Design Mechanism Neutrality Policy Neutrality Support heterogeneous systems Policy Neutrality performance decisions exposed to the users Compatibility with Grid Infrastructure use existing Grid tools (i.e. authentication, resource management, and information) Uniformity of Information Infrastructure uniform and convenient access to everything National Kaohsiung University of Applied Sciences, Electrical Engineering

Data Grid Design<cont.> Policies implement High performer National Kaohsiung University of Applied Sciences, Electrical Engineering

Core Data Grid Services Storage Systems Operating file instances Name will be a hierarchical directory Data Access Support control remote file Detect and report error Metadata Service Data Type Application metadata Replica metadata System configuration metadata Other Authorization, Reservation, Co-allocation, Performance measure. National Kaohsiung University of Applied Sciences, Electrical Engineering

Higher-Level Components Replica management Maintain a repository or catalog. A set of services for registering files in the replica catalog, publishing files to locations, and adding/removing replicas at other locations. Locate and select replicas of files. Uses Replica Catalog and GridFTP. Replica Selection & Data Filtering Matchmaker use property about time, cost or etc.. Require only the need subset. Exploit Grid-enabled servers. National Kaohsiung University of Applied Sciences, Electrical Engineering

Replica management National Kaohsiung University of Applied Sciences, Electrical Engineering

Implementation Experiences LDAP implementation Directory Information Tree (DIT) Climate Modeling Time-step or variable mapping to a specific logical file. Using URL move the data. Data Visualization Case by FLASH. #Result? LDAP bind URL ,結合select server, metric match National Kaohsiung University of Applied Sciences, Electrical Engineering

National Kaohsiung University of Applied Sciences, Electrical Engineering

Status of the Data Grid Implementation What’s LDAP to do? Store attribute information. Collective logic file and build the catalog. API functions Create, delete, open, close, read and write Storage API for uniform access HTTP,FTP,DPSS Dpss = Distributed Parallel Storage Server National Kaohsiung University of Applied Sciences, Electrical Engineering

Replica Management APIs globus_ftp_control provides access to low-level GridFTP control and data channel operations. globus_ftp_client provides typical GridFTP client operations. globus_gass_copy provides the ability to start and manage multiple data transfers using GridFTP, HTTP, local file, and memory operations. The globus-url-copy program is a thin wrapper around this API National Kaohsiung University of Applied Sciences, Electrical Engineering

Replica Management APIs globus_replica_catalog provides basic Replica Catalog operations. globus_replica_management combines GridFTP and the Replica Catalog to manage replicated datasets. National Kaohsiung University of Applied Sciences, Electrical Engineering

Thank you!

Replica Catalog Structure: A Climate Modeling Example Logical Collection C02 measurements 1998 Logical Collection C02 measurements 1999 Filename: Jan 1998 Filename: Feb 1998 … Logical File Parent Location jupiter.isi.edu Location sprite.llnl.gov Filename: Mar 1998 Filename: Jun 1998 Filename: Oct 1998 Protocol: gsiftp UrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate Filename: Jan 1998 … Filename: Dec 1998 Protocol: ftp UrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi Logical File Jan 1998 Logical File Feb 1998 Size: 1468762 National Kaohsiung University of Applied Sciences, Electrical Engineering