UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

WP2: Data Management Gavin McCance University of Glasgow.
Data Management Expert Panel - WP2. WP2 Overview.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Lightweight Preservation Environment Gary Jackson.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Producer-Archive Workflow Network (PAWN) Goals Consistent with the Open Archival Information System (OAIS) model Use of web/grid technologies and platform.
Supporting Customized Archival Practices Using the Producer-Archive Workflow Network (PAWN) Mike Smorul, Mike McGann, Joseph JaJa.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Data Grid: GRASP Mike Smorul. Grid Retrieval and Search Platform Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
DISTRIBUTED COMPUTING
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Virtual Data Grid Architecture Ewa Deelman, Ian Foster, Carl Kesselman, Miron Livny.
Part Four: The LSC DataGrid Part Four: LSC DataGrid A: Data Replication B: What is the LSC DataGrid? C: The LSCDataFind tool.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
CGW 04, Stripped replication for the grid environment as a web service1 Stripped replication for the Grid environment as a web service Marek Ciglan, Ondrej.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
DGC Paris WP2 Summary of Discussions and Plans Peter Z. Kunszt And the WP2 team.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
1 Overall Architectural Design of the Earth System Grid.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
PAWN: Producer-Archive Workflow Network
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
GSAF Grid Storage Access Framework
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

UMIACS PAWN, LPE, and GRASP data grids Mike Smorul

Overall Principles (PAWN) Distributed, secure ingestion Use of web/grid technologies – platform independent Minimal client-side requirements Ease of integration with data grid systems. Designed to satisfy data integrity requirements of scientific collections and digital preservation

Distributed Ingestion (PAWN)

Each Producer registers and arranges files locally prior to transport. Multiple distributed archival receiving stations. X.509 based authentication between sites. Independent Certificate Authorities at each Producer. Data Grid is geographically distributed with multiple entrance points

Ingestion Workflow (PAWN) 1. Negotiate Submission Agreement. 2. Workflow Initialization and Submission Information Packet (SIP) creation. 3. Transfer of SIPs to Data Grid site. 4. Validation of SIP transfer 5. Organization of data into collections and transfer into Data Grid.

Component Overview (PAWN)

Communication & Security (PAWN) Web service communication between all components using SOAP. All remote calls are authenticated and encrypted. Components are scalable by using standard web server load balancing technology. Secure communication with Data Grid using GSI.

The Lightweight Preservation Environment is an archival system based on a modular design, primarily consisting of existing components. The current implementation relies mostly on Globus technologies. Primarily, we’ve focused on wrapping logic around those components. Lightweight Preservation Environment (LPE)

Globus Components We Use (LPE) Grid Security Infrastructure (GSI) for authentication Open Grid Services Infrastructure (OGSI) for middleware Metadata Catalog Service (MCS) for descriptive metadata Replica Location Service (RLS) for locating files GridFTP for file access

Data Manager (DM): Organizes data and queries between the user and the other components Policy Manager (PM): Ensures that a minimum number of copies exist for any given file Transformation Manager (TM): Executes specific transformations on a named file on a given storage node and returns the results Developed Components (LPE)

We have developed several interfaces to the LPE, for demonstration purposes Command Line Perl CGI Servlet Interfaces (LPE)

All components use GSI LPE Organization

SRB Grid Testbed (original) Modified the SRB to hold spatial data Contributed Informix port of the SRB (v3.2) Linked three ESIP sites Tested replication between GMU and UMD. Remote registration at UNH

Grid Retrieval and Search Platform (GRASP) Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the UMIACS GLCF. Provides a graphical interface into data grid holdings. Access to entire GLCF holdings through the Storage Resource Broker(SRB)

GRASP Architecture

GRASP uses a data grid as an abstract storage repository. Metadata in the grid is mined from the grid itself or from external sources and published into a browsable form. Data grids may allow for platform independent metadata, but may not be optimal for access

GRASP Screenshot

Grid Holdings Registered GLCF holdings Over 338,000 registered files 4.4Tb total size Granular permissions on registered holdings No need to move all data into grid, registered pre-existing holdings in place.