Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
William Y. Arms Corporation for National Research Initiatives March 22, 1999 Object models, overlay journals, and virtual collections.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Dr. Bertram Ludäscher - UC Davis.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
MCAT: A Metadata Catalog San Diego Supercomputing Center Part of the Storage Resource Broker (SRB)
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure SRB + Web Services = Datagrid Management System (DGMS) Arcot.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
Introduction to The Storage Resource.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
The Data Grid: Towards an architecture for Distributed Management
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Grid Computing.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Database Environment Transparencies
Interlib Technology Integration
VORB Virtual Object Ring Buffers
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets A.Chervenak, I.Foster, C.Kesselman, C.Salisbury,
Presentation transcript:

Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda

2CS791 - Spr05 INTRODUCTION At present, archival storage systems provide support for storage of data, but provide no or very little support for managing the information needed to interpret or discover archived data. At present, archival storage systems provide support for storage of data, but provide no or very little support for managing the information needed to interpret or discover archived data. As the load of data increases, the need for an efficient infrastructure, that can provide automated means of information, management, querying and access, increases. As the load of data increases, the need for an efficient infrastructure, that can provide automated means of information, management, querying and access, increases.

3CS791 - Spr05 DATA Collections Strong relationship between digital objects and other data sets from the same discipline. Create data collections through the identification of common attributes. The common attributes now serve as meta data to the data collection. Organize these collections as an OODB or RDB. In case of RDB, the schema consists of the common attributes, and information of how the collection has been organized …., information required to federate two collections

4CS791 - Spr05 Persistent Collections Based upon the concept that both the original digital objects and the information required to assemble the digital objects into a data collection must be archived. Archive digital objects as members of the data collection. Dynamically build the data collection from the individual data objects stored in the archive.

5CS791 - Spr05 Integration of collections need the ability to interpret how a Integration of collections need the ability to interpret how a collection is organized and the ability to dynamically build an information discovery interface into the new collection Persistent Collection - integration of two collections in time (same collection instantiated on 2 different sets of technology) Federated Collection – integration of two collections in space Federated Collection – integration of two collections in space

6CS791 - Spr05 Information Architecture The technologies that are available to build an information infrastructure are: Archives—to manage data sets distributed across tertiary storage systems Databases—to organize information about the data sets Data-handling systems—to provide APIs for access to the data collections Digital libraries—to provide services for manipulating and presenting the data collections The integration of these technologies will lead to a collection-based persistent archive The National Partnership for Advanced Computational Infrastructure is developing an information infrastructure architecture to support the creation of scientific data collections using the above mentioned technologies. An information infrastructure, called DICE is being setup at the San Diego Super Computer Center, as a first step towards this goal.

7CS791 - Spr05 “ Data Intensive Computing Environment” Currently setting up a general digital library system for ingesting, managing, archiving, and accessing several collections of scientific data. Would contain documents, images, field-generated data and simulation results for disciplines ranging from astronomy and earth systems science to social science, ecology, and neuroscience. Information in the archived digital libraries should be available through the web as well as through APIs for processing on supercomputing platforms Should provide a means of interaction between the disciplines and their collections. This requires a meta-data catalog for schema level attributes such as discipline- specific ontologies and semantics. Considering the fast evolving data that is stored in this system, plan to migrate the ontolgies forward in time. Not only need to migrate forward the digital objects but also migrate forward the methods and procedures needed to access them. Go beyond storing preservation-level meta-data for the objects and also consider preservation-level meta-data for methods and APIs.

8CS791 - Spr05 DICE (Contd..) DICE is built around a Meta Data Catalog developed at SDSC. MCAT is a repository that handles 3 different levels of metadata: Digital object meta-data about type, formats, lineage (creation characteristics), ingestion protocols, usage methods, and domain-specific data set attributes; typically created for every data collection in order to support information discovery. System-level meta-data about audit trails, authentication, access control, and replication and partitioning of data sets; used to provide location transparency, access transparency and protocol transparency. Schema-level meta-data including ontology information; used to provide a way to migrate the collection to new technology and to federate data collections.

9CS791 - Spr05 MCAT Architecture Application-dependent meta-data that provides information specific to particular data sets and their collections (Ex: Dublin Core values for text objects) System-level meta-data that provides operational information. These include information about resources, users, methods and data objects Schema-level meta-data The first two types of meta-data are extensible. MCAT provides APIs for creating, modifying and deleting the above structures The MCAT is a database-based catalog that provides a repository of meta information about digital objects.

10CS791 - Spr05 Schema-level meta-data This includes: Logical Structure Attribute Clusters Token Attributes Linkages

11CS791 - Spr05 MCAT Architecture (Contd..) Figure from original paper

12CS791 - Spr05 Data collection federation Figure from original paper

13CS791 - Spr05 Storage Resource Broker Provides a uniform API for access to heterogeneous archival storage systems. Deals with federation of storage sites and replication of data objects. The MCAT information catalog systems play a vital role in publishing authenticated information, and storing and disseminating the information

14CS791 - Spr05 SRB-MCAT System Provides a data integration environment that provides: uniform access APIs across heterogeneous file systems, databases, and archival storage protocol-transparency and location-transparency when accessing distributed systems  uniform name space abstraction over the file system that are being brokered  meta-data-based access to files facilities for replication, copying or moving files across heterogeneous systems, performing resource level operations (proxy operations) on data before delivery to the client an integrated encryption and authentication system that can range from no security to fully encrypted and fully authenticated data transfer including security against man-in-the-middle security intrusions.

15CS791 - Spr05 We will need to provide users a uniform access to diverse storage resources in a heterogeneous computing environment, because:  The data sets under consideration can be very large, making it appropriate to store in archival tape systems directly.  The data sets may be too numerous to be stored in a single file system  The number of data sets may grow with many of the data sets being sparsely used after some initial period of time.

16CS791 - Spr05 The SDSC Storage Resource Broker is the middleware that provides distributed clients with uniform access to diverse storage resources in a heterogeneous computing environment. The SRB presents clients with a logical view of data sets stored in the SRB. Similar to the file name in the file system paradigm, each data set stored in SRB has a logical name, which may be used as a handle for data operation. Figure from original paper

17CS791 - Spr05 Collections in SRB Data sets in the SRB are grouped into a logical (hierarchical) structure called collections. The collection provides an abstraction for: placing similar objects (possibly, physically distributed) under one collection (e.g., image collections of a museum) and placing all dissimilar objects that have a common connection under one abstraction (e.g., all the text paragraphs, images, figures, and tables of a document).

18CS791 - Spr05 Data Replication in SRB Two ways: replicate an object during object creation or modification, using Logical Storage Resources off-line replication facility, to replicate an existing data set.

19CS791 - Spr05 SRB provides a facility for resource-side proxy operations. SRB also provides authentication and encryption facilities, access control list and ticket-based access, and auditing capabilities

20CS791 - Spr05 SRB Process Model Figure from original paper

21CS791 - Spr05 Summary SRB – MCAT support federation of data objects SRB – MCAT support federation of data objects It provides the infrastructure to support a collection based persistent archive, distributed across multiple sites. It provides the infrastructure to support a collection based persistent archive, distributed across multiple sites.

22CS791 - Spr05 THANK YOU Some of the phrases and lines of text used in this presentation are direct excerpts from the original paper