Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003
OGSA cross WG discussion template2 Outline Requirements Key concepts/functionality Architecture/Model (if any) Services/portTypes (if any) Relation with invited groups
OGSA cross WG discussion template3 Requirements Need variety of interfaces –Support access through archivist selected API: C library, C++ library, Shell command, Perl, Python, Windows DLL, Mac DLL, Windows browser, Web browsers, OAI, WSDL, Linux I/O redirection, Java, GridFTP Manage consistency between context (state information resulting from service) and content (digital entities) Support transformative migrations between data types –DFDL based description of component structure –METS based description of compound document Manage authenticity –Digital signatures, audit trails, collection-owned data, procedures for validating digital entity/collection Support persistent archive –Logical name space for infrastructure independent name –Manage technology evolution (standards, encoding format, software)
OGSA cross WG discussion template4 Key concepts Automation of all archival processes –Logical name spaces for data, resources, users, applications Four identifiers: Unique handle, Descriptive metadata, Logical name, Physical file name –Build a persistent service that survives across technology evolution –Support collection-owned data, specify roles for each user and access controls on each digital entity –Manage logical name space as a collection hierarchy, and manage consistency of state information mapped onto the logical name space … –Provide bulk operations (registration, load, unload, metadata update, …) Archival processes to generate archival context –Build upon a standard set of operations that can be performed on the metadata, collection, data, and storage systems –Collection tokens that define restricted semantics –Operations for interacting with catalogs in a database, digital entities in a storage repository –Bulk operations to improve performance
OGSA cross WG discussion template5 SRB server SRB agent SRB server Federated Client Server MCAT Read Application SRB agent Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6
OGSA cross WG discussion template6 Shell / Perl / Python Java, NT Browsers OAI WSDL GridFTP http Modular Architecture (Add new APIs, new Storage Repositories, new Information Repositories) Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, SQLserver, Postgres, mySQL File Systems Unix, NT, Mac OSX Application HRM, ORB Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase, Postgres, mySQL C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication MCAT Enabled Server Linux I/O Mac DLL / Windows DLL
OGSA cross WG discussion template7 Basic Interaction Mechanisms Access mechanisms that require remote operations –Byte level access –Latency management mechanisms –Object oriented access –Heterogeneous system access (database operations, ORB operations, HRM operations) –See “Recommendation for Standard Operations at Remote Sites” An access mechanism is any operation that may require interaction between manipulation and transport –Example - streaming of partial results as process is executed
OGSA cross WG discussion template8 Consistency between Context and Content Consistent management of state information generated by services. Examples: –Management of replicas –Synchronization of replicas –Aggregation of data in containers –Write locks on containers –Replication of containers –Authenticity metadata - audit trails Consistency on bulk operations –Roll-back on partial completion –Synchronization across storage repository outages –Load leveling vs. fault tolerance vs. replication
OGSA cross WG discussion template9 GGF / Standards Interactions Data Format Description Language - digital ontology for describing structure Data Transport - remote operations vs transport Grid File System - remote operations vs consistency Grid Protocol Architecture - Consistency between context and content Semantic Web OWL - Ontology Web Language Digital Library Federation METS - Metadata Encoding and Transmission Standard NSF Digital Library Initiative OAI- Open Archive Initiative NASA/NARA OAIS - Open Archival Information System