2 iRODS federates major collections From Ken Arnold, SHAMAN project A UnifiedWeb interface forBrowsing or searchingiRODS federates major collectionsFrom Ken Arnold, SHAMAN projectUser Sees Single HierarchyFlickr file system/flickr/commons/Using flickr API, a RESTful web APIYouTubeMedia accessible through APINew ServiceMountable file system: Hulu, photobucket, etc.Each /flickr/commons/Institution “folder” translates to the result of one or two calls to the flickr API, presented to iRODS as if it were a file systemFor a collection to integrate, it would need to have some remote API that we could write a driver for and one or more ways to map that collection into a treeEach mountable service is made into a resource with all relevant info (location, resource type, etc.
3 With Client Views & Manages Data UserWith Client Views & Manages DataiRODS Shows Unified “Virtual Collection”User Sees Single “Virtual Collection”My DataDisk, Tape, Database, Filesystem, etc.My DataDisk, Tape, Database, Filesystem, etc.Partner’s DataRemote Disk, Tape, Filesystem, etc.The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection.
4 Accessing Data in the iRODS System UserWith iRODS Client searches CATALOG to find and get Data“I need data!”“Finds the data.”“Gets data to user.”iRODS Data SystemiRODS MetadataCatalogKeeps track of dataData ServerDisk, Tape, Database, Filesystem, etc.Users can search for, access, add/extract metadata, annotate, analyze & process, replicate, copy, share data, manage & track access, subscribe, and more.
5 Overview of iRODS Components User InterfaceWeb or GUI Client to Access and Manage Data & Metadata*iRODS ServerData on DiskiRODS MetadataCatalogDatabaseTracks state of dataiRODS Rule EngineImplements PoliciesAbout iRODS and DICEThe Data Intensive Cyber Environments (DICE) group leads core development of the open source iRODS Integrated Rule-Oriented Data System. With more than a decade of award-winning research that harnesses the power of cybertechnologies for managing, sharing, publishing, and preserving digital data, the group is based at the School of Information and Library Science and the Renaissance Computing Institute (RENCI) at the University of North Carolina at Chapel Hill, and the Institute for Neural Computation at the University of California, San Diego. Development of the core iRODS data grid system is funded by the National Science Foundation and the National Archives and Records Administration, with a growing open source iRODS community participating in development worldwide, based in the nonprofit Data Intensive Cyberinfrastructure Foundation. For more information see*Access data with: Web-based Browser, iRODS GUI, Command Line clients, Dspace, Fedora, Kepler workflow, WebDAV, user level file system, etc.
6 "Layers" in iRODS: From Users to Storage PoliciesExpress goals for data access, sharing, preservation, etc.CommunityDecides how to manage shared Collection(s)Administrator/UserApplies RulesRulesImplement Policies in computer-actionable formMicro-servicesOperate on reomte dataiRODS ServerExecutes Micro-services
7 Under the hood - a glimpse NC StateDukeChapel HillMeta DataCatalogiRODS Server Rule EngineDBiRODS ServerRule EngineiRODS ServerRule EngineUser asks for data (using logical properties)Data request goes to 1st ServerServer looks up information in catalogCatalog tells 2nd federated server has data1st server asks 2nd server for data2nd server applies Rules and serves data
8 Policies in iRODSPolicies: Express community goals for data access and sharing, management, long-term preservation, uses, etc.Policy ExamplesRun a particular workflow when a “set of files” is ingested into a collection (e.g. make thumbnails of images, post to website).Automatically replicate a file added to a collection into 3 geographically distributed sites.Automatically extract metadata for a file of a certain type and store in metadata catalog.Periodically check integrity of files in a Collection and repair/replace if needed/possible.Automatically pick a certain storage location based on user or collection or size or type.Let a user access a collection only if using certificate-based login.Send a notification when a certain file is ingested.etc.
9 Policies, Services, Interoperability, Mashups: Richard Marciano, SILS
11 e-Legacy Demo Appraisal Subscribe to RSS Review Received Entry Share and TagDescription Arrangement PreservationMeetPreservationCriteriaPreserve toiRODSYes
12 National Library of France: Distributed Archiving & Preservation System (SPAR)
13 BNF: French National Library Three rules:ImportImport an input document into iRODSAdd import date and checksum as AVU-triplet metadataReplicate to other resourcesGetLocate a copy of the recordReturn if physical checksum .eq. stored checksumIf not, delete replica, copy a good one over itAuditLocate all replicas of a data objectCompute a physical checksum using system’s MD5Compare the result of the checksum stored in user metadataAll stale copies are removed and then replicated from another good copyWhen all copies are audited, a clean copy is staged onto a specific FS directory
14 BNF: French National Library Three rules:ImportImport an input document into iRODSAdd import date and checksum as AVU-triplet metadataReplicate to other resourcesGetLocate a copy of the recordReturn if physical checksum .eq. stored checksumIf not, delete replica, copy a good one over itAuditLocate all replicas of a data objectCompute a physical checksum using system’s MD5Compare the result of the checksum stored in user metadataAll stale copies are removed and then replicated from another good copyWhen all copies are audited, a clean copy is staged onto a specific FS directory
15 BNF: French National Library Micro-ServicesAdd metadata to an iRODS objectImport an object into iRODS, compute MD5 checksum and validate against the supplied one. Once validated, add MD5SUM and import date as metadata. If invalid, content is removed from iRODSReturn the value of an iRODS object metadata attributePrepare to retrieve a metadata attribute for a resourcePrepare to retrieve a metadata attribute for an objectGet the input resources belonging to a zone nameGet iCAT results regarding location info for a recordExecute MD5SUM on the physical content and return valueReturn a pseudo random string of specified lengthDelete a stale replica and replicate over it from another fresh copyStale replica replacement can be eager (synchronous execution) or lazy (delayed execution)
20 RENCI Federated Data Projects Leesa Brieger, RENCI
21 Metadata Catalog (iCAT) RENCI VO Data GridDukeNCSUiRODS ServeriRODS ServerECUUNC-AMetadata Catalog (iCAT)DBUNC-CHRENCI, Europa CenteriRODS ServeriRODS ServeriRODS ServeriRODS ServerClient asks for dataData request goes to iRODS serverServer looks up information in iCATiCAT tells which iRODS server has dataData is retrieved from physical locationand delivered to client
22 Federation of Seven Independent Data Grids National Archives and Records Administration Transcontinental Persistent Archive Prototype (TPAP)Federation of Seven Independent Data GridsNARA IIiCATGeorgia TechiCATRocket CenterNARA IUNCUMDUCSDiCATiCATiCATiCATiCATExtensible Environment:can federate with additional research and education sites.Each data grid uses different vendor products.
24 TUCASI Infrastructure Project (TIP) Goals Leverage data resources for competitive research and leadershipSupport research and education efforts in a wide range of disciplines and domainsNational leadership in next-generation data managementModel for long term campus storageArchitecture and design; hardware, softwareOperations and supportData policiesSelection and retentionIngest, curation and preservationCollections and repository management
25 Classroom content on a DICE/RENCI data grid A TestClassroom content on a DICE/RENCI data gridPanopto Elluminate
27 Goals Make integration simple by creating clear, familiar service API. Make IRODS a familiar, easy-to-use resource to mid-tier Java developers.Develop a REST/SOAP service model for common use-cases using mature tools.Create an out-of-the-box web interface that makes IRODS easy for administrators and archivists.
28 Currently...Jargon is a pure-Java API that talks to IRODS over Java sockets.Jargon is fairly low-level and can be tricky at first.Used in multiple projects including WebDAV interface, as well as integration with the Fedora repository via the irodsfedora library.
29 Jargon (next...) Jargon-core: Jargon re-factored High level service API, POJO's, Spring-friendlyEmphasis on testabilityJargon-akubra: Implementation of an Akubra module for IRODS via JargonJargon-lingo: Application of mature open-source tools over Jargon-core to provide REST-ful, SOAP, and Web interface to IRODS.
30 Conceptual Diagram IRODS Service Model SOAP/REST Web DuraSpace Custom code(Java, Groovy, JythonJruby, etc.)FrameworksJargon-lingoJargon-akubraJargon-coreIRODS Grid