National Science Foundation Cooperative Agreement: OCI-0940841.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
Integrated Rule Oriented Data System (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan
Wayne Schroeder, Paul Tooby Data Intensive Cyber Environments Team (DICE) DICE Center, University of North Carolina at Chapel Hill; Institute for Neural.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
iRODS: Interoperability in Data Management
EarthCube Layered Architecture Concept Award Interoperability Mechanisms.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
National Science Foundation Cooperative Agreement: OCI
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
National Data Infrastructure Projects EarthCube Layered Architecture (GEO) DataNet Federation Consortium (OCI) integrated Rule Oriented Data System (SDCI)
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Data Management Planning Session Kevin Gomes Michael Meisinger Arcot Rajasekar Michael Wan October 19, 2007.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
CUAHSI HIS: Science Challenges Linking small integrated research sites (
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
DataNet Federation Consortium
DataNet Collaboration
Policy-Based Data Management integrated Rule Oriented Data System
Implementing an Institutional Repository: Part II
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Implementing an Institutional Repository: Part II
Presentation transcript:

National Science Foundation Cooperative Agreement: OCI

Compute Resources – HPC centers, institutional clusters DFC Collaboration Environment – Data Grid Community Resources – Repository, Catalog DFC Vision Enable collaborative research – Sharing of data, information, and knowledge Build national data cyberinfrastructure – Federation of existing data management systems Support reproducible data-driven research – Encapsulate knowledge in shared workflows Enable student participation in research – Policy-controlled access to “live” data NEW

Data Driven Science and Engineering Collaboration Environments – Oceanography – Ocean Observatory Initiative Archiving climatic data records from real-time sensor data streams – Engineering – CIBER-U Engineering Digital Library: Curating civil engineering data, materials data, archaeology data, student training materials – Hydrology- EarthCube Automating hydrology research workflows (data retrieval, transformation, analysis) Engineering Representation

Collaboration Environments Plant Biology – Data sharing – Resource federation Cognitive Science – IRB policies – Archive Social Science – Dataverse federation – Management policies

National Infrastructure Research Environment - Portals, Applications, Workflows Research Environment - Portals, Applications, Workflows DFC Collaboration Environment – Data Grid middleware DFC Collaboration Environment – Data Grid middleware Community Resource Data Repository Community Resource Data Repository Community Resource Information Catalog Community Resource Information Catalog Community Resource Web Service Community Resource Web Service Existing infrastructure XSEDE OOI TDLC iPlant CUAHSI NCDC GeoBrain DataONE NCSA Polyglot

Interoperability Mechanisms Information Collection Registration Information Exchange Soft Links Message Queue Information Manipulation Database Query Policies control execution of each interoperability mechanism Data Data Access Data Manipulation Micro-services Storage Driver Knowledge Knowledge Creation Analysis Workflows Knowledge Management Procedures : Micro-services

DataNet Interoperability Research Environment - Portals, Applications, Workflows DFC Collaboration Environment Message Queue Web Service DataONE Member Node TerraPop Server SEAD Portal (VIVO) DataONE Coordinating Node SEAD Engagement Center DFC Data Grid DFC Data Grid SEAD Data DFC Data Grid DFC Data Grid

DFC Interoperability Layers Authentication Workflows Data Manipulation Networks PAM / GSSAPI InCommon, GSI, Kerberos, Shibboleth, LDAP Micro-Services Kepler, NCSA Cyberintegrator, Taverna, NCSA Polyglot Format Drivers NetCDF, HDF5, THREDDS, ERDDAP Network Drivers HTTPS, TCP/IP, Parallel TCP/IP, RBUDP Data Access Micro-Services DataONE, Data Conservancy, CUAHSI, NCDC Clients Vocabulary Messaging Management OpenSocial Web browsers, Web Services, Workflows, FUSE, Synchronization, MediaWiki Micro-Services HIVE, (Cheshire) Micro-Services AMQP, iRODS Xmsg Policies (RDA Policies), (ISO Criteria) Storage Access Storage Drivers File Systems, Tape Archives, Object Stores, Cloud Storage

Interoperability Mechanisms Drivers – middleware executed at remote site – Encapsulate knowledge to support partial I/O, parsing of formats, manipulation of data structures – Authentication, format, storage Micro-services – procedures executed in collaboration environment – Encapsulate knowledge needed to interact with an external system or with a data set – Data access, external workflows, semantics, messaging Policies – control mechanisms executed at both locations – Encapsulate knowledge needed for management functions – Federation control, administrative tasks, validation checks

Policy Based Data Management Integrated Rule Oriented Data System – iRODS Reagan W. Moore (DICE-UNC) Arcot Rajasekar (DICE-UNC)

iRODS Integrated Rule Oriented Data System – DICE group – Reagan Moore – Concepts – Arcot Rajasekar – Architect – Mike Wan (retired) – Security / metadata / production – Wayne Schroeder – Rule engine – Hao Xu – User interface (Java) – Mike Conway – Applications – Antoine de Torcy (moved to RENCI) – Administration – Sheau-Yen Chen iRODS Consortium – RENCI / Max Planck / DICE Center / DDN

Policy-Based Data Environments Purpose – Reason a collection is assembled Properties – Attributes needed to ensure the purpose Policies – Controls for enforcing desired properties, – mapped to computer actionable rules Procedures – Functions that implement the policies – Mapped to computer executable workflows Persistent state information – Results of applying the procedures – mapped to system metadata Property verification – Validation that state information conforms to the desired purpose – mapped to periodically executed policies

Collection Purpose Defines Policy Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy SubType Persistent State Information Persistent State Information Policy-based Data Management

Collection Purpose Defines Attribute Has Defines Policy Has Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy SubType Persistent State Information Persistent State Information Isa Digital Object Updates Has Policy-based Data Management - Collection Has

Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy SubType Persistent State Information Persistent State Information Isa Digital Object Updates Has Integrity Isa Authenticity Isa Access control Isa Policy-based Data Management – Collection Properties HasFeature

Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy SubType Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa Policy-based Data Management – Collection Policies Isa HasFeature

Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy Workflow SubType Isa Function Chains Operation Isa Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa Policy-based Data Management –Collection Procedures Isa HasFeature

Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Controls Updates Periodic Assessment Criteria Policy Workflow SubType Isa Function Chains Operation Isa Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa Policy-based Data Management – Persistent State Isa HasFeature

Collection Purpose Completeness Correctness Consensus Defines Consistency Attribute HasFeature Has Defines Policy Has Property Defines Procedure Controls Updates Client Action Periodic Assessment Criteria Policy Policy Enforcement Point Workflow Invokes Has SubType Isa Function Chains Operation Isa Persistent State Information Persistent State Information Isa Digital Object Updates Has Replication Policy Checksum Policy Quota Policy Data Type Policy Isa Integrity Isa Authenticity Isa Access control Isa GetUserACL SetDataType SetQuota DataObjRepl SysChksumDataObj Isa DATA_ID DATA_REPL_NUM DATA_CHECKSUM Isa Policy-based Data Management – Policy Enforcement Isa HasFeature

Policy-based Data Management Concept Graph

Science and Engineering Domains using the iRODS Policy-based data management system AstrophysicsAuger supernova search Atmospheric scienceNASA Langley Atmospheric Sciences Center BiologyPhylogenetics at CC IN2P3 ClimateNOAA National Climatic Data Center Cognitive ScienceTemporal Dynamics of Learning Center Computer ScienceGENI experimental network Cosmic RayAMS experiment on the International Space Station Dark Matter PhysicsEdelweiss II Earth ScienceNASA Center for Climate Simulations EcologyCEED Caveat Emptor Ecological Data EngineeringCIBER-U High Energy PhysicsBaBar / Stanford Linear Accelerator HydrologyInstitute for the Environment, UNC-CH; Hydroshare GenomicsBroad Institute, Wellcome Trust Sanger Institute, NGS MedicineSick Kids Hospital NeuroscienceInternational Neuroinformatics Coordinating Facility Neutrino PhysicsT2K and dChooz neutrino experiments OceanographyOcean Observatories Initiative Optical AstronomyNational Optical Astronomy Observatory Particle PhysicsIndra multi-detector collaboration at IN2P3 Plant geneticsthe iPlant Collaborative Quantum ChromodynamicsIN2P3 Radio AstronomyCyber Square Kilometer Array, TREND, BAOradio SeismologySouthern California Earthquake Center Social ScienceOdum, TerraPop Digital LibraryFrench National Library, Texas Digital Libraries IndexingCheshire Institutional repositoryCarolina Digital Repository PreservationAdonis Reference collectionsSILS LifeTime Library

User Interfaces Web browser – iDrop-web User level file system – FUSE, WebDAV Unix shell commands – iCommands Synchronization interface – iDrop Load libraries – Java, Python I/O libraries – C, C++, Fortran Workflows – Kepler, Taverna, Pegasus Digital libraries – Dspace, Fedora Presentation tools – VIVO, mediaWiki Grid tools – GridFTP, SAGA Portals – EnginFrame Administration tools – Scotty

iDrop-Web Cloud Browser Supports – Browsing – File starring – File upload, download – Metadata – Rule execution (for files that end in “.r”) – Sharing – Tagging – Tickets

FUSE User level file system – Mount the data grid as a local directory on your computer – You can then run your laptop utilities directly on the files within the data grid Has been used to run Fedora on an iRODS data grid collection

Shared Collections – Data Grid File System Client 50 clients: web browser, unix shell command, … Data grid middleware provides global name, single sign-on, policy enforcement, metadata, replication Tape Archive Data Grid Multiple types of systems can be used to store data

Policy-based Data Management Client iRODS-server Rule-engine Rule base Workflows iRODS-server Rule-engine Rule base Workflows iRODS-server Rule Engine Rule base Workflows iRODS-server Rule Engine Rule base Workflows Storage Logical Collection (data grid) Logical Collection (data grid) Consensus on Policies and Procedures controls the Data Collection Virtualize collection Virtualize workflow

Data Workflow Virtualization Storage System Storage Protocol Access Interface Policy Enforcement Points Standard Micro-services Standard I/O Operations Data Grid Trap actions requested by the client at multiple policy enforcement points. Map from policy to standard micro-services. Map from micro-services to standard Posix I/O operations. Map standard I/O operations to the protocol supported by the storage system

iRODS Distributed Data Management

Building Community Resources Digital libraries use collections to define context – Provenance information – Descriptive information – Administrative information Policy-based data management use procedures to encapsulate domain knowledge – Workflows for generation of data – Workflows for administration of data – Workflows for enforcement of management policies – Workflows for verifying collection properties

Computer Actionable Knowledge Dataobjectsbits Informationnamesmetadata Knowledgerelationships between namesprocedures Wisdomrelationships between relationships policy points DatabitsPosix I/O InformationmetadataRelational database Knowledgeprocedures Workflows Wisdompolicy pointsRule engine

Sharing Domain Knowledge Reproducible science – Register workflows – Automate provenance management Collaboration environments – Share data – Share workflows Reference collections – Build community resources of shared data and workflows

New Development Active objects – Soft link - Micro-service structured object Registers a remote object into the shared collection Clicking on the object invokes the required protocol for retrieving the object Can cache a local copy – Can create soft links to Web sites FTP sites Z39.50 SRB data grid iRODS data grid

New Development Active Collections – Mounted collection Can register a remote directory into the collection Can then view contents, list files, retrieve files – Tar collection Can view contents of a tar file – Time-series collection Can request data stream for arbitrary time interval – Workflow collection Can automate capture of workflow provenance

Automating Time Series Data Access Client Requests time period Client Requests time period Time-Series Collection Time Index NetCDF file Data grid automatically generates a time index into all files deposited into the collection. Each access defines the desired time period, and the data grid retrieves data from the relevant files. Being developed for iRODS 3.3 for use by OOI

Capturing Workflow Provenance Workflow file Directory holding all input and output files associated with workflow file (mounted collection that is linked to the workflow file) Input parameter file, lists parameters and input and output file names Directory holding all output files generated from invocation of eCWkflow.run, the version number is incremented for each execution Automatically generated run file for Executing each input file Output files created for eCWKflow.mpf eCWkflow.mss /earthCube/eCWkflow eCWkflow.mpf /earthCube/eCWkflow/eCWkflow.runDir0 eCWkflow.run Outfile eCWkflow2.run eCWkflow2.mpf /earthCube/eCWkflow/eCWkflow2.runDir0 Newfile

Eco-Hydrology Choose gauge or outlet (HIS) Extract drainage area (NHDPlus) Digital Elevation Model (DEM) Worldfile Flowtable RHESSys Slope Aspect Streams (NHD) Roads (DOT) Strata Hillslope Patch Basin Stream network Nested watershed structure Land Use Leaf Area Index Phenology Soil Data NLCD (EPA) Landsat TM MODIS USDA Soil and vegetation parameter files RHESSys workflow to develop a nested watershed parameter file (worldfile) containing a nested ecogeomorphic object framework, and full, initial system state. For each box, create a micro- service to automate task, and chain into a workflow

Cyberintegrator Integration CI Integration code CI Integration code iRODS iRODS integration with Cyberintegrator (CI) kicks off a workflow in CI, monitors its status, and retrieves its output when the workflow finishes. 3: Loop until workflow finished, calling CI Web service to get status 4: Call CI Web service to get workflow output and register into iRODS Cyberintegrator (CI) Public Web Services Submit_workflow () Get_workflow_status () Get_workflow_output () 2: Call CI Web service to kick off workflow and get ID 1: Invoke integration code (potentially from workflow) iRODS Microservices msiExecCmd msiGetStdoutInExecCmdOut msiSubstr msiGetStdoutInExecCmdOut msiDataObjCreate msiDataObjWrite msiDataObjClose msiSplitPath msiAddSelectFieldToGenQuery msiAddConditionToGenQuery msiExecGenQuery msiGetValByKey

CUAHSI/HIS Integration iRODS integration with CUAHSI allows querying of the HIS WaterOneFlow Web service. This mechanism is used by USC to retrieve streamflow data to calibrate VIC models. iRODS Integration CUAHSI/HIS Public Web Services WaterOneFlow () 2: Call WaterOneFlow Web service and retrieve result 1: Invoke (e.g., from workflow) waterOneFlow.r irule -F waterOneFlow.r 3: Register result file into iRODS

DataONE Integration iRODS integration with DataONE allows free text queries against the collection of repositories exposed through the Coordinating Node network. iRODS Integration DataONE Public Web Services CN: search () 2: Call DataONE Web service and retrieve list of IDs from search 1: Invoke (e.g., from workflow) dataOne.r irule -F dataOneTest.r "'river'" 3: Create new collection for result IDs 4: Call DataONE Web service to obtain file for each matching ID and register it into the new collection CN: object ()

Integration Micro-services msiExecCmd msiGetStdoutInExecCmdOut msiSubstr msiGetStdoutInExecCmdOut msiDataObjCreate msiDataObjWrite msiDataObjClose msiSplitPath msiAddSelectFieldToGenQuery msiAddConditionToGenQuery msiExecGenQuery msiGetValByKey

iRODS - Open Source Software – Distributed under BSD license Current version is iRODS 3.3 – Typically have three releases per year Scale of capabilities: – 338 system attributes (users, files, collections, resources, rules) – 354 basic functions (micro-services) – 70 policy enforcement points – 22 basic storage operations (POSIX I/O plus staging) – 10 storage system drivers – More than 50 clients Downloads – 39 countries – 62 US academic institutions

Examples of “National” Infrastructure  Data Grids (data sharing)  National Optical Astronomy Observatory  Ocean Observatories Initiative  The iPlant Collaborative  Babar High Energy Physics  Broad Institute genomics data grid  WellCome Trust Sanger Institute genomics data grid  Digital Libraries (data publication)  French National Library  Texas Digital Library  UNC-CH SILS LifeTime Library  Repositories / Archives(data preservation)  NASA Center for Climate Simulation  Carolina Digital Repository

Community-based Collection Life Cycle Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies for a broader community Virtualize the stages of the collection life cycle through policy evolution The driving purpose changes at each stage of the data life cycle

Rule to count metadata values myTestRule { #Input parameters are: # Null #Output parameter is: # Result string *Query = select count(META_DATA_ATTR_VALUE), order(META_DATA_ATTR_NAME), META_DATA_ATTR_NAME where COLL_NAME like ‘/lifelibZone/home/rwmoore%’; foreach(*row in *Query) { *Count = *row.META_DATA_ATTR_VALUE; *Name = *row.META_DATA_ATTR_NAME; writeLine(“stdout”, “Metadata value *Name appears *Count times”); } INPUT null OUTPUT ruleExecOut

SILS LifeTime Library Student digital libraries – Enable students to build collections of Photographs MP3 audio files Class documents Video Web site archive Resources provided by School of Information and Library Science at UNC-CH – Student collections range from 2 GBytes to 150 Gbytes – Number of files from 2000 to 12,000

SILS LifeTime Library Policies Library management – Replication – Checksums – Versioning – Strict access controls – Quotas – Metadata catalog replication – Installation environment archiving Ingestion – Automated synchronization of student directory with LifeTime Library – Automated loading of MP3 metadata

Policy-Driven Repository Infrastructure project funded by the Institute for Museum and Library Services Carolina Digital Repository

Carolina Digital Repository Ingest Workflow

Publications Rajasekar, R., M. Wan, R. Moore, W. Schroeder, S.-Y. Chen, L. Gilbert, C.-Y. Hou, C. Lee, R. Marciano, P. Tooby, A. de Torcy, B. Zhu, “iRODS Primer: Integrated Rule-Oriented Data System”, Morgan & Claypool, Ward, R., M. Wan, W. Schroeder, A. Rajasekar, A. de Torcy, T. Russell, H. Xu, R. Moore, “The integrated Rule-Oriented Data System (iRODS 3.0) Micro- service Workbook”, DICE Foundation, November 2011, ISBN: , Amazon.com

iRODS Consortium Development Team - Supercomputing iRODS Community iRODS Enterprise iRODS Consortium 11/1303/13 07/13 10/12 03/12 09/11 03/14 Plus, independent plugin releases iRODS Consortium Roadmap Technical

Plugins under Development Released as separate packages Authentication – Kerberos – GSI Resource – S3 – HPSS 7.3 and 7.4 – HDFS – Load Balancing – WOS iRODS Consortium Development Team - Supercomputing 2013

More Information Datanet Federation Consortium – Integrated Rule Oriented Data System – Reagan W. Moore