Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.

Similar presentations


Presentation on theme: "1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities."— Presentation transcript:

1 1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities

2 2 Outline  Introduction to iRODS capabilities  Data-driven science and full Data Life Cycle  Policy-based Management of Distributed Data  Scaling: petabytes, 100s of millions of files  Enabling unified sharable "virtual" collections  Enabling data grids (sharing), digital libraries (publishing), persistent archives (preservation)  Unified Data Space: Interoperate via Federation

3 3 Introduction to iRODS Capabilities

4 4 Data Driven Science Enable new science through collaborative research on shared data collections Management of entire scientific data life cycle from data analysis pipelines to long-term sustainability of reference collections Implement national scale data cyber-infrastructure Federation of exemplar data management technologies in exemplar research initiatives Creation of production data management systems Proven technology implemented in extant data grids Integrate “live” research data collections into education initiatives Policy-based data management across distributed data Project Shared Collection Processing Pipeline Digital Library Reference Collection Federation Data Life Cycle

5 5 Data are Inherently Distributed Distributed sources Projects span multiple institutions, nations Distributed analysis platforms Grid computing Distributed data storage Minimize risk of data loss, optimize access Distributed users Caching of data near user Multiple stages of data life cycle Data repurposing for use in broader context

6 Cloud Storage Institutional Repositories Federal Repositories Carolina Digital Repository Texas Digital Library National Climatic Data Center National Optical Astronomy Observatory

7 Data Processing Pipelines Preservation Environment Ocean Observatories Initiative NARA Transcontinental Persistent Archive Prototype Carolina Digital Repository Large Synoptic Survey Telescope Digital Library Texas Digital Library French National Library Data Grid Teragrid Temporal Dynamics of Learning Center Australian Research Collaboration Service Taiwan National Archive

8 8 Data Life Cycle Project Collection Private Local Policy Data Grid Shared Distribution Policy Digital Library Published Description Policy Data Processing Pipeline Analyzed Service Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Each stage adds new policies for a broader community Virtualize the stages of data life cycle through evolution of policies Interoperability across data life cycle representations Each stage of the data life cycle re-purposes the original collection

9 9 Tracing the Data Life Cycle Collection Creation using a Data Grid Data manipulation / Data ingestion Processing Pipelines Pipeline processing / Environment administration Data Grid Policy display / Micro-service display / State information display / Replication Digital Library Access / Containers / Metadata browsing / Visualization Preservation Environment Validation / Audit / Federation / Deep Archive / SHAMAN

10 10 Goal - Generic Infrastructure Manage all stages of the data life cycle Data organization Data processing pipelines Collection creation Data sharing Data publication Data preservation Create reference collection against which future information and knowledge is compared Each stage uses similar storage, arrangement, description, and access mechanisms

11 11 Concept Roadmap Purpose - reason a collection is assembled Properties - attributes needed to ensure the purpose Policies - enforce and maintain required properties Procedures – computer functions to implement Policies State information - results of applying procedures (iCAT) Assessment criteria - validate that state information conforms to desired purpose Federation – interoperate w/shared logical name spaces These are the required elements for data life cycle virtualization

12 12 Policy-based Management Each data life cycle stage is driven by extensions of management policies to address broader user communities Data arrangement Project policies Data analysis Processing pipeline standards Data sharing Research collaborations Data publication Discipline standards Data preservation Reference collection Reference collections need to be preserved and interpretable by future generations, most stringent standard Data grids - integrated Rule Oriented Data System

13 13 iRODS - Policy-based Management Turn Policies into computer-actionable Rules Compose Rules by chaining Micro-services Manage state information (in iCAT metadata catalog) as attributes on namespaces: Files / collections /users / resources / rules Validate assessment criteria Queries on state information, parsing audit trails Automate administrative functions Enable scaling to today's massive collections

14 14 User w/ Client Can Search, Access, Add and Manage Data & Metadata Access distributed data with Web-based Browser or iRODS GUI or Command Line clients. Overview of iRODS Architecture iRODS Data Server Disk, Tape, etc. iRODS Metadata Catalog Track information iRODS Data System iRODS Rule Engine Tracks Policies

15 iput../src/irm.c - Checks 10 Policy hooks when file put into iRODS brick14:10900:ApplyRule#116:: acChkHostAccessControl brick14:10900:GotRule#117:: acChkHostAccessControl brick14:10900:ApplyRule#118:: acSetPublicUserPolicy brick14:10900:GotRule#119:: acSetPublicUserPolicy brick14:10900:ApplyRule#120:: acAclPolicy brick14:10900:GotRule#121:: acAclPolicy brick14:10900:ApplyRule#122:: acSetRescSchemeForCreate brick14:10900:GotRule#123:: acSetRescSchemeForCreate brick14:10900:execMicroSrvc#124:: msiSetDefaultResc(demoResc,null) brick14:10900:ApplyRule#125:: acRescQuotaPolicy brick14:10900:GotRule#126:: acRescQuotaPolicy brick14:10900:execMicroSrvc#127:: msiSetRescQuotaPolicy(off) brick14:10900:ApplyRule#128:: acSetVaultPathPolicy brick14:10900:GotRule#129:: acSetVaultPathPolicy brick14:10900:execMicroSrvc#130:: msiSetGraftPathScheme(no,1) brick14:10900:ApplyRule#131:: acPreProcForModifyDataObjMeta brick14:10900:GotRule#132:: acPreProcForModifyDataObjMeta brick14:10900:ApplyRule#133:: acPostProcForModifyDataObjMeta brick14:10900:GotRule#134:: acPostProcForModifyDataObjMeta brick14:10900:ApplyRule#135:: acPostProcForCreate brick14:10900:GotRule#136:: acPostProcForCreate brick14:10900:ApplyRule#137:: acPostProcForPut brick14:10900:GotRule#138:: acPostProcForPut brick14:10900:GotRule#139:: acPostProcForPut brick14:10900:GotRule#140:: acPostProcForPut

16 16 Scale of iRODS Data Grid Number of files Desktop to 10s to 100s of millions of files Size of data Desktop to 100s of terabytes to petabytes of data Number of policy enforcement points 64 actions define when policies are checked System state information 112 metadata attributes of system information per file Number of functions 185 composable iRODS Micro-services Number of storage systems that are linked Desktop to 10s to 100 storage resources Number of data grids that can interoperate Federation of 10s of data grids

17 17 User With Client Views & Manages Data My Data Disk, Tape, Database, Filesystem, etc. The iRODS Data System can install in a “layer” over existing or new data, letting you view, manage, and share part or all of diverse data in a unified Collection. iRODS Shows Unified “Virtual Collection” Project Data Disk, Tape, Database, Filesystem, etc. User Sees Single “Virtual Collection” Reference Data Remote Disk, Tape, Filesystem, etc.

18 18 Organize Distributed Data into a Sharable "Virtual" Collection Project repository MotifNet - manage collection of analysis products Institutional repository Carolina Digital Repository for UNC collections Regional collaboration RENCI Data Grid linking resources across North Carolina National collaboration NSF Temporal Dynamics of Learning Center Australian Research Collaboration Service National Library French National Library National Archive NARA Transcontinental Persistent Archive Prototype, Taiwan International collaboration BaBar High Energy Physics (SLAC-IN2P3) National Optical Astronomy Observatory (Chile-US)

19 19 Infrastructure Independence Manage properties of the collection independently of the choice of technology Access, authentication, authorization, description, location, distribution, replication, integrity, retention Enforce policies globally at all storage locations Rule Engine resident at each storage site Apply procedures at each remote storage site Chain encapsulated operations into workflows Infrastructure independence enables evolution to new technology without interruption Integrate new access methods, new storage systems, new network protocols, new authentication systems

20 20 Data Virtualization Storage System Storage Protocol Access Interface Standard Micro-services Data Grid Map from actions requested by access method to standard set of iRODS Micro- services. Map standard Micro- services to standard operations. Map the operations to protocol supported by operating system. Standard Operations

21 21 Data Grid Security Manage global name spaces for: {users, files, storage} Assign access controls as constraints imposed between two logical name spaces Access controls remain invariant as files are moved within the data grid Controls on: Files / Storage systems / Metadata Authenticate each user access PKI, Kerberos, challenge-response, Shibboleth Use internal or external identity management system Authorize all operations ACLs (Access Control Lists) on users and groups Separate condition for execution of each Rule Internal approval flags (e.g. IRB) within a Rule

22 NOAO Zone Architecture Archive Telescope

23 Ocean Observatories Initiative Sensors Cloud Computing External Repositories Cloud Storage Cache Message Bus Aggregate sensor data in cache SuperComputer Event Detection Remote locations Simulations Digital Library Archive Clients Remote Users iRODS Data Grid Multiple Protocols Large-scale workflows from real-time data to steerable instruments, dig. Library.

24 Access: Data Grid Clients

25 25 iRODS Distributed Data Management

26 26 Towards a Unified Data Space Sharing data across Space Organize data as a shared "virtual" Collection Define unifying properties for the Collection Sharing data across Time Preservation is communication with the future Preservation validates communication from the past Managing full Data Life Cycle Evolution of the Policies that govern a data Collection at each stage of the life cycle From data creation, to collection, to publication, to reference collection, to analysis pipeline

27 27 Intellectual Property Given generic infrastructure, intellectual property resides in the Policies and procedures that manage the Collection Consistency of the Policies Capabilities of the procedures Automation of internal Policy assessment Validation of desired Collection properties Automation of administrative tasks Interacting with DataDirectNetwork, HP, IBM, MicroSoft on commercial application of open source technology.

28 28 Societal Impact Many communities are assembling digital holdings that represent an emerging consensus: Common meaning associated with the data Common interpretation of the data Common data manipulation mechanisms The development of a consensus is described as Socialization of Collections An example is Trans-border Urban Planning

29 29 Social consensus for sharing data, policies, methods, practice Each community controls their own collection Policies Policies enforced at each storage location Explicit computer-actionable rules control type of federation interactions e.g. peer-to-peer, central archive, master-slave data distribution, chained data grids, deep archives Interoperability mechanisms support technology integration Community specific clients Bulk data export / import Cross registration of data Structured information resource drivers Federation of Collections

30 30 Data Grid Federation Motivation Improve performance, scalability, and independence To initiate Federation, each Data Grid administrator establishes trust and creates a remote user iadmin mkzone B remote Host:Port iadmin mkuser rods#B rodsuser Use cases Chained data grids - National Optical Astronomy Observatory Master-slave data grids - NIH BIRN Central archive - UK e-Science Deep archive - NARA TPAP Replication - NSF Teragrid

31 31 Federated irodsUser (use iRODS clients) Federated irodsUsers can upload, download, replicate, share, manage & track access to some or all data (depending on access permissions) in either zone. Accessing Data in Federated iRODS “Gets data to user” “With access permissions” “Finds the data” iRODS/ICAT system at University of North Carolina at Chapel Hill (renci zone) Two federated iRODS data grids iRODS/ICAT system at University of Texas at Austin (tacc zone)

32 32 Development Team DICE team Arcot Rajasekar - iRODS Development Lead Mike Wan - iRODS Chief Architect Wayne Schroeder - iRODS Product Mgr., Sr. Developer Bing Zhu - Fedora, Windows Mike Conway - Java (Jargon) Paul Tooby - Documentation, Foundation Sheau-Yen Chen - Data Grid Administration Reagan Moore - PI Preservation Richard Marciano - Preservation Development Lead Chien-Yi Hou - Preservation Micro-services Antoine de Torcy - Preservation Micro-services

33 33 Foundation Data Intensive Cyber Environments Foundation Nonprofit open source software development Promotes use of iRODS technology Supports standards efforts, intellectual prop. Coordinates international development efforts IN2P3 - quota and monitoring system King’s College London - Shibboleth Australian Research Collaboration Services - WebDAV Academia Sinica - SRM interface More information: http://diceresearch.org

34 34 iRODS Wiki More information… http://irods.diceresearch.org Descriptions, tutorials, documentation Publications / presentations Download of iRODS open source s.w. Performance tests irods-chat page


Download ppt "1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities."

Similar presentations


Ads by Google