Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2012 IBM Corporation 1 ENSURE: Enabling kNowledge Sustainability, Usability and Recovery for Economic value Presenter: Michael Factor

Similar presentations


Presentation on theme: "© 2012 IBM Corporation 1 ENSURE: Enabling kNowledge Sustainability, Usability and Recovery for Economic value Presenter: Michael Factor"— Presentation transcript:

1 © 2012 IBM Corporation 1 ENSURE: Enabling kNowledge Sustainability, Usability and Recovery for Economic value Presenter: Michael Factor The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/ ) under grant agreement n°

2 © 2011 IBM Corporation 2 Enabling kNowledge Sustainability, Usability and Recovery for Economic value 3 4 INNOVATIONSUSE CASES Healthcare Clinical Studies Financial Services EVALUATE Cost and Value AUTOMATE Preservation Lifecycle SCALE using ICT innovations PROTECT Content-aware data protection A 3-year IP project started Feb 2011

3 © 2011 IBM Corporation 3 ENSURE: Key Technical Innovations Evaluate  Automate  Scale  Protect Requirements EvaluateAutomateScaleProtect Access Deploy External Events Flow Events Ontology Cost Value Quality Cloud Virtual appliance Anonymi- zation

4 © 2011 IBM Corporation 4 ENSURE: Key Technical Innovations Evaluate  Automate  Scale  Protect Requirements EvaluateAutomateScaleProtect Access Deploy External Events Flow Events Ontology Cost Value Quality Cloud Virtual appliance Anonymi- zation

5 © 2012 IBM Corporation 5 5 Evaluate Cost and Value – InputEvaluate Cost and Value – Output

6 © 2012 IBM Corporation 6 Evaluate Cost and Value – Process Configurator Economic Performance Engine Preservation Plan Optimizer Translation Rules Quality Engine Cost/risk Engine Data Repositories Configuration Selection Administrator Requirements (Re)Deploy Solution ENSURE Automate

7 © 2012 IBM Corporation 7 Evaluate cost and value: Preservation Plan Optimizer COE QOE Genetic algorithm generates results based upon engines Really n-dimensions The user chooses a solution from the Pareto frontier No dimension can be improved without degrading at least one other dimension Quality Cost

8 © 2012 IBM Corporation 8 ENSURE: Key Technical Innovations Evaluate  Automate  Scale  Protect Requirements EvaluateAutomateScaleProtect Access Deploy External Events Flow Events Ontology Cost Value Quality Cloud Virtual appliance Anonymi- zation

9 © 2012 IBM Corporation 9 Automate Preservation Lifecycle: Preservation Data Aware Lifecycle Management (PDALM) Workflow Engine 9  PDALM: Controls system activities –Manage workflow of the information being preserved –Execute preservation plan (built by the Configurator) –Handle notifications and interaction with the administrator Example: Workflow for ingest

10 © 2012 IBM Corporation 10 Automate Preservation Lifecycle: Event engine Configurator Event Engine Manages, concurrency, priority and impact/severity of events Listens for preservation related events Notifies relevant ENSURE components PDALM Monitored system behavior Economic Data/format Regulatory Standards Feeds Scale

11 © 2012 IBM Corporation 11 Automate preservation lifecycle: ontology update Select ontology to update Upload a new version and display potential system impacts Apply new ontology and update system

12 © 2012 IBM Corporation 12 ENSURE: Key Technical Innovations Evaluate  Automate  Scale  Protect Requirements EvaluateAutomateScaleProtect Access Deploy External Events Flow Events Ontology Cost Value Quality Cloud Virtual appliance Anonymi- zation

13 © 2012 IBM Corporation 13 Scale: What is a cloud, why is it interesting, and what are the issues? “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources … that can be rapidly provisioned and released with minimal management effort or service provider interaction.” –US National Institute of Standards and Technology, Information Technology Laboratory Benefits  Cost Savings –Economies of scale, utilization improvement and standardization  Speed and Agility  Pay-as-you-go for usage Issues for preservation  Rich metadata support, e.g., no search  Differences in security models  Encryption may limit preservation actions  Compute near the storage (storlets)  Logical connections among objects in the same and different clouds  Standards Enterprise A Enterprise B Enterprise C Community Cloud Services User A User BUser C User D User E Public Cloud Services Enterprise Data Center Private Cloud Cloud Delivery Models

14 © 2012 IBM Corporation 14  Map OAIS AIPs and the links among AIPs to the cloud data model  Manage object’s inter-relationship and referential integrity  Map objects to one or more clouds Scale: Mapping Data to Multiple Clouds Cloud A Cloud B Protect

15 © 2012 IBM Corporation 15 Request to access content with VA Instantiate VA Compute Cloud Private Application Library Storage Cloud Extract content Into VA ENSURE Give user access to VA with content Scale: Accessing Content with a Virtual Appliance (VA)

16 © 2012 IBM Corporation 16 ENSURE: Key Technical Innovations Evaluate  Automate  Scale  Protect Requirements EvaluateAutomateScaleProtect Access Deploy External Events Flow Events Ontology Cost Value Quality Cloud Virtual appliance Anonymi- zation

17 © 2012 IBM Corporation 17 Content-aware data protection: Masked/Anonymized Data  Data Owner Requirement: –Data should be anonymized and cannot be associated with a specific individual  Example: –Living people from London who fought in WWII is becoming more and more identifiable hospital bank factory Data Receivers Data Owners Telco Medical Research Software testing Statistical Analysis Pharma Research Full data Masked data Masking Services

18 © 2012 IBM Corporation 18 Summary  Architect and build the next generation preservation system, ensuring knowledge is sustained and can be recovered for future value  Key Innovations: –Evaluate Cost and Value supporting business decisions –Automate Preservation Lifecycle –Scale using ICT innovations –Content-aware data protection  Three use cases to demonstrate future preservation –Healthcare, clinical trials, and finance use  Status –Initial end to end demo of two use cases in the first year –Emphasis on evolution along time for the second year

19 © 2012 IBM Corporation 19 Thank You

20 © 2012 IBM Corporation 20 Backup

21 © 2012 IBM Corporation 21 Open Archival Information System (OAIS) ISO:14721:2002 Functional Model Information Model SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package Archival Information Package

22 © 2012 IBM Corporation 22 What is a cloud and why is it interesting? “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” –US National Institute of Standards and Technology, Information Technology Laboratory Key features:  On-demand  Shared  Automated  Network access Benefits  Cost Savings –Economies of scale, utilization improvement and standardization  Speed and Agility  Pay-as-you-go for usage Investment per GB vs. Quantity of Information

23 © 2012 IBM Corporation 23 Source:”Cloud will Transform Business as We Know It: The Secret’s in the Source”, Hfs Research, and the London School of Economics, December, 2010 How Much of a Concern are the Following Business Risks Posed by Cloud Business Services to your Business Function, Compared to Your Existing Risks for Non-Cloud Business Services? Security, privacy, lack of control in data placement, lock-in and compliance are key concerns with cloud

24 © 2012 IBM Corporation 24 I BM’s five cloud delivery models Enterprise owned Either enterprise operation or 3 rd party Fixed price or time and materials services Internal network Dedicated assets 3 rd party owned and operated Centralized, secure delivery center Fixed price, time and materials, or pay as you go Internal network Dedicated assets Mix of shared and dedicated resources Shared facility and staff Pay as you go VPN access or public internet Shared resources Elastic scaling Pay as you go Public internet Enterprise Data Center Private Cloud Enterprise Data Center IBM operated Managed Private Cloud IBM owned and operated Hosted Private Cloud User A User BUser C User D User E Public Cloud Services Enterprise A Enterprise B Enterprise C Shared Cloud Services Community Clouds should be considered by memory institutions

25 © 2012 IBM Corporation 25 Scale: Cloud Gap Analysis  Clouds considered –Amazon S3 and EC2 (enterprise) –Open Stack Swift and Nova (open source) –VISION Cloud (EC research)  Some common shortcomings for long term preservation –Limited support of user metadata –Lack of support for searches on metadata –Differences in supported security models –Encryption models limit preservation actions –Lack of compute near the storage support –Lake of support for logical connections among objects in the same and different clouds

26 © 2012 IBM Corporation 26 Scale: Computational Storage  Cloud storage generally utilizes: –server-based storage with powerful CPUs –Serves big data accessed from anywhere over the WAN –==> add computational modules (storlets) to the cloud storage  What is a storlet? –Restricted module performed in the storage close to the data  Why/ When use storlets? –Reduce bandwidth –Security – reduce exposure of private data –Preservation – data in storage may change and be more up-to-date –Expose generic functions that can be used by many applications  Example Storlets: –Transformation –Annonymization –Data Mining –Fixity check –Encryption/Secure delete

27 © 2012 IBM Corporation 27 Scale: Use of Open Standards and Open Source  jClouds (open source) to access multiple clouds  Cloud Data Management Interface (CDMI) (standard interface) for cloud access and management –Contribute CDMI support to jCloudes  OpenStack Swift (open source) as private cloud infrastructure

28 © 2012 IBM Corporation 28 Content-aware data protection: Vocabulary of an Access Policy Who are the actors (doctor, nurse, gynecologist,...) What are the actions they can take (create, read, append, update,...) What are the data objects that are subject to access policies (PHR, GI, What are the purposes for which access is given (treatment, research, billing,...) What are the types of conditions mentioned in the access rules (time, place, consent,...) What types of obligations must be fulfilled before access is granted (external: notify, consent,...; data-related: anonymize,...) Actor has permission to take action on data object for the purpose under the conditions with obligations.

29 © 2012 IBM Corporation 29  Share data with changes:  Data Owner Requirement: –Data should be anonymized and cannot be associated with a specific individual  Example: –Living people from London who fought in WWII is becoming identifiable as years pass by. Content-aware data protection: Compromise hospital bank factory De-Identification Data Receiver Data Owner


Download ppt "© 2012 IBM Corporation 1 ENSURE: Enabling kNowledge Sustainability, Usability and Recovery for Economic value Presenter: Michael Factor"

Similar presentations


Ads by Google