Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accessing HTRC Data. What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of.

Similar presentations


Presentation on theme: "Accessing HTRC Data. What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of."— Presentation transcript:

1 Accessing HTRC Data

2 What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Repository Computational access for non-profit and educational users to published works stored within HathiTrust Cutting-edge software tools and cyberinfrastructure to enable advanced computational access to massive amounts of digital text

3 HathiTrust Corpus at a Glance pubic domain copyrighted 27% 73%

4 Corpus Usage Patterns

5 Corpus Usage Patterns (cont’d) Chapter 1 Page IV Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………##

6 Agent framework Page/volume tree (file system) Authoritative volume store (Cassandra) SEASR analytics service Web portal Desktop SEASR client Task deployment WSO2 registry services, collections, data capsule images Solr indexes HathiTrust corpus rsync WSO2 Enterprise service bus Future Grid NCSA local resources Penguin on Demand Replicated volume stores Programmatic access CI logon (NCSA) Access control (e.g. Grouper) University of Michigan Meandre Orchestration Agent instance Non-consumptive Data capsules NCSA HPC resources

7 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open Initial secure capsule VM instance HTRC Corpus storage not mounted Computation result storage not mounted Network is open, allowing any outgoing traffics from VM Trusted Hypervisor (KVM) OpenStack Nova- compute This is a preview of Secure Capsule design. Project started in October 2011

8 Trusted Hypervisor (KVM) Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open OpenStack Nova- compute Before deploying analysis algorithm on the user’s behalf Trusted hypervisor takes a snapshot of the VM instance Trusted hypervisor stores the snapshot to a trusted storage (only accessible to the hypervisor) Take VM snapshot Store VM snapshot VM snapshot

9 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted Deploy user algorithm in the secure capsule VM Trusted hypervisor restricts the outgoing network traffic to authorized and trusted destinations only VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

10 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted Trusted hypervisor mounts the HTRC Corpus storage to the secure capsule VM Trusted hypervisor mounts the computation result storage to the secure capsule VM VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

11 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted As the user algorithm executes, it reads text from the HTRC Corpus and writes the computation result out to the storage The restricted network interface prevents the user algorithm from leaking any confidential data VM snapshot User algorithm Read text Write result Trusted Hypervisor (KVM) OpenStack Nova- compute

12 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted After user algorithm finishes, the trusted hypervisor unmounts the HTRC Corpus storage and Computation result storage again from the VM instance VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

13 Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open Trusted hypervisor retrieves the snapshot from the trusted storage and restores the VM using the snapshot – this effectively erases any data that may be lingering in the VM from executing the user algorithm Trusted hypervisor lifts the restriction from the network interface so any outgoing traffic is again allowed VM snapshot Restore VM snapshot Retrieve VM snapshot Trusted Hypervisor (KVM) OpenStack Nova- compute


Download ppt "Accessing HTRC Data. What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of."

Similar presentations


Ads by Google