Accessing HTRC Data. What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Information Analysis at Scale: HathiTrust Research Center Beth Plale Director, Data to Insight Center Co-Director, HathiTrust Research Center November.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
HathiTrust Research Center Architecture
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
1.Data categorization 2.Information 3.Knowledge 4.Wisdom 5.Social understanding Which of the following requires a firm to expend resources to organize.
May 17, 2011 DPLA Global Interoperability and Linked Data Workshop Building a Public Research Center for the HathiTrust Digital Library Robert H. McDonald.
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
HathiTrust Research Center Tools SHARC: Secure HathiTrust Analytics Research Commons Dirk Herr-Hoyman HTRC Operations Manager + Architect Indiana University.
Elephant in the Room: Scaling Storage for the HathiTrust Research Center Robert H. McDonald Associate Dean for Library Technologies Deputy.
Knowledge Portals and Knowledge Management Tools
Web-Enabling the Warehouse Chapter 16. Benefits of Web-Enabling a Data Warehouse Better-informed decision making Lower costs of deployment and management.
The Hathi Trust Research Center and tool builders John Unsworth (with Beth Plale, Scott Poole, Robert McDonald, and others) Project Bamboo Corpora Space.
Computational Research and Copyright John Unsworth BNN Future of the Academy Speaker Series MIT Faculty Club May 25, 2012.
© 2012 IBM Corporation Build a low-touch, highly scalable cloud with IBM SmartCloud Provisioning.
Module 1 Introduction to Managing Microsoft® Windows Server® 2008 Environment.
Opensource for Cloud Deployments – Risk – Reward – Reality
Module 5: Managing Public Folders. Overview Managing Public Folder Data Managing Network Access to Public Folders Publishing an Outlook 2003 Form Discussion:
National Computational Science National Center for Supercomputing Applications National Computational Science MyProxy: An Online Credential Repository.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Application Provider Visualization Access Analytics Curation Collection.
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
Using the SAS® Information Delivery Portal
HTRC API Overview Yiming Sun. HTRC Architecture Data API Portal access Direct programmatic access (by programs running on HTRC machines) Security (OAuth2)
HathiTrust Research Center Dedicated to provision of computational access to comprehensive body of published works for scholarship and education.
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
| nectar.org.au NECTAR TRAINING Module 10 Beyond the Dashboard.
HTRC Workshop 101 THATCamp Gainesville April 24, 2014.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
HATHI TRUST RESEARCH CENTER Building Collections and Analyzing Data Stacy Kowalczyk.
HathiTrust Research Center Architecture Overview Robert H. McDonald Executive Committee-HathiTrust Research Center (HTRC) Deputy Director-Data.
Collection and Data Overview Jeremy York Stacy Kowalczyk.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
HathiTrust Research Center Architecture Data subsystem.
PROGRESS: ICCS'2003 GRID SERVICE PROVIDER: How to improve flexibility of grid user interfaces? Michał Kosiedowski.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
Internet Architecture and Governance
CaGrid Overview and Core Services caGrid Knowledge Center February 2011.
K E Y : SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Transformation Provider Visualization Access Analytics Curation Collection.
National Computational Science National Center for Supercomputing Applications National Computational Science GSI Online Credential Retrieval Requirements.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
Using Heat to Deploy and Manage Applications in OpenStack Trevor Roberts Jr, VMware, Inc. CNA1763 #CNA1763.
| nectar.org.au NECTAR TRAINING Module 9 Backing up & Packing up.
HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
IT Enablement Approaches Large Business may have hundreds of processes to be enabled by IT. Several Types of Application may be deployed –Departmental.
K E Y : DATA SW Service Use Big Data Information Flow SW Tools and Algorithms Transfer Hardware (Storage, Networking, etc.) Big Data Framework Scalable.
Role Activity Sub-role Functional Components Control Data Software.
The Data Capsule for Non-Consumptive Research Beth Plale, Atul Prakash, Geoffrey Fox, Robert H. McDonald A Proposal to the Alfred P. Sloan Foundation HTRC.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
WP5 – Infrastructure Operations Test and Production Infrastructures StratusLab kick-off meeting June 2010, Orsay, France GRNET.
Introductory Tutorial: OpenStack, Chef, Hadoop, Hbase, Pig I590 Data Science Curriculum Big Data Open Source Software and Projects September Geoffrey.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
OPENSTACK Presented by Jordan Howell and Katie Woods.
Accessing the VI-SEEM infrastructure
PowerPoint presentation
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
Principles of Computer Security
Scalable Web Apps Target this solution to brand leaders responsible for customer engagement and roll-out of global marketing campaigns. Implement scenarios.
AWS COURSE DEMO BY PROFESSIONAL-GURU. Amazon History Ladder & Offering.
OpenStack Ani Bicaku 18/04/ © (SG)² Konsortium.
DIGITAL LIBRARY.
OpenStack-alapú privát felhő üzemeltetés
HathiTrust And Its Research Center
Presentation transcript:

Accessing HTRC Data

What is Hathitrust Research Center? A collaborative research center launched jointly by Indiana University and the University of Illinois, along with the HathiTrust Digital Repository Computational access for non-profit and educational users to published works stored within HathiTrust Cutting-edge software tools and cyberinfrastructure to enable advanced computational access to massive amounts of digital text

HathiTrust Corpus at a Glance pubic domain copyrighted 27% 73%

Corpus Usage Patterns

Corpus Usage Patterns (cont’d) Chapter 1 Page IV Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………## Table of Contents 1………….# 2…………##

Agent framework Page/volume tree (file system) Authoritative volume store (Cassandra) SEASR analytics service Web portal Desktop SEASR client Task deployment WSO2 registry services, collections, data capsule images Solr indexes HathiTrust corpus rsync WSO2 Enterprise service bus Future Grid NCSA local resources Penguin on Demand Replicated volume stores Programmatic access CI logon (NCSA) Access control (e.g. Grouper) University of Michigan Meandre Orchestration Agent instance Non-consumptive Data capsules NCSA HPC resources

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open Initial secure capsule VM instance HTRC Corpus storage not mounted Computation result storage not mounted Network is open, allowing any outgoing traffics from VM Trusted Hypervisor (KVM) OpenStack Nova- compute This is a preview of Secure Capsule design. Project started in October 2011

Trusted Hypervisor (KVM) Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open OpenStack Nova- compute Before deploying analysis algorithm on the user’s behalf Trusted hypervisor takes a snapshot of the VM instance Trusted hypervisor stores the snapshot to a trusted storage (only accessible to the hypervisor) Take VM snapshot Store VM snapshot VM snapshot

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted Deploy user algorithm in the secure capsule VM Trusted hypervisor restricts the outgoing network traffic to authorized and trusted destinations only VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted Trusted hypervisor mounts the HTRC Corpus storage to the secure capsule VM Trusted hypervisor mounts the computation result storage to the secure capsule VM VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted As the user algorithm executes, it reads text from the HTRC Corpus and writes the computation result out to the storage The restricted network interface prevents the user algorithm from leaking any confidential data VM snapshot User algorithm Read text Write result Trusted Hypervisor (KVM) OpenStack Nova- compute

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Restricted After user algorithm finishes, the trusted hypervisor unmounts the HTRC Corpus storage and Computation result storage again from the VM instance VM snapshot User algorithm Trusted Hypervisor (KVM) OpenStack Nova- compute

Secure Capsule VM HathiTrust Corpus Computation Result Storage Trusted Snapshot Storage Network Open Trusted hypervisor retrieves the snapshot from the trusted storage and restores the VM using the snapshot – this effectively erases any data that may be lingering in the VM from executing the user algorithm Trusted hypervisor lifts the restriction from the network interface so any outgoing traffic is again allowed VM snapshot Restore VM snapshot Retrieve VM snapshot Trusted Hypervisor (KVM) OpenStack Nova- compute