Presentation is loading. Please wait.

Presentation is loading. Please wait.

UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair.

Similar presentations

Presentation on theme: "UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair."— Presentation transcript:

1 UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair

2 Outline UTRC Introduction/Current Status Research Data Requirements Current TACC storage infrastructure (Corral) New UTRC capabilities External services and partnerships Research and UTRC future

3 UT Research Cyberinfrastructure Collaborative effort initiated by Dr. Ken Shine, Vice Chancellor for Health Jay Boisseau (TACC), Brian Herman (UTHSCSA) co-chairs Assessment of research CI needs across system campuses Data Storage emerged as highest priority/biggest unmet need

4 UTRC Proposal Approved by UT Regents November 2010 Expanded Lonestar 4 for HPC needs Establish dedicated 10gb research network to all campuses Develop replicated, 5PB Research Data Repository

5 Storage Committee Activities Proposed iterative approach with pilot deployment in late 2011 1 st half of 2011 spent on requirements and architecture development Released RFP in June Vendor selected in August Installation in October Initial users ~December

6 Sidebar: Why The Cloud is not the answer Cloud storage costs = $1000s/TB/year Often not as reliable as advertised (Google, Amazon have both had major issues) Restrictive interfaces, lack of high- performance access Issues with institutional control, security integration, etc

7 Pilot UTRDR Deployment 5PB Raw storage in each of two installations Main installation at TACC added to existing data infrastructure Mirror installation at Arlington for replication High level of redundancy within each installation –Power supplies to storage controllers and servers

8 Research Data Requirements Persistent Storage is just the beginning High reliability/availability is key Complex, evolving security needs Importance of Collaboration Data Applications and Services Data Management and Analysis Also, it has to be cheap (or free)

9 Research Data Security HIPAA Compliance is a major goal of the UTRDR effort But HIPAA is just the beginning Intellectual property and research confidentiality issues are more fine-grained Long-term issues of availability/usability Tiers of access, change over time

10 Example Application Areas Biology –Biodiversity (natural history collections) –Phylogenetics Health Sciences –Medical Imaging –High-throughput sequencing Social Sciences –Economic and social analysis

11 TACC Corral Architecture Emphasis on large-scale storage, highly flexible service infrastructure Fast networks and heterogeneous systems = malleable service and storage platform Allows integration of UTRC hardware into an existing infrastructure Near-transparent migration for existing users Expansion improves reliability and availability

12 Corral Hardware and Services 1.2 Petabytes DataDirect SATA Disk 16 Dell Servers ~300 TB of heterogeneous disks and servers High-Performance Parallel File System, multiple databases, iRODS data management, replication to tape archive Multiple levels of access control Supports almost any imaginable data need

13 iRODS at TACC Distributed/Replicated data management Corral, Ranch, and offsite storage systems Extensible metadata support Policy/Rule-based automation and enforcement Used for sophisticated data management needs Provides wide variety of interfaces

14 Current Corral Usage >30 Data Allocations & Collections 350 Users at TACC and UT >500 External users accessing collections >500TB Research and Reference Data Data of all types and disciplines: –Plant specimens and omics, MRI, GIS, Simulations, Fish and Pottery, Economics and Medicine

15 Added Capabilities w/ UTRDR Synchronous replication Very high availability (weather, comet strikes) Tiers of storage and data management Huge performance boost (>80GB/sec) Accessibility from all UT System campuses HIPAA Compliance

16 UTRDR Pilot Access Accelerated access for early adopters Allows us to shake out bugs, assess readiness for production Helps to develop requirements present and future Research network performance assessment Expect to open to all UT System researchers early 2012

17 UTRDR Long-term sustainability After pilot phase, storage will be free to all Pis up to some small limit (5TB?) Additional storage will be available for cost- recovery fee per TB Currently only trying to recoup costs on an annual basis Long-term preservation costs are TBD but are of major interest

18 Fee-based Research Storage 2 Major types of service: –Simple storage (iSCSI, SCP/FTP) based on per- TB/year costs –Application services (databases, web applications, data management, etc) Provides fixed, relatively low costs that can be written into grant proposals Can include both disk and tape + offsite storage Long-term model for UTRDR

19 Existing/Upcoming Partnerships University of Alaska UC Berkeley University of North Texas Libraries Texas Digital Library University of Florida Indiana University NSF XSEDE – 15 Institutions

20 UTRC Plan 2012-2013 Initial production in early 2012 Design assessment and adjustment based on initial experiences Expansion proposal mid-2012 Significant expansion likely late 2012/early 2013 Ongoing assessment and design adjustments integral to the process

21 TACC Storage Research Data upload and ingest processes Storage reliability and management Data Integrity/Long-term planning Automated data management applications Wide-area storage and replication efforts in the NSF XSEDE project

22 Acknowledgements Dr. Ken Shine – UT System Dr. Patricia Hurn – UT System Jay Boisseau and Brian Herman Jerry York – UTHSCSA UTRC Storage Committee –Brian Grimm, Kevin Granhold, Huapei Chen, Wayne Mueller, Bill Sanns And many, many others

23 Q&A

Download ppt "UT Research Data Repository Chris Jordan UT Research Cyberinfrastructure Storage Committee Chair."

Similar presentations

Ads by Google