Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved www.CambridgeComputer.com – 781-250-3000 End to End Life Cycle Management.

Similar presentations


Presentation on theme: "© Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved www.CambridgeComputer.com – 781-250-3000 End to End Life Cycle Management."— Presentation transcript:

1 © Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved www.CambridgeComputer.com – 781-250-3000 End to End Life Cycle Management for Research Data Capturing Metadata Throughout the Research Pipeline and Facilitating the Handoff to Formal Curation Jacob Farmer, CTO Cambridge Computer

2 2 A Little Background On Cambridge Computer

3 3 Founded in 1991 as a boutique integrator for backup and archive solutions Approximately 75 employees nationwide Clients of all shapes and sizes across all industries Particularly strong in research and higher ed Industry-wide reputation for: defining best practices for enterprise class data protection, and for the early adoption of next generation storage solutions A unique business model that allows us to straddle the fence between academia and industry

4 4 End to End Life Cycle Management for ResearchPASIG - 2013 Seminars and Workshops Through The Usenix Association Tiered Storage and Archiving: Best Practices for Data Life Cycle Management and Digital Preservation Cornell, Dartmouth, Duke, Harvard, Penn LISA Data Storage Day Storage Virtualization Application Acceleration with Solid State A Crash Course in Object Storage LISA Conference, Broad Institute, Georgia State, University Maryland, Davenport, Princeton

5 5 Our Product: Starfish

6 6 End to End Life Cycle Management for ResearchPASIG - 2013 Our Project – Defining Best Practices for File Management Inspiration for our project comes from SRB/IRODS Bring parts of the SRB/IRODS vision to reality –Define a general purpose feature set –Intuitive user interface –Simplified API Inspiration also comes from numerous home grown solutions in our client base. The paradigm: Stat() your file systems Make database records for each file and/or directory Relate metadata to the file and directory records Report and/or take action

7 7 End to End Life Cycle Management for ResearchPASIG - 2013 Starfish - *FS Virtual Global File System It’s not really a file system, but it looks like one and serves as a hierarchical catalog of files Like a file system CIFS and POSIX permissions File system attributes and extended attributes But more User specified metadata Persistent addresses Versioning Point in time collections

8 8 End to End Life Cycle Management for ResearchPASIG - 2013 Basic Starfish Topology

9 9 End to End Life Cycle Management for ResearchPASIG - 2013 Targetted Use Cases 1)Data life cycle management for unstructured data at very large scale Scientific research data Media / entertainment workflows Engineering data 2)Storage middleware for digital asset management systems at very large scale Fixity automation Backup restore Tiered storage Persistent file addresses / links Cloud interface

10 10 End to End Life Cycle Management for ResearchPASIG - 2013 Typical Content Management “Stack”

11 11 End to End Life Cycle Management for ResearchPASIG - 2013 Inserting File System Middleware

12 12 End to End Life Cycle Management for ResearchPASIG - 2013 Simple Storage Workflow While Mirroring File Systems to Object Store

13 13 End to End Life Cycle Management for ResearchPASIG - 2013 Metadata is the Great Enabler Collaboration How else would researchers know what to do with one another’s data? How can data be organized to meet different groups’ needs? Storage management policies How does a storage management system know what to do with your files? File system attributes are not descriptive enough. Preservation / retrieval / provenance How do you know what to keep? How do you find it again? How do you know what it was used for and when? Reporting / chargeback File system permissions are not descriptive enough.

14 14 End to End Life Cycle Management for ResearchPASIG - 2013 What Would a Metadata System for Research Data Look Like? Very flexible Allows scientists to work the way they want to work Out of the data path The system cannot introduce latency to file I/O Enormous scale Billions of files, Petabytes of capacity, 1000s of file systems Device / vendor independence Must work with all storage devices, object stores, clouds, etc. API driven

15 15 End to End Life Cycle Management for ResearchPASIG - 2013 The Real Trick – Getting the Metadata The Golden Rule of Data Preservation – “Preserve at the time of creation” Translation: Capture metadata throughout the research pipeline Perhaps capture metadata when storage is provisioned The presumes that there is a structured process for provisioning storage Capture metadata through an API This requires a simple API that anyone can use Programmatically extract metadata from file headers, tags, and content Capture metadata through a GUI Try to create incentives for users to key in metadata

16 16 Getting from Here to There

17 17 End to End Life Cycle Management for ResearchPASIG - 2013 Problem Statements for Research Data Management Scientists don’t want to enter metadata No one wants to pay for long term storage Data management planning disconnect between grant applicants and their institutions There are more pressing problems related to storing data Collaboration Cost control: Chargeback, Showback, Tiering Backup Organizational gridlock Conflicting priorities Unspecific mandates

18 18 End to End Life Cycle Management for ResearchPASIG - 2013 Yes, We Too Have a Triangle!

19 19 End to End Life Cycle Management for ResearchPASIG - 2013 Where it Starts: Scalable and Flexible Backup/Archive Backup ClientsDisk-Based Object Storage Tape Archive NAS NAS or File Server Cloud Service

20 20 How To Play

21 21 End to End Life Cycle Management for ResearchPASIG - 2013 Looking for Collaborators The ideal collaborator: Has an immediate need that is within our current feature set and scale –This tells us that you can/will invest time with us Has additional needs that will put us to test Is an existing client of Cambridge Computer, or –Is willing to become one, or –Is able to contribute some funds –Is able to make a meaningful investment in time If not now, maybe next year! Email me: jfarmer@CambridgeComputer.com


Download ppt "© Copyright 2009-2013, Cambridge Computer Services, Inc. – All Rights Reserved www.CambridgeComputer.com – 781-250-3000 End to End Life Cycle Management."

Similar presentations


Ads by Google