Dave Cavena HPA Tech Retreat February 16, 2012. 15 min = 3 Points  Longevity  Accuracy - Readability  Cost.

Slides:



Advertisements
Similar presentations
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Term Project Grade 9 Section B Due december 18 Find and research one Emerging technology not studied in class. It can be a prototype or already available.
Digital Collections: Storage and Access Jon Dunn Assistant Director for Technology IU Digital Library Program
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
CSCE430/830 Computer Architecture
A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
Digital archival storage for the University of Michigan Library collections.
Data Storage and Security Best Practices for storing and securing your data The goal of data storage is to ensure that your research data are in a safe.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Everything your business needs to know but probably doesn’t.
1 Introduction to Archiving Movies in a Digital World Dave Cavena, Sun Microsystems January, 2007.
Section Disk Failures Kevin Grant
1 Archival Storage for Digital Libraries Arturo Crespo Hector Garcia-Molina Stanford University.
1 CS 502: Computing Methods for Digital Libraries Lecture 27 Preservation.
1 Stanford Archival Repository Project Brian Cooper Arturo Crespo Hector Garcia-Molina Department of Computer Science Stanford University.
Back Up and Recovery Sue Kayton February 2013.
RAID Systems CS Introduction to Operating Systems.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Quantum Confidential | LATTUS OBJECT STORAGE JANET LAFLEUR SR PRODUCT MARKETING MANAGER QUANTUM.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
Session 3 Windows Platform Dina Alkhoudari. Learning Objectives Understanding Server Storage Technologies Direct Attached Storage DAS Network-Attached.
Management Information Systems
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
CSE 321b Computer Organization (2) تنظيم الحاسب (2) 3 rd year, Computer Engineering Winter 2015 Lecture #4 Dr. Hazem Ibrahim Shehata Dept. of Computer.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Digital Crime Scene Investigative Process
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
© 2001 by Prentice Hall11-1 Local Area Networks, 3rd Edition David A. Stamper Part 4: Installation and Management Chapter 11 LAN Administration: Backup.
1 University of Palestine Information Security Principles ITGD 2202 Ms. Eman Alajrami 2 nd Semester
Information Security Principles Assistant Professor Dr. Sana’a Wafa Al-Sayegh 1 st Semester ITGD 2202 University of Palestine.
Drowning in a Sea of Paper Document Archiving With Technology Presented by Arthur J. Staerk AccuScan
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Government Information Preservation Working Group Highlights of Digital Preservation Survey December 16 th 2003 Oliver Slattery Information Access Division.
Redundant Array of Independent Disks.  Many systems today need to store many terabytes of data.  Don’t want to use single, large disk  too expensive.
The Classic Amiga Preservation Society (CAPS) Software Preservation Society István Fábián.
Component 8/Unit 9bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9b Creating Fault Tolerant.
CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Disaster Recovery.
Storage of digital objects Adolf Knoll National Library of the Czech Republic
Verification & Validation F451 AS Computing. Why check data? It’s useless if inaccurate. Also, wrong data: Can be annoying Can cost a fortune Can be dangerous.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
Hosted by Creating RFPs for Tape Libraries Dianne McAdam Senior Analyst and Partner Data Mobility Group.
13- 1 Chapter 13.  Overview of Sequential File Processing  Sequential File Updating - Creating a New Master File  Validity Checking in Update Procedures.
SIR SONS IN RETIREMENT Computer User Group. HARD DRIVE FORMATTING LOW-LEVEL FORMATTING PARTITIONING HIGH-LEVEL FORMATTING –ERASER:
Verification & Validation
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
ICT Unit 3 Storage Devices and Media. What is backing up of data? Backing up refers to the copying of file to a different medium It’s useful if in case.
Digital Asset Management Systems and Digital Preservation EUAN COCHRANE – DIGITAL PRESERVATION MANAGER YALE UNIVERSITY LIBRARY.
Election Assistance Commission 1 TGDC Meeting High Level VVSG Requirements: What do they look like? February, 09, United States.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
CDP Technology Comparison CONFIDENTIAL DO NOT REDISTRIBUTE.
3. Storage devices and media By: me what is backing up of data? why back up data? WHAT?  Refers to the copying of files/data to a different.
A Technical View of Risk Assessment Methods for Backup Systems Bradley Wong Life Sciences Consulting Tustin, CA – USA DIA/All Hands: 12 February 2015.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
CS Introduction to Operating Systems
KEEPS – a system for UELMA preservation and security
The importance of computer backups
Integrating Disk into Backup for Faster Restores
KEEPS – a system for UELMA preservation and security
System Backup Strategies
3 Best Blu-ray Copy Software to Copy Blu-ray
Information In our Lives
Experiences and Outlook Data Preservation and Long Term Analysis
Digitisation in academic libraries: Experience from Makerere University Library, Kampala Uganda By Patrick Sekikome Presented at the CERN-UNESCO School.
Digital Storage Options – a Transitional Perspective: How Current Storage Technologies Can Facilitate Longevity and Access JTS 2007 Toronto.
The Ultimate Backup Solution.
ICOM 6005 – Database Management Systems Design
ACE – Auditing Control Environment
Presentation transcript:

Dave Cavena HPA Tech Retreat February 16, 2012

15 min = 3 Points  Longevity  Accuracy - Readability  Cost

Longevity  Film has proved an excellent media on which to archive images  The archive is the only part of the workflow that has not been digitized  We are archiving exactly not what we create and watch

Longevity  “the data is the master record” – Leon Silverman, Opening Remarks, JTS2004  “All polymers are subject to decay.” – W. Mark Ritchie, KINEMA, 1995  “Traditional analog methods of archiving cannot keep up with … the loss of analog process expertise.” – Dan Rosen, VP, WB, May 2007

Longevity  In a “perfect” archive, the archive would retain the content … forever  Nothing in the physical world could cause its deterioration  This cannot happen if content is attached to a physical carrier.  Only digital technology provides disintermediation

Accuracy - Readability  Digital technology – HW and SW – change often  YCM 3-strip lasts a long time  … recombination issues  … generational loss

Accuracy  Content integrity over extended time durations requires a managed archive  A Digital Archive without an automated management SYSTEM is not an archive – it’s sheet metal, plastic & media

Model Overview  Automated, rules-based software archive system  Automated computer tape libraries  Servers & front-end disk  Why tape?  Holo is too slow, disk too expensive  When tape is replaced the robots & price point will survive

Model Overview Location 1 Movie Copy 1 Movie Copy 2 Location 2 Movie Copy 3 Movie Copy 4

Accuracy  Irreplaceable content  Multiple copies  Multiple libraries  Automated audit, copy  Algorithmic assurance of bit integrity  Error Correction Codes (ECC)  Bit Error Detection  Bit Error Correction

Accuracy  ECC  Standard on tape drives  COTS technology  Bit Error Rates  Bit Error Rates (BER) differ by manufacturer  ECC undetected BER =  Four copies =  ECC uncorrected BER =  Four copies =

Accuracy  How big is the archive object?  DCDM + Trims & Outs … 20TB?  2 x Bytes  2 x bits  OCN … 100TB?  Bytes  bits

Accuracy  ECC BER  20 TB archive object  2 x bits = one bit lost in 2 x DCDM  100 TB archive object  bits = one bit lost in objects  Generational loss?  Minimum 20 generations of migration

Accuracy For this application it doesn’t matter how many times the data is accessed; how many generations of rewrite  ECC undetected = … on one copy of four in the archive  Failure probability one or more times during N accesses is = 1-( )N  For N less than 10 19, this is approximated by N*10 -19

Accuracy  Example – one archived copy  Assume a movie accessed 10 3 times  The chance of an uncorrectable bit error per read is * =  Model has four copies  10 3 * (2 * ) = 2 * … one bit lost in 2 * (10TB)  10 3 * = … one bit lost in (100TB)

Accuracy  Secure Hashing Algorithm – SHA-256  Checksum failure probability (approximately )  Four-copy BER =  One undetected bit loss in movies  Voting bit-by-bit  Can make a 10TB DCDM into 40 x 1TB files, 31 of which would have to be damaged to preclude rebuilding the original 10TB file

Accuracy  Software Reliability  A digital archive is a managed archive  Audit every bit every 6 months  Read in old format, rewrite in new format  Image data is not changed, only the file wrapper, application accessing the file, or operating system running the app.

Digital Archive  Longevity  Accuracy  Cost

Digital Cost  Conservative pricing model  List price, no depreciation or salvage  HW, SW, Maintenance, Media  150% uplift (Avg $728K/yr 20TB; $826K/yr 100TB)  Labor  Floor space  Electricity

What are the Film costs? *approximately

What are the Digital costs?  20 Features per year, 100 years – Average per-Century Cost  Includes uplift for Floor Space, Labor, Electricity Archive Object Size (TB) 150% Uplift Average Annual Uplift 200% Uplift Average Annual Uplift 100$75,000$826,000$83,000$1,102,000 20$54,000$729,000$65,000$972,000

Comparison – 100TB – 150% uplift Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 75,000$137,700,000 Delta (Digital Less)$ 55,000$122,300,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three

100TB Archive Object 150% Uplift Labor Electricity Floor Space Avg per-year Uplift: $826,000

Comparison – 100TB – 200% Uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 83,000$165,000,000 Delta (Digital Less)$ 47,000$95,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three

100TB Archive Object 200% Uplift Labor Electricity Floor Space Avg per-year Uplift: $1,102,000

Comparison – 20TB – 150% uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 54,000$121,000,000 Delta (Digital Less)$ 76,000$139,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three

20TB Archive Object 150% Uplift Labor Electricity Floor Space Avg per-year Uplift: $729,000

Comparison – 20TB – 200% Uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 65,000$146,000,000 Delta (Digital Less)$ 47,000$114,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three

20TB Archive Object 200% Uplift Labor Electricity Floor Space Avg per-year Uplift: $972,000

Comparison (100TB, 200% Uplift) 2,000 Titles + Repurposing Per-Title Per-Century Archive Cost 2,000-Title Century Archive Cost Per-Title Repurposing Cost One Repurposing per Title per Century Total: Archive + Repurpose Film$130, 000$260,000,000$65,000$130,000,000$390,000,000 Digital$ 83,000$166,000,000$7,000$14,000,000$180,000,000 Delta$ 47,000$94,000,000$58,000$116,000,000$210,000,000

Digital Archive  Longevity  Accuracy  Cost

Per-Title Per-Century Archive Cost 2,000-Title Century Archive Cost Per-Title Repurposing Cost One Repurposing per Title per Century Total: Archive + Repurpose Film$130, 000$260,000,000$65,000$130,000,000$390,000,000 Digital (100TB) $ 83,000$165,000,000$7,000$14,000,000$179,000,000 Delta$ 55,000$95,000,000$58,000$16,000,000$211,000,000 Longevity In a “perfect” archive, the archive would retain the content … forever Nothing in the physical world could cause its deterioration This cannot happen if content is attached to a physical carrier. Disintermediation Example – one archived copy –Assume a movie accessed 10 6 times –The chance of an uncorrectable bit error per read is –The chance of an uncorrectable bit error on any one of 10 6 reads is 10 3 * = Model has four copies –10 3 * (2 * ) = 2 * … one bit lost in 2 * (10TB) –10 3 * = … one bit lost in (100TB) Accuracy

If you remember only ONE thing… “Traditional analog methods of archiving cannot keep up with … the loss of analog process expertise.” – Dan Rosen, VP, WB, May 2007

Model Overview Location 1 Movie Copy 1 Movie Copy 2 Location 2 Movie Copy 3 Movie Copy 4

Thank You For further discussion: Google: “Archiving Movies in a Digital World” (2008) Grab me in the hall Breakfast Roundtable tomorrow Office: , cell: