1 Introduction to Archiving Movies in a Digital World Dave Cavena, Sun Microsystems January, 2007.

Slides:



Advertisements
Similar presentations
GCSE ICT Networks & Security..
Advertisements

RAID Oh yes Whats RAID? Redundant Array (of) Independent Disks. A scheme involving multiple disks which replicates data across multiple drives. Methods.
Term Project Grade 9 Section B Due december 18 Find and research one Emerging technology not studied in class. It can be a prototype or already available.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Backing up and Archiving Data Chapter 1. Introduction This presentation covers the following: – What is backing up – What is archiving – Why are both.
Getting started with hands-on preservation Paul Wheatley SPRUCE Project Manager University of
Chapter 16: Recovery System
Backup and Disaster Recovery (BDR) A LOGICAL Alternative to costly Hosted BDR ELLEGENT SYSTEMS, Inc.
Dave Cavena HPA Tech Retreat February 16, min = 3 Points  Longevity  Accuracy - Readability  Cost.
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
From Analog to Digital: Changes in Preservation Gregor Trinkaus-Randall Digital Commonwealth Conference Worcester, MA March 25, 2010.
Adding scalability to legacy PHP web applications Overview Mario A. Valdez-Ramirez.
DESIGNING A PUBLIC KEY INFRASTRUCTURE
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Chapter 4  Hash Functions 1 Overview  Cryptographic hash functions are functions that: o Map an arbitrary-length (but finite) input to a fixed-size output.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
1 Stanford Archival Repository Project Brian Cooper Arturo Crespo Hector Garcia-Molina Department of Computer Science Stanford University.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Quantum Confidential | LATTUS OBJECT STORAGE JANET LAFLEUR SR PRODUCT MARKETING MANAGER QUANTUM.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
COEN 252 Computer Forensics
SharePoint Portal Server 2003 JAMES WEIMHOLT WEIDER HAO JUAN TURCIOS BILL HUERTA BRANDON BROWN JAMES WEIMHOLT INTRODUCTION OVERVIEW IMPLEMENTATION CASE.
Network Topologies.
November 2009 Network Disaster Recovery October 2014.
LOGO an analysis of the system of a pharmacy hope © Hamed musallam Hussin shaalan Ibrahim alsharif
Security The Kingsway School. Accidental Data Loss Data can be lost or damaged by: Hardware failure such as a failed disk drive Operator error e.g. accidental.
Two or more disks Capacity is the same as the total capacity of the drives in the array No fault tolerance-risk of data loss is proportional to the number.
J.H.Saltzer, D.P.Reed, C.C.Clark End-to-End Arguments in System Design Reading Group 19/11/03 Torsten Ackemann.
David N. Wozei Systems Administrator, IT Auditor.
Challenges of Digital Media Preservation Karen Cariani, Director Media Library and Archives Dave MacCarn, Chief Technologist.
Chapter 9 Section 2 : Storage Networking Technologies and Virtualization.
Security Protocols and E-commerce University of Palestine Eng. Wisam Zaqoot April 2010 ITSS 4201 Internet Insurance and Information Hiding.
System Security Chapter no 16. Computer Security Computer security is concerned with taking care of hardware, Software and data The cost of creating data.
Guide to Computer Forensics and Investigations Fourth Edition
Managing Your Data: Backing Up Your Data Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
1 Designing Storage Architecture for Digital Collections 2012.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
13-1 Sequential File Processing Chapter Chapter Contents Overview of Sequential File Processing Sequential File Updating - Creating a New Master.
PACS in Radiology By Alanoud Al Saleh.
Got Data? A Guide to Data Preservation in the Information Age Written by Francine Berman Presented by Akadej Udomchaiporn.
Archiving and Preservation Michele Kimpton CEO, DuraSpace Bryan Beecher Director, ICPSR DuraSpace Webinar November 2, 2011.
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
Protecting Your Data With Just Get Backup, LLC. Agenda How important is your data – Acknowledging worst-case scenarios. Understanding that data backup.
© 2003, Parvat Infotech Private Limited Linux Bangalore / 2003 Backing Up for Eternity Dr. Martin Atkins Parvat Infotech Private Limited.
IT 221: Introduction to Information Security Principles Lecture 5: Message Authentications, Hash Functions and Hash/Mac Algorithms For Educational Purposes.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
ICT Unit 3 Storage Devices and Media. What is backing up of data? Backing up refers to the copying of file to a different medium It’s useful if in case.
Digital Asset Management Systems and Digital Preservation EUAN COCHRANE – DIGITAL PRESERVATION MANAGER YALE UNIVERSITY LIBRARY.
© 2014 VMware Inc. All rights reserved. Cloud Archive for vCloud ® Air™ High-level Overview August, 2015 Date.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
A strategic view of document and digital object management for the University of the Witwatersrand, Johannesburg Prof Derek W. Keats Deputy Vice Chancellor.
RAID TECHNOLOGY RASHMI ACHARYA CSE(A) RG NO
A Technical View of Risk Assessment Methods for Backup Systems Bradley Wong Life Sciences Consulting Tustin, CA – USA DIA/All Hands: 12 February 2015.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
Techno Diversity in Storage Architecture Strategies Steve Tongish
Integrating Disk into Backup for Faster Restores
Storage devices and media
What every server wants!
Experiences and Outlook Data Preservation and Long Term Analysis
UNIT IV RAID.
Prepared by Jaroslav makovski
Long-Lived Data Collections
STATEL an easy way to transfer data
ACE – Auditing Control Environment
Presentation transcript:

1 Introduction to Archiving Movies in a Digital World Dave Cavena, Sun Microsystems January, 2007

2 Agenda Overview Archiving Archived content integrity Proposed model Costs Alternatives? Summary Conclusion

3 Overview Has the time come to begin archiving movies digitally? Only archiving remains reliant on film Digital image archive technology is mature A viable, scalable, cost-effective COTS model What are the alternatives?

4 The stories of an Age Fiduciary responsibility A Digital Content Archive can store these assets without degradation forever Archiving

5 Chairman Vice Chairman Archiving Any movie archived in 1907 is playable in 2007 Will a celluloid movie archived in 2007 be playable in 2107? Is it time to start digital archiving of this irreplaceable content?

6 Archiving Will the Archive be the only time the story exists on film? What are celluloid archive and repurposing costs? A Digital Content Archive provides image and cost advantages over celluloid Can be accomplished with COTS Technology

7 Archived Content Integrity Irreplaceable content Multiple copies Multiple libraries Automated audit, copy Algorithmic assurance of bit integrity Error Correction Codes (ECC) Bit Error Detection Bit Error Correction

8 Archived Content Integrity ECC Standard on tape drives COTS technology Bit Error Rates* Bit Error Rates (BER) differ by manufacturer ECC undetected BER = Four copies = ECC uncorrected BER = Four copies = TB Digital Intermediate = bits One uncorrectable bit error in movies ( * ) * Sun T10000 drive

9 Archived Content Integrity Generational data integrity 20 generations of compute/disk front-end 5 generations of libraries Unknown generations of application file formats At least 12 rewrites of the content onto new media What is the generational impact on the algorithmic BER?

10 Archived Content Integrity For this application it doesn’t matter how many times the data is accessed; how many generations of rewrite Probability that the ECC will fail to correct damage during any given access is The probability it will fail one or more times during N accesses is 1 minus the probability that it will succeed N times in a row: 1-( )N For N less than 10 19, this is well approximated by N*10 -19

11 Archived Content Integrity Example Assume a movie accessed one million times The chance of an uncorrectable bit error per read is The chance of an uncorrectable bit error on any one of 10 6 reads is 10 6 * = For a single copy It reasonably can be assumed for the purposes of this application that the ability to detect and correct errors in transcription is perfect.

12 Archived Content Integrity Other Strategies Secure Hashing Algorithm, SHA-256* Checksum failure probability of , or approximately Four-copy BER = One undetected bit loss in movies Birthday collisions don’t apply; not defending against traffic analysis, just using it as a good checksum Voting bit-by-bit Can make a 10TB DCDM into 40 1TB files, 31 of which would have to be damaged to preclude rebuilding the original * Developed by the NSA, publicly available, peer-reviewed, easy to implement

13 Archive Model Enterprise class tape library Front-end server and disk Ingest and prepare Archive Object for writing to tape library Hierarchical Storage Manager, HSM Two complete and identical systems, geographically separate Two copies of each movie on each library

14 Archive Model Computers and disk front-ends reach EOSL 5-yr replacement Tape drives reach EOSL 10-yr replacement Libraries reach EOSL 20-yr replacement Tape media has a finite lifetime* Replace tapes every 10 years Audit every tape every six months Re-write from pristine copies as necessary *National Media Lab, IBM, Sun, others, publish 30 years as viable tape media lifetime

15 Archive Model Application software and file formats Proposed archive model HSM uses an open tarball format, readable even without the application When a tape is audited, rewritten or copied, the new copy can be created in the new file format This is feasible because the underlying data format remains digitally fixed, only the file format and / or storage medium change

16 Archive Model Institutional memory must be created Two or more sites are required, geographically separate No network connectivity Archive content in the clear Same as current model Lost key or algorithm will render archive useless Can be encrypted for transport (tape drive HW encryption becoming the norm) When copying tapes, send old ones to another location

17 Oil & Gas has been archiving digital images for decades Medical is doing this with far higher transaction rates Library of Congress doing it now "Storing National Treasures" "Sun Rises at the Library of Congress" Archive Model

18 Costs Can digital compete with celluloid? Film archiving cost $100K /100 years / feature 2,000 movies = $200M 10TB archive object, 20 objects/year, 100 years $45,000/movie (list) $16,000/movie (Archive pricing) 2,000 movies = $32M 100TB archive object $67,000/movie (Archive pricing) 2,000 movies = $79M

19 Costs 10TB Archive Object – List price $2,601,160 $408,631 $114,218 $73,094 $57,330 $45,493 $0 $500,000 $1,000,000 $1,500,000 $2,000,000 $2,500,000 $3,000,000 Movies in Archive Dollars

20 Costs 10TB Archive Object – List price SAM License + mtce75% Media 3% Library and Drives + mtce 18% Compute/Disk + mtce 4% Description (both libraries, two copies/movie/library)Cost Compute/Disk + mtce$3,200,000 Library and Drives + mtce$16,112,000 Media$2,699,000 SAM License + mtce$68,974,400 Total $90,985,400

21 Costs 10TB Archive Object – Archive price

22 Costs 10TB Archive Object – Archive price $2,005,850 $236,852 $54,072 $29,706 $21,259 $16,265 $0 $500,000 $1,000,000 $1,500,000 $2,000,000 $2,500,000 Movies in Archive Dollars

23 Costs 100TB Archive Object – List price $6,488,535 $5,006,303 $3,791,715 $3,180,640 $2,641,742 $2,121,252 $0 $1,000,000 $2,000,000 $3,000,000 $4,000,000 $5,000,000 $6,000,000 $7,000,000 Movies in Archive Dollars

24 Costs 100TB Archive Object – List price

25 Costs 100TB Archive Object – Archive price

26 Costs 100TB Archive Object – Archive price $1,462,466 $505,009 $170,432 $112,159 $85,527 $67,171 $0 $200,000 $400,000 $600,000 $800,000 $1,000,000 $1,200,000 $1,400,000 $1,600,000 Movies in Archive Dollars

27 Alternatives An unaddressed question… Does celluloid have a future – at all? Replaced by commercial photographers globally Precipitous drop in market share and manufacturer jobs Environmentally unfriendly to manufacture and process Celluloid may not be an option Film may not even exist in 100 years Film infrastructure – labs, chemicals, workers, etc. - may not exist

28 Summary The technology required to store and maintain irreplaceable digital image content for archive durations is mature, proven and in use today A Digital Content Archive will extend the quick responsiveness of a studio’s Library to the Archive The return on these increasingly expensive assets easily can be extended – forever … all using COTS technology

29 Conclusion The pivotal and immutable point is that this can be done beginning today. The experience Sun brings to the project already has been recognized, and is being broadened by, the Library of Congress and other locations around the world undertaking the digitization of their media assets using solutions from Sun Microsystems. The time is now to begin serious efforts to test and implement studio Digital Content Archives

30 Introduction to Thank you Dave Cavena