Dave Cavena HPA Tech Retreat February 16, 2012
15 min = 3 Points Longevity Accuracy - Readability Cost
Longevity Film has proved an excellent media on which to archive images The archive is the only part of the workflow that has not been digitized We are archiving exactly not what we create and watch
Longevity “the data is the master record” – Leon Silverman, Opening Remarks, JTS2004 “All polymers are subject to decay.” – W. Mark Ritchie, KINEMA, 1995 “Traditional analog methods of archiving cannot keep up with … the loss of analog process expertise.” – Dan Rosen, VP, WB, May 2007
Longevity In a “perfect” archive, the archive would retain the content … forever Nothing in the physical world could cause its deterioration This cannot happen if content is attached to a physical carrier. Only digital technology provides disintermediation
Accuracy - Readability Digital technology – HW and SW – change often YCM 3-strip lasts a long time … recombination issues … generational loss
Accuracy Content integrity over extended time durations requires a managed archive A Digital Archive without an automated management SYSTEM is not an archive – it’s sheet metal, plastic & media
Model Overview Automated, rules-based software archive system Automated computer tape libraries Servers & front-end disk Why tape? Holo is too slow, disk too expensive When tape is replaced the robots & price point will survive
Model Overview Location 1 Movie Copy 1 Movie Copy 2 Location 2 Movie Copy 3 Movie Copy 4
Accuracy Irreplaceable content Multiple copies Multiple libraries Automated audit, copy Algorithmic assurance of bit integrity Error Correction Codes (ECC) Bit Error Detection Bit Error Correction
Accuracy ECC Standard on tape drives COTS technology Bit Error Rates Bit Error Rates (BER) differ by manufacturer ECC undetected BER = Four copies = ECC uncorrected BER = Four copies =
Accuracy How big is the archive object? DCDM + Trims & Outs … 20TB? 2 x Bytes 2 x bits OCN … 100TB? Bytes bits
Accuracy ECC BER 20 TB archive object 2 x bits = one bit lost in 2 x DCDM 100 TB archive object bits = one bit lost in objects Generational loss? Minimum 20 generations of migration
Accuracy For this application it doesn’t matter how many times the data is accessed; how many generations of rewrite ECC undetected = … on one copy of four in the archive Failure probability one or more times during N accesses is = 1-( )N For N less than 10 19, this is approximated by N*10 -19
Accuracy Example – one archived copy Assume a movie accessed 10 3 times The chance of an uncorrectable bit error per read is * = Model has four copies 10 3 * (2 * ) = 2 * … one bit lost in 2 * (10TB) 10 3 * = … one bit lost in (100TB)
Accuracy Secure Hashing Algorithm – SHA-256 Checksum failure probability (approximately ) Four-copy BER = One undetected bit loss in movies Voting bit-by-bit Can make a 10TB DCDM into 40 x 1TB files, 31 of which would have to be damaged to preclude rebuilding the original 10TB file
Accuracy Software Reliability A digital archive is a managed archive Audit every bit every 6 months Read in old format, rewrite in new format Image data is not changed, only the file wrapper, application accessing the file, or operating system running the app.
Digital Archive Longevity Accuracy Cost
Digital Cost Conservative pricing model List price, no depreciation or salvage HW, SW, Maintenance, Media 150% uplift (Avg $728K/yr 20TB; $826K/yr 100TB) Labor Floor space Electricity
What are the Film costs? *approximately
What are the Digital costs? 20 Features per year, 100 years – Average per-Century Cost Includes uplift for Floor Space, Labor, Electricity Archive Object Size (TB) 150% Uplift Average Annual Uplift 200% Uplift Average Annual Uplift 100$75,000$826,000$83,000$1,102,000 20$54,000$729,000$65,000$972,000
Comparison – 100TB – 150% uplift Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 75,000$137,700,000 Delta (Digital Less)$ 55,000$122,300,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three
100TB Archive Object 150% Uplift Labor Electricity Floor Space Avg per-year Uplift: $826,000
Comparison – 100TB – 200% Uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 83,000$165,000,000 Delta (Digital Less)$ 47,000$95,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three
100TB Archive Object 200% Uplift Labor Electricity Floor Space Avg per-year Uplift: $1,102,000
Comparison – 20TB – 150% uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 54,000$121,000,000 Delta (Digital Less)$ 76,000$139,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three
20TB Archive Object 150% Uplift Labor Electricity Floor Space Avg per-year Uplift: $729,000
Comparison – 20TB – 200% Uplift * Per-Title, Per-Century Archive Cost 2,000-Title (i.e. FULL) Century Archive Cost Film$130, 000$260,000,000 Digital$ 65,000$146,000,000 Delta (Digital Less)$ 47,000$114,000,000 Redundancy Observation – Redundant copies of the Archive Object : Film: Zero Digital: Three
20TB Archive Object 200% Uplift Labor Electricity Floor Space Avg per-year Uplift: $972,000
Comparison (100TB, 200% Uplift) 2,000 Titles + Repurposing Per-Title Per-Century Archive Cost 2,000-Title Century Archive Cost Per-Title Repurposing Cost One Repurposing per Title per Century Total: Archive + Repurpose Film$130, 000$260,000,000$65,000$130,000,000$390,000,000 Digital$ 83,000$166,000,000$7,000$14,000,000$180,000,000 Delta$ 47,000$94,000,000$58,000$116,000,000$210,000,000
Digital Archive Longevity Accuracy Cost
Per-Title Per-Century Archive Cost 2,000-Title Century Archive Cost Per-Title Repurposing Cost One Repurposing per Title per Century Total: Archive + Repurpose Film$130, 000$260,000,000$65,000$130,000,000$390,000,000 Digital (100TB) $ 83,000$165,000,000$7,000$14,000,000$179,000,000 Delta$ 55,000$95,000,000$58,000$16,000,000$211,000,000 Longevity In a “perfect” archive, the archive would retain the content … forever Nothing in the physical world could cause its deterioration This cannot happen if content is attached to a physical carrier. Disintermediation Example – one archived copy –Assume a movie accessed 10 6 times –The chance of an uncorrectable bit error per read is –The chance of an uncorrectable bit error on any one of 10 6 reads is 10 3 * = Model has four copies –10 3 * (2 * ) = 2 * … one bit lost in 2 * (10TB) –10 3 * = … one bit lost in (100TB) Accuracy
If you remember only ONE thing… “Traditional analog methods of archiving cannot keep up with … the loss of analog process expertise.” – Dan Rosen, VP, WB, May 2007
Model Overview Location 1 Movie Copy 1 Movie Copy 2 Location 2 Movie Copy 3 Movie Copy 4
Thank You For further discussion: Google: “Archiving Movies in a Digital World” (2008) Grab me in the hall Breakfast Roundtable tomorrow Office: , cell: