Presentation is loading. Please wait.

Presentation is loading. Please wait.

Auditing PLN’s: Preliminary Results and Next Steps Prepared for PLN 2012 UNC, Chapel Hill October 2012 Micah Altman, Director of Research, MIT Libraries.

Similar presentations


Presentation on theme: "Auditing PLN’s: Preliminary Results and Next Steps Prepared for PLN 2012 UNC, Chapel Hill October 2012 Micah Altman, Director of Research, MIT Libraries."— Presentation transcript:

1 Auditing PLN’s: Preliminary Results and Next Steps Prepared for PLN 2012 UNC, Chapel Hill October 2012 Micah Altman, Director of Research, MIT Libraries Non Resident Senior Fellow, The Brookings Institution Jonathan Crabtree, Assistant Director of Computing and Archival Research HW Odum Institute for Research in Social Science, UNC

2 Collaborators* Nancy McGovern Tom Lipkis & the LOCKSS Team Research Support Thanks to the Library of Congress, the National Science Foundation, IMLS, the Sloan Foundation, the Harvard University Library, the Institute for Quantitative Social Science, and the Massachusetts Institute of Technology. Auditing PLN's * And co-conspirators

3 Related Work Reprints available from: micahaltman.commicahaltman.com M. Altman, J. Crabtree, “Using the SafeArchive System: TRAC-Based Auditing of LOCKSS”, Proceedings of Archiving 2011, Society for Imaging Science and Technology. Altman, M., Beecher, B., & Crabtree, J. (2009). A Prototype Platform for Policy-Based Archival Replication. Against the Grain, 21(2), Auditing PLN's

4 Preview Why audit? Theory & Practice – Round 0: Setting up the Data-PASS PLN – Round 1: Self-Audit – Round 2: Compliance (almost) – Round 3: Auditing Other Networks What’s next? Auditing PLN's

5 Why audit? Auditing PLN's

6 Short Answer: Why the heck not? Auditing PLN's “Don’t believe in anything you hear, and only half of what you see” -Lou Reed “Trust, but verify.” -Ronald Reagan

7 Insider & External Attacks Slightly Long Answer: Things Go Wrong Physical & Hardware Software Media Curatorial Error Organizational Failure

8 Full Answer: It’s our responsibility Auditing PLN's

9 OAIS Model Responsibilities Accept appropriate information from Information Producers. Obtain sufficient control of the information to ensure long term preservation. Determine which groups should become the Designated Community able to understand the information. Ensure that the preserved information is independently understandable to the DC Ensure that the information can be preserved against all reasonable contingencies, Ensure that the information can be disseminated as authenticated copies of the original or as traceable back to the original Makes the preserved data available to the DC Auditing PLN's

10 OAIS Basic Implied Trust Model Organization is axiomatically trusted to identify designated communities Organization is engineered with the goal of: – Collecting appropriate authentic document – Reliably deliver authentic documents, in understandable form, at a future time Success depends upon: – Reliability of storage systems: e.g., LOCKSS network, Amazon Glacier – Reliability of organizations: MetaArchive, DataPASS, Digital Preservation Network – Document contents and properties: Formats, Metadata, Semantics, Provenance, Authenticity Auditing PLN's

11 Reflections on OAIS Trust Model Specific bundle of trusted properties Not complete instrumentally nor ultimately Auditing PLN's

12 Trust Engineering Approaches Incentive based approaches: – Rewards, penalties, incentive-compatible mechanisms Modeling and analysis: – Statistical quality control & reliability estimation, threat-modeling and vulnerability assessment Portfolio Theory: – Diversification (financial, legal, technical… ) – hedges Over-engineering approaches: – Safety margin, redundancy Informational approaches: – Transparency (release of information needed to directly evaluate compliance); cryptographic signature, fingerprint, common knowledge, non-repudiation Social engineering – Recognized practices; shared norms – Social evidence – Reduce provocations – Remove excuses Regulatory approaches – Disclosure; Review; Certification; Audits; Regulations & penalties Security engineering – Increase effort: harden target (reduce vulnerability); increase technical/procedural controls – Increase risk: surveillance, detection, likelihood of response – Design patterns: minimal privileges, separation of privileges – Reduce reward: deny benefits, disrupt markets, identify property, remove/conceal targets Auditing PLN's

13 Audit [aw-dit]: An independent evaluation of records and activities to assess a system of controls Fixity mitigates risk only if used for auditing.

14 Functions of Storage Auditing Detect corruption/deletion of content Verify compliance with storage/replication policies Prompt repair actions

15 Bit-Level Audit Design Choices Audit regularity and coverage: on-demand (manually); on object access; on event; randomized sample; scheduled/comprehensive Fixity check & comparison algorithms Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing Trust model Threat model

16 Repair Key Design Elements Repair granularity Repair trust model Repair latency: –Detection to start of repair –Repair duration Repair algorithm Auditing mitigates risk only if used for repair.

17 LOCKSS Auditing & Repair Decentralized, peer-2-peer, tamper-resistant replication & repair

18 Auditing & Repair TRAC-Aligned policy auditing as a overlay network

19 Theory vs. Practice Round 0: Setting up the Data-PASS PLN Auditing PLN's “Looks ok to me” - PHB Motto

20 Theory Auditing PLN's Expose Content ( Through OAI+DDI+HTTP ) Install LOCKSS (On 7 servers) Harvest Content (through OAI plugin) Setup PLN configurations (through OAI plugin) LOCKSS Magic Done

21 Expose Content ( Through OAI+DDI+HTTP ) Install LOCKSS (On 7 servers) Harvest Content (through OAI plugin) Setup PLN configurations (through OAI plugin) LOCKSS Magic Done Practice (Year 1) OAI Plugin extensions required: – Non-DC metadata – Large metadata – Alternate authentication method – Save metadata record – Support for OAI-SETS – Non-fatal error handling OAI Provider required: – Authentication extensions – Performance handling for delivery – Performance handling for errors – Metadata validation PLN Configuration required: – Stabilization around LOCKSS versions – Coordination around plugin repository – Coordination around AU definition Theory

22 Theory vs. Practice Round 1: Self-Audit Auditing PLN's “A mere matter of implementation” - PHB Motto

23 Theory Auditing PLN's Gather Information from Each Replica Integrate Information -> Map Network State Compare Current Network to Policy Success State == Policy ? State == Policy ? YES Add Replica NO

24 Implementation

25 Practice (Year 2) Gathering information required – Permissions – Reverse-engineering UI’s (with help) – Network magic Integrating information required – Heuristics for lagged information – Heuristics for incomplete information – Heuristics for aggregated information Comparing map to policy required Mere matter of implementation Adding replica: Uh-oh, most policies failed  Adding replicas wasn’t going to resolve most issues Theory Gather Information from Each Replica Integrate Information -> Map Network State Compare Current State Map to Policy Succes s State == Policy ? State == Policy ? YES Add Replica NO

26 Theory vs. Practice Round 2: Compliance (almost) Auditing PLN's “How do you spell ‘backup’? R – E - C – O – V – E – R - Y -

27 Practice (and adjustment) makes perfekt? Timings (e.g. crawls, polls) – Understand – Tune – Parameterize heuristics, reporting – Track trends over time Collections – Change partitioning to AU’s at source – Extend mapping to AU’s in plugin – Extend reporting/policy framework to group AU’s Diagnostics – When things go wrong – information to inform adjustment Auditing PLN's

28 Theory vs. Practice Round 3: Auditing Other PLNs Auditing PLN's “In theory, theory and practice are the same – in practice, they differ.” -

29 Theory Auditing PLN's Gather Information from Each Replica Integrate Information -> Map Network State Compare Current Network to Policy Success State == Policy ? State == Policy ? YES Add Replica NO Adjust AU Sizes, Polling Intervals adjusted? NO YES

30 Practice (Year 3) 100% of what? Diagnostic inference Theory Gather Information from Each Replica Integrate Information -> Map Network State Compare Current Network to Policy Succe ss State == Policy ? State == Policy ? YES Add Replica NO Adjust AU Sizes, Polling Intervals adjusted ? NO YES

31 100% of what? No: Of LOCKSS boxes? No: Of AU’s? Almost: Of policy overall Yes: Of policy for specific collection Maybe: Of files? Maybe: Of bits in a file?

32 What you see Auditing PLN's Box X,Y,Z all agree on AU A What you can conclude: Box X,Y,Z have the same content Content is good Assumption: Failures on file harvest are independent; number of harvested files large Assumption: Failures on file harvest are independent; number of harvested files large

33 What you see Auditing PLN's Box X,Y,Z don’t agree What you can conclude?

34 Hypothesis 1: Disagreement is real, but doesn’t really matter. Non-Substantive AU differences (arising from dynamic elements in AU’s that have no bearing on the substantive content ) 1.1 Individual URLS/files that are dynamic and non substantive (e.g., logo images, plugins, Twitter feeds, etc.) cause content changes (this is common in the GLN). 1.2 dynamic content embedded in substantive content (e.g. a customized per-client header page embedded in the pdf for a journal article ) Hypothesis 2: Disagreement is real, but doesn’t really matter in the longer run (even if disagreement persists over long run!) 2.1 Temporary AU Differences. Versions of objects temporarily out or sync. (E.g. if harvest frequency << source update frequency, but harvest times across boxes vary significantly) 2.2 Objects temporarily missing (E.g. recently added objects are picked up by some replicas, not by others) Hypothesis 3: Disagreement is real, matters Substantive AU differences 3.1 Content corruption (e.g. from corruption in storage, or during transmission/harvesting) 3.2 Objects persistently missing from some replicas ( e.g. because of permissions provider; technical failures during harvest; plugin problems) 3.2 Versions of objects persistently missing/out of sync from some replicas (e.g. harvest frequency > source update frequency leading to different AU’s harvesting different versions of the content. ) Note that later “agreement” signifies that a particular version was verified, not that all versions have been replicated and verified Hypothesis 4: AU’s really do agree, but we think they don’t 4.1 Appearance of disagreement caused by Incomplete diagnostic information Poll data are missing as a result of system reboot, daemon updates, or other cause. 4.2 Poll data are lagging – from different periods Polls fail, but contains information about agreement that is ignored

35 Auditing PLN's

36 Design Challenge Create more sophisticated algorithms and Instrument PLN data collection Such that Observed behavior allows us to distinguish between hypotheses 1-4. Auditing PLN's

37 Approaches to Design Challenge [Tom Lipkis’s Talk] Auditing PLN's

38 What’s Next? Auditing PLN's “It’s tough to make predictions, especially about the future” -Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, and others

39 Short Term Complete round 3 data collection Refinements of current auditing algorithms – More tunable parameters (yeah?!) – Better documentation – Simple health metrics Reports, and dissemination Auditing PLN's

40 Longer Term Health metrics, diagnostics, decision support Additional audit standards Support additional replication networks Audit other policy sets Auditing PLN's

41 Bibliography (Selected) B. Schneier, Liars and Outliers, John Wiley & Sons H.M. Gladney, J.L. Bennett, “What do we mean by authentic”, D-Lib 9(7/8) K. Thompson, “Reflections on Trusting Trust”, Communication of the ACM, Vol. 27, No. 8, August 1984, pp David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. “Requirements for Digital Preservation Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11, November OAIS, Reference Model for an Open Archival Information System (OAIS). CCSDS B-1, Blue Book, January 2002 Auditing PLN's

42 Questions? Web: micahaltman.commicahaltman.com Auditing PLN's


Download ppt "Auditing PLN’s: Preliminary Results and Next Steps Prepared for PLN 2012 UNC, Chapel Hill October 2012 Micah Altman, Director of Research, MIT Libraries."

Similar presentations


Ads by Google