Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of LOCKSS. Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how.

Similar presentations


Presentation on theme: "Overview of LOCKSS. Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how."— Presentation transcript:

1 Overview of LOCKSS

2 Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how LOCKSS private networks differ.  Provide a vocabulary of technical terms used frequently with LOCKSS networks

3 Architectural Components  Provider Sites (digital collections)  LOCKSS nodes (aka “peers”)  Plugins / Plugin Repository  Cache Manager  Title Database / Conspectus Database

4 Provider Sites  Prepare a digital collection so that it is web accessible to the preservation nodes  Expose a “manifest” web page for each collection, according to LOCKSS specifications. Grants permission for LOCKSS to crawl Gives starting point for crawl  Provide information sufficient to create a LOCKSS plugin for the collection (or else create the plugin themselves and reposit that plugin with the LOCKSS network)

5 LOCKSS Peer Nodes  Data caches for harvested content  Caches organized into archival units (AUs)  Nodes can select which AUs to crawl and preserve  There must be >= 6 copies of an AU in order for the polling process to work properly

6 Plugins / Plugin Repository  Tell LOCKSS where, how and how often to crawl a provider site for AUs  Plugins are Java based  Distinct from core LOCKSS software

7 Cache Manager  Distributed separately from LOCKSS  Can remotely inspect and manage the caches on the various peer nodes

8 Title / Conspectus Databases  Title database on each node describes and manages which AUs to preserve on that node  Conspectus Database designed for MetaArchive Project, provides more extensive metadata about the preserved digital collections, and feeds the Title database with entries

9 Web Site Source Code SQL Dump Digital Collection 1Private LOCKSS Network Nodes Manifest page Manifest page Digital Collection 2 AU 1 AU 2 AU 3 Web Site AU 1 Plugin Repository DC1 DC2 DC1 DC2 DC1

10 The Polling Process

11 Polling Process resulting in “landslide loss”, AU repair DC2-AU1 Node 5 calls poll on AU 1 of Digital Collection 2 DC2-AU1 Node 5 invites some recently encountered peers to vote. (Each node maintains a reference list of the recently encountered peers) Those invited are the “inner circle” for this opinion poll. SHA1 Invited nodes create fresh SHA1 digest of the AU SHA1 Invitation SHA1 PollChallenge Affirmative PollChallenge message responses allow that inner circle node to participate in poll PollProof Poll Effort Proof is cryptographically derived and sent to affirmative voter’s challenges Node 9 nominates 7 and 8 Nominated Nodes 7 and 8 belong to the “outer circle”, can be invited to subsequent voting rounds by Node 5 Node 5 discovers new peers through nomination process Valid vote agrees Valid vote disagrees There is a “landslide” of valid, disagreeing votes against the Node 5’s SHA1 digest of DC2-AU1 Since agreeing votes are below threshold, Node 5 picks a random disagreeing voter from the inner circle Encrypted RepairRequest message Repair made Once repair is completed, Node 5 immediately calls a new poll, which effectively verifies, or invalidates and corrects, the repair

12 Polling Refresh Timer  A peer sets a refresh timer for a given AU to determine the interval between successive polls  System parameter R is the mean for the possible random values generated for the refresh timer

13 System Parameter – ‘Quorum’  Q = # of valid inner circle votes required to conclude a poll successfully  Q = 6 is the thoroughly tested value in use  If votes < Q, poller invites additional peers, or else aborts the opinion poll

14 Polling Outcome – ‘Landslide Win’  The poller considers its current copy to have integrity  This is the only scenario in which an opinion poll concludes successfully  The poller updates its reference list and then waits until the next polling period (determined by the refresh timer)

15 Reference List Update  Happens only after a successful poll  Poller removes the inner circle peers who had valid votes in the last opinion poll  Culls peers it has not been able to contact for some time  Adds outer circle peers whose votes were valid and eventually agreeing

16 Polling Outcome - Inconclusive  D = max allowed “minority” votes  If Agreeing Votes > D, and  Agreeing Votes < Total valid votes – D,  Then the poll is inconclusive, raises alarm  Human intervention needed to determine if nodes have been compromised  Peers voting in agreement with a known bad copy are blacklisted if that peer node can’t be identified or it won’t cooperate

17 Further Details on Polling Process  Petros Maniatis, Mema Roussopoulos, TJ Giuli, David S. H. Rosenthal, Mary Baker, and Yanto Muliadi, "LOCKSS: A Peer-to-Peer Digital Preservation System", ACM Transactions on Computer Systems (TOCS). TOCS2005.pdf TOCS2005.pdf  See also LOCKSS related publications at

18 The LOCKSS Private Network Difference  More flexible (not appliance based) Can run on any operating system that supports Java  LOCKSS Team maintains rpm packages for Linux installations Peer Node administrators have greater discretion configuring access, customizing functionality, e.g. altering system parameters

19 The LOCKSS Private Network Difference (cont.)  Can extend LOCKSS core functionality with supplemental tools and methods to fit new use cases  E.g. the MetaArchive Conspectus database

20 Vocabulary  (Please refer to the workshop binder for terminology and definitions)

21 Overview of LCAP version 3


Download ppt "Overview of LOCKSS. Session Learning Objectives  Provide an overview of the LOCKSS architecture.  Describe the LOCKSS polling process  Describe how."

Similar presentations


Ads by Google