Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems.

Similar presentations


Presentation on theme: "Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems."— Presentation transcript:

1 Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems Research Center, University of California, Santa Cruz, Retreat June 1,2 2004

2 Signatures Small strings that characterize objects. Calculated from the object. Distinct Signatures  Objects different. Same Signatures  Objects same. With high probability. Error probability is 2 -f, f length of signatures in bits. A.k.a. checksums, hashes, fingerprint, condensed representation, …

3 Signatures Examples Tripwire: Protection against malware. Maintains the signatures of all system libraries in a secure location. Before a library module is called, verify signature of module.

4 Signatures Examples Remote comparison of files: Problem arose out of first prototypes of replicated databases. Divide records into pages. Calculate and compare signatures for all pages. Do this efficiently by combining signatures of a set of pages into a super-signature.

5 Signatures Integrity check for archival storage Keep two copies of archived data. Maintain the signatures of tape contents. Periodically “scrub” tapes.

6 Signatures Similarity Measurement between Files. Similarity of web-pages. Similarity of files in Deep-Store.

7 Signatures For Scalable Distributed Data Structures SDDS implement a large file of records in buckets distributed over a network. SDDS operations (insert, update, delete, read, scan) have execution times independent of SDDS file size. Use signatures of blocks to decide which portions of the bucket needs to be backed up. More secure than dirty bit. Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5th Workshop on Distributed Data and Structures, Thessaloniki, June 2003 (WDAS 2003).

8 Signatures For Scalable Distributed Data Structures Use signatures of records to test whether they have been changed. Leads to a read-verification based concurrency scheme. Read the record. Process the record. Verify that record has not changed by signature. Litwin, W., Mokadem, R., Schwarz, T.: Disk Backup through algebraic signatures in scalable and distributed data structures. Proc. 5th Workshop on Distributed Data and Structures, WDAS’03, Thessaloniki, June 2003. Schwarz, T., Holliday, J.: A Signature Based Concurrency Scheme for Scalable Distributed Data Structures. Workshop on Distributed Data and Structures, WDAS'04, Lausanne, 2004.

9 Cryptographically Secure Signatures Computationally impossible to find an object with the same signature. Protects against malicious attacks. Used to protect data integrity or to sign data: Object signature Apply Private Key K(signature) Object K(signature) Store encrypted signature with object

10 Cryptographically Secure Signatures MD5 1995 Rivest 16B SHA1 1994 NSA: FIPS 180 /ANSI x9.30 20B … Implement a one-way hash.

11 Signatures with Algebraic Properties Composable signatures * : Capable of calculating object signatures from component objects. Updatable signatures: Calculate new signature of a changed object from old signature and the signature and location of change. * Suel, T., Noel, P., and Trendafilov, D.: Improved File Synchronization for Maintaining Large Replicated Collections over Slow Networks. In Proc. 20th Int. Conf. on Data Engineering, ICDE, Boston, 2004, p. 153-164. Litwin, W., Schwarz, T. Algebraic Signatures for Scalable Distributed Data Structures. Proc. of the 20th International Conference on Data Engineering (ICDE), Boston, 2004, p. 412-423.

12 Signatures with Algebraic Properties Algebraic properties prevent cryptographic security. Fundamental Design Trade-off.

13 Algebraic Signatures Karp-Rabin signatures over Galois fields. A Galois field defines addition, multiplication, subtraction, division, etc. over bit strings of length f. Same mathematical rules as for rational numbers, real numbers, complex numbers, etc. Single and compound signature of P=(p 1,p 2, …) Karp, R. and Rabin, M.: Efficient randomized pattern-matching algorithms. In IBM Journal of Research and Development, Vol. 31, No. 2, March 1987. Schwarz, T., Bowdidge, R. and Burkhard, W.: Low Cost Comparison of File Copies. In Proc. Intern. Conf. on Distributed Computing Systems, Paris, Fr., 1990, (ICDCS 5 Proceedings), p. 196-202.

14 Algebraic Signatures Properties of compound signature: Size is mf. Detects for sure any change of up to m symbols. A symbol is a GF element, i.e. a bit string of length f. Collision probability is 2 -fm

15 Algebraic Signatures Algebraic Signatures Properties Can update signature from simple change: Discovers changes from a cut-and-paste operation.

16 Algebraic Signatures Algebraic Signatures Properties Can calculate the signature of a parity object from the signatures of the data objects. Holds for normal parity (RAID Level 5) But also for some forms of generalized parity. Reed-Solomon Codes. Convolutional Array Codes. Thomas Schwarz, S.J.: Verification of Parity Data in Large Scale Storage Systems, PDPTA 2004, Las Vegas.

17 Algebraic Signatures in Large Scale Storage Systems Data – Parity Coherency: If we miss an update to parity data, then we can no longer reconstruct data: D1D2D3D4D5P D1’D2D3D4D5P D1’D2D3D4D5P ?

18 Protecting Data in a Large Archival Storage System. Disk-Based Archival Storage System Data is cold: Power down disks between accesses. Data on disk storage systems is lost because of: Device Failure. Block Failure. Periodically check whether we can access disks. Periodically check whether we can still read all data on disks.

19 Protecting Data in a Large Archival Storage System. Since we need to read all the data anyway, Since we also need to be concerned about software failures Check the signatures of data.

20 Protecting Data in a Large Archival Storage System. Divide disks into scrubbing blocks. Assume that the redundancy scheme creates generalized parity blocks for scrubbing blocks. Maintain a map of the signatures of the scrubbing blocks. D1 D9 D15 D23 P 2,5,13 P 1,4,7 D12 D2 D20 D22 P 2,25,31 D5 P 9,12,25 D17 D25 P 1,8,22 D3 D31 D19 D3 D10 D15 P 15,3,19

21 Protecting Data in a Large Archival Storage System. When data in the scrubbing block is updated change its signature. This happens rarely. When we scrub, check whether the actual signature of block coincides with the signature in metadata. If not: Something bad has happened. Typically software error, but occasionally data corruption. Comes at almost no costs. We need to read anyway.

22 Protecting Data in a Large Archival Storage System. Periodically check whether parity blocks and data blocks cohere. Access signatures of data blocks. Calculate signature of parity block(s). Compare with actual signature on file.

23 Protecting Data in a Large Archival Storage System. Conclusion Low cost scheme. Protects against data corruption and parity / data incoherence.


Download ppt "Using Algebraic Signatures in Storage Applications Thomas Schwarz, S.J. Associate Professor, Santa Clara University Associate, SSRC UCSC Storage Systems."

Similar presentations


Ads by Google