Maximally Recoverable Local Reconstruction Codes

Slides:



Advertisements
Similar presentations
Optimal Lower Bounds for 2-Query Locally Decodable Linear Codes Kenji Obata.
Advertisements

Mahdi Barhoush Mohammad Hanaysheh
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
RAID Redundant Array of Independent Disks
Multiplicity Codes Swastik Kopparty (Rutgers) (based on [K-Saraf-Yekhanin ’11], [K ‘12], [K ‘14])
Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
Henry C. H. Chen and Patrick P. C. Lee
Information and Coding Theory
Coding and Algorithms for Memories Lecture 12 1.
Coding for Modern Distributed Storage Systems: Part 1. Locally Repairable Codes Parikshit Gopalan Windows Azure Storage, Microsoft.
Locally Decodable Codes
II. Linear Block Codes. © Tallal Elshabrawy 2 Last Lecture H Matrix and Calculation of d min Error Detection Capability Error Correction Capability Error.
Parikshit Gopalan Windows Azure Storage, Microsoft.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
Fountain Codes Amin Shokrollahi EPFL and Digital Fountain, Inc.
Correcting Errors Beyond the Guruswami-Sudan Radius Farzad Parvaresh & Alexander Vardy Presented by Efrat Bank.
6/20/2015List Decoding Of RS Codes 1 Barak Pinhas ECC Seminar Tel-Aviv University.
Codes with local decoding procedures Sergey Yekhanin Microsoft Research.
Ger man Aerospace Center Gothenburg, April, 2007 Coding Schemes for Crisscross Error Patterns Simon Plass, Gerd Richter, and A.J. Han Vinck.
Hamming Codes 11/17/04. History In the late 1940’s Richard Hamming recognized that the further evolution of computers required greater reliability, in.
Analysis of Iterative Decoding
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)
1 Failure Correction Techniques for Large Disk Array Garth A. Gibson, Lisa Hellerstein et al. University of California at Berkeley.
Threshold Phenomena and Fountain Codes Amin Shokrollahi EPFL Joint work with M. Luby, R. Karp, O. Etesami.
Codes Codes are used for the following purposes: - to detect errors - to correct errors after detection Error Control Coding © Erhan A. Ince Types: -Linear.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
Practical Session 10 Error Detecting and Correcting Codes.
Erasure Coding for Real-Time Streaming Derek Leong and Tracey Ho California Institute of Technology Pasadena, California, USA ISIT
§6 Linear Codes § 6.1 Classification of error control system § 6.2 Channel coding conception § 6.3 The generator and parity-check matrices § 6.5 Hamming.
DIGITAL COMMUNICATIONS Linear Block Codes
15-853:Algorithms in the Real World
Coding and Algorithms for Memories Lecture 14 1.
The parity bits of linear block codes are linear combination of the message. Therefore, we can represent the encoder by a linear system described by matrices.
Some Computation Problems in Coding Theory
1 Asymptotically good binary code with efficient encoding & Justesen code Tomer Levinboim Error Correcting Codes Seminar (2008)
Coding and Algorithms for Memories Lecture 13 1.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
A Tale of Two Erasure Codes in HDFS
New Locally Decodable Codes and Private Information Retrieval Schemes
IERG6120 Lecture 22 Kenneth Shum Dec 2016.
Data Link Layer Objective: to achieve reliable and efficient communication between 2 adjacent machines Data link layer design issues services provided.
Sublinear-Time Error-Correction and Error-Detection
Locality in Coding Theory
Sublinear-Time Error-Correction and Error-Detection
Algorithms in the Real World
Error-Correcting Codes:
Sublinear Algorithmic Tools 2
Local Decoding and Testing Polynomials over Grids
Linear Independence Prepared by Vince Zaccone
General Strong Polarization
II. Linear Block Codes.
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
RS – Reed Solomon List Decoding.
The Curve Merger (Dvir & Widgerson, 2008)
Uncertain Compression
On Sequential Locally Repairable Codes
Block codes. encodes each message individually into a codeword n is fixed, Input/out belong to alphabet Q of cardinality q. The set of Q-ary n-tuples.
Information Redundancy Fault Tolerant Computing
Sudocodes Fast measurement and reconstruction of sparse signals
General Strong Polarization
Data Link Layer Objective: to achieve reliable and efficient communication between 2 adjacent machines Data link layer design issues services provided.
Data Link Layer Objective: to achieve reliable and efficient communication between 2 adjacent machines Data link layer design issues services provided.
CIS 700: “algorithms for Big Data”
Sudocodes Fast measurement and reconstruction of sparse signals
How many deleted bits can one recover?
General Strong Polarization
General Strong Polarization
Zeev Dvir (Princeton) Shachar Lovett (IAS)
Presentation transcript:

Maximally Recoverable Local Reconstruction Codes Sivakanth Gopi Venkatesan Guruswami Sergey Yekhanin

The Cloud How big is “The Cloud”? Distributed storage ≈ 0.5 zettabyte of data (1 ZB = 10 9 TB) in gigantic data centers ready to be accessed ≈100$ to rent 1 TB for 1 year The costs run into ≈$50 billion per year! Distributed storage Data is partitioned and stored in individual servers with small storage capacity (few TB) Servers don’t respond (erasures) Server is rebooting or network bottlenecks or completely crashed Will refer to all of them as ‘erasures’

Classical erasure coding 𝑘,𝑘+ℎ -Reed-Solomon Code Pros Can recover from any ℎ erasures 1+ ℎ 𝑘 storage cost, 𝑘 is large to get good efficiency Field size is linear in number of servers Cons Need to read from 𝑘 servers even in the case of a single erasure

Locality To recover an erased server, only need to read from ‘𝑟’ other servers [Gopalan, Huang, Simitci, Yekhanin’11] Example: Has locality ‘2’ to recover any erased coordinate Locally Decodable Codes [Katz, Trevisan’00] Have locality even when a constant fraction of coordinates erased But they need a lot of redundancy and not practical for small length codes! Emphasize that locality is the bottleneck when user requests a small amount of data from an nonresponsive server but not the total bandwith

Separate normal and worst case Efficiency Low storage overhead Reliability (in worst case) Should be able to recover from as many erasure patterns as possible, this can be slow Fast recovery (in normal case) Locality – recover crashed data by reading only a few other servers in case of a single crash Maximally Recoverable Local Reconstruction Codes (MR LRCs) are designed for this! Already deployed in Microsoft distributed storage systems outperforming traditional Reed-Solomon based systems

Maximally Recoverable LRCs 𝑎,ℎ - constants 𝑟= 𝑛 𝜖 (𝑛,𝑟,ℎ,𝑎,𝑞)-MR LRC is a linear code of length 𝑛 over 𝔽 𝑞 : ‘ℎ’ global parity checks ‘𝑎’ local parity checks per local group of size ‘𝑟’ If only ‘𝑎’ crashes in a local group, can recover the group locally Corrects every erasure pattern that is possible to correct!

Correctable erasure patterns What are the correctable erasure patterns? ‘𝑎’ erasures per local group + ‘ℎ’ additional erasures anywhere “Beyond minimum distance” Minimum distance is ‘𝑎+ℎ’ Minimum distance achievable with 𝑞=𝑂(𝑛) [Tamo, Barg’14]

𝑛=12,𝑟=6,ℎ=2,𝑎=2,𝑞 −𝑀𝑅 𝐿𝑅𝐶

𝑛=12,𝑟=6,ℎ=2,𝑎=2,𝑞 −𝑀𝑅 𝐿𝑅𝐶 Can correct up to 𝑎=2 crashes in a local group, by reading any other 𝑟−𝑎=4 servers in the same group

𝑛=12,𝑟=6,ℎ=2,𝑎=2,𝑞 −𝑀𝑅 𝐿𝑅𝐶 Can correct 𝑎=2 erasures per local group + ℎ=2 more anywhere

𝑛=12,𝑟=6,ℎ=2,𝑎=2,𝑞 −𝑀𝑅 𝐿𝑅𝐶 Can correct 𝑎=2 erasures per local group + ℎ=2 more anywhere

Correctable erasure patterns What are the correctable erasure patterns? ‘𝑎’ erasures per local group + ‘ℎ’ additional erasures anywhere Why are these the only correctable patterns and why do MR LRCs exist?

Parity check view of codes A (𝑘,𝑛)-code can be expressed using the ‘ℓ=𝑛−𝑘’ linearly independent equations that the codewords satisfy An ℓ-erasure pattern is correctable, if the corresponding ℓ columns are full rank i.e. the corresponding ℓ×ℓ minor is non-zero

MR LRC parity check matrix Some minors are identically zero (trivial minors) - uncorrectable Some minors are non-zero (non-trivial minors) - correctable

Non-trivial minors What are the non-trivial minors? ‘𝑎’ columns in each local group + ‘ℎ’ more columns Is there an assignment of 𝔽 𝑞 values to ∗ ′ 𝑠 such that all non- trivial minors are non-zero?

Does random work? All non-trivial minors should be non-zero Assign the ∗’s randomly from 𝔽 𝑞 By Schwartz-Zippel + union bound all the required minors are non-zero with high probability if 𝑞≫ 𝑛 𝑎𝑛/𝑟+ℎ Large fields make encoding/decoding extremely slow! Ideally we want 𝑞=𝑂(𝑛) like Reed-Solomon codes or polynomial in 𝑛

A general class of questions Given a matrix with 0’s and ∗’s, what is the smallest 𝑞 such that there is an assignment of 𝔽 𝑞 -values to ∗’s which makes all the non-trivial minors non-zero? GM-MDS: If all the 𝑘×𝑘 minors are non-trivial, one can take 𝑞=𝑂(𝑛) Proved by Lovett’18 and independently by Yildiz and Hassibi’18 If the 𝑛 coordinates are arranged in a matrix with row checks, column checks and one global parity check then need 𝑞≥ exp ( 𝑛 ) Kane, Lovett, Rao’17

Field size What is the smallest field size 𝑞 such that 𝑛,𝑟,ℎ,𝑎,𝑞 -MR LRCs exist? There are explicit constructions achieving 𝑞= 𝑛 𝑂 𝑎ℎ [GHJY’14, GYBS’17] Can we get 𝑞=𝑂 𝑛 ?

Our results Lower Bound When 𝑔≥ℎ, the lower bound is 𝑞= Ω 𝑎,ℎ (𝑛⋅ 𝑟 min⁡(𝑎,ℎ−2) ) First super linear lower bound when 𝑟 is growing When 𝑎≈ℎ and 𝑟= 𝑛 Ω 1 , the lower bound is 𝑛 Ω ℎ and the best upper bounds are 𝑛 O ℎ 2 Lower Bound An 𝑛,𝑟,ℎ,𝑎,𝑞 -MR LRC with 𝑔=𝑛/𝑟 local groups needs 𝑞= Ω 𝑎,ℎ (𝑛⋅ 𝑟 𝛼 ) where 𝛼= min 𝑎,ℎ−2 ℎ/𝑔 ℎ/𝑔

Some upper bounds Practical deployments of MR LRCs typically have ℎ=2,3 When ℎ=2, the lower bound is Ω 𝑛 When ℎ=3, the lower bound is Ω(𝑛𝑟) Lower bound: 𝑞= Ω 𝑎,ℎ (𝑛⋅ 𝑟 min 𝑎,ℎ−2 ) Thm: There exist 𝑛,𝑟,ℎ=2,𝑎,𝑞 -MR LRCs with 𝑞=𝑂( 𝑛) for every 𝑎,𝑟 Thm: There exist 𝑛,𝑟,ℎ=3,𝑎,𝑞 -MR LRCs with 𝑞=𝑂( 𝑛 3 ) for every 𝑎,𝑟

Proof sketch of Lower Bound

Lower bound Lower Bound Special case An 𝑛,𝑟,ℎ,𝑎,𝑞 -MR LRC needs 𝑞= Ω 𝑎,ℎ (𝑛⋅ 𝑟 min 𝑎,ℎ−2 ) Special case An 𝑛,𝑟,ℎ=3,𝑎=1,𝑞 -MR LRC needs 𝑞= Ω 𝑎,ℎ (𝑛⋅𝑟)

Parity check matrix One column per local group + any 3 more columns are linearly independent

Two claims Define 𝑉 𝑖 = 𝑣 𝑖 1 , 𝑣 𝑖 2 ,…, 𝑣 𝑖 𝑟 ⊂ 𝔽 𝑞 3 Define 𝐻 𝑖 ⊂ 𝔽 𝑞 3 as the difference set 𝐻 𝑖 = 𝑉 𝑖 − 𝑉 𝑖 ={ 𝑣 𝑖 𝑗 − 𝑣 𝑖 𝑘 :𝑗,𝑘∈ 𝑟 ,𝑗<𝑘} Claim 1: No two points in 𝐻 𝑖 are multiples of each other So we can think of 𝐻 𝑖 ⊂ ℙ𝔽 𝑞 2 and |𝐻 𝑖 |= 𝑟 2 Claim 2: If 𝑎∈ 𝐻 𝑖 , 𝑏∈ 𝐻 𝑗 ,𝑐∈ 𝐻 𝑘 where 𝑖,𝑗,𝑘 are distinct, then 𝑎,𝑏,𝑐 are linearly independent If we think of them as points in ℙ𝔽 𝑞 2 , then 𝑎,𝑏,𝑐 are non-collinear Also implies that 𝐻 𝑖 ’s are mutually disjoint

Proof of Claim 2 Want to show 𝑎∈ 𝐻 𝑖 ,𝑏∈ 𝐻 𝑗 , 𝑐∈ 𝐻 𝑘 are linearly independent Say 𝑎= 𝑣 𝑖 2 − 𝑣 𝑖 1 , 𝑏= 𝑣 𝑗 4 − 𝑣 𝑗 3 , 𝑐= 𝑣 𝑘 6 − 𝑣 𝑘 5 One column per local group + any 3 more columns are linearly independent Can prove claim 1, by making 4 erasures in a single group \

Claim 1 and 2 We have disjoint subsets 𝐻 1 , 𝐻 2 ,…, 𝐻 𝑛/𝑟 of the plane ℙ 𝔽 𝑞 2 each of size ≈ 𝑟 2 such that any three points in different subsets are non-collinear Want to show that 𝑞=Ω 𝑛𝑟 Suppose 𝑞≪𝑛𝑟, we will show that a random line 𝐿 will intersect 3 sets among 𝐻 1 ,…, 𝐻 𝑛 𝑟 w.h.p

Final step* Suppose 𝑞≪𝑛𝑟, we will show that a random line 𝐿 will intersect 3 sets among 𝐻 1 ,…, 𝐻 𝑛 𝑟 w.h.p Let 𝑍 𝑖 = 𝐿∩ 𝐻 𝑖 . 𝔼 𝑍 𝑖 ≈ 𝐻 𝑖 𝑞 𝔼 𝑍 𝑖 2 ≈ 𝐻 𝑖 𝑞 + 𝐻 𝑖 2 𝑞 2 Pr 𝑍 𝑖 >0 ≥ 𝔼 𝑍 𝑖 2 𝔼 𝑍 𝑖 2 ≈ 𝐻 𝑖 𝑞 So 𝐿 will intersect 𝐻 1 +…+ 𝐻 𝑛/𝑟 𝑞 ≈ 𝑛 𝑟 𝑟 2 𝑞 ≫1 sets among 𝐻 1 ,…, 𝐻 𝑛 𝑟 in expectation. *Simplified based on a suggestion of Madhu Sudan

Upper Bounds

A determinantal identity

Open Questions, a lot of them! For fixed constant 𝑟,ℎ,𝑎, can we get 𝑛,𝑟,ℎ,𝑎,𝑞 -MRCs with 𝑞= 𝑂 𝑟,ℎ,𝑎 (𝑛)? The lower bound we show 𝑛≥ Ω ℎ,𝑎 (𝑛⋅ 𝑟 min ℎ−2,𝑎 ) The upper and lower bounds are still pretty far apart, when 𝑎≥ℎ−2 are constants Upper bound: 𝑞= 𝑛 𝑂(𝑎ℎ) Lower bound: 𝑞=𝑛⋅ 𝑟 ℎ−2 Thank you!