Alex Dimakis based on collaborations with Dimitris Papailiopoulos Viveck Cadambe Kannan Ramchandran USC Tutorial on Distributed Storage Problems and Regenerating.

Slides:



Advertisements
Similar presentations
1 A triple erasure Reed-Solomon code, and fast rebuilding Mark Manasse, Chandu Thekkath Microsoft Research - Silicon Valley Alice Silverberg Ohio State.
Advertisements

Analysis and Construction of Functional Regenerating Codes with Uncoded Repair for Distributed Storage Systems Yuchong Hu, Patrick P. C. Lee, Kenneth.
current hadoop architecture
An Easy-to-Decode Network Coding Scheme for Wireless Broadcasting
Cyclic Code.
Alex Dimakis based on collaborations with Dimitris Papailiopoulos Arash Saber Tehrani USC Network Coding for Distributed Storage.
Henry C. H. Chen and Patrick P. C. Lee
1 NCFS: On the Practicality and Extensibility of a Network-Coding-Based Distributed File System Yuchong Hu 1, Chiu-Man Yu 2, Yan-Kit Li 2 Patrick P. C.
BASIC Regenerating Codes for Distributed Storage Systems Kenneth Shum (Joint work with Minghua Chen, Hanxu Hou and Hui Li)
Coding and Algorithms for Memories Lecture 12 1.
Simple Regenerating Codes: Network Coding for Cloud Storage Dimitris S. Papailiopoulos, Jianqiang Luo, Alexandros G. Dimakis, Cheng Huang, and Jin Li University.
Yuchong Hu1, Henry C. H. Chen1, Patrick P. C. Lee1, Yang Tang2
D.J.C MacKay IEE Proceedings Communications, Vol. 152, No. 6, December 2005.
Multicut Lower Bounds via Network Coding Anna Blasiak Cornell University.
Locally Decodable Codes
Sean Traber CS-147 Fall  7.9 RAID  RAID Level 0  RAID Level 1  RAID Level 2  RAID Level 3  RAID Level 4 
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
Beyond the MDS Bound in Distributed Cloud Storage
Data Persistence in Sensor Networks: Towards Optimal Encoding for Data Recovery in Partial Network Failures Abhinav Kamra, Jon Feldman, Vishal Misra and.
1 Network Coding: Theory and Practice Apirath Limmanee Jacobs University.
1 Data Persistence in Large-scale Sensor Networks with Decentralized Fountain Codes Yunfeng Lin, Ben Liang, Baochun Li INFOCOM 2007.
1 University of Freiburg Computer Networks and Telematics Prof. Christian Schindelhauer Mobile Ad Hoc Networks Network Coding and Xors in the Air 7th Week.
A Hybrid Approach of Failed Disk Recovery Using RAID-6 Codes: Algorithms and Performance Evaluation Yinlong Xu University of Science and Technology of.
Network Coding Theory: Consolidation and Extensions Raymond Yeung Joint work with Bob Li, Ning Cai and Zhen Zhan.
Network Coding Project presentation Communication Theory 16:332:545 Amith Vikram Atin Kumar Jasvinder Singh Vinoo Ganesan.
1 NETWORK CODING Anthony Ephremides University of Maryland - A NEW PARADIGM FOR NETWORKING - February 29, 2008 University of Minnesota.
1 Simple Network Codes for Instantaneous Recovery from Edge Failures in Unicast Connections Salim Yaacoub El Rouayheb, Alex Sprintson Costas Georghiades.
Deterministic Network Coding by Matrix Completion Nick Harvey David Karger Kazuo Murota.
Network Coding and Reliable Communications Group Algebraic Network Coding Approach to Deterministic Wireless Relay Networks MinJi Kim, Muriel Médard.
Cooperative regenerating codes for distributed storage systems Kenneth Shum (Joint work with Yuchong Hu) 22nd July 2011.
Processing Along the Way: Forwarding vs. Coding Christina Fragouli Joint work with Emina Soljanin and Daniela Tuninetti.
15-853Page :Algorithms in the Real World Error Correcting Codes I – Overview – Hamming Codes – Linear Codes.
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
Low Complexity Algebraic Multicast Network Codes Sidharth “Sid” Jaggi Philip Chou Kamal Jain.
Network Coding vs. Erasure Coding: Reliable Multicast in MANETs Atsushi Fujimura*, Soon Y. Oh, and Mario Gerla *NEC Corporation University of California,
Network Coding for Distributed Storage Systems IEEE TRANSACTIONS ON INFORMATION THEORY, SEPTEMBER 2010 Alexandros G. Dimakis Brighten Godfrey Yunnan Wu.
Network Coding Distributed Storage Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong 1.
Network Alignment: Treating Networks as Wireless Interference Channel Chun Meng Univ. of California, Irvine.
Analysis of Iterative Decoding
Repairable Fountain Codes Megasthenis Asteris, Alexandros G. Dimakis IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 32, NO. 5, MAY /5/221.
22/07/ The MDS Scaling Problem for Cloud Storage Yu-chong Hu Institute of Network Coding.
© 2012 A. Datta & F. Oggier, NTU Singapore Redundantly Grouped Cross-object Coding for Repairable Storage Anwitaman Datta & Frédérique Oggier NTU Singapore.
Distributed Storage Allocations for Optimal Delay Derek Leong 1, Alexandros G. Dimakis 2, Tracey Ho 1 1 California Institute of Technology 2 University.
NETWORK CODING. Routing is concerned with establishing end to end paths between sources and sinks of information. In existing networks each node in a.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
1 Network Coding and its Applications in Communication Networks Alex Sprintson Computer Engineering Group Department of Electrical and Computer Engineering.
MIMO continued and Error Correction Code. 2 by 2 MIMO Now consider we have two transmitting antennas and two receiving antennas. A simple scheme called.
CprE 545 project proposal Long.  Introduction  Random linear code  LT-code  Application  Future work.
Array BP-XOR Codes for Reliable Cloud Storage Systems Yongge Wang UNC Charlotte, USA IEEE ISIT(International Symposium on Information Theory)
Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu, Yinlong Xu, Xiaozhao Wang, Cheng Zhan and Pei.
Computer Science Division
1 The Encoding Complexity of Network Coding Michael Langberg California Institute of Technology Joint work with Jehoshua Bruck and Alex Sprintson.
The High, the Low and the Ugly Muriel Médard. Collaborators Nadia Fawaz, Andrea Goldsmith, Minji Kim, Ivana Maric 2.
Exact Regenerating Codes on Hierarchical Codes Ernst Biersack Eurecom France Joint work and Zhen Huang.
20/10/ Cooperative Recovery of Distributed Storage Systems from Multiple Losses with Network Coding Yuchong Hu Institute of Network Coding Please.
Network RS Codes for Efficient Network Adversary Localization Sidharth Jaggi Minghua Chen Hongyi Yao.
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems Yan Wang, East China Jiao Tong University Xin Wang, Fudan University 1 12/11/2013.
Coding and Algorithms for Memories Lecture 13 1.
Secret Sharing in Distributed Storage Systems Illinois Institute of Technology Nexus of Information and Computation Theories Paris, Feb 2016 Salim El Rouayheb.
Alex Dimakis based on collaborations with Mahesh Sathiamoorthy Megas Asteris Dimitris Papailiopoulos Kannan Ramchandran Scott Chen Ramkumar Vadali Dhruba.
Network Topology Single-level Diversity Coding System (DCS) An information source is encoded by a number of encoders. There are a number of decoders, each.
Reliability of Disk Systems. Reliability So far, we looked at ways to improve the performance of disk systems. Next, we will look at ways to improve the.
RS – Reed Solomon Error correcting code. Error-correcting codes are clever ways of representing data so that one can recover the original information.
A Tale of Two Erasure Codes in HDFS
Double Regenerating Codes for Hierarchical Data Centers
Repair Pipelining for Erasure-Coded Storage
Presented by Haoran Wang
Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1
On Sequential Locally Repairable Codes
Erasure Correcting Codes for Highly Available Storage
Presentation transcript:

Alex Dimakis based on collaborations with Dimitris Papailiopoulos Viveck Cadambe Kannan Ramchandran USC Tutorial on Distributed Storage Problems and Regenerating Codes

overview 2 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Simple Regenerating Codes Future directions: security through coding

3 Massive distributed data storage Numerous disk failures per day. Failures are the norm rather than the exception Must introduce redundancy for reliability Replication or erasure coding?

44 how to store using erasure codes A B A B A+B B A+2B A A+B A B (3,2) MDS code, (single parity) used in RAID 5 (4,2) MDS code. Tolerates any 2 failures Used in RAID 6 k=2 n=3 n=4 File or data object

55 erasure codes are reliable A B A A B B A+B A+2B (4,2) MDS erasure code (any 2 suffice to recover) A B vs Replication File or data object

storing with an (n,k) code An (n,k) erasure code provides a way to: Take k packets and generate n packets of the same size such that Any k out of n suffice to reconstruct the original k Optimal reliability for that given redundancy. Well- known and used frequently, e.g. Reed-Solomon codes, Array codes, LDPC and Turbo codes. Assume that each packet is stored at a different node, distributed in a network. 6

how much redundancy is there in current systems? most distributed storage systems use replication gmail uses 21x replication(!) some companies are investigating or using Reed- Solomon and other codes (e.g. NetApp, IBM, Google, MSR, Cleversafe) 7

The promise: coding is much more reliable … 33 encoded packets … 21 copies 1GB 21 Replication uses 21GB. (33,10) Code uses 33*0.1=3.3GB 600% more storage for the same reliability. … 10 packets 1GB

99 Coding+Storage Networks = New open problems Issues: Communication Update complexity Repair communication Repair bits Read No of nodes accessed for repair d A B ? Network traffic

(4,2) MDS Codes: Evenodd a b c d a+c b+d b+c a+b+d M. Blaum and J. Bruck ( IEEE Trans. Comp., Vol. 44, Feb 95) Total data object size= 4GB k=2 n=4, binary MDS code used in RAID systems

We can reconstruct after any 2 failures a b c d a+c b+d b+c a+b+d 1GB

We can reconstruct after any 2 failures a b c d a+c b+d b+c a+b+d c = a + (a+c) d = b + (b+d)

overview 13 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Simple Regenerating Codes Future directions: security through coding

The Repair problem 14 a b c d e ? ? ? Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block.

The Repair problem 15 a b c d e ? ? ? Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block. Do I need to reconstruct the Whole data object to repair one failure?

The Repair problem 16 a b c d e ? ? ? Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the redundancy in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block Functional repair : e can be different from a. Maintains the any k out of n reliability property. Exact repair : e is exactly equal to a.

The Repair problem 17 a b c d e ? ? ? Ok, great, we can tolerate n-k disk failures without losing data. If we have 1 failure however, how do we rebuild the lost blocks in a new disk? Naïve repair: send k blocks. Filesize B, B/k per block Theorem: It is possible to functionally repair a code by communicating only As opposed to naïve repair cost of B bits. (Regenerating Codes)

Exact repair with 3GB a b c d a+c b+d b+c a+b+d a = (b+d) + (a+b+d) b = d + (b+d) a? b? 1GB

Systematic repair with 1.5GB a b c d a+c b+d b+c a+b+d a = (b+d) + (a+b+d) b = d + (b+d) a? b? 1GB Reconstructing all the data: 4GB Repairing a single node: 3GB 3 equations were aligned, solvable for a,b Reconstructing all the data: 4GB Repairing a single node: 3GB 3 equations were aligned, solvable for a,b

Repairing the last node a b c d a+c b+d b+c a+b+d b+c = (c+d) + (b+d) a+b+d = a + (b+d)

21 Proof sketch: Information flow graph a e 2GB a bb cc dd α =2 GB data collector ∞ ∞ β β β 2+2 β ≥4 GB  β ≥1 GB Total repair comm. ≥3 GB S data collector

22 Proof sketch: reduction to multicasting a e a bb c dd data collector    S data collector data collector data collector Repairing a code = multicasting on the information flow graph. sufficient iff minimum of the min cuts is larger than file size M. (Ahlswede et al. Koetter & Medard, Ho et al.) data collector data collector c

23 The infinite graph for Repair x1x1 α α α α α β d α β d α β d α β d data collector k data collector x2x2 … xnxn

24 Theorem 3 : for any (n,k) code, where each node stores α bits, repairs from d existing nodes and downloads d β=γ bits, the feasible region is piecewise linear function described as follows: Storage-Communication tradeoff

25 Storage-Communication tradeoff Min-Storage Regenerating code Min-Bandwidth Regenerating code α (D, Godfrey, Wu, Wainwright, Ramchandran, IT Transactions (2010) ) γ=βd

overview 26 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Simple Regenerating Codes Future directions: security through coding

27 Key problem: Exact repair a b c d e =a From Theorem 1, an (n,k) MDS code can be repaired by communicating What if we require perfect reconstruction? ? ? ?

x1?x1? 28 Repair vs Exact Repair x1x1 α α α α α β d α β d α β d α β d data collector k data collector x2x2 … xnxn Functional Repair= Multicasting Exact repair= Multicasting with intermediate nodes having (overlapping) requests. Cut set region might not be achievable Linear codes might not suffice (Dougherty et al.) Functional Repair= Multicasting Exact repair= Multicasting with intermediate nodes having (overlapping) requests. Cut set region might not be achievable Linear codes might not suffice (Dougherty et al.)

29 Exact Storage-Communication tradeoff? α Exact repair feasible? γ=βd

30 For (n,k=2) E-MSR repair can match cutset bound. [WD ISIT’09] (n=5,k=3) E-MSR systematic code exists (Cullina,D,Ho, Allerton’09) For k/n <=1/2 E-MSR repair can match cutset bound [Rashmi, Shah, Kumar, Ramchandran (2010)] E-MBR for all n,k, for d=n-1 matches cut-set bound. [Suh, Ramchandran (2010) ] What is known about exact repair

31 What can be done for high rates? Recently the symbol extension technique (Cadambe, Jafar, Maleki) and independently (Suh, Ramchandran) was shown to approach cut-set bound for E-MSR, for all (k,n,d). (However requires enormous field size and sub-packetization.) Shows that linear codes suffice to approach cut-set region for exact repair, for the whole range of parameters. Tamo et al., Papailiopoulos et al. and Cadambe et al. are presenting the first constructions of high rate exact regenerating codes at ISIT What is known about exact repair

32 Min-Storage Regenerating code (no known practical codes for high rates) Min-Bandwidth Regenerating code (practical) α γ=βd E-MSR Point E-MBR Point Exact Storage-Communication tradeoff?

overview 33 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Simple Regenerating Codes Future directions: security through coding

The coefficients of some variables lie in a lower dimensional subspace and can be canceled out. 34 Imagine getting three linear equations in four variables. In general none of the variables is recoverable. (only a subspace). A 1 +2A 2 + B 1 +B 2 =y 1 2A 1 +A 2 + B 1 +B 2 =y 2 B 1 +B 2 =y 3 Interference alignment How to form codes that have multiple alignments at the same time?

35 Exact Repair-(4,2) example x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x x x2+x3+x x3+x4 (Wu and D., ISIT 2009)

Given an error-correcting code find the repair coefficients that reduce communication (over a field) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) Given some channel matrices find the beamforming matrices that maximize the DoF (Cadambe and Jafar, Suh and Tse) connecting storage and wireless Both problems reduce to rank minimization subject to full rank constraints. Polynomial reduction from one to the other. (Papailiopoulos & D. Asilomar 2010) Both problems reduce to rank minimization subject to full rank constraints. Polynomial reduction from one to the other. (Papailiopoulos & D. Asilomar 2010)

37 Storage codes through alignment techniques The symbol extension alignment technique of [Cadambe and Jafar] leads to exact regenerating codes Exact repair is a non-multicast problem where cut-set region is achievable but needs alignment. It is an improbable match made in heaven (unfortunately not practical) ergodic alignment should have a storage code equivalent? does real alignment have a finite-field equivalent?

overview 38 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Simple Regenerating Codes Future directions: security through coding

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes.

The ring code n=5 k=3 Any 3 nodes must suffice to recover the data. set x 5 =x 1 +x 2 +x 3 +x 4

The ring code 41 n=5 k=3 Any 3 nodes know m=4 packets. An MDS code produces T=5 blocks. Each coded block is stored in r=2 nodes.

The ring code 42 An MDS code produces T blocks. m=4 n=5

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim 1: This code has the (n,k) recovery property.

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Simple regenerating codes Claim 1: This code has the (n,k) recovery property. Choose k right nodes They must know m left nodes

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim 2: I can do easy lookup repair. [Rashmi et al. 2010, El Rouayheb & Ramchandran 2010] d packets lost But each packet is replicated r times. Find copy in another node.

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim 2: I can do easy lookup repair. [Rashmi et al. 2010, El Rouayheb & Ramchandran 2010] d packets lost But each packet is replicated r times. Find copy in another node.

The ring code: lookup repair n=5 k=3 node 1 fails. just read from d=2 other nodes. Minimizing d is proportional to total disk IO.

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Great. Now everything depends on which graph I use and how much expansion it has.

Simple regenerating codes 49 Rashmi et al. used the edge-vertex bipartite graph of the complete graph. Vertices=storage nodes. Edges= coded packets. d=n-1, r=2 Expansion: Every k nodes are adjacent to m= kd – (k choose 2) edges. Remarkably this matches the cut-set bound for the E-MBR point.

Simple regenerating codes In cloud storage practice the number of nodes (d) is more important than number of bits read or transferred. Lookup repair is great. The ring code has the smallest d=2. if we wanted to repair from ANY d, we could not make d smaller than k. 50

Two excellent expanders to try at home The Petersen Graph. n=10, T=15 edges. Every k=7 nodes are adjacent to m=13 (or more) edges, i.e. left nodes. The ring. n vertices and edges. Maximum girth. Minimizes d which is important for some applications.

Example ring RC 52 Every k nodes adjacent to at least k+1 edges. Example pick k=19, n=22. Use a ring of 22 nodes. An MDS code produces T blocks. Each coded block is stored in r=2 nodes. m=20 Each storage node Stores d coded blocks. n=22

Ring RC vs RS k=19, n=22 Ring RC. Assume B=20MB. Each Node stores d=2 packets. α= 2MB.Total storage =44MB 1/rate= 44/20 = 2.2 storage overhead Can tolerate 3 node failures. For one failure. d=2 surviving nodes are used for exact repair. Communication to repair γ= 2MB. Disk IO to repair=2MB. k=19, n=22 Reed Solomon with naïve repair. Assume B=20MB. Each Node stores α= 20MB/ 19 =1.05 MB. Total storage= /rate= 22/19 = 1.15 storage overhead Can tolerate 3 node failures. For one failure. d=19 surviving nodes are used for exact repair. Communication to repair γ= 19 MB. Disk IO to repair=19 MB. Double storage, 10 times less resources to repair.

How to get high rate? In cloud storage practice the number of nodes (d) is more important than number of bits read or transferred. Lookup repair is great. We need high rate = low storage overhead There is no fractional repetition code or MBR code that has true rate above ½ 54

Extending fractional repetition 55 Lookup repair allows very easy uncoded repair and modular designs. Random matrices and Steiner systems proposed by [El Rouayheb et al.] Note that for d< n-1 it is possible to beat the previous E-MBR bound. This is because lookup repair does not require every set of d surviving nodes to suffice to repair. E-MBR region for lookup repair remains open. r ≥ 2 since two copies of each packet are required for easy repair. In practice higher rates are desirable for cloud storage. This corresponds to a repetition code! Lets replace it with a sparse intermediate code.

File is Separated in m blocks A code (possibly MDS code) produces T blocks. Each coded block is stored in r=1.5 nodes. m Each storage node Stores d coded blocks. n Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. + + Simple regenerating codes

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim: I can still do easy lookup repair. d packets lost + +

File is Separated in m blocks An MDS code produces T blocks. Each coded block is stored in r nodes. m Each storage node Stores d coded blocks. n Simple regenerating codes (SRC) Adjacency matrix of an expander graph. Every k right nodes are adjacent to m left nodes. Claim: I can still do easy lookup repair. 2d disk IO and communication [ Papailipoulos et al. to be submitted] d packets lost + +

High rate SRCs 59

Simple regenerating codes if XORs (forks) of degree 2 are used, these SRCs can have true rate approach 2/3 k/n  f/(f+1) rate can be achieved with higher XORs, but requires more nodes to be accessed. We think this is the minimal d for lookup repair. 60

overview 61 Storing information using codes. The repair problem Exact Repair. The state of the art. The role of Interference Alignment Future directions: security through coding

security through coding 62 Startup Cleversafe is introducing data security through distributed coding.

63 coding allows secret sharing a b c d Four coded blocks are stored in four different cloud storage providers Any two can be used to recover the data Any cloud storage provider knows nothing about the data. [Shamir, Blakley 1979] Distributed coding theory problems?

64 Security during Repair ? a b c e Incorrect linear equations d Repair bandwidth in the presence of byzantine adversaries?

65 Open Problems in distributed storage Cut-Set region matches exact repair region ? Repairing codes with a small finite field limit ? Dealing with bit-errors (security) and privacy ? (Dikaliotis,D, Ho, ISIT’10) What is the role of (non-trivial) network topologies ? Cooperative repair (Shum et al.) Lookup repair region ? Disk IO region ? What are the limits of interference alignment techniques ? Repairing existing codes used in storage (e.g. EvenOdd, B- Code, Reed-Solomon etc) ? Real world implementation, benefits over HDFS for Mapreduce ? 65

66 Coding for Storage wiki

67 fin

68 Exact Repair-(4,2) example x1 x3 x2 x4 x1+x3 x2+x4 x1+2x3 2x2+3x4 x1? x2? x1+x2+x3+x x x2+x3+x x3+x4 (Wu and D., ISIT 2009)

v2v2 v3v3 v4v4 = = = Exact Repair-interference alignment

Exact Repair-interference alignment = = = [Cadambe-Jafar 2008, Cadambe-Jafar-Maleki-2010]

We want this full rank Exact Repair-interference alignment = = = Choose same V’ and V Make all A diagonal iid Want this in the span of V’

72 Exact Repair-interference alignment We have to choose V, V’ so that all the rows in Are contained in the rowspan of The A matrices assumed iid diagonal, no assumption other than that they commute

Exact Repair-interference alignment Ok. Lets start by choosing V’ to be one vector w Must be in the rowspan of

Exact Repair-interference alignment And fold it back in…

Exact Repair-interference alignment And fold it back in… And again fold it back in….

Extending this idea 76 Lookup repair allows very easy uncoded repair and modular designs. Random matrices and Steiner systems proposed by [El Rouayheb et al.] Note that for d< n-1 it is possible to beat the previous E-MBR bound. This is because lookup repair does not require every set of d surviving nodes to suffice to repair. E-MBR region for lookup repair remains open. r ≥ 2 since two copies of each packet are required for easy repair. In practice higher rates are more attractive. This corresponds to a repetition code! Lets replace it with a sparse intermediate code.