Presentation is loading. Please wait.

Presentation is loading. Please wait.

Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004.

Similar presentations


Presentation on theme: "Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004."— Presentation transcript:

1 Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004

2 Talk Outline Batch codes Amortized PIR –via hashing –via batch codes Constructing batch codes Concluding remarks

3 A Load-Balancing Scenario x

4 What’s wrong with a random partition? Good on average for “oblivious” queries. However: –Can’t balance adversarial queries –Can’t balance few random queries –Can’t relieve “hot spots” in multi-user setting

5 Example 3 devices, 50% storage overhead. By how much can the maximal load be reduced? –Replicating bits is no good:  device s.t.1/6 of the bits can only be found at this device. –Factor 2 load reduction is possible: LR LR LRLR

6 Batch Codes (n,N,m,k) batch code: Notes –Rate = n / N –By default, insist on minimal load per bucket  m≥k. –Load measured by # of probes. Generalizations –Allow t probes per bucket –Larger alphabet  x n y1y1 y2y2 ymym N { i 1,…,i k }

7 Multiset Batch Codes (n,N,m,k) multiset batch code: Motivation –Models multiple users (with off-line coordination) –Useful as a building block for standard batch codes Nontrivial even for multisets of the form x n y1y1 y2y2 ymym N

8 Examples Trivial codes –Replication: N=kn, m=k Optimal m, bad rate. –One bit per bucket: N=m=n Optimal rate, bad m. (L,R,L  R) code: rate=2/3, m=3, k=2. Goal: simultaneously obtain –High rate (close to 1) –Small m (close to k) multiset

9 Private Information Retrieval (PIR) Goal: allow user to query database while hiding the identity of the data-items she is after. Motivation: patent databases, web searches,... Paradox(?): imagine buying in a store without the seller knowing what you buy. Note: Encrypting requests is useful against third parties; not against server holding the data.

10 Modeling Database: n-bit string x User: wishes to –retrieve x i and –keep i private

11 Server User xixi ?? ?

12 Some “Solutions” 1. User downloads entire database. Drawback: n communication bits (vs. logn+1 w/o privacy). Main research goal: minimize communication complexity. 2. User masks i with additional random indices. Drawback: gives a lot of information about i. 3. Enable anonymous access to database. Note: addresses the different security concern of hiding user’s identity, not the fact that x i is retrieved. Fact: PIR as described so far requires  (n) communication bits.

13 Two Approaches Computational PIR [KO97, CMS99,...] –Computational privacy –Based on cryptographic assumptions Information-Theoretic PIR [CGKS95,Amb97,...] –Replicate database among s servers –Unconditional privacy against t servers –Default: t=1

14 Communication Upper Bounds Computational PIR –O(n  ), polylog(n), O(  logn), O(  +logn) [KO97,CMS99,…] Information-theoretic PIR –2 servers, O(n 1/3 ) [CGKS95] –s servers, O(n 1/c(s) ) where c(s)=Ω(slogs / loglogs) [CGKS95,Amb97,BIKR02] –O(logn/loglogn) servers, polylog(n)

15 Time Complexity of PIR Given low-communication protocols, efficiency bottleneck shifts to servers’ time complexity. –Protocols require (at least) linear time per query. –This is an inherent limitation! Possible workarounds: –Preprocessing –Amortize cost over multiple queries

16 Previous Results [BIM00] PIR with preprocessing –s-server protocols with O(n  ) communication and O(n 1/s+  ) work per query, requiring poly(n) storage. –Disadvantages: Only work for multi-server PIR Storage typically huge Amortized PIR –Slight savings possible using fast matrix multiplication –Require a large batch of queries and high communication –Apply also to queries originating from different users. This work: –Assume a batch of k queries originate from a single user. –Allow preprocessing (not always needed). –Nearly optimal amortization

17 Model Server/s User ?? ? x i, x i, …, x i 1 2k

18 Amortized PIR via Hashing Let P be a PIR protocol. Hashing-based amortized PIR: –User picks h  R H, defining a random partition of x into k buckets of size  n/k, and sends h to Server/s. Except for 2 -  failure probability, at most t=O(  logk) queries fall in each bucket. –P is applied t times for each bucket. Complexity: –Time  kt  T(n/k)  t  T(n) –Communication  kt  C(n/k) –Asymptotically optimal up to “polylog factors”

19 So what’s wrong? Not much… Still: –Not perfect introduces either error or privacy loss –Useless for small k t=O(  logk) overhead dominates –Cannot hash “once and for all”  h  bad k-tuple of queries Sounds familiar?

20 Amortized PIR via Batch Codes Idea: use batch-encoding instead of hashing. Protocol: –Preprocessing: Server/s encode x as y=(y 1,y 2,…,y m ). –Based on i 1,…,i k, User computes the index of the bit it needs from each bucket. –P is applied once for each bucket. Complexity –Time   1  j  m T(N j )  T(N) –Communication   1  j  m C(N j )  m  C(n) Trivial batch codes imply trivial protocols. (L,R,L  R) code: 2 queries,1.5 X time, 3 X communication

21 Constructing Batch Codes

22 Overview Recall notion Main qualitative questions: 1.Can we get arbitrarily high constant rate (n/N=1-  ) while keeping m feasible in terms of k (say m=poly(k))? 2.Can we insist on nearly optimal m (say m=O(k)) and still get close to a constant rate? Several incomparable constructions Answer both questions affirmatively. x n y1y1 y2y2 ymym N i 1,…,i k ~

23 Batch Codes from Unbalanced Expanders By Hall’s theorem, the graph represents an (n,N=|E|,m,k) batch code iff every set S containing at most k vertices on the left has at least |S| neighbors on the right. Fully captures replication-based batch codes. n m

24 Parameters Non-explicit: N=dn, m=O(k  (nk) 1/(d-1) ) –d=3: rate=1/3, m=O(k 3/2 n 1/2 ). –d=logn: rate=1/logn, m=O(k)  Settles Q2 Explicit (using [TUZ01],[CRVW02] ) –Nontrivial, but quite far from optimal Limitations: –Rate < ½ (unless m=  (n)) –For const. rate, m must also depend on n. –Cannot handle multisets.

25 The Subcube Code Generalize (L,R,L  R) example in two ways –Trade better rate for larger m (Y 1,Y 2,…,Y s,Y 1  …  Y s ) still k=2 –Handle larger k via composition

26 Geomertic Interpretation AB CD A B C D ABAB CDCD ACAC BDBD ABCDABCD

27 Parameters N  k log(1+1/s)  n, m  k log(s+1) –s=O(logk) gives an arbitrary constant rate with m=k O(loglogk).  “almost” resolves Q1 Advantages: –Arbitrary constant rate –Handles multisets –Very easy decoding Asymptotically dominated by subsequent construction.

28 The Gadget Lemma From now on, we can choose a “convenient” n and get same rate and m(k) for arbitrarily larger n. Primitive multiset batch code

29 Batch Codes vs. Smooth Codes Def. A code C:  n   m is q-smooth if there exists a (randomized) decoder D such that –D(i) decodes x i by probing q symbols of C(x). –Each symbol of C(x) is probed w/prob  q/m. Smooth codes are closely related to locally decodable codes [KT00]. Two-way relation with batch codes: –q-smooth code  primitive multiset batch code with k=m/q 2 (ideally would like k=m/q). –Primitive multiset batch code  (expected) q-smooth for q=m/k Batch codes and smooth codes are very different objects: –Relation breaks when relaxing “multiset” or “primitive” –Gap between m/q and m/q 2 is very significant for high rate case Best known smooth codes with rate>1/2 require q>n 1/2 These codes are provably useless as batch codes.

30 Batch Codes from RM Codes (s,d) Reed-Muller code over F –Message viewed as s-variate polynomial p over F of total degree (at most) d. –Encoded by the sequence of its evaluations on all points in F s –Case |F|>d is useful due to a “smooth decoding” feature: p(z) can be extrapolated from the values of p on any d+1 points on a line passing through z.

31 s=2, d  (2n) 1/2 x2x2 x1x1 xnxn Two approaches for handling conflicts: 1.Replicate each point t times 2.Use redundancy to “delete” intersections Slightly increases field size, but still allows constant rate.

32 Parameters Rate = (1/s!-  ), m=k 1+1/(s-1)+o(1) –Multiset codes with constant rate (< ½) Rate =  (1/k  ), m=O(k)  resolves Q2 for multiset codes as well Main remaining challenge: resolve Q1 ~

33 The Subset Code Choose s,d such that n  Each data bit i  [n] is associated T  Each bucket j  [m] is associated S  Primitive code: y S =  T  S x T x y s d ( ) [s]d[s]d [s]d[s]d sdsd

34 Batch Decoding the Subset Code Lemma: For each T’  T, x T can be decoded from all y S such that S  T=T’. –Let L T,T’ denote the set of such S. –Note: {L T,T’ : T’  T } defines a partition of xTxT y T’ ( ) [s]d[s]d 0011110000 **0110****

35 Batch Decoding the Subset Code (contd.) Goal: Given T 1,…,T k, find subsets T’ 1,…,T’ k such that L Ti, T’i are pairwise disjoint. –Easy if all T i are distinct or if all T i are the same. Attempt 1: T’ i is a random subset of T i –Problem: if T i,T j are disjoint, L Ti, T’i and L Tj, T’j intersect w.h.p. Attempt 2: greedily assign to T i the largest T’ i such that L Ti, T’i does not intersect any previous L Tj, T’j –Problem: adjacent sets may “block” each other. Solution: pick random T’ i with bias towards large sets. x3x1x2

36 Parameters Allows arbitrary constant rate with m=poly(k)  Settles Q1 Both the subcube code and the subset code can be viewed as sub-codes of the binary RM code. –The full binary RM code cannot be batch decoded when the rate>1/2.

37 Concluding Remarks: Batch Codes A common relaxation of very different combinatorial objects –Expanders –Locally-decodable codes Problem makes sense even for small values of m,k. –For multiset codes with m=3,k=2, rate 2/3 is optimal. –Open for m  k+2. Useful building block for “distributed data structures”.

38 Concluding Remarks: PIR Single-user amortization is useful in practice only if PIR is significantly more efficient than download. –Certainly true for multi-server PIR –Most likely true also for single-server PIR Killer app for lattice-based cryptosystems? Single user Multiple users AdaptiveNon-adaptive ?? ?


Download ppt "Batch Codes and Their Applications Y.Ishai, E.Kushilevitz, R.Ostrovsky, A.Sahai Preliminary version in STOC 2004."

Similar presentations


Ads by Google