Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Implementations of) verifiable computation and succinct arguments: survey and wishlist Michael Walfish NYU.

Similar presentations


Presentation on theme: "(Implementations of) verifiable computation and succinct arguments: survey and wishlist Michael Walfish NYU."— Presentation transcript:

1 (Implementations of) verifiable computation and succinct arguments: survey and wishlist
Michael Walfish NYU

2 The motivation is 3rd party computing: cloud, etc. Ideal requirements:
“f”, x client (verifier) server (prover) y, short proof without executing f, can check that: “y = f(x)” more generally: “prover knows w s.t. y = f(x,w)” gmr85 bcc88 Kilian92 Micali94 bg02 gos06 iko07 gkr08 ggp10 Groth10 Lipmaa12 ggpr12 bcct13 The motivation is 3rd party computing: cloud, etc. Ideal requirements: Efficiency (client CPU, communication, server CPU) Privacy of w (desirable in some applications) Practicality (as real people understand the term)

3 Good news: Bad news: Running code; cost reductions of 1020 vs. theory
sbw11 cmt12 smbw12 trmp12 svpbbw12 sbvbpw13 vsbw13 pghr13 Thaler13 bcgtv13 bfrsbw13 bfr13 dfkp13 bctv14a bctv14b bcggmtv14 fl14 kppsst14 fgv14 bbfr14 wsrhbw15 cfhkknpz15 ctv15 Running code; cost reductions of 1020 vs. theory Compilers from C to verifiable computations Stateful computations; remote inputs, outputs Concretely efficient verifiers SOSP (!) Bad news: Small computations, extreme expense, etc. Useful only for special-purpose applications

4 This talk Summary of state of the art (2) Reality check (3) Next steps

5 Common framework in state of the art systems:
front-end (program translator) back-end (probabilistic proof protocol) main(){ ... } verifier central construction: QAPs [ggpr12] x y, π C program arithmetic circuit (non-det. input) prover interactive argument [iko07] non-interactive argument [Groth10, Lipmaa12, ggpr12] interactive proof [gkr08]

6 Attempt 1: Use PCPs that are asymptotically short
[almss92, AS92] [bghsv05, bghsv06, Dinur07, bs08, Meir12, bcgt13] verifier prover ... ... ... “short” PCP accept/reject This does not meet the efficiency requirements (because |PCP| > running time of f).

7 prover verifier ... Attempt 2: Use arguments or CS proofs
[Kilian92, Micali94] verifier prover hash tree digest ... “short” PCP accept/reject But the constants seem too high …

8 prover verifier v ... Attempt 3: Use long PCPs interactively z ⊗ z z
[iko07, smbw12, svpbbw12] commit request verifier prover commit response L() = <,v> z ⊗ z queries: q1, q2, q3, … ... v accept/reject Hadamard encoding of v z q1v q2v q3v Achieves simplicity, with good constants … … but pre-processing is required (because |qi|=|v|) … and prover’s work is quadratic; address that shortly

9 v … z ⊗ z z prover creates vector v, which encodes an execution trace:
Hadamard PCP [almss92] prover creates vector v, which encodes an execution trace: v MUL ADD x0 y0 ADD MUL z ⊗ z y1 MUL ADD wn z

10 prover verifier v ... Attempt 4: Use long PCPs non-interactively z ⊗ z
[bciop13] verifier prover E(q1), E(q2), E(q3)… L() = <,v> z ⊗ z ... v accept/reject Hadamard encoding of v z E(q1v) E(q2v) E(q3v) Query process now happens “in the exponent” … pre-processing still required (again because |qi|=|v|) … prover’s work still quadratic; addressing that soon

11 Recap efficient (short) PCPs arguments, CS proofs
arguments w/ preprocessing SNARGs w/ preprocessing who almss92, as92, bgshv, Dinur, … Kilian92, Micali94 iko07, smbw12, svpbbw12 ggpr12, bciop13, … what classical PCP commit to PCP by hashing commit to long PCP using linearity encrypt queries to a long PCP security unconditional CRHFs linearly HE knowledge-of-exponent why/why not not efficient for V constants are unfavorable simple simple, non-interactive (Thanks to Rafael Pass.)

12 prover v Final attempt: apply linear query structure to GGPR’s QAPs
[Groth10, Lipmaa12, ggpr12] prover L() { return <,v>; } z ⊗ z queries h v z z Addresses the issue of quadratic costs. PCP structure implicit in GGPR. Made explicit in [bciop13, sbvbbw13].

13 [high-degree polynomials]
QAPs [ggpr12] , x, y D(t), Px,y(t, Z) [high-degree polynomials] Z=z is satisfying assignment ⟺ ∃ H(t) such that D(t)  H(t)=Px,y(t, z) This can be checked probabilistically. Choose random τ. Then check: D(τ)  H(τ) = Px,y(τ, z) ? Px,y(τ,z) can be evaluated by querying M() = <,z> Px,y(τ, z) = (<A(τ), z> + L1)(<B(τ), z> + L2) – (<C(τ), z> + L3)

14 π()=<,w> client server PCP verifier
commit request w commit response PCP verifier w q1, q2, …, qu queries π(q1), …, π(qu) responses (z, h) linearity test quad corr. test circuit test |w|=|Z|2 (z, z ⊗ z) π()=<,w> |w|=|Z|+|C| [ggpr Eurocrypt 2013] new quad.test Any computation has a linear PCP whose proof vector is (quasi)linear in the computation size. Shows connection between PCPs, QAPs (Also shown by [bciop tcc13].)

15 All recent systems based on ggpr
“Zaatar” [sbvbbw13] “Pinocchio,” “libsnark” [pghr13, bctv14a] plaintext queries queries in exponent interactive argument [iko07] snarg, zk-snark with pre-processing [Groth10, bcct12, ggpr12] pre-processing avoidable in theory [bcct13, bctv14b, ctv15] QAPs standard assumptions amortize over batch interactive non-falsifiable assumptions amortize indefinitely non-interactive, ZK, … All recent systems based on ggpr sbvbpw13, pghr13, bfrsbw13, bcgtv13, bcggmtv14, bbfr14, bctv14a, bctv14b, fl14, kppsst14, wsrbw15, cfhkknpz15, ctv15

16 Common framework in state of the art systems:
front-end (program translator) back-end (argument variants) verifier main(){ ... } x y, π QAPs C program arithmetic circuit (non-det. input) [ggpr12] prover

17 … State of the art front-ends “CPU” Verbose circuits (costly)
fetch-decode-execute “CPU” Verbose circuits (costly) Good amortization Great programmability C prog MIPS .exe CPU state circuit is unrolled CPU execution [bcgtv13, bctv14a, bctv14b, ctv15] “ASIC” C prog Concise circuits Amortization worse How is programmability? each line translates to gates [sbvbpw13, vsbw13, pghr13, bfrsbw13, bcggmtv14, bbfr14, fl14, kppsst14, wsrbw15, cfhkknpz15]

18 applicable computations
Front-ends trade off performance and expressiveness applicable computations concrete costs special-purpose pure stateful general loops function pointers lower Thaler crypto13 CMT, TRMP itcs, Hotcloud12 Pepper, Ginger smbw ndss12 svpbbw Security12 Trueset, Zerocash kppsst Security14, bcggmtv Oakland15 Zaatar sbvbpw Eurosys13 Pinocchio pghr Oakland13 Geppetto cfhkknpz Oakland15 Pantry bfrsbw sosp13 Buffet wsrhbw ndss15 higher highest (still theory) better [Your work here! ] ASIC ? BCTV Security14 BCGTV crypto13 CPU Proof-carrying data bctv crypto14, ctv Eurocrypt15

19 Front-ends trade-off performance and expressiveness
applicable computations setup costs special-purpose straightline pure stateful general loops none (fast prover) Thaler [crypto13] CMT, TRMP [itcs,Hotcloud12] low Allspice [Oakland13] medium Pepper [ndss12] Ginger [Security12] Zaatar [Eurosys13], Pinocchio Pantry [sosp13] Buffet [ndss15] high BCTV [security14] very high BCGTV better better crypto properties: ZK, non-interactive, etc.

20 … Summary of common framework: verifier x y, π QAPs prover front-end
(program translator) back-end (argument variants) “CPU” verifier main(){ ... } x y, π QAPs [ggpr12] prover “ASIC”

21 Summary of state of the art
(2) Reality check (3) Next steps

22 Quick performance study
Back-end: BCTV’s optimized GGPR/Pinocchio implementation Front-ends: implementations or re-implementations of Zaatar (ASIC) [sbvbpw Eurosys13] BCTV (CPU) [Security14] Buffet (ASIC) [wsrhbw ndss15]

23 applicable computations
Landscape of front-ends (again) applicable computations concrete costs special-purpose pure stateful general loops function pointers lower Thaler crypto13 CMT, TRMP itcs, Hotcloud12 Pepper, Ginger smbw ndss12 svpbbw Security12 Trueset, Zerocash kppsst Security14, bcggmtv Oakland15 Zaatar sbvbpw Eurosys13 Pinocchio pghr Oakland13 Geppetto cfhkknpz Oakland15 Pantry bfrsbw sosp13 Buffet wsrhbw ndss15 higher highest (still theory) better BCTV Security14 BCGTV crypto13 Proof-carrying data bctv crypto14, ctv Eurocrypt15

24 Quick performance study
Back-end: BCTV’s optimized GGPR/Pinocchio implementation Front-ends: implementations or re-implementations of: Zaatar (ASIC) [sbvbpw Eurosys13] BCTV (CPU) [Security14] Buffet (ASIC) [wsrhbw ndss15] Evaluation platform: cluster at Texas Advanced Computing Center (tacc) Each machine runs Linux on an Intel Xeon 2.7 GHz with 32gb of ram.

25 What are the verifier’s costs?
What are the prover’s costs? Proof length 288 bytes V per-instance 6 ms + (|x| + |y|)・3 µs V pre-processing |C|・180 µs P per-instance |C|・60 µs +|C|log |C|・0.9µs P’s memory requirements O(|C|log|C|) (|C|: circuit size) How do the front-ends compare to each other? Are the constants good or bad?

26 Extrapolated prover execution time, normalized to Buffet
How does the prover’s cost vary with the choice of front-end? Extrapolated prover execution time, normalized to Buffet

27 Extrapolated prover execution time, normalized to native execution
All of the front-ends have terrible concrete performance Extrapolated prover execution time, normalized to native execution

28 The maximum input size is far too small to be called practical
Zaatar BCTV Buffet approach ASIC CPU m × m mat. mult 215 7 merge sort m elements 256 32 512 KMP str len: m substr len: k m=320, k=32 m=160, k=16 m=2900, k=256 The data reflect a “gate budget” of ≈107 gates. Pre-processing costs minutes; proving costs 8-13 minutes

29 Summary of concrete performance
Front-end: generality brings a concrete price (but better in theory) Verifier’s “variable costs”: genuinely inexpensive Verifier’s “pre-processing”: depends on your perspective Prover’s computational costs: near-total disaster Memory: creates scaling limit for verifier and prover Alternatives? Proof-carrying data [bcct13, bctv14b, ctv15]: better in theory, but most concrete costs are orders of magnitude worse GKR-derived systems [cmt12, vsbw13, …]: great verifier performance; some expressivity limitations

30 Where do we go from here? One option: target domains where prover cost is tolerable Anonymity for Bitcoin: Zerocash [bcggmtv Oakland14] Location-private tolling [pbb Security09]: Pantry [bfrsbw sosp13] Another option: try to motivate theoretical advances

31 Summary of state of the art
(2) Reality check (3) Next steps

32 1000x improvements in prover performance …
What we need: 1000x improvements in prover performance … … and verifier’s pre-processing costs (ideally) Inexpensive handling of state … … and more generally, “the outside world” Standard assumptions (ideally) What we are less worried about: Non-interactivity (depending on the application) Pre-processing (if costs are reasonable) Succinctness (proof can be bigger, verifier can work harder)

33 Wishlist (1) Probabilistic proof protocols that do not require circuits Variant: special-purpose algorithms for outsourcing pieces of computations, which integrate with circuit verification More efficient reductions from programs to circuits

34 Wishlist (2) Lots of other ideas needed; we don’t know what they are!
Construct short PCPs that are efficient and simple Endow IKO’s arguments with more properties or lower costs Reuse the pre-processing work beyond a batch Make the protocol zero knowledge Enhance efficiency via stronger assumptions and/or analysis Lots of other ideas needed; we don’t know what they are! Improve GGPR’s QAPs or the cryptography used to query it (disclaimer: the status quo represents massive progress!) Apply and possibly develop special-purpose proof systems for cryptographic operations (to accelerate verifier via offloading)

35 Conclusion and take-aways
Exciting area, with good news and bad news: Lots of progress, but … … “it’s been implemented” != “it’s plausibly deployable” Overhead rooted in circuit representation and QAPs Theoretical breakthroughs are needed Incentive: the potential is huge! (http://www.pepper-project.org)

36 Appendix Slides

37 Prior work on verifiable computation
Makes usage assumptions or targets particular computations: Replication [Castro-Liskov, Malkhi-Reiter], trusted hardware [Chiesa-Tromer ICS10, Sadeghi et al. TRUST10], auditing [Haeberlen et al. SOSP07, Monrose et al. NDSS99] Special-purpose [Freivalds MFCS79, Golle-Mironov RSA01, Sion VLDB05, Michalakis et al. NSDI 07, Karame et al. ICIS09, Benabbas et al. CRYPTO11, Boneh- Freeman EUROCRYPT11, Fiore-Gennaro CCS12, Blanton et al. TISSEC13, ….] General but not geared toward practice: Use fully homomorphic encryption [Gennaro et al., Chung et al. CRYPTO10] Proof-based verifiable computation [Babai85, GMR85, BCC88, BGKW88, BFL90, BFLS91, FGLSS91, LFKN91, Shamir91 ALMSS92, AS92, Kilian92, Micali00, BGHSV04, BGSHV05, BS06, IKO07, GKR08, BCCT12, BCCT13, …]

38 A number of projects have produced experimental results
Pepper, Ginger, Zaatar, Allspice, Pantry, Buffet hotos11 ndss12 security12 eurosys13 oakland13 sosp13 ndss15 CMT, Thaler cmt itcs12 Thaler et al. hotcloud12 Thaler crypto13 GGPR, Pinocchio ggpr eurocrypt13 Parno et al. oakland13 BCGTV, BCTV bcgtv crypto13 bctv security14

39 Zaatar incorporates PCPs but not like this:
“f”, x client server y ... accept/reject ... The proof is too long to be transferred. We move out of the PCP setting: We will make computational assumptions We will allow # of query bits to be superconstant We will be loose about # of random bits

40 response scalars: q1w(j), q2w(j), …
client server “f” , x(j) y(j) commit request commit response w(j) query vectors: q1, q2, q3, … checks queries response scalars: q1w(j), q2w(j), … accept/reject

41 Programs compile to constraints over a finite field (Fp).
dec-by-three.c f(X) { Y = X − 3; return Y; } compiler 0 = Z − X, 0 = Z – 3 – Y Input/output pair correct ⟺ constraints satisfiable. As an example, suppose X = 7. if Y = 4 … if Y = 5 … 0 = Z – 7 0 = Z – 3 – 4 0 = Z – 7 0 = Z – 3 – 5 … there is a solution … there is no solution

42 Constraints can represent various program structures:
Z2 ← Z1 0 = Z1 – Z2 0 = (Z1 – Z2 )  M – Z3, 0 = (1 – Z3)  (Z1 – Z2 ) Z3 ← (Z1 != Z2) “Z1 < Z2” log |Fp| constraints loops unrolled Our compiler is derived from Fairplay [mnps security04]; it turns the program into list of assignments (SSA). We replaced the back-end (now it is constraints), and later the front-end (now it is C, inspired by [Parno et al. oakland13]).

43 The proof vector now encodes the assignment that satisfies the constraints.
1 1 1 = (Z1 – Z2 )  M 0 = Z3 − Z4 0 = Z3Z5 + Z6 − 5 219 2047 1 1 1013 1 1 1 1 Z1=23, Z2=187, …, 805 1 187 23 The savings from the change are enormous.

44 (1) If verification work is performed on a CPU
mat. mult. (m=150) Floyd-Warshall (m=25) root finding (m=256, L=8) PAM clustering (m=20, d=128) cross-over 25,000 inst. 43,000 inst. 210 inst. 22,000 inst. client CPU 21 mins. 5.9 mins. 2.7 mins. 4.5 mins. server CPU 12 months 8.9 months 22 hours 4.2 months (2) If we had free crypto hardware for verification … cross-over 4,900 inst. 8,000 inst. 40 inst. 5,000 inst. client CPU 4 mins. 1.1 mins. 31 secs. 61 secs. server CPU 2 months 1.7 months 4.2 hours 29 days

45 front-end (program translator)
back-end (argument) GGPR encoding client 0 = X + Z1 0 = Y + Z2 0 = Z1Z3 − Z2 …. F(){ [subset of C] } arith. circuit server constraints No state No RAM No dynamic control flow Design question: what can we put in the constraints so that satisfiability implies correct storage interaction?

46 A naive approach Variables M0, …, Msize hold state of storage or memory. B = M0 + (A − 0)  F0 B = M1 + (A − 1)  F1 B = M2 + (A − 2)  F2 B = Msize + (A − size)  Fsize B = read(A) Requires two variables for every possible address!

47 Recall: untrusted storage systems from crhfs
[SFSRO (Fu et al. osdi00); SUNDR (Mazières-Shasha podc02, Li et al. osdi04)] digest <H(b), b> cli. serv. block To compute with a remote input named by digest, the client: arranges to get block as input performs integrity check: hash(block)=digest executes the computation ? The steps above can happen remotely and verifiably.

48 Example: substring search over a remote data block
digest, pattern data block B client server “yes”/“no” server’s input client’s input substr (digest d, pattern p) { B = block whose digest is d if (p is a substring of B) return “yes”; else return “no”; } constraints that check hash(B)=d constraints that compute substring search using B, p The client is using the untrusted storage paradigm but is not handling the blocks.

49 (This is intuition, but we also have security proofs.)
client’s input: p, d constraints that check hash(B)=d server’s input: B constraints that compute substring search using B, p The idea: first set of constraints has multiple solutions, but it is computationally infeasible for the server to find them. (This is intuition, but we also have security proofs.) Nothing on this slide is a contribution Pantry’s contribution: bringing the folklore to life

50 Some system design questions
(1) What hash function should we use? We use an algebraic hash function that has an efficient representation as constraints [Ajtai96, ggh96] (2) What do applications look like in this framework?

51 The client is assured that a MapReduce job was performed correctly—without ever touching the data.
map(), reduce(), in_digests client Mi Ri out_digests The two phases are handled separately: mappers reducers in = vget(in_digest); out = map(in); for r=1,…,R: d[r] = vput(out[r]) for m=1,…,M: in[m] = vget(e[m]); out = reduce(in); out_digest = vput(out); in_digests

52 Hidden state applications under Pantry
lookup( ) client server “yes”, proof accept/reject list of faces Key idea: Pantry’s storage plus zero knowledge variant of Pinocchio [Parno et al., oakland13] Requires small adjustments in protocol (b/c digests don’t hide state) Other applications: tolling, regression analysis, etc. Upshot: write C programs, get powerful guarantees

53 Evaluation questions When does the Pantry client save resources relative to locally executing the computation? What are the costs of supporting hidden state?

54 Compiler pipeline implemented in C++, Java, Go, Python
Server is distributed; communication uses OpenMPI Evaluation platform: a cluster at Texas Advanced Computing Center (tacc) Each machine runs Linux on an Intel Xeon 2.7 GHz with 32gb of ram.

55 Pantry’s client saves resources with sufficiently large inputs
baseline CPU time (minutes) Pantry number of nucleotides in the input dataset (billions) Nucleotide substring search: A mapper gets 600k nucleotides and outputs matching locations. One reducer per 10 mappers. The graph is an extrapolation but is nonetheless encouraging.

56 Cost of supporting hidden state applications
Server holds 128 face fingerprints (hidden state: 15KB) The good news: Proof size: 288 bytes Client’s CPU time: 7 ms The bad news: Network cost (setup) and storage cost (ongoing): 170MB Server’s CPU time: 7.8 minutes

57 Pantry Buffet C Si C S C S C S map(), reduce(), input filenames
output filenames Pantry [sosp13] query, digest C S DB result f f, x v = *x; w = *v; C S y Buffet f [ndss15] f, x for (…) if (…) break; C S y

58 front-end (program translator)
back-end (argument) GGPR encoding client 0 = X + Z1 0 = Y + Z2 0 = Z1Z3 − Z2 …. F(){ [subset of C] } arith. circuit server constraints No state No RAM No dynamic control flow

59 … Alternate approach to front-end: simulate a CPU
[bcgtv crypto13] [bctv usenix Security14] fetch-decode-execute 0 fetch-decode-execute 1 fetch-decode-execute T CPU state: pc, regs, … CPU state: pc, regs, … CPU state: pc, regs, … Builds on (optimized) Pinocchio backend Front-end has great programmability, but …

60 … Alternate approach to front-end: simulate a CPU
[bcgtv crypto13] [bctv usenix Security14] CPU state: pc, regs, … fetch-decode-execute 1 fetch-decode-execute 0 fetch-decode-execute T CPU state: pc, regs, … CPU state: pc, regs, … Builds on (optimized) Pinocchio backend Front-end has great programmability, but …

61 state[1].instr = load(state[1].pc)
decode_instr(&op, &d, &a1, &a2); switch (op): case ADD: state[2].regs[d] = a1+a2; next_fl = (a1+a2) > NREG; break; case JMP:

62 Alternate approach to front-end: simulate a CPU
[bcgtv crypto13] [bctv usenix Security14] Can we incorporate RAM and dynamic control flow without representing a CPU’s execution in constraints? fetch-decode-execute 0 fetch-decode-execute 1 fetch-decode-execute T CPU state: pc, regs, … CPU state: pc, regs, … CPU state: pc, regs, … Builds on (optimized) Pinocchio backend Front-end has great programmability, but … … costs orders of magnitude more than other approaches

63 Can we incorporate RAM and dynamic control flow without representing a CPU’s execution in constraints? Yes! Our Buffet system: [NDSS15] Adapts and refines BCTV’s RAM technique Supports data-dependent control flow, via loop flattening

64 Inspired by parallelizing compilers
state = dummy = 0 while dummy < MAXITER: if state == 0: if j < MAX1: <BODY 1> limit = get_limit(j) i = 0 state = 1 else: state = 3 if state == 1: if i < limit: <BODY 2> i++ state = 2 if state == 2: <BODY 3> state = 0 dummy++ Inspired by parallelizing compilers [gf ppopp95, knd ishpc05, Knijnenburg cpc98, Polychronopoulos icpp97, ycfvegh08] Generalizes to break, arbitrary nesting, sequential loops, etc. while (j < MAX1): <BODY 1> // data dependent limit = get_limit(j) for i in [0, limit): <BODY 2> <BODY 3> This is done via source-to-source translation.

65 Can we incorporate RAM and dynamic control flow without representing a CPU’s execution in constraints? Yes! Our Buffet system: [NDSS15] Adapts and refines BCTV’s RAM technique Supports data-dependent control flow, via loop flattening Upshot: Buffet efficiently supports all of C, except function pointers and goto.


Download ppt "(Implementations of) verifiable computation and succinct arguments: survey and wishlist Michael Walfish NYU."

Similar presentations


Ads by Google