Verifying remote executions: from wild implausibility to near practicality Michael Walfish NYU and UT Austin.

Slides:



Advertisements
Similar presentations
Designed and Presented by Dr. Ayman Elshenawy Elsefy Dept. of Systems & Computer Eng.. Al-Azhar University
Advertisements

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Data Dependencies Describes the normal situation that the data that instructions use depend upon the data created by other instructions, or data is stored.
EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
SQL Performance 2011/12 Joe Chang, SolidQ
Inpainting Assigment – Tips and Hints Outline how to design a good test plan selection of dimensions to test along selection of values for each dimension.
The Assembly Language Level
Making proof-based verified computation almost practical Michael Walfish The University of Texas at Austin.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Making argument systems for outsourced computation practical (sometimes) Srinath Setty, Richard McPherson, Andrew J. Blumberg, and Michael Walfish The.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
On-The-Fly Verification of Rateless Erasure Codes Max Krohn (MIT CSAIL) Michael Freedman and David Mazières (NYU)
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
(Implementations of) verifiable computation and succinct arguments: survey and wishlist Michael Walfish NYU.
ELI BEN-SASSON, ALESSANDRO CHIESA, ERAN TROMER AND MADARS VIRZA USENIX SECURITY SYMPOSIUM 2014 Succinct Non-Interactive Zero Knowledge for a von Neumann.
Efficient Consistency Proofs for Generalized Queries on a Committed Database R. Ostrovsky C. Rackoff A. Smith UCLA Toronto.
A hybrid architecture for interactive verifiable computation
Cong Wang1, Qian Wang1, Kui Ren1 and Wenjing Lou2
Map Reduce: Simplified Data Processing On Large Clusters Jeffery Dean and Sanjay Ghemawat (Google Inc.) OSDI 2004 (Operating Systems Design and Implementation)
Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Privacy-Preserving Public Auditing for Secure Cloud Storage
Making verifiable computation a systems problem Michael Walfish The University of Texas at Austin.
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
From Viewstamped Replication to BFT Barbara Liskov MIT CSAIL November 2007.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
Byzantine fault tolerance
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
Robustness in the Salus scalable block store Yang Wang, Manos Kapritsos, Zuocheng Ren, Prince Mahajan, Jeevitha Kirubanandam, Lorenzo Alvisi, and Mike.
Merkle trees Introduced by Ralph Merkle, 1979 An authentication scheme
Non-Interactive Verifiable Computing August 5, 2009 Bryan Parno Carnegie Mellon University Rosario Gennaro, Craig Gentry IBM Research.
SIGCOMM 2012 (August 16, 2012) Private and Verifiable Interdomain Routing Decisions Mingchen Zhao * Wenchao Zhou * Alexander Gurney * Andreas Haeberlen.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Verifiable Cloud Computing KANG Yu. Verifiable Computation Weak clients Computationally powerful cloud Goal: – Verify the computing result.
CS4432: Database Systems II Query Processing- Part 2.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Efficient Private Matching and Set Intersection Mike Freedman, NYU Kobbi Nissim, MSR Benny Pinkas, HP Labs EUROCRYPT 2004.
Secure Data Outsourcing
CS Class 04 Topics  Selection statement – IF  Expressions  More practice writing simple C++ programs Announcements  Read pages for next.
ALITHEIA: Towards Practical Verifiable Graph Processing Yupeng Zhang, Charalampos Papamanthou and Jonathan Katz University of Maryland.
Advanced Algorithms Analysis and Design
Problem: Internet diagnostics and forensics
Verifiable Databases and RAM Programs
Bryan Pano, Jon Howell, Craig Gentry, Mariana Raykova
(Implementations of) verifiable computation and succinct arguments: survey and wishlist Michael Walfish NYU.
On the Size of Pairing-based Non-interactive Arguments
How will execution time grow with SIZE?
MPC and Verifiable Computation on Committed Data
Genomic Data Clustering on FPGAs for Compression
R.G.L.M Samarawickrama , D. N. Ranasinghe , T. Sritharan
Cse 344 May 4th – Map/Reduce.
Objective of This Course
Building a Database on S3
From Viewstamped Replication to BFT
Searching, Sorting, and Asymptotic Complexity
Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware Kriti shreshtha.
In the name of God.
Helen: Maliciously Secure Coopetitive Learning for Linear Models
MapReduce: Simplified Data Processing on Large Clusters
Presentation transcript:

Verifying remote executions: from wild implausibility to near practicality Michael Walfish NYU and UT Austin

Acknowledgment Andrew J. Blumberg (UT), Benjamin Braun (UT), Ariel Feldman (UPenn), Richard McPherson (UT), Nikhil Panpalia (Amazon), Bryan Parno (MSR), Zuocheng Ren (UT), Srinath Setty (UT), and Victor Vu (UT).

The motivation is 3 rd party computing: cloud, volunteers, etc. We want this to be: 1. Unconditional, meaning no assumptions about the server 2. General-purpose, meaning arbitrary f 3. Practical, or at least conceivably practical soon “f”, x y, aux. clientserver check whether y = f(x), without computing f(x) Problem statement: verifiable computation

REJECTACCEPT “f”, x y clientserver y’... Theory can help. Consider the theory of Probabilistically Checkable Proofs (PCPs). [ ALMSS 92, AS 92 ] But the constants are outrageous  Under naive PCP implementation, verifying multiplication of 500×500 matrices would cost 500+ trillion CPU-years  This does not save work for the client

This research area is thriving. HOTOS 11 NDSS 12 SECURITY 12 EUROSYS 13 OAKLAND 13 SOSP 13 We have refined several strands of theory. We have reduced the costs of a PCP-based argument system [ IKO CCC 07] by 20 orders of magnitude. We predict that PCP-based machinery will be a key tool for building secure systems. CMT ITCS 12 TRMP HOTCLOUD 12 BCGT ITCS 13 GGPR EUROSYS 13 PGHR OAKLAND 13 Thaler CRYPTO 13 BCGTV CRYPTO 13 …. We have implemented the refinements.

(1)Zaatar: a PCP-based efficient argument (2) Pantry: extending verifiability to stateful computations (3) Landscape and outlook [ NDSS 12, SECURITY 12, EUROSYS 12] [ SOSP 13]

ACCEPT / R EJECT “f”, x clientserver y... The proof is not drawn to scale: it is far too long to be transferred. Zaatar incorporates PCPs but not like this: Even the asymptotically short PCPs seem to have high constants. We move out of the PCP setting: we make computational assumptions. (And we allow # of query bits to be superconstant.) [ BGHSV CCC 05, BGHSV SIJC 06, Dinur JACM 07, Ben-Sasson & Sudan SIJC 08 ]

client server... [ IKO CCC 07 ]... server client commit request commit response q 1  w q 2  w q 3  w … Zaatar uses an efficient argument [Kilian CRYPTO 92,95 ] : Instead of transferring the PCP … q 1, q 2, q 3, … PCPQuery(q){ return ; } ACCEPT / REJECT efficient checks [ ALMSS 92 ] queries

The server’s vector w encodes an execution trace of f(x). w f ( ) What is in w ? (1)An entry for each wire; and (2)An entry for the product of each pair of wires. x x0x0 x1x1 xnxn … y0y0 y1y [ ALMSS 92 ]

... server client commit request commit response q 1  w q 2  w q 3  w q 1, q 2, q 3, … PCPQuery(q){ return ; } ACCEPT / REJECT queries This is still too costly (by a factor of ), but it is promising. efficient checks Zaatar uses an efficient argument [Kilian CRYPTO 92,95 ] : [ IKO CCC 07 ] [ ALMSS 92 ]

y commit request commit response w ACCEPT / REJECT response scalars: q 1  w, q 2  w, … client server checks queries query vectors: q 1, q 2, q 3, …, x “f” Zaatar incorporates refinements to [ IKO CCC 07 ], with proof.

query vectors: q 1, q 2, q 3, … w (1) w (2) w (3) client server The client amortizes its overhead by reusing queries over multiple runs. Each run has the same f but different input x.

y (j) commit request commit response query vectors: q 1, q 2, q 3, … w (j) ✔ ACCEPT / REJECT response scalars: q 1  w (j), q 2  w (j), … client server checks queries, x (j) “f”

Boolean circuit Arithmetic circuit Arithmetic circuit with concise gates × × × × abab  abab abab something gross Unfortunately, this computational model does not really handle fractions, comparisons, logical operations, etc.

if Y = 4 … … there is a solution Input/output pair correct constraints satisfiable. 0 = Z – 7 0 = Z – 3 – 4 Programs compile to constraints over a finite field (F p ). f(X) { Y = X − 3; return Y; } 0 = Z − X, 0 = Z – 3 – Y As an example, suppose X = 7. if Y = 5 … … there is no solution 0 = Z – 7 0 = Z – 3 – 5 dec-by-three.c compiler

How concise are constraints? Z 3 ← (Z 1 != Z 2 ) We replaced the back-end (now it is constraints), and later the front-end (now it is C, inspired by [Parno et al. OAKLAND 13 ] ). log |F p | constraints “Z 1 < Z 2 ” loops unrolled 0 = (Z 1 – Z 2 )  M – Z 3, 0 = (1 – Z 3 )  (Z 1 – Z 2 ) Our compiler is derived from Fairplay [ MNPS SECURITY 04 ] ; it turns the program into list of assignments (SSA).

if (X 1 < X 2 ) Y ← 3 else Y ← 4 M  {C < }, 0 = M  (Y – 3), (1 – M)  {C ≥ }, 0 = (1 – M)  (Y – 4) C < = 0 = B 0 (1 – B 0 ), 0 = B 1 (2 – B 1 ), …, 0 = B N–2  (2 N–2 – B N–2 ), X 1 – X 2 = (p – 2 N–1 ) + (B 0 +…+B N–2 ) Recall that the constraints are over F p. C < is satisfiable iff X 1 − X 2 is in the “negative” range [p − 2 N–1, p).

Y ← (X 1 != X 2 ) 0 = (X 1 – X 2 )  M – Y, 0 = (1 – Y)  (X 1 – X 2 ) Z 1 ← Z 2 0 = Z 1 – Z 2 program excerpt constraints Z 1 ← Z 2 + Z 3  Z 4 0 = Z 1 – Z 2 – Z 3  Z 4 Y ← X 1 NAND X 2 0 = 1 – X 1  X 2 – Y Y ← X 1 OR X 2 0 = (1 – X 1 )  ( X 2 – 1) + 1 – Y

Z 1 =23, Z 2 =187, …, w w The proof vector now encodes the assignment that satisfies the constraints = (Z 1 – Z 2 )  M 0 = Z 3 − Z 4 0 = Z 3 Z 5 + Z 6 − 5 The savings from the change are enormous.

“f”, x (j) y (j) commit request commit response query vectors: q 1, q 2, q 3, … w (j) ✔ ✔ ACCEPT / REJECT response scalars: q 1  w (j), q 2  w (j), … client server checks queries

w server We (mostly) eliminate the server’s PCP-based overhead. before: # of entries quadratic in computation size after: # of entries linear in computation size Now, the server’s overhead is mainly in the cryptographic machinery and the constraint model itself. The client and server reap tremendous benefit from this change.

|w|=|Z| 2 PCP verifier This resolves a conjecture of Ishai et al. [ IKO CCC07 ] Any computation has a linear PCP whose proof vector is (quasi)linear in the computation size. (Also shown by [ BCIOP TCC 13].) [ GGPR Eurocrypt 2013] linearity test quad corr. test circuit test commit request commit response queries responses π(q 1 ), …, π(q u ) π(  )= q 1, q 2, …, q u (z, z ⊗ z) (z, h) server client new quad.test |w|=|Z|+|C| w w

“f”, x (j) y (j) query vectors: q 1, q 2, q 3, … w (j) ✔ ✔ ✔ ACCEPT / REJECT response scalars: q 1  w (j), q 2  w (j), … ✔ client server checks queries commit request commit response

client We strengthen the linear commitment primitive of [IKO CCC07]. PCP tests q 1, q 2, …, q u π(q 1 ), …, π(q u ) This saves orders of magnitude in cryptographic costs. PCP verifier (q i, t i ) (π(q i ), π(t i )) π(t i ) = π(r i ) + α i  π (q i ) ? Enc(r i ) Enc(π(r i )) t i = r i + α i  q i π(t) = π(r) + α 1  π (q 1 ) + … + α u  π (q u ) t = r + α 1  q 1 + … + α u  q u ? π(  ) ? server (q 1, …, q u, t) (π(q 1 ), …, π(q u ), π(t)) Enc(r) Enc(π(r))

“f”, x (j) y (j) commit request commit response query vectors: q 1, q 2, q 3, … w (j) ✔ ✔ ✔ ACCEPT / REJECT response scalars: q 1  w (j), q 2  w (j), … ✔ client server checks queries ✔

Our implementation of the server is massively parallel; it is threaded, distributed, and accelerated with GPUs. Some details of our evaluation platform:  It uses a cluster at Texas Advanced Computing Center ( TACC )  Each machine runs Linux on an Intel Xeon 2.53 GHz with 48 GB of RAM.

Amortized costs for multiplication of 256×256 matrices: Under the theory, naively applied Under Zaatar client CPU time>100 trillion years1.2 seconds server CPU time>100 trillion years1 hour However, this assumes a (fairly large) batch.

1.What are the cross-over points? 2.What is the server’s overhead versus native execution? 3.At the cross-over points, what is the server’s latency?

native (slope: 50 ms/inst) Zaatar (slope: 33 ms/inst) verification cost (minutes of CPU time) instances of 150 x 150 matrix multiplication The cross-over point is high but not totally ridiculous.

The server’s costs are unfortunately very high.

cross-over25,000 inst.43,000 inst.210 inst.22,000 inst. client CPU21 mins.5.9 mins.2.7 mins.4.5 mins. server CPU12 months8.9 months22 hours4.2 months (1) If verification work is performed on a CPU (2) If we had free crypto hardware for verification … mat. mult. (m=150) Floyd-Warshall (m=25) root finding (m=256, L=8) PAM clustering (m=20, d=128) cross-over4,900 inst.8,000 inst.40 inst.5,000 inst. client CPU4 mins.1.1 mins.31 secs.61 secs. server CPU2 months1.7 months4.2 hours29 days

4 cores20 cores60 cores (ideal)60 cores Parallelizing the server results in near-linear speedup. matrix mult. (m=150) Floyd-Warshall (m=25) root finding (m=256, L=8) PAM clustering (m=20, d=128)

(1)The server’s burden is too high, still. (2)The client requires batching to break even. (3)The computational model is stateless (and does not allow external inputs or outputs!). Zaatar is encouraging, but it has limitations:

(1)Zaatar: a PCP-based efficient argument (2) Pantry: extending verifiability to stateful computations (3) Landscape and outlook [ NDSS 12, SECURITY 12, EUROSYS 12] [ SOSP 13]

before: F, x y after: Pantry creates verifiability for real-world computations  C supplies all inputs  F is pure (no side effects)  All outputs are shipped back C S query, digest result C S F, x y C S RAM DB C map(), reduce(), input filenames output filenames SiSi

y (j) commit request commit response query vectors: q 1, q 2, q 3, … w (j) ACCEPT / REJECT response scalars: q 1  w (j), q 2  w (j), … client server checks, x (j) “f”

client server client server GGPR encoding arith. circuit F(){ [subset of C] } constraints (E) F, x y “E ( X=x,Y=y ) has a satisfying assignment” The compiler pipeline decomposes into two phases. 0 = X + Z 1 0 = Y + Z 2 0 = Z 1 Z 3 − Z 2 …. “If E ( X=x,Y=y ) is satisfiable, computation is done right.” = Design question: what can we put in the constraints so that satisfiability implies correct storage interaction?

Representing “load(addr)” explicitly would be horrifically expensive. How can we represent storage operations? Straw man: variables M 0, …, M size contain state of memory. B = M 0 + (A − 0)  F 0 B = M 1 + (A − 1)  F 1 B = M 2 + (A − 2)  F 2 … B = M size + (A − size)  F size Requires two variables for every possible memory address! B = load(A)

 They bind references to values  They provide a substrate for verifiable RAM, file systems, … [Merkle CRYPTO 87, Fu et al. OSDI 00, Mazières & Shasha PODC 02, Li et al. OSDI 04] How can we represent storage operations? (2) Consider content hash blocks: Key idea: encode the hash checks in constraints  This can be done (reasonably) efficiently Folklore: “this should be doable.” (Pantry’s contribution: “it is.”) digest block cli. serv. hash(block) = digest ?

d = hash(Z) add_indirect(digest d, value x) { value z = vget(d); y = z + x; return y; } y = Z + x We augment the subset of C with the semantics of untrusted storage  block = vget(digest): retrieves block that must hash to digest  hash(block) = vput(block): stores block; names it with its hash Server is obliged to supply the “correct” Z (meaning something that hashes to d).

constraints (E) cli serv QAP circuit subset of C + {vput, vget} C with RAM, search trees map(), reduce() client server F, x y Putting the pieces together [ Merkle 87 ] =  recall: “I know a satisfying assignment to E ( X=x,Y=y ) ”  checks-of-hashes pass satisfying assignment identified  checks-of-hashes pass storage interaction is correct  storage abstractions can be built from {vput(), vget()}

How can we represent storage operations? (Hint: consider content hash blocks: blocks named by a cryptographic hash, or digest, of their contents.) Srinath will tell you how.

The client is assured that a MapReduce job was performed correctly—without ever touching the data. in_digests map(), reduce(), in_digests out_digests The two phases are handled separately: mappers … … in = vget(in_digest); out = map(in); for r=1,…,R: d[r] = vput(out[r]) reducers … … for m=1,…,M: in[m] = vget(e[m]); out = reduce(in); out_digest = vput(out); … client MiMi RiRi

The client is assured that a MapReduce job was performed correctly—without ever touching the data. in_digests map(), reduce(), in_digests out_digests The two phases are handled separately: mappers … … reducers … … … client MiMi RiRi

Pantry baseline CPU time (minutes) number of nucleotides in the input dataset (billions) Example: for a DNA subsequence search, the client saves work, relative to performing the computation locally.  A mapper gets 600k nucleotides and outputs matching locations  One reducer per 10 mappers  The graph is an extrapolation

Pantry applies fairly widely  Privacy-preserving facial recognition query, digest result client server DB  Verifiable queries in (highly restricted) subset of SQL  Our implementation works with Zaatar and Pinocchio [Parno et al. OAKLAND 13]  Our implemented applications include:

(1)Zaatar: a PCP-based efficient argument (2) Pantry: extending verifiability to stateful computations (3) Landscape and outlook [ NDSS 12, SECURITY 12, EUROSYS 12] [ SOSP 13]

We describe the landscape in terms of our three goals. Gives up being unconditional or general-purpose:  Replication [Castro & Liskov TOCS02], trusted hardware [Chiesa & Tromer ICS10, Sadeghi et al. TRUST10], auditing [Haeberlen et al. SOSP07, Monrose et al. NDSS99]  Special-purpose [Freivalds MFCS79, Golle & Mironov RSA01, Sion VLDB05, Michalakis et al. NSDI 07, Benabbas et al. CRYPTO11, Boneh & Freeman EUROCRYPT11] Unconditional and general-purpose but not geared toward practice:  Use fully homomorphic encryption [Gennaro et al., Chung et al. CRYPTO10]  Proof-based verifiable computation [GMR85, Ben-Or et al. STOC88, BFLS91, Kilian STOC92, ALMSS92, AS92, GKR STOC08, Ben-Sasson et al. STOC13, Bitansky et al. STOC13, Bitanksy et al. ITCS12]

Experimental results are now available from four projects. Pepper, Ginger, Zaatar, Allspice, Pantry HOTOS 11 NDSS 12 SECURITY 12 EUROSYS 13 OAKLAND 13 SOSP 13 CMT ITCS 12 Thaler et al. HOTCLOUD 12 Thaler CRYPTO 13 CMT, Thaler Pinocchio GGPR EUROCRYPT 13 Parno et al. OAKLAND 13 BCGTV BCGTV CRYPTO 13 BCGT ITCS 13 BCIOP TCC 13

applicable computations setup costs“regular”straightline pure, no RAM stateful, RAM general loops none (fast prover) Thaler [ CRYPTO 13] none CMT, TRMP [ ITCS, Hotcloud 12] low Allspice [Oakland13] medium Pepper [ NDSS 12] Ginger [Security12] Zaatar [Eurosys13] Pantry [ SOSP 13] high Pinocchio [Oakland13] Pantry [ SOSP 13] very high BCGTV [ CRYPTO 13] BCGTV [ CRYPTO 13] A key trade-off is performance versus expressiveness. better crypto properties: ZK, non-interactive, etc. better more expressive lower cost, less crypto

 Data are from our re-implementations and match or exceed published results.  All experiments are run on the same machines (2.7Ghz, 32GB RAM). Average 3 runs (experimental variation is minor).  Benchmarks: 150×150 matrix multiplication and clustering algorithm Quick performance comparison

The cross-over points can sometimes improve, at the cost of expressiveness.

10 11 The server’s costs are high across the board.

Summary of performance in this area  None of the systems is at true practicality  Server’s costs still a disaster (though lots of progress)  Client approaches practicality, at the cost of generality  Otherwise, there are setup costs that must be amortized  (We focused on CPU; network costs are similar.)

 Can we design more efficient constraints or circuits?  Can we apply cryptographic and complexity-theoretic machinery that does not require a setup cost?  Can we provide comprehensive secrecy guarantees?  Can we extend the machinery to handle multi-user databases (and a system of real scale)? Research questions:

 We have reduced the costs of a PCP-based argument system [Ishai et al., CCC07] by 20 orders of magnitude  We broaden the computational model, handle stateful computations (MapReduce, etc.), and include a compiler  There is a lot of exciting activity in this research area  This is a great research opportunity:  There are still lots of problems (prover overhead, setup costs, the computational model)  The potential is large, and goes far beyond cloud computing Summary and take-aways

Appendix Slides

PERFORMANCE COMPARISON

 A system is included iff it has published experimental results.  Data are from our re-implementations and match or exceed published results.  All experiments are run on the same machines (2.7Ghz, 32GB RAM). Average 3 runs (experimental variation is minor).  For a few systems, we extrapolate from detailed microbenchmarks  Measured systems:  General-purpose: IKO, Pepper, Ginger, Zaatar, Pinocchio  Special-purpose: CMT, Pepper-tailored, Ginger-tailored, Allspice  Benchmarks: 150×150 matrix multiplication and clustering algorithm (others in our papers) Experimental setup and ground rules

×150 matrix multiplication Pepper GingerZaatar Pinocchio Ishai et al. (PCP-based efficient argument) verification cost (ms of CPU time) 50 ms 5 ms Verification cost sometimes beats (unoptimized) native execution.

Some of the general-purpose protocols have reasonable cross-over points. 1.6 days instances of 150x150 matrix multiplication

15K 30K 45K 60K 1.2B 450K 25.5K 50.5K 22K 4.5M matrix multiplication (m=150)PAM clustering (m=20, d=128) Ginger Zaatar Pinocchio 7.4K Ginger-tailored Allspice CMT 71 Ginger Zaatar Pinocchio Ginger-tailored Allspice CMT N/A cross-over point The cross-over points can sometimes improve with special-purpose protocols.

When Allspice is applicable, it has low cross-over points. Zaatar Allspice47715 mat. mult. (m=128) poly. eval (m=512) root finding (m=256, L=8) But, of our benchmarks, CMT-improved does not apply to:  PAM clustering  Longest common subsequence  Floyd-Warshall

server’s cost normalized to native C matrix multiplication (m=150)PAM clustering (m=20, d=128) Pepper Ginger Pinocchio Zaatar Ginger-tailored CMT native C Pepper-tailored Allspice Pepper Ginger Pinocchio Zaatar Ginger-tailored CMT native C Pepper-tailored Allspice N/A The server’s costs are pretty much preposterous.

Ginger-tailored server’s cost normalized to native C matrix multiplication (m=150)PAM clustering (m=20, d=128) Pepper Ginger Pinocchio IKO Zaatar Ginger-tailored CMT native C Pepper-tailored Allspice Pepper Ginger Pinocchio IKO Zaatar CMT native C Pepper-tailored Allspice N/A The server’s costs are pretty much preposterous.