+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute.

Slides:



Advertisements
Similar presentations
FULLY HOMOMORPHIC ENCRYPTION
Advertisements

Paper by: Craig Gentry Presented By: Daniel Henneberger.
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Sponsors: National Science Foundation, LogicBlox Inc., and NVIDIA Kernel.
Monte-Carlo method and Parallel computing  An introduction to GPU programming Mr. Fang-An Kuo, Dr. Matthew R. Smith NCHC Applied Scientific Computing.
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
Why GPU Computing. GPU CPU Add GPUs: Accelerate Science Applications © NVIDIA 2013.
Simulation of Microwave Induced Thermoacoustic Imaging Model using GPU Nilangshu Bidyanta Ramaprasad Kulkarni ECE 562 Term Project.
Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)
OpenFOAM on a GPU-based Heterogeneous Cluster
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.
Name: Kaiyong Zhao Supervisor: Dr. X. -W Chu. Background & Related Work Multiple-Precision Integer GPU Computing & CUDA Multiple-Precision Arithmetic.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Simons Institute, Cryptography Boot Camp
Debunking the 100X GPU vs. CPU Myth: An Evaluation of Throughput Computing on CPU and GPU Presented by: Ahmad Lashgar ECE Department, University of Tehran.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.
DELL PowerEdge 6800 performance for MR study Alexander Molodozhentsev KEK for RCS-MR group meeting November 29, 2005.
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Computer Graphics Graphics Hardware
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Christopher Mitchell CDA 6938, Spring The Discrete Cosine Transform  In the same family as the Fourier Transform  Converts data to frequency domain.
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
Parallelizing Security Checks on Commodity Hardware E.B. Nightingale, D. Peek, P.M. Chen and J. Flinn U Michigan.
Advisor: Dr. Aamir Shafi Co-Advisor: Mr. Ali Sajjad Member: Dr. Hafiz Farooq Member: Mr. Tahir Azim Optimizing N-body Simulations for Multi-core Compute.
Use/User:LabServerField Engineer Electrical Engineer Software Engineer Mechanical Engineer Requirements: Small form factor.
£899 – Ultimatum Computers indiegogo.com/ultimatumcomputers The Ultimatum.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Taking the Complexity out of Cluster Computing Vendor Update HPC User Forum Arend Dittmer Director Product Management HPC April,
YOU LI SUPERVISOR: DR. CHU XIAOWEN CO-SUPERVISOR: PROF. LIU JIMING THURSDAY, MARCH 11, 2010 Speeding up k-Means by GPUs 1.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
CUDA Performance Study on Hadoop MapReduce Clusters Chen He Peng Du University of Nebraska-Lincoln.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
Accelerating Homomorphic Evaluation on Reconfigurable Hardware Thomas Pöppelmann, Michael Naehrig, Andrew Putnam, Adrian Macias.
GPU DAS CSIRO ASTRONOMY AND SPACE SCIENCE Chris Phillips 23 th October 2012.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Jie Chen. 30 Multi-Processors each contains 8 cores at 1.4 GHz 4GB GDDR3 memory offers ~100GB/s memory bandwidth.
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
China Summer School on Lattices and Cryptography Craig Gentry and Shai Halevi June 4, 2014 Homomorphic Encryption over Polynomial Rings.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
CS 732: Advance Machine Learning
Computer Hardware & Processing Inside the Box CSC September 16, 2010.
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
Moore’s Law Electronics 19 April Moore’s Original Data Gordon Moore Electronics 19 April 1965.
KEYNOTE OF THE FUTURE 1: CIARA MOORE CSIT PhD Student QUEEN’S UNIVERSITY BELFAST.
Parallel Computers Today Oak Ridge / Cray Jaguar > 1.75 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  TFLOPS = floating.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
S. Pardi Frascati, 2012 March GPGPU Evaluation – First experiences in Napoli Silvio Pardi.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
VGOS GPU Based Software Correlator Design Igor Surkis, Voytsekh Ken, Vladimir Mishin, Nadezhda Mishina, Yana Kurdubova, Violet Shantyr, Vladimir Zimovsky.
Sobolev(+Node 6, 7) Showcase +K20m GPU Accelerator.
Computer Graphics Graphics Hardware
NFV Compute Acceleration APIs and Evaluation
Brad Baker, Wayne Haney, Dr. Charles Choi
HISTORY OF MICROPROCESSORS
HISTORY OF MICROPROCESSORS
Computer Graphics Graphics Hardware
PANN Testing.
Presentation transcript:

+ Accelerating Fully Homomorphic Encryption on GPUs Wei Wang, Yin Hu, Lianmu Chen, Xinming Huang, Berk Sunar ECE Dept., Worcester Polytechnic Institute

+ Fully Homomorphic Encryption Introduced by Gentry in 2009 Powerful! Arbitrary depth circuits evaluated on fixed sized ciphertexts Impractical, for now.. Very Slow (~30 sec for reencryption) Large Public Keys (100’s Mbytes) Lampson (CryptDB): “I don’t think we’ll see anyone using Gentry’s solution in our lifetimes.” (Forbes, Dec 2011)

+ If history teaches us anything.. RSA was introduced in 1978 Intel 8086 was introduced 4-10 Mhz 1024-RSA enc. would take at least 10 minutes (est.) RSA circuit layed out in MIT basketball court (Shamir & Rivest)

+ Today RSA is used in >90% of secure connections (Intel Whitepaper) Runs in ~100’s msec on cell phones Moore’s Law and algorithmic improvements! Question: Can we expect the same for FHE?

+ What is FHE?

+ The Gentry-Halevi FHE Scheme

+

+ Parameters of Gentry’s Homomorphic Scheme Dimension dEncryptDecryptRecrypt sec sec sec0.02 sec32 sec sec0.13 sec2.8 min min0.66 sec31 min Gentry’s implementation was running on an IBM System x3500 server, featuring a 64-bit quad-core Intel Xeon E5450 processor, running at 3GHz, with 12 MB L2 cache and 24GB of RAM.

+ CPU vs. GPU Hardware GPUs are ideal for FHE Multiple ALUs Fast onboard memory High throughput on parallel tasks

+ Fast Multiplications on GPUs

+ CPUGPU Size in K bits Intel Xeon X5650 processor running at 2.67GHz with 24GB RAM Build with NTL/GMP NVIDIA Tesla C2050, 448 CUDA cores, 1.15 GHz, 3GB GDDR5* memory 1024 x ms0.765 ms 2048 x ms1.483 ms 4094 x ms3.201 ms

+ Modular Multiplication

+ GPU Implementation of FHE The Decrypt process The most computation- intensive part is the large- number modular multiplication. Applying the FFT based Strassen algorithm and Barrett reduction results significant speedup.

+ GPU Implementation of FHE

+

Performance FHE Primitives CPUGPU Speedup Platform Intel Xeon X5650 processor running at 2.67GHz with 24GB RAM Build with NTL/GMP NVIDIA Tesla C2050, 448 CUDA cores, 1.15 GHz, 3GB GDDR5* memory Encryption 1.69 sec0.22 msec x7.7 Decryption 18.5 msec2.5 msecx7.5 Recryption sec4.2 sec x6.6 *Based on small setting (dimension n=2048).

+ Thanks!