Implementation of the RSA Algorithm on a Dataflow Architecture

Slides:



Advertisements
Similar presentations
Acceleration of Cooley-Tukey algorithm using Maxeler machine
Advertisements

Fourier Transform Fourier transform decomposes a signal into its frequency components Used in telecommunications, data compression, digital signal processing,
CDA 3101 Fall 2010 Discussion Section 08 CPU Performance
School of Engineering & Technology Computer Architecture Pipeline.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Fast Modular Reduction
Cryptography and Network Security Chapter 5 Fifth Edition by William Stallings Lecture slides by Lawrie Brown.
22C:19 Discrete Structures Integers and Modular Arithmetic
Capstone Project Presentation A Tool for Cryptography Problem Generation CSc 499 Mark Weston Winter 2006.
22C:19 Discrete Math Integers and Modular Arithmetic Fall 2010 Sukumar Ghosh.
1 EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY Adnan Gutub Hassan Tahhan Computer Engineering Department KFUPM, Dhahran, SAUDI ARABIA.
Abdullah Sheneamer CS591-F2010 Project of semester Presentation University of Colorado, Colorado Springs Dr. Edward RSA Problem and Inside PK Cryptography.
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Wilson W. L. Fung Ivan Sham George Yuan Tor M. Aamodt Electrical and Computer Engineering.
1. RSA basics 2. Key generation 3. What it would take to break RSA
Lecture 2 Aug 28 goals: Introduction to recursion examples of recursive programs.
CS 584. Logic The art of thinking and reasoning in strict accordance with the limitations and incapacities of the human misunderstanding. The basis of.
Design of a Reconfigurable Hardware For Efficient Implementation of Secret Key and Public Key Cryptography.
VLSI Design Spring03 UCSC By Prof Scott Wakefield Final Project By Shaoming Ding Jun Hu
An Expandable Montgomery Modular Multiplication Processor Adnan Abdul-Aziz GutubAlaaeldin A. M. Amin Computer Engineering Department King Fahd University.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
IHP Im Technologiepark Frankfurt (Oder) Germany IHP Im Technologiepark Frankfurt (Oder) Germany ©
M. Interleaving Montgomery High-Radix Comparison Improvement Adders CLA CSK Comparison Conclusion Improving Cryptographic Architectures by Adopting Efficient.
CSE 246: Computer Arithmetic Algorithms and Hardware Design Numbers: RNS, DBNS, Montgomory Prof Chung-Kuan Cheng Lecture 3.
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Written by: Haim Natan Benny Pano Supervisor:
1 Montgomery Multiplication David Harris and Kyle Kelley Harvey Mudd College Claremont, CA {David_Harris,
 Introduction  Requirements for RSA  Ingredients for RSA  RSA Algorithm  RSA Example  Problems on RSA.
Montgomery multiplication Algorithm Mohammad Farmani Under supervision of : Dr. S. Bayat-sarmadi 2 nd. Semister, Sharif University of Technology.
Digital signature using MD5 algorithm Hardware Acceleration
1 AN EFFICIENT METHOD FOR FACTORING RABIN SCHEME SATTAR J ABOUD 1, 2 MAMOUN S. AL RABABAA and MOHAMMAD A AL-FAYOUMI 1 1 Middle East University for Graduate.
CS 312: Algorithm Analysis Lecture #3: Algorithms for Modular Arithmetic, Modular Exponentiation This work is licensed under a Creative Commons Attribution-Share.
1/18 Lattice Boltzmann for Blood Flow: A Software Engineering Approach for a DataFlow SuperComputer Nenad Korolija, Tijana Djukic,
Public Key Encryption and the RSA Public Key Algorithm CSCI 5857: Encoding and Encryption.
Chapter 8 Problems Prof. Sin-Min Lee Department of Mathematics and Computer Science.
Implementing RSA Encryption in Java
MA/CSSE 473 Day 10 Primality testing summary Data Encryption RSA.
Greatest Common Divisor Jordi Cortadella Department of Computer Science.
Some Perspectives on Smart Card Cryptography
1. 2 Table 4.1 Key characteristics of six passenger aircraft: all figures are approximate; some relate to a specific model/configuration of the aircraft.
Acceleration of the SAT Problem
Lecture 11, Advance Digital Design
Elliptic Curve Cryptography
Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported.
Ch1 - Algorithms with numbers Basic arithmetic Basic arithmetic Addition Addition Multiplication Multiplication Division Division Modular arithmetic Modular.
Implementation Issues for Public Key Algorithms
Lecture5 – Introduction to Cryptography 3/ Implementation Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009.
Introduction Gray code is a binary numeral system where two successive values differ in only one bit (binary digit). Today, Gray codes are widely used.
Fermat’s Little Theorem The RSA Cryptosystem will require exponentiation to decrypt messages. Exponentiation Notation Example 1: Compute Exponentials Example.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Implementation of Public Key Encryption Algorithms
Application of Addition Algorithms Joe Cavallaro.
Introduction to Elliptic Curve Cryptography CSCI 5857: Encoding and Encryption.
CSCI-256 Data Structures & Algorithm Analysis Lecture Note: Some slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved. 16.
Breaking Cryptosystems Joshua Langford University of Texas at Tyler Fall 2007 Advisor: Dr. Ramona Ranalli Alger.
Copyright © Zeph Grunschlag, RSA Encryption Zeph Grunschlag.
1 The RSA Algorithm Rocky K. C. Chang February 23, 2007.
Classification of parallel computers Limitations of parallel processing.
Efficient Montgomery Modular Multiplication Algorithm Using Complement and Partition Techniques Speaker: Te-Jen Chang.
Performance Lecture notes from MKP, H. H. Lee and S. Yalamanchili.
Gray Codes.
Fast Truncated Multiplication for Cryptographic Applications
Number Theory (Chapter 7)
IMPLEMENTATION OF MULTIPLE-PRECISION MODULAR MULTIPLICATION ON GPU
Lecture 20 Guest lecturer: Neal Gupta
STUDY AND IMPLEMENTATION
EFFICIENT ADDERS TO SPEEDUP MODULAR MULTIPLICATION FOR CRYPTOGRAPHY
Binary Math Basic operations.
Reading: Study Chapter (including Booth coding)
Montek Singh Mon, Mar 28, 2011 Lecture 11
Cryptography Lecture 16.
RSA Cryptosystem 電機四 B 游志強 2019/8/25.
Presentation transcript:

Implementation of the RSA Algorithm on a Dataflow Architecture Nikola Bežanić nbezanic@gmail.com Advisors: Veljko Milutinović, Jelena Popović-Božović, and Ivan Popović School of Electrical Engineering, University of Belgrade, 2013.

Introduction Case study area: Public key cryptography acceleration Problem: RSA implementation on Maxeler Existing problem solutions: None Summary: Under review (IPSI) Approach: Accelerate multiplications Analyze usability Conclusions: Multiplication speedup: 70% (28% total) Usability: Picture encryption 2/10

RSA ------------------------------------ bits 1 bit Montgomery method: n -> r r=2sw -> power of 2 Montgomery product (MonPro): modulo r arithmetic ------------------------------------ ms-1 . . . m1 m0 bits function ModExp(M, e, n) {n is odd} Step 1. Compute n’. Step 2. Mm := M ∙ r mod n Step 3. xm := 1 ∙ r mod n Step 4. for i = k – 1 down to 0 do Step 5. xm := MonPro(xm, xm) Step 6. if ei = 1 then xm := MonPro(Mm, xm) Step 7. x := MonPro(xm, 1) Step 8. return x ek-1 . . . e1 e0 1 bit 3/10

Montgomery product function MonPro(a, b) Step 1. t := a ∙ b Step 2. m := t ∙ n’ mod r Step 3. u := (t + m ∙ n) / r Step 4. if u ≥ n then return u – n else return u a and b are big numbers Breaking them to digits: bs-1…b1b0 as-1…a1a0 Processing on a word basis for i = 0 to s-1 C := 0 for j = 0 to s-1 (C, S) := t[i + j] + a[j]∙b[i] + C t[i + j] := S t[i + S] := C 4/10

Montgomery product: Step 1 for i = 0 to s-1 C := 0 for j = 0 to s-1 (C, S) := t[i + j] + a[j]∙b[i] + C t[i + j] := S t[i + S] := C a[j] b[i] X CPU Product: hi low 32 bits 5/10

Dataflow multiplier CPU Dataflow engine (DFE) a[j] b[i] X hi low Product: hi low 32 bits Stream a Constant b0 Dataflow engine (DFE) X Stream y Stream x 6/10

Dataflow multiplier: Pipeline problem Next iteration (next constant b1) => new DFE run New DFE run => new pipeline fill-up overhead 1024-bits key requires only 32 digits (32 bits each) Not enough to fill-up the pipeline Result: CPU time < DFE time ! Solution: Work on blocks of data Do not use constants, rather use a stream Stream has redundant values: acts as a const. b0 b1 a0 a2 a1 a3 a2 a3 a0 a1 X Stream y Stream x 7/10

Dataflow multiplier: Blocks of data b0 x a < = > Block 0 Block 1 Big streams for each run Block z-1 8/10

Results Using blocks pipeline is full Using one multiplier speed up is 10% for RSA Speedup is 70% for multiplication using 4 multiplers It leads to 28% for complete RSA (Amdahl’s law) Future work Deal with carry at DFE or Overlap carry propagation at CPU and multiplication at DFE 9/10

The End function ModExp(M, e, n) { n is odd } Step 1. Compute n’. Step 2. Mm := M ∙ r mod n Step 3. xm := 1 ∙ r mod n Step 4. for i = k – 1 down to 0 do Step 5. xm := MonPro(xm, xm) Step 6. if ei = 1 then xm := MonPro(Mm, xm) Step 7. x := MonPro(xm, 1) Step 8. return x The End 10/10