Presentation is loading. Please wait.

Presentation is loading. Please wait.

Running OpenSSL Crypto Algorithms in Simplescalar

Similar presentations


Presentation on theme: "Running OpenSSL Crypto Algorithms in Simplescalar"— Presentation transcript:

1 Running OpenSSL Crypto Algorithms in Simplescalar
Piyush Ranjan Satapathy Department of Computer Science & Engineering University of California Riverside

2 Outline What Crypto Algorithms are ?
Why we need to run them on simplescalar ? Any previous work on this ? Introducing OpenSSL0.9-7e Introducing Simplescalar version2.0 Selecting the crypto Algorithms from OpenSSL Simulation Settings and parameters Results & Discussions An interesting Comparison Demo Conclusion Acknowledgement and References Q&A 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

3 What Crypto Algorithms Are ?
 Algorithms meant for Network Security 1. Authentication 2. Secrecy 3. Nonrepudiation 4. Integrity Control  Kind of Crypto Algorithms to solve the above 1. Public Key Algorithms (Ex:- RSA,DSS,LUC...) 2. Secret key Algorithms (Ex:- AES,DES,RC4,SEAL…) 3. Cryptographic Hash Functions (Ex:- MD5,SHA1…) 4. Random Number Generators (Ex:- PGP, Noiz,SSH…)  Secret Key Algorithms 1. Block Ciphering (Ex:- IDEA, DES, AES, BLOWFISH…) 2. Stream Ciphering (Ex:- RC4,SEAL,A5) Many commonly used ciphers (e.g., IDEA, DES, BLOWFISH) are block ciphers. This means that they take a fixed-size block of data (usually 64 bits), an transform it to another 64 bit block using a function selected by the key. The cipher basically defines a one-to-one mapping from 64-bit integers to another permutation of 64-bit integers. The encryption of any particular plaintext with a block cipher will result in the same ciphertext when the same key is used. With a stream cipher, the transformation of these smaller plaintext units will vary, depending on when they are encountered during the encryption process. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

4 Why run on Simplescalar ?
 Architectural Analysis for Crypto algorithms To achieve a best network processor design we need to know the architectural analysis of crypto algorithms at cycle level accuracy. Simplescalar Easy to Simulate !! Fast, Flexible and Accurate simulation.  Simplescalar provides a cycle level accuracy simulation of MIPS processor Not concerned about Parallel programming Otherwise could have used Simics… 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

5 Previous Work on Architectural Analysis of Crypto Algorithms:
Analysis using widely available Crypto algorithms (I refer “Average” here) by haiyong et. al. Analysis using SPECInt & CommBench Performance of SSL crypto Algorithms (Li Zhao et. al.) But no architectural analysis of OpenSSL crypto algorithms. Now OpenSSL has been the standard bench mark for crypto engines….. So knowing the architectural analysis of these algorithms help understanding the need of modern network processor dealing with cryptography. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

6 Introducing OpenSSL 0.9.7e  Widely used Open source for crypto algorithms ( I have used the recent version)  OpenSSL is a cryptography toolkit It implementing the Secure Sockets Layer (SSL v2/v3) and Transport Layer Security (TLS v1) network protocols and related cryptography standards required by them. The openssl program is a command line tool for using the various cryptography functions of OpenSSL's crypto library from the shell. It can be used for  creation of RSA, DH and DSA key parameters  Creation of X.509 certificates,  CSRs and CRLs o Calculation of Message Digests  Encryption and Decryption with Ciphers  SSL/TLS Client and Server Tests  Handling of S/MIME signed or encrypted mail I have used the library to port the crypto algorithms into Simplescalar. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

7 Introducing SimpleScalar2.0
Compiling: sslittle-na-sstrix-gcc foo.c –o foo Running: sim-outorder foo 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

8 Selecting OpenSSL Crypto Algorithms:
 Private Key  Block Cipher Mode  AES (Key length: 128bits; Block Size: 16bits)  DES (Key length: 128bits; Block Size: 8bits)  3DES (Key length: 168 bits; Block Size:8 bits)  IDEA (Key length: 128 bits; Block Size: 8 bits)  Stream Cipher Mode  RC4 (Length of 128 bits)  Hash Key  MD5 (Block Size: 512 bits; Digest Size: 128 bits)  SHA1 (Block Size: 512 bits; Digest Size: 160 bits) 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

9 Simulation Settings & parameters
-Writing of separate modules for each algorithm by using crypto library. Simulating by gcc simplescalar simulator and running the binary file and giving a file as Input. -Input file length varies from 1byte to 256 KB. -Most readings are taken by running through 1 byte length of Input file. - Changing different parameters in simplescalar in command line and observing the readings. Parameters used: Parameters Values ALU IFQ Size ILP 1,2,4,8 1,2,4,…,32 Changing ALU and IFQ same time Branch prediction type Not taken, taken, 2lev, bimodal, combinational -Cache size (L1I & L1D) -Line size -Sets -Replacement policy 4,8,…256 KB 8,16,…64 Bytes 1,2,4,8,16 L, r , f -Unified Cache Size (UL2) -Replacement Policy 4,8,…2048 KB L, f, r 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

10 Results & Discussions: (1)
1. Instruction Set Characteristics: - Comparison with Average, SPECint & Commbench - “Average” represents Li’s work - SSLcrypto represents the average over all the OpenSSL algorithms I considered. Obvservation:- * SSLCrypto algorithms has significant amount of memory reference (~40%) * Intensive Arithmetic Computation but less than Average 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

11 Results & Discussions: (2,3)
2.Comaprisons of Instruction Mix: -Plotted all the block, stream and hash ciphers for the instruction mix Observation: - DES, 3DES have high memory reference -IDEA has a significant branch predictions 3. Cycle per Bytes of Computation -3DES takes more cycle as it has to manipulate data 3 times with 3 diff keys. - Block ciphers require more cycles than Stream and hash ciphers. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

12 Results & Discussions (4,5)
4. IPC Vs ALU: - I26%, 37%, and 40% for Block, stream and hash kind of algorithms respectively when the number of ALUs increases from 1 to 2 - 6%, 10%, and 5% when the number of ALUs increases from 2 to 4 -with more than 4 ALUs, the number of instructions executed in one cycle increases only less than 1%. 5. IPC Vs IFQ Size: -26%, 37%, and 40% for block, stream and hash kind of algorithms respectively after the size of the instruction fetch queue changes from 1 to 2 - 6%, 10% and 5% if the IFQ changes from 2 to4 - After that it changes within 2% 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

13 Results & Discussions: (6)
6. IPC Vs ILP: - ILP 4 means 4 ALU and 4 IFQ (Both Changes) - ILP of 4 is enough for getting the best Instruction per cycle value. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

14 Results & Discussions: (7)
7. Branch prediction Hit Rate: - Bimodal & Combinational kinds of prediction give a better hit rate - Also 2lev kind of prediction gives almost better hit rate. -Simple taken or not taken doesn’t do well.. -So need to consider the complex branch predictions. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

15 Results & Discussions: (8,9)
8. L1 Instruction Cache Size behaviors: - Cache Size changed keeping fixed 64 bytes of lines size , 4way set and l replacement - We can observe that 128KB is enough to reach the best performance level. 9. L1Instruction Cache Line Size : -Cache line size changed keeping fixed 256 cache size and 4 way set and l replacement - we can observe that 32 bytes of line size is enough to reach the lowest possible miss rate. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

16 Results & Discussions: (10,11)
10. L1 Instruction cache Set behaviors: - Set Associativity changed keeping fixed 256KB cache size, 32 bytes of line size and l kind of replacement policy. - We can observe that 2 way set associativity is enough to reach a miss rate lower than 5%. 11. L1 Instruction Cache Replacement Policy Behaviors: - Replacement policy changes keeping fixed 256KB cache size, 32 bytes of line size and 4 way set.. - We can observe that LRU & FIFO give same performance . We can choose either one. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

17 Results & Discussions:(12,13)
12. L1 Data Cache Behaviors: - Cache Size changed keeping fixed 64 bytes of lines size , 1way set and l replacement - We can observe that 32KB is enough to reach the best performance level. 13. L1 Data Cache Line Size : -Cache line size changed keeping fixed 256 cache size and 1 way set and l replacement - we can observe that 32 bytes of line size is enough to reach the lowest possible miss rate. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

18 Results & Discussions: (14,15)
14. L1 Data cache Set behaviors: - Set Associativity changed keeping fixed 256KB cache size, 32 bytes of line size and l kind of replacement policy. - We can observe that 2 way set associativity is enough for block and for stream but 4 way is enough for Hash ciphers. 15. L1 Instruction Cache Replacement Policy Behaviors: - Replacement policy changes keeping fixed 256KB cache size, 32 bytes of line size and 4 way set.. - We can observe that LRU & FIFO give same performance . We can choose either one. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

19 Results & Discussions: (16,17)
16. L1 Data Cache Behaviors: - Cache Size changed keeping fixed 64 bytes of lines size , 1way set and l replacement - We can observe that 512KB is enough to reach the best performance level. 17. L1 Instruction Cache Replacement Policy Behaviors: - Replacement policy changes keeping fixed 512KB cache size, 64 bytes of line size and 4 way set.. - We can observe that LRU & FIFO give same performance . We can choose either one. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

20 An Interesting Comparison:
Observation: Li’s Analysis (Widely available cryptoAlgo) My Analysis (OpenSSL Crypto Algorithms) Instruction Mix: 23% Memory Reference 60% Arithmetic computations 40-45 % Memory Reference 68% Arithmetic Reference Cycles per Byte of Computation Block: Stream: 20 Hash: 18 Block: Stream: 55 Hash: 30 ALU Vs IPC IFQ Vs IPC ILP Vs IPC Best when 4 ALUs Best when IFQ is 4 Best when ILP is 4 Best When IFQ is 4 Best when ILP is 8 Branch prediction technique Simple technique (taken or not taken) Complex technique (Bimodal or Combinational) L1 Instruction cache parameters 16KB cache size, 8 bytes of line size, 4 way set, l replacement 128KB Cache size, 32 bytes line size, 2 way sets, l replacement L1 Data Cache parameters 32KB cache, 8bytes of line size, 2 way sets, l replacement 32KB cache Size, 64 bytes line size, 2 way set, l replacement UL2 Unified cache parameters 64 KB cache Size, l kind of replacement policy 512 KB cache size, l kind of replacement policy 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

21 Demo Time ………… 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

22 Conclusion: For crypto Engines using OpenSSL crypto algorithms should have * 128KB L1 Inst cache size * 32KB L1 Data cache Size * 512KB UL2 cache Size * 2 way set associativity * l replacement policy * ILP of 8 * Advanced branch prediction schemes For a better performance architecture wise….!!! 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

23 Acknowledgement & References:
A Big Thanks to Li Zhao References:  SimpleScalr Tool Set  OpenSSL Architectural Analysis of Cryptographic applications for Network processors by Haiyong Xie et. al.  Anatomy and Performance of SSL processing by Li Zhao, Ravi Iyer, Srihari Maikeneni, Laxmi Bhuyan. 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside

24 Q&A ???? 11/12/2018 CS213: "Parallel processing Architecture" By Dr Laxmi Narayan Bhuyan (Winter 2005) University of California Riverside


Download ppt "Running OpenSSL Crypto Algorithms in Simplescalar"

Similar presentations


Ads by Google