Presentation is loading. Please wait.

Presentation is loading. Please wait.

Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A.

Similar presentations


Presentation on theme: "Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A."— Presentation transcript:

1 Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A

2 motivation Now days, there are many portable storage systems with large memories which contains valuable data (such as disk on key, tablets, etc.) Therefore there is a concrete need for portable cryptography systems which are suitable for such devices. In our project, we will aspire to provide a suitable system which will answer this need.

3 Project Goal main goal: Implementation of data cryptography embedded system using AES algorithm and finding the suitable architecture for portable system.

4 Project Specifications Implementing on a Zync SoPC by Xilinx. Suitable for portable systems (Disk-on-Key, tablets, etc.) - low power system. Transparent system (while storing/loading files) - The cryptography system won’t create traffic bottle necks. Finding the best architecture – according to the requirements above: Profiling AES algorithm. Finding the balance between using the ARM processor and using the FPGA (the hardware accelerator needs more power).

5 AES algorithm Advanced Encryption Standard, also known as “Rijndael”, is a block cipher. The cipher is iterative, quick and comfortable to implement both by software and hardware, and it doesn’t have high memory requirements. Most of the AES calculations are made through 10 rounds. In each state the data block is described as a 2D, 4X4 array of bytes. In each round a “Round Key” is created by the key-expansion process. Each round consists of 4 steps: 1.SubBytes 2.ShiftRows 3.MixColumns 4.AddRoundKey

6 PS UART DDR System Block diagram RS232 PL ZEDBOARD Encrypted data Decrypted data Zynq BRAM

7 PS UART Out Memory In Memory RS232 PL ZEDBOARD Encrypted data Decrypted data Zynq System Block Diagram BRAM

8 Zedboard Block Diagram

9 Tools and development environment PlanAhead- hardware design (VHDL), simulation and synthesis tool. XPS/EDK- configuring the embedded system. SDK - software development kit. Visual Studio ZedBoard - including Zynq SOPC.

10 PS UART RS232 PL ZEDBOARD Encrypted data Decrypted data Zynq AES in software System Block Diagram project part A Implementation of AES algorithm on ARM and code optimization. DDR BRAM

11 Software Engineering Each step is implemented as a separate function. Each function is independent of the other functions. The program can encrypt and decrypt the data.

12 Software Engineering The input data will be entered by the user via PuTTY terminal. The program’s output is the data after encryption and the encrypted data after decryption.

13 Encryption Process

14 Development stages XPS/EDK- Configuring the ARM system:  Creation of the ARM processor interface to the RS-232 UART.  Addition of the Bram and Bram Controller IP and connection to the AXI Interconnect.

15

16 Development stages PlanAhead  Creation of the Top level entity in VHDL code.  Generation of the Bitstream.  Exporting hardware to SDK.

17 Development stages SDK -  Generating the software platform project: Creating Board Support package (BSP). Selection of memory – DDR vs. Bram.  Test in Hardware: Downloading the application to the ARM processor. Running and profiling the application.

18 Profiling Bram vs. DDR Encryption and decryption of 10x16 Bytes 2.754 ms 111.54 ms

19 Software optimization #1 The MixColumns and InvMixColumns functions takes around 65%-70% of the whole process execution time. Improving them will significantly reduce the delay time.

20 Software optimization #1 We will implement the MixColumns function using LUTs instead of arithmetic commands and if/else statements. Should speed up the calculations.

21 MixColumns initial implementation

22 MixColumns improved implementation

23 2.626 ms 88.06 ms Profiling Bram vs. DDR With an improved MixColumns implementation

24 Software optimization #1  Bram : The total execution time decreased from 111.5 msec to 88 msec. Decreasing in 21%.  DDR : The total execution time decreased from 2.754 msec to 2.626 msec. Decreasing in 5%.

25 Software optimization #2 We will implement the MixColumns and the InvMixColumns functions using LUTs and without using for loops. Should speed up even more the calculations.

26 MixColumns optimized implementation

27 InvMixColumns optimized implementation

28 With an optimized MixColumns and MixColumns implementation 1.145 ms 47.427 ms Profiling Bram vs. DDR With an improved MixColumns implementation

29 Software optimization #2  Bram : The total execution time decreased from 111.5 msec to 47.427 msec. Decreasing in 57%.  DDR : The total execution time decreased from 2.754 msec to 1.145 msec. Decreasing in 58%.

30 Hardware optimization The ARM processor clock: At first, we used the default clock rate, which was 160MHz. We will now set the clock rate to 225MHz (the maximum clock rate).

31 Hardware optimization The ARM processor uses LogiCORE IP AXI Interconnect (v1.06.a) with AXI4 protocol. At first we set the AXI’s clock rate to 160MHz. We will now set the clock rate to be 225MHz.

32 With higher clock rate (160MHz  225MHz) 0.819 ms 34.798 ms Profiling Bram vs. DDR

33  Bram : The total execution time decreased from 111.5 msec to 34.8 msec. Decreasing in 69%.  DDR : The total execution time decreased from 2.754 msec to 0.82 msec. Decreasing in 70%. Hardware optimization

34 Optimizations Execution’s time improvement

35 Optimizations Execution’s speed improvement

36 Every optimization that we have made has decreased the total time and improved the speed. The most significant improve was attributed by the 2 nd SW optimization. Both DDR and Bram speeds were eventually increased by 3 times and more.

37 Bram vs. DDR In every optimization : running the application from BRAM was significantly slower then running from DDR. This is due to: – DDR has it own dedicated Bus. – The DDR clock rate is 550 MHZ, when BRAM clock rate is 160 MHZ. – DDR works on both rising and falling edge.

38 Transmission rate The typical maximum data rate in USB is 1.5 MB/s (The typical rates are around 0.5 MB/s.) The encryption rate we were able to achieve at the end is 323 KB/s  1.5 times slower. Conclusion: An hardware accelerator is needed.

39 Project Specifications Implementing on a Zync SoPC by Xilinx. Suitable for portable systems (Disk-on-Key, tablets, etc.) - low power system. Finding the best architecture – according to the requirements above:  Profiling AES algorithm.

40 Demonstration


Download ppt "Final presentation Encryption/Decryption on embedded system Supervisor: Ina Rivkin students: Chen Ponchek Liel Shoshan Winter 2013 Part A."

Similar presentations


Ads by Google