Download presentation
Presentation is loading. Please wait.
Published byJonas Bell Modified over 9 years ago
1
SHA-3 Candidate Evaluation 1
2
FPGA Benchmarking - Phase 1 2 14 Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design methodology (the same function implemented independently by 2-3 students) Uniform Input/Output Interface Uniform Generic Testbench Optimization for maximum throughput to cost ratio Benchmarking on multiple FPGA platforms from Xilinx and Altera using ATHENa Comparing vs. optimized implementations of SHA-1 & SHA-2 Compressing all results into one single ranking
3
Division into Datapath and Controller Datapath (Execution Unit) Controller (Control Unit) Data Inputs Data Outputs Control & Status Inputs Control & Status Outputs Control Signals Status Signals 3
4
Specification Execution UnitControl Unit Block diagram Algorithmic State Machine VHDL code Interface Design Methodology 4
5
Steps of the Design Process (1) Given 1.Specification 2.Interface Completed 3.Pseudocode 4.Detailed block diagram of the Datapath 5.Interface with the division into the Datapath and the Controller 6.Timing and area analysis, architectural-level optimizations 7.RTL VHDL code of the Datapath, and corresponding Testbenches 5
6
Steps of the Design Process (2) Remained to be done 8.ASM chart of the Controller 9.RTL VHDL Code the Controller and the corresponding testbench 10.Integration of the Datapath and the Controller 11.Testing using uniform generic testbench (developed by Ice) 12.Source Code Optimizations 13.Performance characterization using ATHENa 14.Documentation and final report 6
7
FPGA Benchmarking - Phase 2 7 extending source codes to cover all hash function variants padding in hardware applying additional architectural optimizations extended benchmarking (Actel FPGAs, multiple tools, adaptive optimization strategies, etc.) reconciling differences with other available rankings preparing the codes for ASIC evaluation
8
How to compress all results into a single ranking? 8
9
Single Ranking (1) 9 Select several representative FPGA platforms with significantly different properties e.g., different vendor – Xilinx vs. Altera process - 90 nm vs. 65 nm LUT size - 4-input vs. 6-input optimization - low-cost vs. high-performance Use ATHENa to characterize all SHA-3 candidates and SHA-2 using these platforms in terms of the target performance metrics (e.g. throughput/area ratio)
10
Single Ranking (2) 10 Calculate ratio SHA-3 candidate performance vs. SHA-2 performance (for the same security level) Calculate geometrical average over multiple platforms
11
FPGA and ASIC Performance Measures 11
12
The common ground is vague Hardware Performance: cycles per block, cycles per byte, Latency (cycles), Latency (ns), Throughput for long messages, Throughput for short messages, Throughput at 100 KHz, Clock Frequency, Clock Period, Critical Path Delay, Modexp/s, PointMul/s Hardware Cost: Slices, Slices Occupied, LUTs, 4-input LUTs, 6-input LUTs, FFs, Gate Equivalent GE, Size on ASIC, DSP Blocks, BRAMS, Number of Cores, CLB, MUL, XOR, NOT, AND Hardware efficiency: Hardware performance/Hardware cost 12
13
13 Our Favorite Hardware Performance Metrics: Mbit/s for Throughput ns for Latency Allows for easy cross-comparison among implementations in software (microprocessors), FPGAs (various vendors), ASICs (various libraries)
14
14 But how to define and measure throughput and latency for hash functions? Time to hash N blocks of message = Htime(N, T CLK ) = Initialization Time(T CLK ) + N * Block Processing Time(T CLK ) + Finalization Time(T CLK ) Latency = Time to hash ONE block of message = Htime(1, T CLK ) = = Initialization Time + Block Processing Time + Finalization Time Throughput (for long messages) = Htime(N+1, T CLK ) - Htime(N, T CLK ) Block size = Block Processing Time (T CLK )
15
But how to define and measure throughput and latency for hash functions? Initialization Time(T CLK ) = cycles I ⋅ T CLK Block Processing Time(T CLK ) = cycles P ⋅ T CLK Finalization Time(T CLK ) = cycles F ⋅ T CLK Block size from specification from analysis of block diagram and/or functional simulation from place & route report (or experiment) 15
16
How to compare hardware speed vs. software speed? EBASH reports (http://bench.cr.yp.to/results-hash.html) In graphs Time(n) = Time in clock cycles vs. message size in bytes for n-byte messages, with n=0,1, 2, 3, … 2048, 4096 In tables Performance in cycles/byte for n=8, 64, 576, 1536, 4096, long msg Time(4096) – Time(2048) 2048 Performance for long message = 16
17
How to compare hardware speed vs. software speed? Throughput [Gbit/s] = Performance for long message [cycles/byte] 8 bits/byte ⋅ clock frequency [GHz] 17
18
18 How to measure hardware cost in FPGAs? 1. Stand-alone cryptographic core on FPGA 2. Part of an FPGA System On-Chip 3. FPGA prototype of an ASIC implementation Cost of a smallest FPGA that can fit the core. Unit: USD [FPGA vendors would need to publish MSRP (manufacturer’s suggested retail price) of their chips] – not very likely or size of the chip in mm 2 - easy to obtain Vector:(CLB slices, BRAMs, MULs, DSP units) for Xilinx (LEs, memory bits, PLLs, MULs, DSP units) for Altera Force the implementation using only reconfigurable logic (no DSPs or multipliers, distributed memory vs. BRAM): Use CLB slices as a metric. [LEs for Altera]
19
How to measure hardware cost in ASICs? 1. Stand-alone cryptographic core 2. Part of an ASIC System On-Chip Cost = f(die area, pin count) Tables/formulas available from semiconductor foundries Cost ~ circuit area Units: μm 2 or GE (gate equivalent) = size of a NAND2 cell 19
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.