George Mason University Modern FPGA Devices ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 11.

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
Internal Logic Analyzer Final presentation-part B
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Fast FPGA Resource Estimation Paul Schumacher & Pradip Jha Xilinx, Inc.
Comprehensive environment for benchmarking using FPGAs: ATHENa - Automated Tool for Hardware EvaluatioN 1.
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
1 Performed By: Khaskin Luba Einhorn Raziel Einhorn Raziel Instructor: Rivkin Ina Spring 2004 Spring 2004 Virtex II-Pro Dynamical Test Application Part.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Configurable System-on-Chip: Xilinx EDK
Programmable logic and FPGA
ECE Department: University of Massachusetts, Amherst Lab 1: Introduction to NIOS II Hardware Development.
George Mason University ECE 448 – FPGA and ASIC Design with VHDL Overview of Modern FPGAs ECE 448 Lecture 14.
ECE 699: Lecture 2 ZYNQ Design Flow.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Basic Adders and Counters Implementation of Adders in FPGAs ECE 645: Lecture 3.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
© 2011 Xilinx, Inc. All Rights Reserved This material exempt per Department of Commerce license exception TSU Xilinx Tool Flow.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
George Mason University FPGA Memories ECE 448 Lecture 13.
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
ASIC/FPGA design flow. FPGA Design Flow Detailed (RTL) Design Detailed (RTL) Design Ideas (Specifications) Design Ideas (Specifications) Device Programming.
Electronics in High Energy Physics Introduction to Electronics in HEP Field Programmable Gate Arrays Part 1 based on the lecture of S.Haas.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
© 2003 Xilinx, Inc. All Rights Reserved For Academic Use Only Xilinx Design Flow FPGA Design Flow Workshop.
SHA-3 Candidate Evaluation 1. FPGA Benchmarking - Phase Round-2 SHA-3 Candidates implemented by 33 graduate students following the same design.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
VHDL Project Specification Naser Mohammadzadeh. Schedule  due date: Tir 18 th 2.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
1ECE 545 – Introduction to VHDL Project Deliverables.
George Mason University ATHENa - Automated Tool for Hardware EvaluatioN Modern FPGA Families ECE 545 Lecture 12.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
ECE 545 Project 2 Specification. Schedule of Projects (1) Project 1 RTL design for FPGAs (20 points) Due date: Tuesday, November 22, midnight (firm) Checkpoints:
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU FPGA Design with Xilinx ISE Presenter: Shu-yen Lin Advisor: Prof. An-Yeu Wu 2005/6/6.
ECE 545 Project 2 Specification. Project 2 (15 points) – due Tuesday, December 19, noon Application: cryptography OR digital signal processing optimized.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Timing and Constraints “The software is the lens through which the user views the FPGA.” -Bill Carter.
A Physical Resource Management Approach to Minimizing FPGA Partial Reconfiguration Overhead Heng Tan and Ronald F. DeMara University of Central Florida.
Introduction to FPGA Tools
Tools - Design Manager - Chapter 6 slide 1 Version 1.5 FPGA Tools Training Class Design Manager.
Tools - LogiBLOX - Chapter 5 slide 1 FPGA Tools Course The LogiBLOX GUI and the Core Generator LogiBLOX L BX.
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
& FPGA Embedded Resources
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Teaching Digital Logic courses with Altera Technology
Survey of Reconfigurable Logic Technologies
George Mason University ECE 448 – FPGA and ASIC Design with VHDL FPGA Devices ECE 448 Lecture 5.
George Mason University FPGA Memories ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 10.
Delivered by.. Love Jain p08ec907. Design Styles  Full-custom  Cell-based  Gate array  Programmable logic Field programmable gate array (FPGA)
George Mason University ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 12.
A Brief Introduction to FPGAs
ATHENa - Automated Tool for Hardware EvaluatioN
Introduction to Programmable Logic
Head-to-Head Xilinx Virtex-II Pro Altera Stratix 1.5v 130nm copper
ECE 4110–5110 Digital System Design
Project Deliverables ECE 545 – Introduction to VHDL.
Basic Adders and Counters Implementation of Adders
THE ECE 554 XILINX DESIGN PROCESS
THE ECE 554 XILINX DESIGN PROCESS
Presentation transcript:

George Mason University Modern FPGA Devices ATHENa - Automated Tool for Hardware EvaluatioN ECE 545 Lecture 11

2 Required Reading Xilinx, Inc. Virtex-5 FPGA Family Virtex-5 FPGA User Guide Chapter 5: Configurable Logic Blocks (CLBs)

3 Required Reading Altera, Inc. Stratix III FPGA Family Stratix III Device Handbook 1. Stratix III Device Family Overview 2. Logic Array Blocks and Adaptive Logic Modules in Stratix III Devices

TechnologyLow-costHigh- performance 120/150 nmVirtex 2, 2 Pro 90 nmSpartan 3Virtex 4 65 nmVirtex 5 45 nmSpartan 6 40 nmVirtex 6 Xilinx FPGA Devices

Altera FPGA Devices TechnologyLow-costMid-rangeHigh- performanc e 130 nmCycloneStratix 90 nmCyclone IIStratix II 65 nmCyclone IIIArria IStratix III 40 nmCyclone IVArria IIStratix IV

ECE 448 – FPGA and ASIC Design with VHDL High-Performance Xilinx FPGAs

Virtex 5 Arrangement of Slices within the CLB

Row and Column Relationship between CLBs and Slices

Major Differences between Xilinx Families Number of CLB slices per CLB Number of LUTs per CLB slice Look-Up Tables Spartan 3 Virtex 4 Virtex 5, Virtex 6, Spartan 6 4-input6-input

Distributed RAM Configurations

64 x 1 Single Port

64 x 1 Dual Port

64 x 1 Quad Port

64 x 3 Simple Dual Port

ROM Configurations

32-bit Shift Register, SRL

32-bit Shift Register

Dual 16-bit Shift Register

64-bit Shift Register

96-bit Shift Register

Fast Carry Logic Path

Major Differences between Xilinx Families Maximum Shift Register Size per LUT Maximum Single-Port Memory Size per LUT Number of adder stages per CLB slice Spartan 3 Virtex 4 Virtex 5, Virtex 6, Spartan 6 16 x 164 x 1 16 bits 2 32 bits 4

ECE 448 – FPGA and ASIC Design with VHDL Low-cost Altera FPGAs

Altera Cyclone III Logic Element (LE) – Normal Mode

Altera Cyclone III Logic Element (LE) – Arithmetic Mode

ECE 448 – FPGA and ASIC Design with VHDL High-Performance Altera FPGAs

Stratix III Logic Array Blocks (LABs)

High-Level Block Diagram of the Stratix III ALM

Altera Stratix III Adaptive Logic Modules (ALM) – Normal Mode

4 × 2 Crossbar Switch Example

Register Packing

Template for Seven-Input Functions Supported in Extended LUT Mode

Altera Stratix III, Stratix IV Adaptive Logic Modules (ALM) – Arithmetic Mode

R = (X < Y) ? Y : X Performing Operation

Three Operand Addition Utilizing Shared Arithmetic Mode

LUT-Register Mode

Register Chain

Example of Resource Utilization Report (1) ; Fitter Resource Usage Summary ; ; Resource ; Usage ; ; ALUTs Used ; 415 / 38,000 ( 1 % ) ; ; -- Combinational ALUTs ; 415 / 38,000 ( 1 % ) ; ; -- Memory ALUTs ; 0 / 19,000 ( 0 % ) ; ; -- LUT_REGs ; 0 / 38,000 ( 0 % ) ; ; Dedicated logic registers ; 136 / 38,000 ( < 1 % ) ; ; ; ; ; Combinational ALUT usage by number of inputs ; ; ; -- 7 input functions ; 0 ; ; -- 6 input functions ; 287 ; ; -- 5 input functions ; 0 ; ; -- 4 input functions ; 24 ; ; -- <=3 input functions ; 104 ; ; ; ; ; Combinational ALUTs by mode ; ; ; -- normal mode ; 335 ; ; -- extended LUT mode ; 0 ; ; -- arithmetic mode ; 80 ; ; -- shared arithmetic mode ; 0 ;

Example of Resource Utilization Report (2) ; Logic utilization ; 701 / 38,000 ( 2 % ) ; ; -- Difficulty Clustering Design ; Low ; ; -- Combinational ALUT/register pairs used in final Placement ; 476 ; ; -- Combinational with no register ; 340 ; ; -- Register only ; 61 ; ; -- Combinational with a register ; 75 ; ; -- Estimated pairs recoverable by pairing ALUTs and registers as design grows ; -54 ; ; -- Estimated Combinational ALUT/register pairs unavailable ; 279 ; ; -- Unavailable due to Memory LAB use ; 0 ; ; -- Unavailable due to unpartnered 7 LUTs ; 0 ; ; -- Unavailable due to unpartnered 6 LUTs ; 279 ; ; -- Unavailable due to unpartnered 5 LUTs ; 0 ; ; -- Unavailable due to LAB-wide signal conflicts ; 0 ; ; -- Unavailable due to LAB input limits ; 0 ;

Example of Resource Utilization Report (3) ; Total registers* ; 136 ; ; -- Dedicated logic registers ; 136 / 38,000 ( < 1 % ) ; ; -- I/O registers ; 0 / 2,752 ( 0 % ) ; ; -- LUT_REGs ; 0 ; ; ALMs: partially or completely used ; 360 / 19,000 ( 2 % ) ; ; Total LABs: partially or completely used ; 42 / 1,900 ( 2 % ) ; ; -- Logic LABs ; 42 / 42 ( 100 % ) ; ; -- Memory LABs ; 0 / 42 ( 0 % ) ; ; ; ; ; User inserted logic elements ; 0 ; ; Virtual pins ; 0 ; ; I/O pins ; 20 / 488 ( 4 % ) ; ; -- Clock pins ; 5 / 16 ( 31 % ) ; ; -- Dedicated input pins ; 0 / 12 ( 0 % ) ; ; Global signals ; 2 ; ; M9K blocks ; 0 / 108 ( 0 % ) ; ; M144K blocks ; 0 / 6 ( 0 % ) ; ; Total MLAB memory bits ; 0 ; ; Total block memory bits ; 0 / 1,880,064 ( 0 % ) ; ; Total block memory implementation bits ; 0 / 1,880,064 ( 0 % ) ; ; DSP block 18-bit elements ; 0 / 216 ( 0 % ) ; ; PLLs ; 0 / 4 ( 0 % ) ; ; Global clocks ; 2 / 16 ( 13 % ) ;

George Mason University ATHENa

42 Resources ATHENa website

43 ATHENa – Automated Tool for Hardware EvaluatioN Supported in part by the National Institute of Standards & Technology (NIST)

ATHENa Team Venkata “Vinny” MS CpE student Ekawat “Ice” PhD CpE student Marcin PhD ECE student Rajesh PhD ECE student Michal PhD exchange student from Slovakia John MS CpE student

ATHENa – A utomated T ool for H ardware E valuatio N 45 Benchmarking open-source tool, written in Perl, aimed at an AUTOMATED generation of OPTIMIZED results for MULTIPLE hardware platforms Currently under development at George Mason University.

Why Athena? 46 "The Greek goddess Athena was frequently called upon to settle disputes between the gods or various mortals. Athena Goddess of Wisdom was known for her superb logic and intellect. Her decisions were usually well-considered, highly ethical, and seldom motivated by self-interest.” from "Athena, Greek Goddess of Wisdom and Craftsmanship"

ATHENa Server FPGA Synthesis and Implementation Result Summary + Database Entries 2 3 HDL + scripts + configuration files 1 Database Entries Download scripts and configuration files8 Designer 4 HDL + FPGA Tools User Database query Ranking of designs 5 6 Basic Dataflow of ATHENa 0 Interfaces + Testbenches 47

48 synthesizable source files configuration files testbench constraint files result summary (user-friendly) result summary (user-friendly) database entries (machine- friendly) database entries (machine- friendly)

ATHENa Major Features (1) synthesis, implementation, and timing analysis in batch mode support for devices and tools of multiple FPGA vendors: generation of results for multiple families of FPGAs of a given vendor automated choice of a best-matching device within a given family 49

ATHENa Major Features (2) automated verification of designs through simulation in batch mode support for multi-core processing automated extraction and tabulation of results several optimization strategies aimed at finding – optimum options of tools – best target clock frequency – best starting point of placement OR 50

51 batch mode of FPGA tools ease of extraction and tabulation of results Text Reports, Excel, CSV (Comma-Separated Values) optimized choice of tool options GMU_optimization_1 strategy Generation of Results Facilitated by ATHENa vs.

52 Relative Improvement of Results from Using ATHENa Virtex 5, 256-bit Variants of Hash Functions Ratios of results obtained using ATHENa suggested options vs. default options of FPGA tools

53 Other (Somewhat) Similar Tools ExploreAhead (part of PlanAhead) Design Space Explorer (DSE) Boldport Flow EDAx10 Cloud Platform

54 Distinguishing Features of ATHENa Support for multiple tools from multiple vendors Optimization strategies aimed at the best possible performance rather than design closure Extraction and presentation of results Seamless integration with the ATHENa database of results

Read the Tutorial! Install the Required Tools (see Tutorial - Part 1 – Tools Installation) Run ATHENa_setup How To Start Working With ATHENa? One-Time Tasks Download and unzip ATHENa

Modify design.config.txt + possibly other configuration files Run ATHENa How To Start Working With ATHENa? Repetitive Tasks Prepare or modify your source files & source_list.txt

design.config.txt Your Design # directory containing synthesizable source files for the project SOURCE_DIR = # A file list containing list of files in the order suitable for synthesis and implementation # low level modules first, top level entity last SOURCE_LIST_FILE = source_list.txt # project name # it will be used in the names of result directories PROJECT_NAME = SHA256 # name of top level entity TOP_LEVEL_ENTITY = sha256 # name of top level architecture TOP_LEVEL_ARCH = rs_arch # name of clock net CLOCK_NET = clk

design.config.txt Timing Formulas #formula for latency LATENCY = TCLK*65 #formula for throughput THROUGHPUT = 512/(TCLK*65)

design.config.txt Application & Optimization Target # OPTIMIZATION_TARGET = speed | area | balanced OPTIMIZATION_TARGET = speed # OPTIONS = default | user OPTIONS = default # APPLICATION = single_run | exhaustive_search | placement_search | frequency_search | # GMU_Optimization_1 | GMU_Xilinx_optimization_1 APPLICATION = single_run # TRIM_MODE = off | zip | delete TRIM_MODE = zip

design.config.txt FPGA Families # commenting the next line removes all families of Xilinx FPGA_VENDOR = xilinx #commenting the next line removes a given family FPGA_FAMILY = spartan3 # FPGA_DEVICES = | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_SYN_FREQ = 120 REQ_IMP_FREQ = 100 MAX_SLICE_UTILIZATION = 0.8 MAX_BRAM_UTILIZATION = 0.8 MAX_MUL_UTILIZATION = 1 MAX_PIN_UTILIZATION = 0.9 END FAMILY END VENDOR

design.config.txt FPGA Families # commenting the next line removes all families of Altera FPGA_VENDOR = altera #commenting the next line removes a given family FPGA_FAMILY = Stratix III # FPGA_DEVICES = | best_match | all FPGA_DEVICES = best_match SYN_CONSTRAINT_FILE = default IMP_CONSTRAINT_FILE = default REQ_IMP_FREQ = 120 MAX_LOGIC_UTILIZATION = 0.8 MAX_MEMORY_UTILIZATION = 0.8 MAX_DSP_UTILIZATION = 0 MAX_MUL_UTILIZATION = 0 MAX_PIN_UTILIZATION = 0.8 END FAMILY END VENDOR

Library Files device_lib/xilinx_device_lib.txt device_lib/altera_device_lib.txt Files created during ATHENa setup Characterize FPGA families and devices available in the version of Xilinx and Altera tools installed on your computer Currently supported tool versions: – Xilinx WebPACK 9.1, 9.2, 10.1, 11.1, 11.5, 12.1, 12.2, 12.3 – Xilinx Design Suite11.1, 12.1, 12.2, 12.3 – Altera Quartus II Web Edition8.1, 8.2, 9.0, 9.1, 10.0 – Altera Quartus II Subscription Edition9.1, 10.0 In case a library for a given version not available yet, use a library from the closest available version

Library Files device_lib/xilinx_device_lib.txt VENDOR = Xilinx #Device, Total Slices, Block RAMs, DSP, Dedicated Multipliers, Maximum User I/O Pins ITEM_ORDER = SLICE, BRAM, DSP, MULT, IO FAMILY = spartan3 xc3s50pq208-5, 768,4, 0, 4, 124 xc3s200ft256-5, 1920, 12, 0, 12, 173 xc3s400fg456-5, 3584, 16, 0, 16, 264 xc3s1000fg676-5, 7680, 24, 0, 24, 391 xc3s1500fg676-5, 13312, 32, 0, 32, 487 END_FAMILY FAMILY = virtex5 xc5vlx30ff676-3, 4800, 32, 32, 0, 400 xc5vfx30tff665-3, 5120, 68, 64, 0, 360 xc5vlx30tff665-3, 4800, 36, 32, 0, 360 xc5vlx50ff1153-3, 7200, 48, 48, 0, 560 xc5vlx50tff1136-3, 7200, 60, 48, 0, 480 END_FAMILY

Result Files report_resource_utilization.txt xilinx : spartan | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | | default | xc3s200ft256-5* | 1 | 142 | 3 | 74 | 3 | 4 | 33 | 7 | 58 | 0 | 0 | 20 | 11 | xilinx : spartan | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | | default | xc6slx9csg324-3* | 1 | 41 | 1 | 22 | 1 | 4 | 6 | 0 | 0 | 9 | 56 | 20 | 10 | xilinx : virtex | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | | default | xc5vlx20tff323-2* | 1 | 101 | 1 | 56 | 1 | 4 | 15 | 0 | 0 | 9 | 37 | 20 | 11 | xilinx : virtex | GENERIC | DEVICE | RUN | LUTs | % | SLICES | % | BRAMs | % | MULTs | % | DSPs | % | IO | % | | default | xc6vlx75tff784-3* | 1 | 44 | 1 | 21 | 1 | 4 | 1 | 0 | 0 | 9 | 3 | 20 | 5 |

Result Files report_timing.txt REQ SYN FREQ- Requested synthesis clk freq.SYN FREQ – Achieved synthesis clk. freq. REQ SYN TCLK- Requested synthesis clk periodSYN TCLK – Achieved synthesis clk. period REQ IMP FREQ- Requested implement. clk freq.IMP FREQ – Achieved implement. clk. freq. REQ IMP TCLK- Requested implement. clk periodIMP TCLK – Achieved implement clk. period LATENCY- Latency [ns]THROUGHPUT – Throughput [Mbits/s] TP/Area - Throughput/Area [(Mbits/s)/CLB slicesLatency*Area – Latency*Area [ns*CLB slices] xilinx : spartan | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | | default | xc3s200ft256-5* | 1 | default | | default | | default | | default | | | | | | xilinx : spartan | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | | default | xc6slx9csg324-3* | 1 | default | | default | | default | | default | | | | | | xilinx : virtex | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | | default | xc5vlx20tff323-2* | 1 | default | | default | | default | | default | | | | | | xilinx : virtex | GENERIC | DEVICE | RUN | REQ SYN FREQ | SYN FREQ | REQ SYN TCLK | SYN TCLK | REQ IMP FREQ | IMP FREQ | REQ IMP TCLK | IMP TCLK | LATENCY | THROUGHPUT | TP/Area | Latency*Area | | default | xc6vlx75tff784-3* | 1 | default | | default | | default | | default | | | | | |

Result Files report_options.txt xilinx : spartan | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | | default | xc3s200ft256-5* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | xilinx : spartan | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | | default | xc6slx9csg324-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | xilinx : virtex | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | | default | xc5vlx20tff323-2* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b -cm speed | -w -ol std | xilinx : virtex | GENERIC | DEVICE | RUN | COST TABLE | Synthesis Options | Map Options | PAR Options | | default | xc6vlx75tff784-3* | 1 | 1 | -opt_level 1 -opt_mode speed | -c 100 -pr b | -w -ol std | COST TABLE - parameter determining the starting point of placement Synthesis Options – options of the synthesis tool Map Options – Options of the mapping tool PAR Options – Options of the place & route tool

Result Files report_execution_time.txt xilinx : spartan | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | | default | xc3s200ft256-5* | 1 | 0d 0h:0m:12s | 0d 0h:0m:36s | 0d 0h:0m:48s | xilinx : spartan | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | | default | xc6slx9csg324-3* | 1 | 0d 0h:0m:21s | 0d 0h:1m:13s | 0d 0h:1m:34s | xilinx : virtex | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | | default | xc5vlx20tff323-2* | 1 | 0d 0h:0m:39s | 0d 0h:1m:50s | 0d 0h:2m:29s | xilinx : virtex | GENERIC | DEVICE | RUN | Synthesis Time | Implementation Time | Elapsed Time | | default | xc6vlx75tff784-3* | 1 | 0d 0h:0m:22s | 0d 0h:3m:22s | 0d 0h:3m:44s | Synthesis Time- Time of Synthesis Implementation Time- Time of Implementation Elapsed Time - Total Time

design.config.txt Functional Simulation (1) # FUNCTIONAL_VERFICATION_MODE = FUNCTIONAL_VERIFICATION_MODE = # directory containing source files of the testbench VERIFICATION_DIR = # A file containing a list of testbench files in the order suitable for compilation; # low level modules first, top level entity last. # Test vector files should be located in the same directory and listed # in the same file, unless fixed path is used. Please refer to tutorial for more detail. VERIFICATION_LIST_FILE = # name of testbench's top level entity TB_TOP_LEVEL_ENTITY = # name of testbench's top level architecture TB_TOP_LEVEL_ARCH =

design.config.txt Functional Simulation (2) # MAX_TIME_FUNCTIONAL_VERIFICATION = #supported unit are : ps, ns, us, and ms #if blank, simulation will run until it finishes = # = no changes in signals, i.e., clock is stopped and no more inputs coming in. MAX_TIME_FUNCTIONAL_VERIFICATION = <> # Perform only verification (synthesis and implementation parameters are ignored) # VERIFICATION_ONLY = VERIFICATION_ONLY =

ATHENa Example including embedded FPGA resources test_circuit :

design.config.txt Global Generics GLOBAL_GENERICS_BEGIN # Number of stages # n is currently set to the default value i.e n=16 # for other values of n, modify the formulas for Latency and Throughput accordingly n = 16 # Memory type: 0 = MEM_DISTRIBUTED, 1= MEM_EMBEDDED mem_type = 0, 1 # Adder type: 0 = ADD_SCCA_BASED (Simple Carry Chain Adder, "+" in VHDL), # 1 = ADD_DSP_BASED # Multiplier type: 0 = MUL_LOGIC_BASED (multiplier based on configurable logic), # 1 = MUL_DSP_BASED # Allowed combinations of adder and multiplier types (adder_type, multiplier_type) = (0, 0), (1, 1) GLOBAL_GENERICS_END

design.config.txt FPGA Family Specific Generics FPGA_FAMILY = Cyclone II GENERICS_BEGIN # FPGA vendor: 0 = XILINX, 1 = ALTERA vendor = 1 # Memory block size: 0 = M512, 1 = M4K, 2 = M9K, 3 = M20K, # 4 = MLAB, 5 = MRAM, 6 = M144K mem_block_size = 1 GENERICS_END FPGA_DEVICES = best_match REQ_IMP_FREQ = 120 MAX_LOGIC_UTILIZATION = 0.8 MAX_MEMORY_UTILIZATION = 0.8 MAX_DSP_UTILIZATION = 1 MAX_MUL_UTILIZATION = 1 MAX_PIN_UTILIZATION = 0.8 END FAMILY

73 ATHENa – Database of Results ATHENa – Database of Results

74 ATHENa Database

75 ATHENa Database – Result View Algorithm parameters Design parameters  Optimization target  Architecture type  Datapath width  I/O bus widths  Availability of source code  Platform  Vendor, Family, Device  Timing  Maximum clock frequency  Maximum throughput  Resource utilization  Logic blocks (Slices/LEs/ALUTs)  Multipliers/DSP units  Tools  Names & versions  Detailed options  Credits  Designers & contact information

76 ATHENa Database – Compare Feature Matching fields in grey Non-matching fields in red and blue

77 Currently in the Database 20 hash functions ( 14 Round 2 SHA Round 3 SHA-3 + SHA-2 ) x 2 variants ( 256-bit output & 512-bit output ) x 11 FPGA families = 440 combinations (440-not_fitting) = 423 optimized results Hash Functions in FPGAs GMU Results for

78 Coming soon! GMU results for Hash Functions in FPGAs  Folded & unrolled architectures  Pipelined architectures  Lightweight architectures  Architectures based on embedded resources Other Groups’ results for Hash Functions in FPGAs Other Groups’ results for Hash Functions in ASICs Modular Arithmetic (basis of public key cryptography) in FPGAs & ASICs

79 Possible Future Customizations The same basic database can be customized and adapted for other domains, such as Digital Signal Processing Bioinformatics Communications Scientific Computing, etc.

80 ATHENa - Website

81 ATHENa Website Download of ATHENa Tool Links to related tools SHA-3 Competition in FPGAs & ASICs Specifications of candidates Interface proposals RTL source codes Testbenches ATHENa database of results Related papers & presentations

82 First batch of GMU Source Codes for all Round 3 SHA-3 Candidates & SHA-2 made available at the ATHENa website at: Included in this release: Basic architectures Folded architectures Unrolled architectures Each code supports two variants: with 256-bit and 512-bit output. Each source code accompanied by comprehensive hierarchical block diagrams GMU Source Codes and Block Diagrams

83 ATHENa Result Replication Files Scripts and configuration files sufficient to easily reproduce all results (without repeating optimizations) Automatically created by ATHENa for all results generated using ATHENa Stored in the ATHENa Database In the same spirit of Reproducible Research as: Patrick Vandewalle 1, Jelena Kovacevic 2, and Martin Vetterli 1 ( 1 EPFL, 2 CMU) Reproducible research in signal processing - what, why, and how. IEEE Signal Processing Magazine, May J. Claerbout (Stanford University) “Electronic documents give reproducible research a new meaning,” in Proc. 62nd Ann. Int. Meeting of the Soc. of Exploration Geophysics, 1992,

84 Benchmarking Goals Facilitated by ATHENa 1.cryptographic algorithms 2.hardware architectures or implementations of the same cryptographic algorithm 3.hardware platforms from the point of view of their suitability for the implementation of a given algorithm, (e.g., choice of an FPGA device or FPGA board) 4.tools and languages in terms of quality of results they generate (e.g. Verilog vs. VHDL, Synplicity Synplify Premier vs. Xilinx XST, ISE v vs. ISE v. 12.3) Comparing multiple: