Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.

Slides:



Advertisements
Similar presentations
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Advertisements

Parallell Processing Systems1 Chapter 4 Vector Processors.
Minimizing Clock Skew in FPGAs
Congestion Driven Placement for VLSI Standard Cell Design Shawki Areibi and Zhen Yang School of Engineering, University of Guelph, Ontario, Canada December.
The Design Process Outline Goal Reading Design Domain Design Flow
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
Evolution of implementation technologies
Data Partitioning for Reconfigurable Architectures with Distributed Block RAM Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Chapter 1 and 2 Computer System and Operating System Overview
February 4, 2002 John Wawrzynek
1 Chapter 7 Design Implementation. 2 Overview 3 Main Steps of an FPGA Design ’ s Implementation Design architecture Defining the structure, interface.
Introduction to FPGA’s FPGA (Field Programmable Gate Array) –ASIC chips provide the highest performance, but can only perform the function they were designed.
Field Programmable Gate Array (FPGA) Layout An FPGA consists of a large array of Configurable Logic Blocks (CLBs) - typically 1,000 to 8,000 CLBs per chip.
Delevopment Tools Beyond HDL
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Digital signature using MD5 algorithm Hardware Acceleration
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Section I Introduction to Xilinx
April 15, Synthesis of Signal Processing on FPGA Hongtao
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
Allen Michalski CSE Department – Reconfigurable Computing Lab University of South Carolina Microprocessors with FPGAs: Implementation and Workload Partitioning.
FPGA IRRADIATION and TESTING PLANS (Update) Ray Mountain, Marina Artuso, Bin Gui Syracuse University OUTLINE: 1.Core 2.Peripheral 3.Testing Procedures.
CAD for Physical Design of VLSI Circuits
Automated Design of Custom Architecture Tulika Mitra
1 HandleC ) prepared by: Mitra Khorram Abadi professor: Dr. Maziar Goudarzi A language based on ISO-C, extended for hardware design ( HandleC ) prepared.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
1 Moore’s Law in Microprocessors Pentium® proc P Year Transistors.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
COE 405 Design and Modeling of Digital Systems
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Fuzzy Genetic Algorithm
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
An FPGA Implementation of the Ewald Direct Space and Lennard-Jones Compute Engines By: David Chui Supervisor: Professor P. Chow.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Introductory project. Development systems Design Entry –Foundation ISE –Third party tools Mentor Graphics: FPGA Advantage Celoxica: DK Design Suite Design.
1 - CPRE 583 (Reconfigurable Computing): VHDL to FPGA: A Tool Flow Overview Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 5: 9/7/2011.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Hardware Accelerator for Combinatorial Optimization Fujian Li Advisor: Dr. Areibi.
French 207 MAPLD 2005 Slide 1 Integrated Tool Suite for Post Synthesis FPGA Power Consumption Analysis Matthew French, Li Wang University of Southern California,
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
FPGA CAD 10-MAR-2003.
A Design Flow for Optimal Circuit Design Using Resource and Timing Estimation Farnaz Gharibian and Kenneth B. Kent {f.gharibian, unb.ca Faculty.
A New Class of High Performance FFTs Dr. J. Greg Nash Centar ( High Performance Embedded Computing (HPEC) Workshop.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Programmable Logic Devices
Programmable Hardware: Hardware or Software?
Introduction to Programmable Logic
Reconfigurable Computing
A High Performance SoC: PkunityTM
VHDL Introduction.
Presentation transcript:

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

Topic Overview  Introduction  Background –Circuit Partitioning (CP) –Handel-C vs. VHDL –Memetic Algorithm  Research Challenges  Hardware Approach  Current Status and Future Work

Introduction  Today's technology allows for billions of transistors to be implemented into a single circuit  As these transistors become smaller, the interconnect delay is the limiting factor in computer execution speeds  These factors place an increasing importance on CAD tools to minimizing this interconnect length  As FPGAs become larger and faster, new methods for improving algorithm performance become available 2.0 µ1.5 µ1.0 µ0.8 µ0.5 µ0.35 µ Delay (ns) Minimum Feature Size Typical Gate Delay Interconnect Delay

Circuit Partitioning  Method of splitting complex designs into smaller subsystems  Attempts to minimize the connection between subsystems  The objective is to maximize the number of uncut nets –The longer the interconnects between modules, the longer the delay within the circuit M0 M2 M4 M3 M1 M5 Net 5 Net 1 Net 2 Net 3 Net 4

Development Tools Celoxica DK Design Suite  High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware  Allows software engineers to design hardware without retraining  Can generate VHDL code or a EDIF file  Support for many Actel, Altera and Xilinx devices  Uses second-party Placement and Routing programs to generate bit files Handel C Source Files Compile Generate EDIF (netlist) Generate VHDL/Verilog Simulate & netlist Place & Route Tools Generation BitStream Design Flow

Similarities of Handel-C & ISO C  Similarities –#define, #ifdef, etc. –Casting different Variable types –Function Declarations are the same –Registers stored as variables (eg. int, unsigned, etc) –for, while and do loops  Differences –No float, double in Handel-C –Variables in Handel-C are of undefined widths –No Recursive Function Calls –Incline functions generate totally new hardware –No malloc, free (Hardware cannot make dynamic memory –Data can be read in for simulation only –Parallelism exists

Memory is access as a array Type of memory is easily distinguishable Memory of Handel-C Memory Access Advantage Memory Data is access within 1 Clock No specific timing requiredNo specific timing required Block RamBlock Ram External RamExternal Ram Logic RamLogic Ram Memory Access Disadvantage MemoryData[1024] = WriteData;MemoryData[1024] = WriteData; Allows Multi-Dimensional Memory AccessAllows Multi-Dimensional Memory Access Divides operating clock frequency by 4 External Clock Handel-C Clock Write Enable Data

Parallel Execution In Handel-C Parallel Execution par{ } Command Clock 1 Clock 2 Clock 3 Clock 4 Wait Waiting for right execution to finish Channel Communication Allows parallel component to talk to each other Channel

Memetic Algorithm A genetic/evolutionary algorithm which includes a non-genetic local search to improve solutiongenetic/evolutionary algorithmlocal search  Genetic Algorithm –Population based heuristic technique based on the biological reproductive system –Operates on the theory of “survival of the fittest” –Good at exploring the solution space  Local Search –Iterative improvement algorithms –Often get trapped in sub- optimum solutions –Good at exploiting the solution space –Success is dependent on good starting solutions

Not Global Minimum Genetic Algorithm Local Search

Research Challenges  Memetic Algorithms –Increase computational performance of Algorithm (CPU Time) –Exploit the inherent parallel nature of Genetic Algorithms  Hardware Development Languages –Determine the impact of High level Languages vs Low level Languages

Approach  Explore the most efficient design to implement memetic algorithms on single FPGA chip  Achieve increased performance through pipelining and parallelization –Divide the tasks into separate but concurrent components FPGA Chip Different Tasks of algorithm

Genetic Algorithm in Hardware Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module (Pipelined Approch) Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Offspring 1 Offspring 2 Offspring 3

Local Search Algorithm M0 M2 M1 M5 M4 M3 Net 4 Net 5 Net 1 Net 2 Net Block 1Block 0 0 Objective Value = (Uncut Nets) 23 Module Data Block 1 Block (forcing specific nets within one block)

Sequential issues Select Next Move Copy Solution Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Block Ram Update Net Info

Preliminary Results of GA Software Results (Sun Blade 1000 ) BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat Quality Hardware Results 59MHz / 4 ) BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat Speedup 290% 370% 342% 369% 266% 247% 264% 253% -16.8% -32.8% -25.5% -27.3% 0.8% -8.8% 1.1% 8.4%

Handel-C vs VHDL For Local Search Designs 42,19242,898 Total equivalent gate Handel-C VHDL Prototype Handel-C 1/4 (25%) 3,349/24,576 (13%) 2,193/24,576 (8%) 2,204/12,288 (17%) ns ns ns Number of GCLKs Number of 4 input LUTs Number of Slice Registers Number of Slices Usage Summary Average Delay on the 10 Worst Nets Maximum Delay Average Connection Delay for this design Speed 2/4 (50%) 3,333/24,576 (13%) 1,709/24,576 (6%) 2,573/12,288 (20%) ns ns ns (xcv1000-4bg560)

Current Status and Future Work  Current Status –Completed VHDL Local Search Prototype  Verified through simulation –Completed Handel-C Local Search Design  Verified and implemented on RC1000 –Completed Handel-C Genetic Algorithm Design  Currently in testing stages  Future Work –Complete VHDL Local Search Design and Implementation –Analyze the performance difference between Hardware based Memetic algorithm and Software algorithm

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa