Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.

Similar presentations


Presentation on theme: "Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa."— Presentation transcript:

1 Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa

2 Topic Overview  Introduction  Background –Circuit Partitioning (CP) –Handel-C vs. VHDL –Memetic Algorithm  Research Challenges  Hardware Approach  Current Status and Future Work

3 Introduction  Today's technology allows for billions of transistors to be implemented into a single circuit  As these transistors become smaller, the interconnect delay is the limiting factor in computer execution speeds  These factors place an increasing importance on CAD tools to minimizing this interconnect length  As FPGAs become larger and faster, new methods for improving algorithm performance become available 2.0 µ1.5 µ1.0 µ0.8 µ0.5 µ0.35 µ 0.1 1.0 10 Delay (ns) Minimum Feature Size Typical Gate Delay Interconnect Delay

4 Circuit Partitioning  Method of splitting complex designs into smaller subsystems  Attempts to minimize the connection between subsystems  The objective is to maximize the number of uncut nets –The longer the interconnects between modules, the longer the delay within the circuit M0 M2 M4 M3 M1 M5 Net 5 Net 1 Net 2 Net 3 Net 4

5 Development Tools Celoxica DK Design Suite  High-level language based on ISO/ANSI-C for the implementation of algorithms in hardware  Allows software engineers to design hardware without retraining  Can generate VHDL code or a EDIF file  Support for many Actel, Altera and Xilinx devices  Uses second-party Placement and Routing programs to generate bit files Handel C Source Files Compile Generate EDIF (netlist) Generate VHDL/Verilog Simulate & netlist Place & Route Tools Generation BitStream Design Flow

6 Similarities of Handel-C & ISO C  Similarities –#define, #ifdef, etc. –Casting different Variable types –Function Declarations are the same –Registers stored as variables (eg. int, unsigned, etc) –for, while and do loops  Differences –No float, double in Handel-C –Variables in Handel-C are of undefined widths –No Recursive Function Calls –Incline functions generate totally new hardware –No malloc, free (Hardware cannot make dynamic memory –Data can be read in for simulation only –Parallelism exists

7 Memory is access as a array Type of memory is easily distinguishable Memory of Handel-C Memory Access Advantage Memory Data is access within 1 Clock No specific timing requiredNo specific timing required Block RamBlock Ram External RamExternal Ram Logic RamLogic Ram Memory Access Disadvantage MemoryData[1024] = WriteData;MemoryData[1024] = WriteData; Allows Multi-Dimensional Memory AccessAllows Multi-Dimensional Memory Access Divides operating clock frequency by 4 External Clock Handel-C Clock Write Enable Data

8 Parallel Execution In Handel-C Parallel Execution par{ } Command Clock 1 Clock 2 Clock 3 Clock 4 Wait Waiting for right execution to finish Channel Communication Allows parallel component to talk to each other Channel

9 Memetic Algorithm A genetic/evolutionary algorithm which includes a non-genetic local search to improve solutiongenetic/evolutionary algorithmlocal search  Genetic Algorithm –Population based heuristic technique based on the biological reproductive system –Operates on the theory of “survival of the fittest” –Good at exploring the solution space  Local Search –Iterative improvement algorithms –Often get trapped in sub- optimum solutions –Good at exploiting the solution space –Success is dependent on good starting solutions

10 Not Global Minimum Genetic Algorithm Local Search

11 Research Challenges  Memetic Algorithms –Increase computational performance of Algorithm (CPU Time) –Exploit the inherent parallel nature of Genetic Algorithms  Hardware Development Languages –Determine the impact of High level Languages vs Low level Languages

12 Approach  Explore the most efficient design to implement memetic algorithms on single FPGA chip  Achieve increased performance through pipelining and parallelization –Divide the tasks into separate but concurrent components FPGA Chip Different Tasks of algorithm

13 Genetic Algorithm in Hardware Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Mutation Module Mutation Module Repair Module Repair Module Fitness Module Replacement Fitness Module Offspring 1 Offspring 2 Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module (Pipelined Approch) Crossover Module Selection Module Replacement Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Fitness Module Crossover Module Selection Module Mutation Module Repair Module Offspring 1 Offspring 2 Offspring 3

14 Local Search Algorithm M0 M2 M1 M5 M4 M3 Net 4 Net 5 Net 1 Net 2 Net 3 012345 011010 Block 1Block 0 0 Objective Value = (Uncut Nets) 23 Module Data 010 010 Block 1 Block 0 12345 00 00 11 0 (forcing specific nets within one block)

15 Sequential issues Select Next Move Copy Solution Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Loop1 Loop2 Loop3 Block Ram Update Net Info

16 Preliminary Results of GA Software Results (Sun Blade 1000 ) 107.6 BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat 833 3014 1952 2271 24 300 224 149 902 3029 1920 2192 32 294 221 147 795.4 2580.6 1713.2 1947.6 25 253.2 186.6 767.2 2504.4 1671.2 1887.8 19.2 241.2 175.4 96.2 786.4 2546.6 1694.6 1919.6 24.7 251.1 184.6 107.4 5.642 14.539 8.252 12.134 1.073 2.703 2.361 2.480 30.6 122.1 73.1 87.9 0.8 8.4 6.6 4.3 Quality Hardware Results (@ 59MHz / 4 ) 116.6 BenchmarkModulesNetsBestWorstMeanStd DevTime prim1.dat prim2.dat struct.dat ind1.dat pcb1.dat chip1.dat chip4.dat fract.dat 833 3014 1952 2271 24 300 224 149 902 3029 1920 2192 32 294 221 147 661.4 1732.0 1275.4 1415.0 25.2 230.8 188.8 645.2 1703.0 1246.8 1390.0 22.0 221.2 182.0 112.0 657.2 1723.8 1266.2 1407.8 25.2 229.8 188.2 116.3 3.775 7.041 6.705 6.138 0.333 1.883 1.316 0.661 10.3 33.0 21.4 23.8 0.3 3.4 2.5 1.7 Speedup 290% 370% 342% 369% 266% 247% 264% 253% -16.8% -32.8% -25.5% -27.3% 0.8% -8.8% 1.1% 8.4%

17 Handel-C vs VHDL For Local Search Designs 42,19242,898 Total equivalent gate Handel-C VHDL Prototype Handel-C 1/4 (25%) 3,349/24,576 (13%) 2,193/24,576 (8%) 2,204/12,288 (17%) 11.612 ns 15.768 ns 2.921 ns Number of GCLKs Number of 4 input LUTs Number of Slice Registers Number of Slices Usage Summary Average Delay on the 10 Worst Nets Maximum Delay Average Connection Delay for this design Speed 2/4 (50%) 3,333/24,576 (13%) 1,709/24,576 (6%) 2,573/12,288 (20%) 11.309 ns 11.979 ns 2.775 ns (xcv1000-4bg560)

18 Current Status and Future Work  Current Status –Completed VHDL Local Search Prototype  Verified through simulation –Completed Handel-C Local Search Design  Verified and implemented on RC1000 –Completed Handel-C Genetic Algorithm Design  Currently in testing stages  Future Work –Complete VHDL Local Search Design and Implementation –Analyze the performance difference between Hardware based Memetic algorithm and Software algorithm

19 Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa


Download ppt "Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa."

Similar presentations


Ads by Google