PipeRoute: A Pipelining-Aware Router for FPGAs Akshay Sharma, Carl Ebeling* and Scott Hauck Electrical Engineering / *Computer Science & Engineering University.

Slides:



Advertisements
Similar presentations
Architecture-Specific Packing for Virtex-5 FPGAs
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Reconfigurable Computing (EN2911X, Fall07) Lecture 04: Programmable Logic Technology (2/3) Prof. Sherief Reda Division of Engineering, Brown University.
School of Engineering & Technology Computer Architecture Pipeline.
Altera FLEX 10K technology in Real Time Application.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
TangP187_MAPLD High-Performance SEE- Hardened Programmable DSP Array Larry McMurchie, Carl Sechen Students: Victor Tang, James Lan, Duncan Lam Dept.
FPGA Latency Optimization Using System-level Transformations and DFG Restructuring Daniel Gomez-Prado, Maciej Ciesielski, and Russell Tessier Department.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
Exploration of Pipelined FPGA Interconnect Structures Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 21 Introduction to Computer Networks.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
General Routing Overview and Channel Routing
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Global Routing. Global routing:  To route all the nets, should consider capacities  Sequential −One net at a time  Concurrent −Order-independent 2.
CAFE router: A Fast Connectivity Aware Multiple Nets Routing Algorithm for Routing Grid with Obstacles Y. Kohira and A. Takahashi School of Computer Science.
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Titan: Large and Complex Benchmarks in Academic CAD
Institute of Applied Microelectronics and Computer Engineering College of Computer Science and Electrical Engineering, University of Rostock Slide 1 Spezielle.
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
Lecture 15: Multi-FPGA System Software I November 1, 2004 ECE 697F Reconfigurable Computing Lecture 15 Mid-term Review.
Julien Lamoureux and Steven J.E Wilton ICCAD
Implementation of Finite Field Inversion
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
A Graph Based Algorithm for Data Path Optimization in Custom Processors J. Trajkovic, M. Reshadi, B. Gorjiara, D. Gajski Center for Embedded Computer Systems.
L11: Lower Power High Level Synthesis(2) 성균관대학교 조 준 동 교수
Intermediate Fabrics: Virtual FPGA Architectures for Circuit Portability and Fast Placement and Routing on FPGAs James Coole PhD student, University of.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Combinational and Sequential Mapping with Priority Cuts Alan Mishchenko Sungmin Cho Satrajit Chatterjee Robert Brayton UC Berkeley.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Implementing and Optimizing a Direct Digital Frequency Synthesizer on FPGA Jung Seob LEE Xiangning YANG.
Parallel Routing for FPGAs based on the operator formulation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
FPGA Routing Pathfinder [Ebeling, et al., 1995] Introduced negotiated congestion During each routing iteration, route nets using shortest.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Intermediate Fabrics: Virtual FPGA Architectures for Circuit Portability and Fast Placement and Routing on FPGAs James Coole PhD student, University of.
Dynamic FPGA Routing for Just-in-Time Compilation
CS184a: Computer Architecture (Structures and Organization)
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
Presentation transcript:

PipeRoute: A Pipelining-Aware Router for FPGAs Akshay Sharma, Carl Ebeling* and Scott Hauck Electrical Engineering / *Computer Science & Engineering University of Washington Seattle, WA – 98195

2 Pipelined FPGA Architectures FPGAs and flexible computing But, max clock frequency? Examples of pipelined FPGAs RaPiD(Ebeling et al, 1996) HSRA(Tsu et al, 1999) UCSB(Singh et al, 2001) Few prominent features A fraction of (or all) switch-points are registered Registered LUT inputs Netlists heavily pipelined and retimed

3 Pipelined Routing PipeRoute – route netlists on pipelined FPGAs pipelined netlist provides information about register separation FPGA routing graph consists of R-nodes and D-nodes Cost of using an R-node or D-node in a route is the same as Pathfinder Pipelined routing problem differs from normal FPGA routing S T1T1 T2T2   

4 Normal Routing – Two Terminal Dijkstra’s shortest-path for two-terminal routing T S

5 Normal Routing – Two Terminal Dijkstra’s shortest-path for two-terminal routing T S

6 Normal Routing – Two Terminal Dijkstra’s shortest-path for two-terminal routing T S

7 Normal Routing – Two Terminal Dijkstra’s shortest-path for two-terminal routing T S

8 Normal Routing – Two Terminal Dijkstra’s shortest-path for two-terminal routing T S

9 Pipeline Routing – Two Terminal Find shortest route that goes through N registers (hereafter “registers” will be called “delays”) Traveling Salesman Find shortest route that goes through all nodes in a graph NP Complete  T S    

10 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

11 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

12 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

13 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

14 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

15 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

16 Two Terminal 1-Delay Router Can do optimal routing for 1-delay routes via Dijkstra  T S 

17 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router  T S    

18 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router Find 1-delay route  T S    

19 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router Find 1-delay route While not enough delay on route Replace any 0-delay segment with cheapest 1-delay replacement  T S    

20 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router Find 1-delay route While not enough delay on route Replace any 0-delay segment with cheapest 1-delay replacement  T S    

21 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router Find 1-delay route While not enough delay on route Replace any 0-delay segment with cheapest 1-delay replacement  T S    

22 Two Terminal N-Delay Router Greedy Approximation via 1-Delay Router Find 1-delay route While not enough delay on route Replace any 0-delay segment with cheapest 1-delay replacement  T S    

23 Normal Routing – Multi-Terminal Do two-terminal routing Use all of previous route(s) as source for next route T1T1 S T2T2

24 Normal Routing – Multi-Terminal Do two-terminal routing Use all of previous route(s) as source for next route T1T1 S T2T2

25 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S T1T1 S   T2T2   

26 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S T1T1 S T2T2     

27 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time T1T1 S T2T2     

28 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2     

29 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2      1

30 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2     

31 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2      2

32 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2     

33 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2      3

34 Multi-Terminal Router Sinks considered in increasing order of delay separation T1 is 2 delays away from S, and T2 is 3 delays away from S Accumulate 1 delay at a time When routing for an I delay, start from all existing routing at delay I and I-1 T1T1 S T2T2     

35 Benchmark Architecture Modified RaPiD architecture 1-D datapath of 16-bit ALUs, Multipliers, registers and memories Pipelined interconnect structure Long and short tracks Bus Connectors used to pick up delay

36 Testing Benchmark RaPiD netlists Pipelining aware placement tool For each netlist Treat netlist as unpipelined and determine smallest RaPiD arch. (Zl) Determine smallest RaPiD arch. needed to route pipelined netlist (Zp) Pipelining cost = Zp/Zl

37 Results Avg pipelining cost incurred = 1.74

38 Results Effect of netlist-size on pipelining cost Normalized to unpipelined netlist area

39 Results Effect of % pipelined signals on pipelining cost Normalized to unpipelined circuit area

40 The Future Delay driven PipeRoute Currently under development Sophisticated pipelining-aware placement algorithms Fast pipelined routing algorithms Use PipeRoute to explore pipelined FPGA architectures Number and location of registered switch-points