Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.

Slides:



Advertisements
Similar presentations
SDN + Storage.
Advertisements

Types of Algorithms.
1 Optimization of Routing Algorithms Summer Science Research By: Kumar Chheda Research Mentor: Dr. Sean McCulloch.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 26: April 18, 2007 Et Cetera…
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
EDA (CS286.5b) Day 3 Clustering (LUT Map and Delay) N.B. no lecture Thursday.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 16: March 14, 2007 Interconnect 4: Switching.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 14: February 28, 2007 Interconnect 2: Wiring Requirements and Implications.
Comparing Computing Machines Dr. André DeHon UC Berkeley November 3, 1998.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 15: February 12, 2003 Interconnect 5: Meshes.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
Network Aware Resource Allocation in Distributed Clouds.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
DLS on Star (Single-level tree) Networks Background: A simple network model for DLS is the star network with a master-worker platform. It consists of a.
CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.
Improved Cut Sequences for Partitioning Based Placement Mehmet Can YILDIZ and Patrick H. Madden State University of New York at BinghamtonComputer Science.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
Caltech CS184b Winter DeHon 1 CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and optimizations] Day9:
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 14: February 10, 2003 Interconnect 4: Switching.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
CBSSS 2002: DeHon Interconnect André DeHon Thursday, June 20, 2002.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2003 Interconnect 6: MoT.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 18: April 2, 2014 Interconnect 4: Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 16: February 14, 2005 Interconnect 4: Switching.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day14: November 10, 2000 Switching.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 12: February 5, 2003 Interconnect 2: Wiring Requirements.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 17: March 29, 2010 Interconnect 2: Wiring Requirements and Implications.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day12: November 1, 2000 Interconnect Requirements and Richness.
Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.
ESE534: Computer Organization
ESE534: Computer Organization
ESE534: Computer Organization
COMPUTATIONAL MODELS.
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
Types of Algorithms.
ESE534: Computer Organization
ESE534: Computer Organization
Types of Algorithms.
ESE534: Computer Organization
Sungho Kang Yonsei University
Professor Ioana Banicescu CSE 8843
CS184a: Computer Architecture (Structures and Organization)
ESE534: Computer Organization
CS137: Electronic Design Automation
CS184a: Computer Architecture (Structure and Organization)
Presentation transcript:

Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really want 100% LUT utilization

Question How much interconnect do I need for my computing/programmable array? Problem(?): too little interconnect  won’t be able to use all the gates/LUTs Typical subgoal: how much interconnect to use (almost) all LUTs?

Wrong Subgoal Observation: –interconnect is dominant area on FPGAs –more important to use interconnect efficiently than to use LUTs efficiently Different question/subgoal: –What level of interconnect gives the least implementation area for applications?

LUT Utilization predict Area?

Outline Question: how much interconnect? Teaser: less than 100% LUT utilization Model Application characteristics Compose Conclusions

Model Interconnect Requirements and Richness Recursively partition (bisect) design –Look at I/O from each partition (subtree) N/2 cutsize

Regularizing Growth How do bisection bandwidths shrink (grow) at different levels of bisection hierarchy? Basic assumption: Geometric –1 –1/  –1/  2

Rent’s Rule Long standing empirical relationship –IO = C  N P –0  P  1.0 Embodies geometric assumption (C,P) –Two parameters C base of growth P capture growth (  = 2 P ) Captures notion of locality

Step 1: Build Architecture Model Assume geometric growth Build architecture can tune –F, C – , p

Tree of Meshes Tree Restricted internal bandwidth Can match to model

Parameterize C

Parameterize Growth (2 1)* =>  =  2 (2 2 1)* =>  =(2*2) (1/3) =2 (2/3) ( )* =>  =2 (3/4)

Step 2: Area Model Need to know effect of architecture parameters on area (costs) –focus on dominant components wires (saw on Thursday) switches logic blocks(?)

Area Parameters A logic =   A sw =   Wire Pitch = 8

Switchbox Population Full population is excessive (see next week) Hypothesis: linear population adequate –still to be (dis)proven

“Cartoon” VLSI Area Model (Example artificially small for clarity)

Larger “Cartoon” 1024 LUT Network P=0.67 LUT Area 3%

Effects of P on Area 0.25 P= P= P= LUT Area Comparison

Effects of P on Capacity

Step 3: Characterize Application Requirements Identify representative applications. –Today: IWLS93 logic benchmarks How much structure there? How much variation among applications?

Application Requirements Max: C=7, P=0.68 Avg: C=5, P=0.72

Application Requirements: Benchmark Wide (MCNC)

Benchmark Parameters Interconnect requirements vary across applications.

Complication Interconnect requirements vary among applications Interconnect richness has large effect on area What is effect of architecture/application mismatch? –Interconnect too rich? –Interconnect too poor?

Network Fixed Schedule Network will have a fixed wiring schedule Applications have varying requirements To assess impact of mismatch –map to network schedules –look at area required

Interconnect Mismatch in Theory

Step 4: Assess Resource Impact Map designs to parameterized architecture Identify architectural resource required

Mapping to Fixed Wire Schedule Easy if need less wires than Net If need more wires than net, must depopulate to meet interconnect limitations.

Mapping to Fixed-WS Better results if “reassociate” rather than keeping original subtrees.

Observation Don’t really want a “bisection” of LUTs –subtree filled to capacity by either of LUTs root bandwidth –May be profitable to cut at some place other than midpoint not require “balance” condition –“Bisection” should account for both LUT and wiring limitations

Challenge Not know where to cut design into –not knowing when wires will limit subtree capacity

Brute Force Solution Explore all cuts –start with all LUTs in group –consider “all” balances –try cut –recurse

Brute Force Too expensive Exponential work …viable if solving same subproblems

Simplification Single linear ordering Partitions = pick split point on ordering Reduce to finding cost of [start,end] ranges (subtrees) within linear ordering Only n 2 such subproblems Can solve with dynamic programming

Dynamic Programming Start with base set of size 1 Compute all splits of size n, from solutions to all problems of size n-1 or smaller Done when compute where to split 0,N-1

Dynamic Programming Just one possible “heuristic” solution to this problem –not optimal –dependent on ordering –sacrifices ability to reorder on splits to avoid exponential problem size Opportunity to find a better solution here...

Ordering LUTs Another problem –lay out gates in 1D line –minimize sum of squared wire length tend to cluster connected gates together –Is solvable mathematically for optimal Eigenvector of connectivity matrix Use this 1D ordering for our linear ordering

Mapping Results

Step 5: Apply Area Model Assess impact of resource results

Resources  Area Model => Area

Net Area

Picking Network Design Point

What about a single design?

LUT Utilization predict Area?

Summary Interconnect area dominates logic block area Interconnect requirements vary –among designs –within a single design To minimize area –focus on using dominant resource (interconnect) –may underuse non-dominant resources (LUTs)

Methodology Architecture model (parameterized) Cost model Important task characteristics Mapping Algorithm –Map to determine resources Apply cost model Digest results –find optimum (multiple?) –understand conflicts (avoidable?)

Dr. André DeHon Berkeley Reconfigurable Architectures Software and Systems (BRASS)