Download presentation
Presentation is loading. Please wait.
1
Packet-Switched vs. Time-Multiplexed FPGA Overlay Networks Kapre et. al RC Reading Group – 3/29/2006 Presenter: Ilya Tabakh
2
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
3
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
4
Introduction Dedicated spatial interconnect links on a configured FPGA network can be inefficient for sparse communication patterns Overlaying virtual networks on top of the physical networks can help address this issue
5
Time-Multiplexed Pros –Can take advantage of global route information Cons –Offline computation can be compute intensive –Must allocate resources for communication schedule and all possible communication between operators
6
Packet-Switched Pros –No offline setup and resources for storing communication schedule –Routes are made for operators that are actually communicating Cons –Switches more complex –Routes can be less efficient
7
Novel Contributions of work Demonstration of efficient and scalable static and dynamic FPGA overlay networks Quantification of difference between offline scheduling and online routing Quantification of performance impacts due to balancing interconnects and computing Characterization of area and performance tradeoffs between time-multiplexed and packet-switched Quantification of performance difference between time-multiplexed and packet-switched under varying application communication loads.
8
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
9
NoC Early days – on-chip buses Later necessary to investigate scalable, high- performance, low-overhead on chip networks Networks are required since buses scale poorly As the number of PEs increases the communcation increases and more bandwidth is needed
10
Communication Patterns Need to know in order to choose network to use Configured switching is inefficient for apps that underutilize links Circuit switching is efficient for larger messages on shorter networks Need to know characteristics in order to make appropriate choice
11
Packet Switched How they improve on past work in FPGA- based overlay networks –Allow arbitrary topolgies –Use real applications and relistic PE architectures to generate traffic payloads –Network speed is much faster running at 166 MHz as compared to most running at 25-50 MHz
12
Time Multiplexed Use a greedy router similar to the one used in the Virtual Wires project Virtual Wires overcame pin limitation by time sharing each physical wire among logical wires and pipelining This paper attempts to explore the entire design space as opposed to one system size or config
13
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
14
Performance Analysis Several important quantities of the network have to be defined PE Input Serialization A bound of cycle count for input PE Output Serialization A bound of cycle count for output Network Bisection Maximum number of messages that can cross the network on a given cycle Network Latency Number of cycles required to cross the network
15
Butterfly Fat Trees Most FPGA NoCs have focused on meshes BFTs achieve higher performance at equivalent chip size Routing functions programmed in the split primitives determine path Single address bit is used to make a routing decision at each switch Time-multiplexed merge contains a context memory which stores computed routing
17
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
18
Packet Switched Primitives have input queues Split primitives computes the routing decision in a single cycle based on the destination address Arbitration is done by selecting packets based on input queue occupancies Network with floorplaned and pipelined primitives can operate as high as 180 MHz
19
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
20
Time Multiplexed Statically scheduled prior to runtime Switching primitives contain context memory Context memory requires 1 bit of storage per cycle Network capable of operating at 166 MHz Greedy routing algorithm used
21
Area and Latency of Switching
22
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
23
A real life application was mapped onto the networks ConceptNet – common-sense reasoning knowledge base represented as a graph Start with a inititial set of nodes, send activation from each node to it’s neighbors along weighed edges Time multiplexed run at 100% activity packet switched run between 1-100% activity level Limitations –Nodes limited to 128 edges of fanout or fanin –Can only process a single edge per cycle Application
24
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
25
Java based infrastructure –simulates the packet switched network –computes schedules for time multiplexed network Used smallest set of ConceptNet predicates Java infrastructure generates VHDL netlist Hand coded VHDL for ConceptNet PEs Created custom multipliers instead of using onboard for speed Methodology
26
Methodology (cont) Synthesis and place and routing using Synplicity Compiler v8.0 Xilinx ISE v8.1i to obtain operating frequency and slice count Long wires that constrain performance are further pipelined based on post place-and-route timing analysis Lots of intervention to prepare system
27
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
28
Results Three quantitative comparisons are provided to characterize the tradeoffs between packet switched and time multiplexed networks –Routing of identical topologies –Impact of area with identical area constraints –Examine performance while varying activity level (Activity Factors)
29
Routing identical topologies Small numbers of PEs induce a light communication load As PEs , communication and offline routing starts to outperform online routing Online routing requires up to 63% more cycles than offline routing for larger networks
30
Impact of Area A couple of things to consider when talking about area –PE vs. Interconnect Tradeoff –Area-Time Tradeoff
31
PE vs. Interconnect Tradeoff Sometimes the network performs better with less PEs but more capacity in the network.
32
Area-Time Tradeoff Packet switched and time multiplexed networks may use significantly different amounts of area due to differences in switch sizes At smaller areas time multiplexing requires more cycles At higher cycle counts time multiplexing requires more area for context Performance is limited by 128 edge fanin or fanout limit
33
Activity Factors Packet-switching takes 8x as many cycles to route At some activity factors less than 100% packet-switching should be able to outperform time-multiplexing for same area
34
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
35
Conclusions Demonstrated implementations of packet- switched and time-multiplexed FPGA overlay networks operating at 166 MHz Offline scheduling offers up to a 63% performance increase over online scheduling for equivalent topologies Packet-switching is up to 2x faster for small areas Time-multiplexing is up to 8x faster for large areas
36
Conclusions (cont.) For activity factors less than 30% or 5%, packet switching offers better performance At 32K slices and 100K slices respectively
37
Future Work Mapping larger communication graphs with smaller fanout limitations to fully test networks Compress context memory for time- multiplexing Improve efficiency of packet switching Extend work to multiple-chip networks
38
Introduction Background Topology Packet Switched Time Multiplexed Application Methodology Results Conclusions Wrap-up Questions Agenda
39
Wrap-up Paper takes a look at trade-offs involved in FPGA networks Thought it was a good look at design decisions and gave actual guidance to the designer Describes interesting alternative to mesh network (BFTs)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.