Presentation is loading. Please wait.

Presentation is loading. Please wait.

Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.

Similar presentations


Presentation on theme: "Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University."— Presentation transcript:

1 Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University

2 Spring 2003 Texas A&M Topics of this Lecture Motivations for a new communication template Methodology for Synthesis Clustering - PreliminaryAnalysis Network Architecture NoCSIM - Basic Simulator on Work Core Network Interface Implementation of NoCs Future Work and Conclusion

3 Spring 2003 Texas A&M Motivation for a New Template Throughput –Networking example OC768 Networking Standard Input data rate of 40Gbps or 114 Million packets per second (48 byte packets) Upto 70 memory accesses per packet for classification Nearly 7G memory accesses per second (64 bits) Shared bus needs to run at tens of GHz We need scalable and high performance on-chip communication architectures

4 Spring 2003 Texas A&M A Commercial example TNETV3010

5 Spring 2003 Texas A&M Current Trend Explicitly parallel SoC architectures Integrating huge amounts of Memory in chip designs Distributed Shared Memory Environments Support GALS for Power and Performance. Should allow Interconnection centric design flow and better predictability –Physical design Closure –Wire delay dominates gate delay

6 Spring 2003 Texas A&M Wire Delay vs Generation

7 Spring 2003 Texas A&M Motivation for Communication Synthesis SoCs are application specific Synthesis should be based on the application’s requirements –2.5 X Improvement based on architecture –6 X Improvement based on traffic. “EFFECTIVE SYNTHESIS METHODOLOGY IS NEEDED” [1]

8 Spring 2003 Texas A&M Network on Chip Architecture Packet switched Network for Communication concurrency and throughput Regular layout with small wire lengths Reduced Wire, Area and Power Complexity –Mesh –Torus Reduces number of hops.

9 Spring 2003 Texas A&M Weird Folded Torus Wire latency is distributed evenly

10 Spring 2003 Texas A&M Architecture Template Each node could be a cluster of cores.

11 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template Methodology for Synthesis Clustering - Analysis Network Architecture NoCSIM - Simulator Core Network Interface Initial Implementation of NoCs Future Work and Conclusion

12 Spring 2003 Texas A&M Overall Synthesis Methodology Outputs Synthesis Inputs Hardware/ Protocol Library System Specification/ Process Profile Clustering Intra-Cluster Communication Synthesis Profile Annotation Inter-Cluster / Core Communication Synthesis Mapping Partitioned & Synthesized System

13 Spring 2003 Texas A&M Topics of this Lecture Motivations for a new communication template Methodology for Synthesis Clustering - PreliminaryAnalysis Network Architecture NoCSIM - Basic Simulator on Work Core Network Interface Implementation of NoCs Future Work and Conclusion

14 Spring 2003 Texas A&M Clustering: Motivations Low Latency, Low Bandwidth & Synchronous Communications High Connectivity - Low Fanout Fixed Interfaces & Ease of Design Hierarchical approach to Communication architecture Exploration Estimation of Costs ahead of time

15 Spring 2003 Texas A&M Clustering Flow

16 Spring 2003 Texas A&M Clustering Algorithm Inputs –Resource & Connection Constraints Core type, area, Latency, Throughput, traffic type etc. –Implementation Constraints Parameters of a process technology like wire-pitch, R,C values of interconnects, buffers etc Operating frequencies, Signaling overheads –User defined Constraints Binary variable for control of designer on convergence of clustering

17 Spring 2003 Texas A&M Clustering Algorithm (Cont.) Simulated Annealing Initialize Move –Remove from Cluster and Add to Cluster Parameters –Cost function – A linear function of Wire complexity, Latency and Data-rate satisfactions with high biases and Bandwidth and Area deviations with small biases

18 Spring 2003 Texas A&M Simulated Annealing Convergence

19 Spring 2003 Texas A&M Analysis Models [2]

20 Spring 2003 Texas A&M Architecture Template

21 Spring 2003 Texas A&M Analysis Models (Cont.) Hop Latency

22 Spring 2003 Texas A&M Results from Clustering Clock Power decrease Interconnect Power decrease Interconnect Area increase

23 Spring 2003 Texas A&M Results from Clustering Causes : Hop latencies and lack of user constraints

24 Spring 2003 Texas A&M Synthesis and Verification Methodology

25 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion

26 Spring 2003 Texas A&M Network Architecture Packet Switched - Lower latency for less correlated communications Virtual Channel Flow-control - Bandwidth guarantees and higher saturation throughput Source based Routing - Simple and Fast Credit based traffic flow control - Reliable delivery

27 Spring 2003 Texas A&M Architecture Template

28 Spring 2003 Texas A&M Packet Format Type: Head, Data, Tail and Complete VCID: Virtual Channel Identifier Route: ‘N’ bit route field with last 2 bits specifying the Route to be used in the next controller 00 - Left 01 - Right 10 - Straight 11 - Extract Data: Actual Data field

29 Spring 2003 Texas A&M Routing Example

30 Spring 2003 Texas A&M Network working - Output Controller

31 Spring 2003 Texas A&M Network working -Input Controller

32 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion

33 Spring 2003 Texas A&M NoCSIM - SystemC based Network Simulator SystemC –SystemC Ports, Signals, Clocks, Processes and Modules 1000X Faster than RTL System Level Exploration for Architectural Synthesis

34 Spring 2003 Texas A&M Simulator Features Flow Control – Dynamic & Static Routing – Source-based, Dynamic & Multicast Switching – Packet switched Topology – K-ary-n-cube & Arbitrary topological extensions

35 Spring 2003 Texas A&M Class Hierarchy

36 Spring 2003 Texas A&M Simulation Results

37 Spring 2003 Texas A&M Simulation Results (cont.)

38 Spring 2003 Texas A&M Simulation Results (cont.)

39 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - A Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion

40 Spring 2003 Texas A&M Core Network Interface Two implementations

41 Spring 2003 Texas A&M Interface Estimations Type of Implementation AreaLatencyComplexityFlexibility RTL (HW) implementation (on Xtensa core) Additional register and logic to packetize 2 Cycles Additional registers and logic and an increase in instruction set. Requires programmable cores or development of modified cores. Wrapper RTL (HW) Implementation (off core) Additional control, registers and logic to packetize. 3 Cycles expected Additional control, registers and logic. Ability to understand core operation Can use existing cores. Modify wrappers for plug-and-play into different networks.

42 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - An analysis phase Network Architecture and Working NoCSIM - A Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work and Conclusion

43 Spring 2003 Texas A&M Gate Level Implementation of NoCs Synthesis Library Used –.18 u library –Vdd = 1.62v –Synopsys Design Compiler, Power Compiler –No Dynamic power annotations

44 Spring 2003 Texas A&M Gate Level Implementation of NoCs (cont.)

45 Spring 2003 Texas A&M Gate Level Implementation of NoCs (cont.) ModuleArea(128)Area (256) Output Controller78800154100 Input Controller470000610000 VCs, BuffersArea estimate in square microns 1,8338000 2,4354000 2,8431000 4,4470000

46 Spring 2003 Texas A&M Performance Evaluation A Configuration of 6 buffers and 4 virtual channels with 16 such tiles(N/W logic Clock = 400Mhz) can –Support aggregate data-rates of 600Gbps –Area cost around assuming a 3cm*3cm chip size will be around 5% –Dissipate around 1.6 W

47 Spring 2003 Texas A&M A Roadmap Area Cost will Decrease with process Latency will be dominated by wire delay Power reductions will become more important MOVE WILL BE TOWARDS MORE CLUSTERS AND CONCURRENCY

48 Spring 2003 Texas A&M Topics to be discussed Motivations for a new communication template New Methodology for Synthesis Clustering - Analysis phase Network Architecture and Working NoCSIM - Simulation Phase Core Network Interface Gate Level Implementation of NoCs Future Work

49 Spring 2003 Texas A&M Future Work Network Architecture –Application Layer Wrapper design, protocols and services –Logical Network Layer Flow-control, routing, switching and topologies –Physical Network Layer Signaling, Wire characterization and Prediction System Level Design and Design Flow –Power Management of Network –Integration to Codesign Environment Network aware partitioning and mapping Integration with Clustering phase

50 Spring 2003 Texas A&M References [1] N. Swaminathan, MS Thesis, Texas A&M Univ, Summer 2002. [1] K. Lahiri et. Al, “Fast performance Analysis of bus based on-chip communication architecture”, in Proc. of ICCAD, 1999. [2] A.Hemani et. al, “Lowering Power Consumption in clock by using Globally Asynchronous Design Style”, in Proc. of DAC, USA, 1999, pp 873-879.


Download ppt "Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University."

Similar presentations


Ads by Google