Presentation is loading. Please wait.

Presentation is loading. Please wait.

Communication Architecture Synthesis Alessandro Pinto, University of California at Berkeley

Similar presentations


Presentation on theme: "Communication Architecture Synthesis Alessandro Pinto, University of California at Berkeley"— Presentation transcript:

1 Communication Architecture Synthesis Alessandro Pinto, University of California at Berkeley apinto@eecs.berkeley.edu

2 Introduction: what Communication synthesis What is the “best way” of transferring information between entities Problem Formulation Who wants to communicate with whom How he aspects the communication mechanism to behave What is possible to built today How to build a communication architecture that is cheap and satisfies the entities constraints while being feasible

3 Introduction: why On-Chip application Lots of components connected together Distance is becoming important There isn’t a formal approach to the problem We build tools which are the glue in the platform-based design approach More or less everything has been done in the design of “functions”

4 Outline Overview Internet vs. On-chipnet Standard communication structures VSIA + Standard comm. Platform Constraint-Driven Communication Synthesis A theoretic approach (A.Pinto,L.P.Carloni.ASV) Current research Conclusion

5 Wire Scaling “As designs scale to newer technologies, they get smaller and their wires get shorter, and the relative change in the speed of wires to the speed of gates is modest.” M.A.Horowitz et.al “The future of wires” “The real wire problem arises with increasing chip complexity and global communication costs.” M.A.Horowitz et.al “The future of wires” Scale Local Wire Global Wire

6 What is going to happen From few closed connected components to a lot of distributed IP’s A global net is most likely to be traveled in more that one clock cycle High bandwidth and low power requirements Fault tolerance requirement There is need for a correct by construction methodology

7 Internet Network Highly distributed Fault Tolerant Every node must reachable

8 Networks vs. On-Chip Networks Internet major concern was to guarantee connectivity in case of attack Cost is mostly static Buffers memory is not a problem (512MB is still cheap) Number and complexity of protocol layer is important only for performance issues Full connectivity is not the major concern Now cost is mostly dynamic (power) Registers are expensive The protocol must be very light In the future error correction might become a reality

9 The beginning of internet Back in 1969: the ARPAnet SRI UCLA UCSB UTAH

10 The beginning of On-Chipnet A lot of effort on on-chip communication One of GSRC main theme: Communication- based design Special section in conferences Different approaches Not always really networks Buses are still widely used

11 Effect of standardization 1983: TCP/IP was introduced 1987: Commercialization of internet TCP/IP technology is transferred form Inventors to vendors

12 Standardization VSIA: Virtual Socket Interface Alliance Specifies Virtual Component Interface Peripheral VCI Basic VCI Advanced VCI VCI InitiatorVCI Target VCI Initiator Bus MasterBus Slave Any Bus AB http://www.vsi.org

13 Standard “Bus” Architecture ARM (ARM Ltd.)* CoreFrame (Palmchip) CoreConnect (IBM) WichBone (Silicore) SiliconBackPlane (Sonics)*

14 AMBA Bus Hierarchy AHB APB HP ARM Core On Chip RAM DMA Off-Chip RAM Bridge Keypad UART AHB: Advanced High-Performance Bus AB: Advanced Peripheral Bus

15 SiliconBackplane It’s possible to specify constraints for each point to point communication It’s possible to specify different OPC Synthesis of the network that guarantees performances

16 A Today’s Platform Example Processor USB, MMC, IrDA, Bluetooth PCMCIA, SDRAM CPU PCMCIA SDRAM Bluetooth MMC USB IrDA Bridge Based on Intel XScale

17 Where are we headed? Communication is specified in terms of point to point Constraints A target implementation technology is abstracted into a Library A synthesis procedure generates a communication architecture That satisfies the constraints That uses only the library elements

18 A Platform Integrator Environment ArchPltBuilder RTOS μC1 HW MEM μC2 DSP μC1 μC2 DSP MEM RTOS HW MEM To Silicon

19 Communication Topology Design Platform-based-design in communication

20 The Problem M 1 M 2 M 3 M 4 M 5

21 The Problem M 1 M 2 M 3 M 4 M 5

22 Previous Work S.Dey et al., “System-Level Performance Analysis for Designing On-Chip Communication Architectures”, TCAD, Vol. 20, NO. 6, June 2001 S.Dey et al., “Efficient Exploration of the SOC Communication Architecture Design Space”,Int. Conf. On CAD, August 1999 J.M. Daveau et al., “Protocol Selection and Interface Generation for HW-SW Codesign”, TVLSI, Vol. 5, NO.1, March 1997 J.M. Daveau et al., “Synthesis of system level communication by an allocation based approach”, Int. Symposium on System Synthesis, 1995

23 Our Approach System modules communicate by means of point-to-point unidirectional channels Each channel is characterized by a set of communication constraints (distance, minimum bandwidth) The specific functionality of each module is “abstracted away” Abstract Model

24 The Goal – Overview of the Method Constraint Propagation Performance Abstraction

25 Abstraction (Intuition) cost Length b1 b2 clk Relay Station Switches C r (b) C s (b) This is the basic library of Communication Components

26 Communication-Constraint Graph G=(V,A)  v  V, p(v) is the vertex position in our coordinate system  a  A: The arc distance d(a) The arc bandwidth b(a) M 1 M 5 v1v2 a

27 Library  = L  N, communication links and communication nodes  l  L: d(l) is the link length b(l) is the link bandwidth c(l) link cost  n  N c(n) is the cost of the communication node l1 l2 l3 n  r

28 Communication Implementation (1,3) (b(a),d(a))=(4,5) (1,2) (2,2)  nr G=(V,A)

29 Communication Implementation Arc Matching (1,3) (4,5) (1,2) (2,2)  nr

30 Communication Implementation Arc Segmentation (4,5) (1,2) (2,2)  nr

31 Communication Implementation Arc Duplication + Arc Segmentation (1,2) (2,2)  nr

32 Communication Implementation (1,3) Arc Merging (1,2) (2,2)  nr

33 Implementation Graph G’(G,  ) = (V’  N’, A’)  v’  V’, v’  V  n’  N’, n’ is an instance of an element of N  a’  A’, a’ is an instance of an element of L  a(u,v)  A,  P(a) set of paths s.t. P(a) connects u’ with v’ without passing through any other computational vertex P(a) satisfies the bandwidth constraint of a The cost of and implementation graph is

34 TheOptimization Problem

35 Assumptions Assumption on the Library d(a 1 ) > d(a 2 )  b(a 1 ) > b(a 2 )  c(a 1 ) > c(a 2 ) K-Way Merging structure q*

36 Candidate Implementations Generate only k-way mergings that could be part of the final implementation Derive mergeability conditions based only on positions and bandwidth of the constraint graph and the library

37 2-Way Mergeablility u1u1 u2u2 v1v1 v2v2 u1u1 u2u2 v1v1 v2v2

38 u1u1 u2u2 v1v1 v2v2 u1u1 u2u2 v1v1 v2v2

39 u1u1 u2u2 v1v1 v2v2 u1u1 u2u2 v1v1 v2v2

40 u1u1 u2u2 v1v1 v2v2 a1a1 a2a2 d(a 1 ) + d(a 2 )  || p(u 1 ) – p(u 2 )|| || p(v 1 ) – p(v 2 )|| || p(u 1 ) – p(u 2 )|| + || p(v 1 ) – p(v 2 )||  {a 1, a 2 } not mergeable 2-Way Mergeablility

41  {a 1… a k } not mergeable u1u1 ukuk v1v1 vkvk a1a1 akak (k-1)d(a k ) + d(a i )  || p(u 1 ) – p(u k )|| || p(v 1 ) – p(v k )|| || p(u i ) – p(u k )|| + || p(v i ) – p(v k )|| 2-Way Mergeablility

42 b(a i )  b(a l ) + b(a j )  {a 1… a k } not mergeable u1u1 ukuk v1v1 vkvk n-Way Mergeablility

43 Expansion Rule If a  A is not mergeable with any set of k-1 arcs in A, then a is not mergeable with any set of k arcs in A A a A kmax

44 Summary of Pruning Conditions K-way mergeablility condition over distance K-way mergeablility condition over bandwidth K  k+1 mergeability condition 1 2 3 All the pruning conditions are based on Positions of ports and Properties of arcs in the Constraint Graph and maximum bandwidth in the Library

45 Solving the Optimization Problem

46 Constrained Distance Sum Matrix D ij= d(a i ) + d(a j ) ajaj akak amam Dkj + Dmj = d(a k ) + d(a j )+ d(a m ) + d(a j ) = 2d(a j ) + d(a k ) + d(a m ) = (k-1)d(a j ) + d(a i )

47 Merging Distance Sum Matrix  ij= ||p(u i ) – p(u j )|| + ||p(v i ) - p(v j )|| ajaj akak amam  kj +  mj = ||p(u k ) – p(u j )|| + ||p(v k ) - p(v j )|| + ||p(u m ) – p(u j )|| + ||p(v m ) - p(v j )||= = || p(u i ) – p(u k )|| + || p(v i ) – p(v k )||

48 Algorithm K  1; foundcanditatemapping  TRUE; while (foundcanditatemapping) { for all column j  col(  ) kWayMergingNotFound  TRUE; for all subset of row R={i1…ik}  row(  ) if sum(D,R) < sum( ,R) if  l  L s.t. b(l) + bmin(R,j) < sum(B,R) + B[j] { S  S  kmergingimplementation(R,j); kWayMergingNotFound  FALSE } if (kWayMergingNotFound) col(  )  col(  ) \ j; if col(  ) =  foundcanditatemapping  FALSE; else k  k+1;} 1 2 3

49 A Simple Example a1a1 a3a3 a2a2 (0,0)(0.4,0)(1.4,0) (0,1)(0.4,1)(1.4,1) 22 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3 D 0.82.8 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3  111 a1a1 a2a2 a3a3 B b(l)=2 d(l)=1 c(l)=1 L

50 A Simple Example a1a1 a3a3 a2a2 (0,0)(0.4,0)(1.4,0) (0,1)(0.4,1)(1.4,1) 22 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3 D 0.82.8 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3  111 a1a1 a2a2 a3a3 B b(l)=2 d(l)=1 c(l)=1 L

51 A Simple Example a1a1 a3a3 a2a2 (0,0)(0.4,0)(1.4,0) (0,1)(0.4,1)(1.4,1) 22 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3 0.82.8 2 a1a1 a1a1 a2a2 a2a2 a3a3 a3a3  111 a1a1 a2a2 a3a3 B b(l)=2 d(l)=1 c(l)=1 L  

52 Examples Lib1 3 3-way mergings 1 9-way merging

53 Lib2 All p-to-p 1 9-way merging Examples

54 Limitations u1u1 u2u2 u3u3 v1v1 v2v2 v3v3 Alternation u1u1 u2u2 u3u3 v1v1 v2v2 v3v3 Bi-directionality

55 Library Characterization Main concerns (costs) Area Power Costs are affected by Geometric and physic properties of wires (our basic communication primitives) Communication Speed

56 Wires Geometry and Physics l crit buffer s opt Depends only on the technology, not on the metal layer

57 An Example Based on Intel 0.13um Energy consumption for a series of lcrit and buffer sopt Optimum buffer size in terms of minimum size

58 Wires trade off  crit is constant for the same length, different metal layers carry different bandwidth A wire in M(n) is going to cost more in terms of area and power then a wire in M(n-1) Buffers are bigger The library is convex

59 Conlcusion A chip will look like a network of components Highly distributed Multiclock cycle nets CAD tools for On-Chipnet design must Allow specification of communication constraints Find the minimum cost network. Costs are: Power Area (statefull/stateless repeaters) Blocking time (queues) Ensure Network reliability and correctness Constraint-Driven Communication Synthesis is our approach to the problem

60 Future Work (Available Projects) Ad hoc flow control protocol for low power, minimum buffering On-Chipnet Implementation and performance/cost abstraction of fast low power On-Chip routers Interface with Metropolis On-Chipnet VHDL netlist generator


Download ppt "Communication Architecture Synthesis Alessandro Pinto, University of California at Berkeley"

Similar presentations


Ads by Google