Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006.

Similar presentations


Presentation on theme: "Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006."— Presentation transcript:

1 Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006

2 OCIN06: Intelligent Interconnects for Multicore SoCs2Dec. 6, 2006 Agenda SoC Background Interconnect Architecture Application Areas

3 OCIN06: Intelligent Interconnects for Multicore SoCs3Dec. 6, 2006 The Goal Create a system-on-a-chip, comprising ten million gates, that satisfies rapidly-evolving market requirements for: –Speed –Power –Area –Application Performance –Time to Market Using the minimum resources with maximum predictability

4 OCIN06: Intelligent Interconnects for Multicore SoCs4Dec. 6, 2006 SoC Architecture Trends Massive feature integration –Driven largely by Moores Law (supply) and convergence (demand) Continued movement of complexity to software Distributed architectures –Higher scalability (and independence?) Multiple processors –CPU –DSP –Special purpose (MPEG, packet, …) Distributed DMA –Removes centralized DMA bottleneck –Simplifies driver software integration CPUMPEGDSP 3D GFX MAC Video I/O CommI/O DRAM Controller System On Chip

5 OCIN06: Intelligent Interconnects for Multicore SoCs5Dec. 6, 2006 Why SoCs Are Difficult Improvement by feature integration has been practiced for decades –Why is SoC any different? Benefits of an SoC –Higher performance –Smaller footprint –Lower power –Lower cost Size, power, and cost benefits derive from sharing –Control software on CPU (interrupts, etc.) –Memory resources (on-chip and off-chip) – single external DRAM Building predictable systems with so much sharing is hard –Dozens to hundreds of interrupt sources –Memory bandwidth bottlenecks + lots of real-time traffic stress Unified Memory Architecture (UMA)

6 OCIN06: Intelligent Interconnects for Multicore SoCs6Dec. 6, 2006 Limits of Tightly Coupled Design Tightly coupled design has been dominant –Assumes largely synchronous, instantaneous and free communication –Widely practiced, and supported in design flows BUT –Delivering clocks is problematic –Wire delay is dominant –Routing area can cost more than gates –Too many constraints, from too many blocks Cannot afford lowest common denominator design

7 OCIN06: Intelligent Interconnects for Multicore SoCs7Dec. 6, 2006 Network DRAM Controller CPU DMA DRAM ControllerCPU Our Approach: Active Decoupling Separation Abstraction Optimization Independence Master Slave Master Socket MasterSlave MasterSlave Bus Adapter Core Function Communication Core Function Communication Core Function Communication Agent Internal Fabric SMART Interconnect

8 OCIN06: Intelligent Interconnects for Multicore SoCs8Dec. 6, 2006 Layers of Decoupling Degree of blocking QoS Addressing Source identification Bursting, pipelining Threading Data, address, event widths Handshaking / flow control Signal timing and capacitance Clock frequency Performance Identification Transfer Protocol Signaling Electrical Higher Abstraction

9 OCIN06: Intelligent Interconnects for Multicore SoCs9Dec. 6, 2006 Evolution of Design Abstractions Abstraction minimizes the number of objects that the designer must manage Gates Wires Level of Abstraction TIME Blocks Buses n Tiles Networks Interconnect Cores Functional Cores Need to be here! Most designs are here

10 OCIN06: Intelligent Interconnects for Multicore SoCs10Dec. 6, 2006 Abstraction Enables Higher Complexity p 100M+ Gates Technology Push THE PAST BlocksBuses PRESENT Cores Interconnects 5M+ Gates DEVELOPMENT TIME & COST FUTURE TilesNetworks Log(GATE COUNT)

11 OCIN06: Intelligent Interconnects for Multicore SoCs11Dec. 6, 2006 The OCP Socket Owned/advanced by OCP-IP (www.ocpip.org)www.ocpip.org –Over 150 member companies Interconnect-neutral Defines data flow, control flow, test signaling Highly configurable to match core needs Simple Master/Slave request/response protocols –With many options Fully synchronous, point-to-point Optional pipelining, bursting, threading Flexible handshaking and other flow control

12 OCIN06: Intelligent Interconnects for Multicore SoCs12Dec. 6, 2006 Agenda SoC Background Interconnect Architecture –Advanced Fabrics –Intelligent Agents Application Areas

13 OCIN06: Intelligent Interconnects for Multicore SoCs13Dec. 6, 2006 SoC Data Flow Requirements Connect dozens of initiators to several memories –While meeting a wide range of latency and throughput requirements Connect processors and DMA to control ports and peripherals –With predictably low latencies Example SoC requirements (interconnect view): MetricModeTypeRequirement CPU – peripheral interrupt serviceAllWorst-case< 10 cycles CPU – DDR cache miss latencyAllAverageAs low as feasible H.264 decode – DDR bandwidthHD vid.Worst-case1.2 GBytes/sec H.264 decode – service jitterHD vid.Worst-case< 6 macro-blocks Video refresh – DDR bandwidthSD/NIWorst-case56 MBytes/sec ………… DDR data bus bandwidthHD vid.Sustained> 80%

14 OCIN06: Intelligent Interconnects for Multicore SoCs14Dec. 6, 2006 Interconnect Fabric Options SoC data flow requirements must be satisfied by internal interconnect fabric –Big challenge in current SoC designs! Choices in interconnect fabric design –Unified vs. split transactions –Shared vs. separate physical links –Combinational vs. pipelined –Single vs. multiple outstanding transactions (transaction pipelining) –In-order vs. out-of-order completion and response –Blocking vs. non-blocking flow control

15 OCIN06: Intelligent Interconnects for Multicore SoCs15Dec. 6, 2006 Blocking vs. Non-blocking Flow Control Sharing in SoCs creates many opportunities for contention –Arbitration determines who wins –Flow control determines when the winner gets to go Blocking flow control systems allow resource shortages along some paths to prevent other paths from progressing Non-blocking flow control systems ensure that points of sharing never stall if any data flow could progress Locally non-blocking flow control (aka virtual channels) improves efficiency and allows more resource sharing End-to-end non-blocking flow control offers greater predictability –Provides basis for QoS guarantees Our Approach

16 OCIN06: Intelligent Interconnects for Multicore SoCs16Dec. 6, 2006 SMX SonicsMX Basic Architecture Hybrid topologies –Full / partial cross-bar –Shared bus Pipelined, multi-threaded, non-blocking fabric Fully split (dual) request / response Distributed QoS arbiter –Spans cycle, frequency, and data width boundaries –Supports flexible thread merging tree topologies I SMX CPU DSP GFX ROM SRAM Flash Ctl. DRAM Ctl. IIII T T

17 OCIN06: Intelligent Interconnects for Multicore SoCs17Dec. 6, 2006 Global Interconnect Responsibilities Routing –Getting requests, responses and data to the desired destination Access control –Managing contention for shared resources (ensuring QoS) –Ensuring requested access is allowed (security and protection) Error management –Detection, reporting, and SW recovery support Power management –Activity detection, clock and voltage removal support Connectivity –Protocol conversion –Data width / clock frequency conversion Spanning distance –Connecting endpoints at required frequency and latency

18 OCIN06: Intelligent Interconnects for Multicore SoCs18Dec. 6, 2006 Fabric The Intelligence is in the Agents Agents provide… –Protocol conversion Agent adapts to IP core –Decoupling of IP cores from fabric Provide local, isolated environment –Layered services Proven technology –Over 100 million ICs shipped so far Agent services –Power management –Security management –Error management –QoS –Burst, width, and command conversion I T I T I T I T I T INITIATOR SOCKETS TARGET SOCKETS Target Agents (TA) Initiator Agents (IA)

19 OCIN06: Intelligent Interconnects for Multicore SoCs19Dec. 6, 2006 Sonics Interconnect Products ProductSiliconBackplaneSonics3220SonicsMX Introduction ConnectsProcessors, memories Peripherals, registers Processors, memories Socket(s)OCP 1OCP 1/2, APBOCP 1/2, AHB, AXI DecouplingHighMediumVery High FabricDistributed, shared bus Multi-branch shared bus Hybrid cross-bar / shared bus ApplicationsWLAN, HDTV, PVR, … Handsets, consumer, … Handsets, HDTV, DSC, OA, … Volume Production Over 100 million ICs shipped 2005

20 OCIN06: Intelligent Interconnects for Multicore SoCs20Dec. 6, 2006 Interconnect Performance Many standard peak measurements pretty meaningless when considering SoC interconnects –Bandwidth is cheap (wires are cheaper than pins) –Usable bandwidth mostly dependent on attached cores… –… and application scenario Sonics products engineered to span very wide range of operating points –Latencies: 0-10 clock cycles –Bandwidth: limited only by targets –Frequency: keep up with DRAM –IP core count: –No process-specific limitations –Standard ASIC-style design flow

21 OCIN06: Intelligent Interconnects for Multicore SoCs21Dec. 6, 2006 Product Deliverables Configuration tools –Aid in capture, checking, generation, and verification of Sonics IP Configured Register Transfer Language (RTL) code –Virtual hardware input to logic synthesis and place & route tools Configured verification environment –Tests customer configuration of Sonics IP Logic synthesis scripts & floorplan interface –Bridges to physical design domain Electronic System Level (ESL) models (in SystemC) –Helps architectural exploration and firmware development Analysis tools and models –Aids customer in building and analyzing prototypes of SoC

22 OCIN06: Intelligent Interconnects for Multicore SoCs22Dec. 6, 2006 Design Issues Addressed By Our Approach Methodology & Automation Scalable Fabrics Intelligen t Agents Complex Memory Hierarchies Signal Integrity Distributed Processing High Peripheral Count Perf. Verification Arch. Modeling SW Development Virtual Prototyping Parallel IP Creation Design Re-use Timing Closure Variable Clock Freq. Voltage Isolation Power Management Error Management Access Security Data Width Conversion Mixed Endianness Protocol Conversion Guaranteed BW QoS Pipelining

23 OCIN06: Intelligent Interconnects for Multicore SoCs23Dec. 6, 2006 Design Timeline Benefits of Our Approach

24 OCIN06: Intelligent Interconnects for Multicore SoCs24Dec. 6, 2006 Agenda SoC Background Interconnect Architecture Application Areas

25 OCIN06: Intelligent Interconnects for Multicore SoCs25Dec. 6, 2006 Inside a Tile-based Multicore SoC Example: TI OMAP 2420 Heterogeneous processors –General-purpose CPU (e.g. ARM 11) –DSP (e.g. TI C5x) –2D/3D graphics (e.g. Imagination PowerVR MBX) –Video accelerator (e.g. MPEG4 codec) Multiple levels of shared memory –External DRAM and Flash –Internal SRAM and ROM Very complex peripheral subsystems Security management (content, service, stability) Aggressive power management

26 OCIN06: Intelligent Interconnects for Multicore SoCs26Dec. 6, 2006 Sources: (from OMAP 5910) IVA

27 OCIN06: Intelligent Interconnects for Multicore SoCs27Dec. 6, 2006 Multicore Architecture Advantages Avner Goren TI EPF 2004 What is needed

28 OCIN06: Intelligent Interconnects for Multicore SoCs28Dec. 6, 2006 Application Processor Interconnect Example S3220 CPU Tile PP P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T P T Flash Controller SDRAM Controller 2D/3D Graphics Tile Embedded SRAM DSP Tile SMX I T T T I I MPEG4 Codec Tile MP3USB 2.0 LCD Controller Camera Interface TT TT T P T II II DMA II IT IT TI Complex Socket 128 DSP Core Inst. Cache X RAM Y RAM MMU DMA Data Cache Shared Bus Fabric T P T P T P T Simple Socket 16 Agent Regs Socket I/F Fabric I/F Decoupling Buffer SM P T P T P T TTTT I Partial XBar Fabric

29 OCIN06: Intelligent Interconnects for Multicore SoCs29Dec. 6, 2006 SoC Application Requirements (1/2) Market DRAM Efficiency Real- time Feature- driven Power- optimized Flexible Platform Application Processor DTV/STB Gaming Multi-function Printers WLAN

30 OCIN06: Intelligent Interconnects for Multicore SoCs30Dec. 6, 2006 SoC Application Requirements (2/2) Market DRAM Efficiency Real- time Feature- driven Power- optimized Flexible Platform Cellular Baseband DSC PC Automotive Storage

31 OCIN06: Intelligent Interconnects for Multicore SoCs31Dec. 6, 2006 Challenges With Textbook NoC SoC applications do not offer lots of network- level concurrency –Many are dominated by a single DRAM target NoC packetization and serialization overhead too high –Communication (wires) vs. computation (router, NI) cost trade-offs different for SoC NoC latency is unacceptable for current processors –Should improve as reality sets in…

32 OCIN06: Intelligent Interconnects for Multicore SoCs32Dec. 6, 2006 Research Opportunities Sonics is interested in facilitating research in the following areas: –NoC benchmarking (via OCP-IP) –Distributed, heterogeneous cache and I/O coherence –Performance constraint capture –Static performance analysis (SPA) –Implications of 3D packaging on SoC architecture Interested? –Please contact me:

33 OCIN06: Intelligent Interconnects for Multicore SoCs33Dec. 6, 2006 Summary Active decoupling is essential for managing heterogeneous designs Non-blocking interconnect fabrics offer higher efficiency and better QoS Intelligent agents centralize key services High volume multicore SoC applications need advanced on-chip interconnects now Please contact me for references


Download ppt "Intelligent Interconnects for Multicore SoCs Drew Wingard, CTO, Sonics, Inc. OCIN06: December 6, 2006."

Similar presentations


Ads by Google