We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byDestin Blaker
Modified about 1 year ago
On-Chip Communication Architectures Introduction ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapters 1, 2 1© 2008 Sudeep Pasricha & Nikil Dutt
Outline Introduction to SoC Design Trends Significance of on-chip communication architectures Bus-based communication architectures ◦ Terminology ◦ Physical structure ◦ Clocking ◦ Arbitration and decoding ◦ Topology types ◦ Data transfer modes ◦ Physical implementation issues ◦ DSM effects 2© 2008 Sudeep Pasricha & Nikil Dutt
Designer Productivity Gap © 2008 Sudeep Pasricha & Nikil Dutt3 SoC designs today are complex, characterized by more and more IPs being integrated on a single chip, and a shrinking time-to-market
Emerging Application Requirements 4© 2008 Sudeep Pasricha & Nikil Dutt
Coping with SoC Complexity © 2008 Sudeep Pasricha & Nikil Dutt5 Practicing IP based Design and Reuse ◦ Raising the reuse factor from standard cells to IP blocks e.g. predesigned hardware IPs for processors (ARM, PowerPC), communication (AMBA, CoreConnect), memories (Samsung SDRAMs, Denali SRAMs), I/O (UART, USB) etc. ◦ IPs not just for hardware, but for software (device drivers, OS) too ◦ Substantial reduction in SoC design and verification time ◦ Requires initial investment to create reusable cores but productivity improves with reuse IP Interfacing Standards ◦ IP based design needs to handle incompatible IP interfaces ◦ Assembling heterogeneous IPs for SoC design can take months!!! ◦ Need for unified standard to quickly connect IPs e.g. OCP-IP, VSIA VCI etc.
Coping with SoC Complexity © 2008 Sudeep Pasricha & Nikil Dutt6 Platform based Design ◦ Design paradigm which promotes IP based design and reuse ◦ A meet-in-the-middle approach maps application to a customizable design consisting of HW, SW IPs ◦ Platforms use IPs; but will soon become the IPs of the future e.g. Phillips Nexperia, ARM PrimeXsys platform etc. Decouples the application development process from the architectural implementation process.
Data Flow Replacing Data Processing As Major SoC Design Challenge © 2008 Sudeep Pasricha & Nikil Dutt7 I/O Bus Main Bus Core N µP Core 2 µP Sub system µP Mem Bus Core 1 SoCs Circa 2002 SoCs Circa 2008 Critical Decision Was uP Choice Critical Decision Is Interconnect Choice Communication Architecture Design and Verification becoming Highest Priority in Contemporary SoC Design! DRAMC Exploding core counts requiring more advanced Interconnects EDA cannot solve this architectural problem easily Complexity too high to hand craft (and verify!) Source: SONICS Inc.
Examples of On-chip Communication Architectures IBM Cell ring bus communication architecture 8© 2008 Sudeep Pasricha & Nikil Dutt
Ideal ESL Design Flow 9© 2008 Sudeep Pasricha & Nikil Dutt
Need for Communication-centric Design Flow Communication is THE most critical aspect affecting system performance Communication architecture consumes upto 50% of total on-chip power Ever increasing number of wires, repeaters, bus components (arbiters, bridges, decoders etc.) increases system cost Communication architecture design, customization, exploration, verification and implementation takes up the largest chunk of a design cycle Communication Architectures in today’s complex systems significantly affect performance, power, cost and time-to-market! 10© 2008 Sudeep Pasricha & Nikil Dutt
Technology Scaling Trends: Total Interconnect Length on a Chip Highlights importance of interconnect design in future technologies 11© 2008 Sudeep Pasricha & Nikil Dutt
Technology Scaling Trends: Interconnect Performance Relative delay comparison of wires vs. process technology Increasing wire delay limits achievable performance 12© 2008 Sudeep Pasricha & Nikil Dutt
Bus based On-Chip Communication Architectures © 2008 Sudeep Pasricha & Nikil Dutt13 Buses are the simplest and most widely used SoC interconnection networks Bus: ◦ a collection of signals (wires) to which one or more IP components (which need to communicate data with each other) are connected Only one IP component can transfer data on the shared bus at any given time Micro- controller Digital Signal Processor Input/ Output Device Memory Bus
Bus Terminology 14© 2008 Sudeep Pasricha & Nikil Dutt
Bus Terminology © 2008 Sudeep Pasricha & Nikil Dutt15 Master (or Initiator) ◦ IP component that initiates a read or write data transfer Slave (or Target) ◦ IP component that does not initiate transfers and only responds to incoming transfer requests Arbiter ◦ Controls access to the shared bus ◦ Uses arbitration scheme to select master to grant access to bus Decoder ◦ Determines which component a transfer is intended for Bridge ◦ Connects two busses ◦ Acts as slave on one side and master on the other
Bus signal lines © 2008 Sudeep Pasricha & Nikil Dutt16 A bus typically consists of three types of signal lines ◦ Address Carry address of destination for which transfer is initiated Can be shared or separate for read, write data ◦ Data Carry information between source and destination components Can be shared or separate for read, write data Choice of data width critical for application performance ◦ Control Requests and acknowledgements Specify more information about type of data transfer Byte enable, burst size, cacheable/bufferable, write-back/through, … address lines data lines control lines
Types of Bus Topologies Shared bus 17© 2008 Sudeep Pasricha & Nikil Dutt
Types of Bus Topologies Hierarchical shared bus 18© 2008 Sudeep Pasricha & Nikil Dutt Improves system throughput Multiple ongoing transfers on different buses
Types of Bus Topologies Split bus 19© 2008 Sudeep Pasricha & Nikil Dutt Reduces impact of capacitance across two segments Reduces contention and energy
Types of Bus Topologies Full crossbar/matrix bus (point to point) 20© 2008 Sudeep Pasricha & Nikil Dutt
Types of Bus Topologies Partial crossbar/matrix bus 21© 2008 Sudeep Pasricha & Nikil Dutt
Types of Bus Topologies Ring bus 22© 2008 Sudeep Pasricha & Nikil Dutt
Bus Physical Structure tri-state buffer based bidirectional signals Commonly used in off-chip/backplane buses ◦ + take up fewer wires, smaller area footprint ◦ - higher power consumption, higher delay, hard to debug 23© 2008 Sudeep Pasricha & Nikil Dutt
Bus Physical Structure MUX based signals 24© 2008 Sudeep Pasricha & Nikil Dutt Separate read, write channels
Bus Physical Structure AND-OR based signals 25© 2008 Sudeep Pasricha & Nikil Dutt
Bus Clocking Synchronous Bus ◦ Includes a clock in control lines ◦ Fixed protocol for communication that is relative to clock ◦ Involves very little logic and can run very fast ◦ Require frequency converters across frequency domains 26© 2008 Sudeep Pasricha & Nikil Dutt
Bus Clocking Asynchronous Bus ◦ Not clocked ◦ Requires a handshaking protocol performance not as good as that of synchronous bus No need for frequency converters, but does need extra lines ◦ Does not suffer from clock skew like the synchronous bus 27© 2008 Sudeep Pasricha & Nikil Dutt
Decoding and Arbitration © 2008 Sudeep Pasricha & Nikil Dutt28 Decoding ◦ determines the target for any transfer initiated by a master Arbitration ◦ decides which master can use the shared bus if more than one master request bus access simultaneously Decoding and Arbitration can either be ◦ centralized ◦ distributed
Centralized Decoding and Arbitration 29© 2008 Sudeep Pasricha & Nikil Dutt Minimal change is required if new components are added to the system
Distributed Decoding and Arbitration 30© 2008 Sudeep Pasricha & Nikil Dutt + requires fewer signals compared to the centralized approach - more hardware duplication, more logic/area, less scalable
Arbitration Schemes Random ◦ Randomly select master to grant bus access to Static priority ◦ Masters assigned static priorities ◦ Higher priority master request always serviced first ◦ Can be pre-emptive (AMBA2) or non-preemptive (AMBA3) ◦ May lead to starvation of low priority masters RR ◦ Masters allowed to access bus in a round-robin manner ◦ No starvation – every master guaranteed bus access ◦ Inefficient if masters have vastly different data injection rates ◦ High latency for critical data streams 31© 2008 Sudeep Pasricha & Nikil Dutt
Arbitration Schemes TDMA ◦ Time division multiple access ◦ Assign slots to masters based on BW requirements ◦ If a master does not have anything to read/write during its time slots, leads to low performance ◦ Choice of time slot length and number critical TDMA/RR ◦ Two-level scheme ◦ If master does not need to utilize its time slot, second level RR scheme grants access to another waiting master ◦ Better bus utilization ◦ Higher implementation cost for scheme (more logic, area) 32© 2008 Sudeep Pasricha & Nikil Dutt
Arbitration Schemes Dynamic priority ◦ Dynamically vary priority of master during application execution ◦ Gives masters with higher injection rates a higher priority ◦ Requires additional logic to analyze traffic at runtime ◦ Adapts to changing data traffic profiles ◦ High implementation cost (several registers to track priorities and traffic profiles) Programmable priority ◦ Simpler variant of dynamic priority scheme ◦ Programmable register in arbiter allows software to change priority 33© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Single Non-pipelined Transfer ◦ Simplest transfer mode first request for access to bus from arbiter on being granted access, set address and control signals Send/receive data in subsequent cycles 34© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Pipelined Transfer ◦ Overlap address and data phases Only works if separate address and data busses are present 35© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Non-pipelined Burst Transfer ◦ Send multiple data items, with only a single arbitration for entire transaction master must indicate to arbiter it intends to perform burst transfer Saves time spent requesting for arbitration 36© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Pipelined Burst Transfer ◦ Useful when separate address and data buses available ◦ Reduces data transfer latency 37© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Split Transfer ◦ If slaves take a long time to read/write data, it can prevent other masters from using the bus ◦ Split transfers improve performance by ‘splitting’ a transaction Master sends read request to slave Slave relinquishes control of bus as it prepares data Arbiter can grant bus access to another waiting master Allows utilizing otherwise idle cycles on the bus When slave is ready, it requests bus access from arbiter On being granted access, it sends data to master ◦ Explicit support for split transfers required from slaves and arbiters (additional signals, logic) 38© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Out-of-Order Transfer ◦ Allows multiple transfers from different masters, or even from the same master, to be SPLIT by a slave and be in progress simultaneously on a single bus ◦ Masters can initiate data transfers without waiting for earlier data transfers to complete ◦ Allows better parallelism, performance in buses ◦ Additional signals are needed to transmit IDs for every data transfer in the system ◦ Master interfaces need to be extended to handle data transfer IDs and be able to reorder received data ◦ Slave interfaces have out-of-order buffers for reads, writes, to keep track of pending transactions, plus logic for processing IDs Any application typically has a limited buffer size beyond which performance doesn’t increase 39© 2008 Sudeep Pasricha & Nikil Dutt
Bus Data Transfer Modes Broadcast Transfer ◦ Every time a data item is transmitted over a bus, it is physically broadcast to every component on the bus ◦ Useful for snooping and cache coherence protocols ◦ Example: when several components on bus have a private cache fed from a single memory, a problem arises when the memory is updated when a cache line is written to memory by a component ◦ It is essential that private caches of the components on the bus invalidate (or update) their cache entries to prevent reading incorrect values ◦ Broadcasting allows address of the memory location (or cache line) being updated to be transmitted to all the components on the bus, so they can invalidate (or update) their local copies 40© 2008 Sudeep Pasricha & Nikil Dutt
Physical implementation issues for bus wires Bus wires are implemented as long metal lines on a silicon wafer ◦ transmitting data using electromagnetic waves (finite speed limit) As application performance requirements increase, clock frequencies are also increasing ◦ Greater bus clock frequency = shorter bus clock period 100 MHz = 10 ns ; 500 MHz = 2 ns Time allowed for a signal on a bus to travel from source to destination in a single bus clock cycle is decreasing Can take multiple cycles to send a signal across a chip ◦ 6-10 bus clock cycles @ 50 nm ◦ unpredictability in signal propagation time has serious consequences for performance and correct functioning of synchronous digital circuits 41© 2008 Sudeep Pasricha & Nikil Dutt
Physical implementation issues for bus wires Partition long bus wires into shorter ones ◦ Hierarchical or split bus communication architectures ◦ Register slices or buffers to pipeline long bus wires enable signal to traverse a segment in a single clock cycle Asynchronous buses ◦ No clock signal Low level techniques ◦ Inserting repeaters or using fat wires 42© 2008 Sudeep Pasricha & Nikil Dutt
Buses in the DSM era With CMOS process technology scaling below 90 nm, SoCs have entered the DSM era ◦ High levels of component integration ◦ High clock frequencies ◦ Low signal voltages Buses significantly impacted by DSM effects ◦ Signal integrity issues scenario where the received signal at the destination is different from the transmitted signal at the source driver noise caused due to following factors crosstalk external electromagnetic interference transmission line effects soft errors 43© 2008 Sudeep Pasricha & Nikil Dutt
DSM Effects Crosstalk ◦ Phenomenon of noise being caused on a signal A due to the coupling with another signal B due to the close proximity of bus wires near-field electromagnetic coupling causes inductive and capacitive crosstalk on bus signals ◦ Even when wires are far apart, crosstalk can still occur due to coupling facilitated by common substrate, shared power supply or ground, or a shared signal return path ◦ As wires become narrower with scaling and clock frequencies increase, fringing field effects and inductance effects become larger for wires higher inductive and capacitive crosstalk 44© 2008 Sudeep Pasricha & Nikil Dutt
DSM Effects Electromagnetic interference (EMI) ◦ Phenomenon of large external electric and magnetic fields coupling into circuits and creating unwanted noise ◦ EMI due to external and internal coupling is expected to increase with evolving technology As highly integrated, portable wireless communication SoCs increasingly consist of analog, RF, and digital circuits ◦ Long on-chip buses in particular will be the sources and receptors of EMI noise 45© 2008 Sudeep Pasricha & Nikil Dutt
DSM Effects Transmission line effects ◦ when a wire is longer than 1/10 of the wavelength of the signal frequency component that is transmitted, the wave nature of the propagated signal must be modeled otherwise significant errors may result. ◦ wires will thus have to be modeled as transmission lines to avoid errors during signal analysis ◦ Discontinuities in these transmission lines can result in impedence mismatches due to various factors such as capacitive loads, vias, wire bends, package pins, crossover wires, and non-ideal receivers ◦ Such mismatches will create noise as a result of signal reflections at the discontinuities 46© 2008 Sudeep Pasricha & Nikil Dutt
DSM Effects Soft Errors ◦ Phenomenon of spurious pulses and interference with signals on buses ◦ Caused by collision of thermal neutrons produced by the decay of cosmic ray showers alpha particles produced by impurities in the substrate ◦ Highly integrated SoCs will be particularly susceptible to soft errors 47© 2008 Sudeep Pasricha & Nikil Dutt
DSM Effects Harder to guarantee error-free data transfers on buses Reduced signal swings in DSM technologies will result in a further reduction of voltage noise margins ◦ Greater probability of transmission errors in the presence of even the smallest sources of noise Many other factors will further complicate bus design ◦ increasing wire resistance due to skin effect at high frequencies ◦ increasing number of metal layers that increase cross-layer coupling ◦ timing errors due to jitters Emerging tools and methodologies for on-chip communication architecture design must predict and address DSM issues early in the design flow ◦ instead of finding and fixing the problems in post-layout 48© 2008 Sudeep Pasricha & Nikil Dutt
Summary SoC complexity is increasing rapidly, due to ◦ Digital convergence ◦ Process technology shrinking into DSM era On-chip communication architectures are critical components in SoC designs ◦ To meet power, performance, cost, reliability constraints ◦ Also rapidly increasing in complexity with increasing no. of cores Reviewed basic concepts of (widely used) bus-based communication architectures Open Problems ◦ Designing communication architectures to satisfy diverse and complex application constraints ◦ Predicting and estimating DSM issues early in a design flow © 2008 Sudeep Pasricha & Nikil Dutt49
Architectural and Physical Design Optimization for Efficient Intra-Tile Communication Liza Rodriguez Aurelio Morales EEL Embedded Systems Dept.
By Nasir Mahmood. The NoC solution brings a networking method to on-chip communication.
COMPUTER ORGANIZATIONS CSNB123 Ver.1Systems and Networking1.
ROHAIL EEE440 Computer Architecture. Program Concept Hardwired systems are inflexible General purpose hardware can do different tasks, given correct control.
Topics covered: Input/Output Organization CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
Computer Architecture Lecture 08 Fasih ur Rehman.
Lecture 7. AMBA Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.
Top Level View of Computer Function and Interconnection.
Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 26 Bus Interconnect May 7, 2001
Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)
Chapter 4 MARIE: An Introduction to a Simple Computer.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
DEVICES AND COMMUNICATION BUSES FOR DEVICES NETWORK.
August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
JAZiO ™ IncorporatedPlatform JAZiO ™ Supplemental SupplementalInformation.
Chapter 3 System Buses. Hardwired systems are inflexible General purpose hardware can do different tasks, given correct control signals Instead.
1 (Based on text: David A. Patterson & John L. Hennessy, Computer Organization and Design: The Hardware/Software Interface, 3 rd Ed., Morgan Kaufmann,
William Stallings Computer Organization and Architecture 6 th Edition Chapter 3 System Buses (revised 9/7/02)
I/O Organization. I/O Organization interfacing Techniques I/O interfacing Techniques I/O devices can be interfaced to a computer system in two ways 1)
1 Input-Output Organization Computer Organization Computer Architectures Lab Peripheral Devices Input-Output Interface Asynchronous Data Transfer Modes.
William Stallings Computer Organization and Architecture 7 th Edition Chapter 3 System Buses.
Peripheral Devices Input-Output Interface Asynchronous Data Transfer Modes of Transfer Priority Interrupt Direct Memory Access Input-Output Processor Serial.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Ob-Chip Networks1 On-Chip Networks and Testing. Ob-Chip Networks2 Basic Network Architectures Shared Medium - Buses Direct Networks (More follows) Indirect.
Network on Chip - Architectures and Design Methodology Natt Thepayasuwan Rohit Pai.
Computer Architecture Lecture 31 Fasih ur Rehman.
William Stallings Computer Organization and Architecture 8 th Edition Chapter 3 Top Level View of Computer Function and Interconnection Lecture 3 and 4.
+ William Stallings Computer Organization and Architecture 10 th Edition © 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with Buses & Interconnects.
Lecture 12 Today’s topics –CPU basics Registers ALU Control Unit –The bus –Clocks –Input/output subsystem 1.
Computer Architecture System Interface Units Iolanthe II approaches Coromandel Harbour.
1 1998 Morgan Kaufmann Publishers Interfacing Processors and Peripherals.
COMPUTER ORGANIZATIONS CSNB123. COMPUTER ORGANIZATIONS CSNB123 Expected Course Outcome #Course OutcomeCoverage 1Explain the concepts that underlie modern.
Computer Organization and Architecture Internal Memory Chapter 5.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the FPX.
Computer Organization and Architecture William Stallings 8th Edition Chapter 3 Top Level View of Computer Function and Interconnection.
William Stallings Computer Organization and Architecture 8 th Edition Chapter 3 Top Level View of Computer Function and Interconnection.
CPU BASICS, THE BUS, CLOCKS, I/O SUBSYSTEM Philip Chan.
Processor Architecture Chapter 4. Better utilization of Memory There are different concepts with reference to memory organization. We have already discussed:
Computer Architecture. Semiconductor Main Memory The basic element of a semiconductor memory is the memory cell. Its main properties are: They exhibit.
Introduction to Computing: Lecture 4 Computer Systems.
CH03 System Buses Computer Components Computer Function Interconnection Structures Bus Interconnection PCI CH03 TECH Computer Science.
Field Programmable Port Extender (FPX) 1 Modular Design Techniques for the Field Programmable Port Extender John Lockwood and David Taylor Washington University.
3D Graphics Content Over OCP Martti Venell Sr. Verification Engineer Bitboys.
© 2017 SlidePlayer.com Inc. All rights reserved.