We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byJared Ruddick
Modified over 2 years ago
On-Chip Communication Architectures Standards ICS 295 Sudeep Pasricha and Nikil Dutt Slides based on book chapter 3 1© 2008 Sudeep Pasricha & Nikil Dutt
Outline Why Standards? On-chip standard bus architectures AMBA 2.0/3.0 IBM CoreConnect STMicroelectronics STBus Sonics Smart Interconnect Socket based on-chip bus interface standards OCP-IP 2© 2008 Sudeep Pasricha & Nikil Dutt
Why Standards? SoC components (IPs) have an interface to the outside world consisting of a set of pins responsible for sending/receiving addresses, data, control Number and functionality of pins must adhere to a specific interface standard Important for seamless integration of SoC IPs – helps avoid integration mismatches e.g. 1 - connecting IP with 32 data pins to a 30 bit data bus e.g. 2 - connecting IP supporting data bursts to a bus with no burst support Mismatches require development of logic wrappers at IP interfaces to ensure correct data transfers time consuming to create, reduce performance, take up area 3© 2008 Sudeep Pasricha & Nikil Dutt
Why Standards? Interface standards define a specific data transfer protocol decide number and functionality of pins at IP interfaces make it easy to connect diverse IPs quickly Two categories of standards for SoC communication: Standard bus architectures define interface between IPs and bus architecture define (at least some) specifics of bus architecture that implements data transfer protocol Socket based bus interface standards define interface between IPs and bus architecture freedom w.r.t choice and implementation of bus architecture Ideally, designers want one standard to interconnect all IPs In reality, several competing standards have emerged 4© 2008 Sudeep Pasricha & Nikil Dutt
5 Standard Bus Architectures AMBA 2.0, 3.0 (ARM) CoreConnect (IBM) Sonics Smart Interconnect (Sonics) STBus (STMicroelectronics) Wishbone (Opencores) Avalon (Altera) PI Bus (OMI) MARBLE (Univ. of Manchester) CoreFrame (PalmChip) … widely used
6 Standard Bus Architectures AMBA 2.0, 3.0 (ARM) CoreConnect (IBM) Sonics Smart Interconnect (Sonics) STBus (STMicroelectronics) Wishbone (Opencores) Avalon (Altera) PI Bus (OMI) MARBLE (Univ. of Manchester) CoreFrame (PalmChip) …
AMBA 2.0 7© 2008 Sudeep Pasricha & Nikil Dutt
AHB Basic Transfer 8© 2008 Sudeep Pasricha & Nikil Dutt Split ownership of Address and Data bus
AHB Basic Transfer 9© 2008 Sudeep Pasricha & Nikil Dutt Data transfer with slave wait states
AHB Pipelining 10© 2008 Sudeep Pasricha & Nikil Dutt Transaction pipelining increases bus bandwidth
AHB Architecture 11© 2008 Sudeep Pasricha & Nikil Dutt centralized arbitration / decode 1 unidirectional address bus (HADDR) 2 unidirectional data buses (HWDATA, HRDATA) At any time only 1 active data bus
AHB Arbitration 12© 2008 Sudeep Pasricha & Nikil Dutt HBREQ_M1 HBREQ_M2 HBREQ_M3 Arbiter Arbitration protocol is specified, but not the arbitration policy
Cost of Arbitration in AHB 13© 2008 Sudeep Pasricha & Nikil Dutt Time for arbitration Time for handshaking
AHB Pipelined Burst Transfers 14© 2008 Sudeep Pasricha & Nikil Dutt Bursts cut down on arbitration, handshaking time, improving performance
AHB Burst Types 15© 2008 Sudeep Pasricha & Nikil Dutt Fixed length bursts Incremental bursts access sequential locations e.g. 0x64, 0x68, 0x6C, 0x70 for INCR4, transferring 4 byte data Wrapping bursts wrap around address if starting address is not aligned to total no. of bytes in transfer e.g. 0x64, 0x68, 0x6C, 0x60 for WRAP4, transferring 4 byte data
AHB Control Signals 16© 2008 Sudeep Pasricha & Nikil Dutt Transfer direction HWRITE – write transfer when high, read transfer when low Transfer size HSIZE[2:0] indicates the size of the transfer
AHB Control Signals 17© 2008 Sudeep Pasricha & Nikil Dutt Protection control HPROT[3:0], provide additional information about a bus access
AHB Split Transfers 18© 2008 Sudeep Pasricha & Nikil Dutt Improves bus utilization May cause deadlocks if not carefully implemented
AHB Bus Matrix Topology In addition to shared bus and hierarchical bus, AHB can be implemented as a bus matrix 19© 2008 Sudeep Pasricha & Nikil Dutt
APB State Diagram 20© 2008 Sudeep Pasricha & Nikil Dutt When AHB wants to drive a transfer One cycle penalty for APB peripheral address decoding Transfer occurs here no (multi-cycle) bursts, pipelined transfers
AHB-APB Bridge 21© 2008 Sudeep Pasricha & Nikil Dutt AHB signals High performance Low power (and performance)
AMBA 3.0 Introduces AXI high performance protocol Support for separate read address, write address, read data, write data, write response channels Out of order (OO) transaction completion Fixed mode burst support Useful for I/O peripherals Advanced system cache support Specify if transaction is cacheable/bufferable Specify attributes such as write-back/write-through Enhanced protection support Secure/non-secure transaction specification Exclusive access (for semaphore operations) Register slice support for high frequency operation 22© 2008 Sudeep Pasricha & Nikil Dutt
AHB vs. AXI Burst AHB Burst Address and Data are locked together (single pipeline stage) HREADY controls intervals of address and data 23© 2008 Sudeep Pasricha & Nikil Dutt AXI Burst One Address for entire burst
AHB vs. AXI Burst 24© 2008 Sudeep Pasricha & Nikil Dutt AXI Burst Simultaneous read, write transactions Better bus utilization
AXI Out of Order Completion With AHB If one slave is very slow, all data is held up SPLIT transactions provide very limited improvement 25© 2008 Sudeep Pasricha & Nikil Dutt With AXI Burst Multiple outstanding addresses, out of order (OO) completion allowed Fast slaves may return data ahead of slow slaves
Register Slices for Max Frequency 26© 2008 Sudeep Pasricha & Nikil Dutt Register slices can be applied across any channel Allows maximum frequency of operation by matching channel latency to channel delay Allows system topology to be matched to performance requirements WREADY WID WDATA WSTRB WLAST WVALID
Summary: AHB vs. AXI 27© 2008 Sudeep Pasricha & Nikil Dutt
28 Standard Bus Architectures AMBA 2.0, 3.0 (ARM) CoreConnect (IBM) Sonics Smart Interconnect (Sonics) STBus (STMicroelectronics) Wishbone (Opencores) Avalon (Altera) PI Bus (OMI) MARBLE (Univ. of Manchester) CoreFrame (PalmChip) …
IBM CoreConnect 29© 2008 Sudeep Pasricha & Nikil Dutt PLB Pipelined Burst modes Split transactions Multiple masters OPB Low bandwidth Burst mode Multiple Masters DCR Low throughput 1 r/w = 2 cycles Ring type data bus
Processor Local Bus (PLB) High performance synchronous bus Shared address, separate read and write data buses Support for 32-bit address, 16, 32, 64, and 128-bit data bus widths Dynamic bus sizingbyte, half-word, word, and double-word transfers Up to 16 masters and any number of slaves AND–OR implementation structure Variable or fixed length (16-64 byte) burst transfers Pipelined transfers SPLIT transfer support Overlapped read and write transfers (up to 2 transfers per cycle) Centralized arbiter Locked transfer support for atomic accesses 30© 2008 Sudeep Pasricha & Nikil Dutt
PLB Transfer Phases 31© 2008 Sudeep Pasricha & Nikil Dutt Address and data phases are decoupled
Overlapped PLB Transfers 32© 2008 Sudeep Pasricha & Nikil Dutt PLB allows address and data buses to have different masters at the same time
PLB Arbiter Bus Control Unit each master drives a 2-bit signal that encodes 4 priority levels in case of a tie, arbiter uses static or RR scheme Timer pre-empts long burst masters ensures high priority requests served with low latency 33© 2008 Sudeep Pasricha & Nikil Dutt
On-chip Peripheral Bus (OPB) Synchronous bus to connect low performance peripherals and reduce capacitive loading on PLB Shared address bus, multiple data buses Up to a 64-bit address bus width 32- or 64-bit read, write data bus width support Support for multiple masters Bus parking (or locking) for reduced transfer latency Sequential address transfers (burst mode) Dynamic bus sizingbyte, half-word, word, double-word transfers MUX-based (or AND–OR) structural implementation. Single cycle data transfer between OPB masters and slaves. Timeout capability to guarantee low latency for high priority xfers 34© 2008 Sudeep Pasricha & Nikil Dutt
Device Control Register (DCR) Bus Low speed synchronous bus, used for on-chip device configuration purposes meant to off-load the PLB from lower performance status and control read and write transfers 10-bit, up to 32-bit address bus 32-bit read and write data buses 4-cycle minimum read or write transfers Slave bus timeout inhibit capability Multi-master arbitration Privileged and non-privileged transfers Daisy-chain (serial) or distributed-OR (parallel) bus topologies 35© 2008 Sudeep Pasricha & Nikil Dutt
36 Standard Bus Architectures AMBA 2.0, 3.0 (ARM) CoreConnect (IBM) Sonics Smart Interconnect (Sonics) STBus (STMicroelectronics) Wishbone (Opencores) Avalon (Altera) PI Bus (OMI) MARBLE (Univ. of Manchester) CoreFrame (PalmChip) …
Sonics Smart Interconnect Consists of 3 synchronous bus-based interconnect specifications SonicsMX high performance interconnect fabric SonicsLX high performance interconnect fabric, but with less advanced features Synapse 3220 peripheral interconnect designed to connect slower peripheral components 37© 2008 Sudeep Pasricha & Nikil Dutt
SonicsMX High performance synchronous bus fabric Pipelined, non-blocking, multi-threaded communication support Split/outstanding transactions for high performance Configurable data bus width: 32, 64, or 128 bits Socket-based connection support, using native OCP 2.0 interface Bandwidth and latency-based arbitration schemes to obtain desired quality of service (QoS) for threads Register points (RPs) for pipelining long interconnects and providing timing isolation Protection mode support Advanced error handling support Fine-grained power management support 38© 2008 Sudeep Pasricha & Nikil Dutt
SonicsMX Topology SonicsMX supports full crossbar, partial crossbar, and shared bus topology 39© 2008 Sudeep Pasricha & Nikil Dutt
SonicsMX Arbitration Weighted QoS available bandwidth distributed among masters based on ratio of bandwidth weights configured for each master Priority QoS extends bandwidth-based scheme above 1-2 threads are assigned a static priority (guaranteed service) Other threads assigned bandwidth weights (best effort) Controlled QoS dynamically switches between three arbitration schemes based on traffic characteristics Static priority (guaranteed service) Bandwidth weighted scheme (best-effort) Guaranteed Bandwidth allocation (guaranteed service) 40© 2008 Sudeep Pasricha & Nikil Dutt
SonicsLX 41© 2008 Sudeep Pasricha & Nikil Dutt High performance synchronous bus fabric subset of SonicsMX feature set pipelined, multithreaded, non-blocking communication support weighted and priority QoS modes SPLIT transactions
Synapse 3220 Synchronous bus targeted at low bandwidth, physically dispersed peripheral slave cores 42© 2008 Sudeep Pasricha & Nikil Dutt
Synapse 3220 Features Up to 4 masters and 63 slaves Up to 24-bit configurable address bus Configurable data bus widths8, 16, 32 bits Fair arbitration scheme, with high priority allowed for a single initiator thread Power management interface Exclusive (semaphore) access support Error detection and recoverywatchdog timer to identify unresponsive peripherals Protection mode support 43© 2008 Sudeep Pasricha & Nikil Dutt
44 Standard Bus Architectures AMBA 2.0, 3.0 (ARM) CoreConnect (IBM) Sonics Smart Interconnect (Sonics) STBus (STMicroelectronics) Wishbone (Opencores) Avalon (Altera) PI Bus (OMI) MARBLE (Univ. of Manchester) CoreFrame (PalmChip) …
STBus Consists of 3 synchronous bus-based interconnect specifications Type 1 Simplest protocol meant for peripheral access Type 2 More complex protocol Pipelined, SPLIT transactions Type 3 Most advanced protocol OO transactions, transaction labeling/hints 45© 2008 Sudeep Pasricha & Nikil Dutt
Type 1 Simple handshake mechanism 32-bit address bus Data bus sizes of 8, 16, 32, 64 bits Similar to IBM CoreConnect DCR bus 46© 2008 Sudeep Pasricha & Nikil Dutt
Type 2 Supports all Type 1 functionality Pipelined transfers SPLIT transactions Data bus sizes up to 256 bits Compound operations READMODWRITE Returns read data and locks slave till same master writes to location SWAP Exchanges data value between master and slave FLUSH/PURGE Ensure coherence between local and main memory USER Reserved for user defined operations 47© 2008 Sudeep Pasricha & Nikil Dutt
Type 3 Supports all Type 2 functionality OO transaction completion Requires only single response/ACK for multiple data transfers (burst mode) 48© 2008 Sudeep Pasricha & Nikil Dutt
STBus All types have MUX-based implementation Shared, partial or full crossbar implementation 49© 2008 Sudeep Pasricha & Nikil Dutt
STBus Arbitration Static priority Non-preemptive Programmable priority Latency based Each master has register with max. allowed latency (in clock cycles) If value is 0, master must be granted bus access as soon as it requests it Each master also has counter loaded with max. latency value when master makes request Master counters are decremented at every subsequent cycle Arbiter grants access to master with lowest counter value In case of a tie, static priority is used 50© 2008 Sudeep Pasricha & Nikil Dutt
STBus Arbitration Bandwidth based Similar to TDMA/RR scheme STB Hybrid of latency based and programmable priority schemes In normal mode, programmable priority scheme is used Masters have max. latency registers, counters (latency based scheme) Each master also has an additional latency-counter-enable bit If this bit is set, and counter value is 0, master is in panic state If one or more masters in panic state, programmable priority scheme is overridden, and panic state masters granted access Message based Pre-emptive static priority scheme 51© 2008 Sudeep Pasricha & Nikil Dutt
Socket-based Interface Standards Defines the interface of components Does not define bus architecture implementation Shield IP designer from knowledge of interconnection system, and enable same IP to be ported across different systems Requires Adaptor components to interface with implementation 52© 2008 Sudeep Pasricha & Nikil Dutt
Socket-based Interface Standards Must be generic, comprehensive, and configurable to capture basic functionality and advanced features of a wide array of bus architecture implementations Adaptor (or translational) logic component Must be created only once for each implementation (e.g. AMBA) – adds area, performance penalties, more design time + enhances reuse, speeds up design time across many designs Commonly used socket-based interface standards Open Core Protocol (OCP) ver 2.0 Most popular – used in Sonics Smart Interconnect VSIA Virtual Component Interface (VCI) Subset of OCP DTL Proprietary 53© 2008 Sudeep Pasricha & Nikil Dutt
OCP 2.0 Point-to-point synchronous interface Bus architecture independent Configurable data flow (address, data, control) signals for area-efficient implementation Configurable sideband signals to support additional communication requirements Pipelined transfer support Burst transfer support OO transaction completion support Multiple threads 54© 2008 Sudeep Pasricha & Nikil Dutt
OCP 2.0 Signals Dataflow Basic signals Simple extensions e.g. byte enables, data byte parity, error correction codes, etc. Burst extensions e.g. length, type (WRAP/INCR), pack/unpack, ACK requirements etc. Tag extensions Assign IDs to transactions for reordering support Thread extensions Assign IDs to threads for multi-threading support Sideband (optional) Not part of the dataflow process Convey control and status information such as reset, interrupt, error, and core-specific flags Test (optional) add support for scan, clock control, and IEEE (JTAG) 55© 2008 Sudeep Pasricha & Nikil Dutt
OCP 2.0 Protocol Hierarchy Data flow signals combined into groups of request signals, response signals and data handshake signals Groups map one-on-one to their corresponding protocol phases (request, response, handshaking) Different combinations of protocol phases are used by different types of transfers (e.g. single request/multiple data burst) Burst transactions are comprised of a set of transfers linked together having a defined address sequence and no. of transfers 56© 2008 Sudeep Pasricha & Nikil Dutt
OCP 2.0 Profiles OCP 2.0 specifies pre-defined configurations of interface called profiles consist of OCP interface signals, specific protocol features, and application guidelines Two sets of profiles are provided Profiles for new IP cores implementing native OCP interfaces Block data flow Sequential undefined length data flow (streaming access) Register access Profiles for designers of bridges between OCP & other bus protocols Simple H-bus X-bus packet write X-bus packet read 57© 2008 Sudeep Pasricha & Nikil Dutt
Example: SoC with Mixed Profiles 58© 2008 Sudeep Pasricha & Nikil Dutt
Summary Standards important for seamless integration of SoC IPs avoid costly integration mismatches Two categories of standards for SoC communication: Standard bus architectures define interface between IPs and bus architecture define (at least some) specifics of bus architecture that implements data transfer protocol e.g. AMBA 2.0/3.0, Coreconnect, Sonics Smart Interconnect, STBus Socket based bus interface standards define interface between IPs and bus architecture do not define bus architecture implementation specifics e.g. OCP 2.0 Open Issue: Robust standards for DSM-aware communication © 2008 Sudeep Pasricha & Nikil Dutt59
PradeepKumar S K Asst. Professor Dept. of ECE, KIT, TIPTUR. PradeepKumar S K, Asst.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
×1= 9 4 1×1= 1 5 8×1= 8 6 7×1= 7 7 8×3= 24.
Copyright © Action Works 2008 All Rights Reserved - Photos by David D. Kempster 1.
The Bus Architecture of Embedded System ESE 566 Report 1 LeTian Gu.
PP Test Review Sections 6-1 to 6-6 Mrs. Rivas 1. 2.
Threads, SMP, and Microkernels Chapter 4 1. Process Resource ownership - process includes a virtual address space to hold the process image Scheduling/execution-
1 Chapter 11 I/O Management and Disk Scheduling Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 6 Processes and Operating Systems.
1 Dynamic Interconnection Networks Buses CEG 4131 Computer Architecture III Miodrag Bolic.
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
13:00 Clock will move after 1 minute PPT – VCIC Timer 15.ppt.
DLMSO Classroom Timer Select a time to count down from the clock above 60 min 45 min 30 min 20 min 15 min 10 min 5 min or less.
1 Chapter 3 Logic Gates. 2 Inverter 3 Inverter Truth Table.
Break Time Remaining 10:00. Break Time Remaining 9:59.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Mr. Deven Patel, AITS, Rajkot. 1 Process Description and Control Chapter 3.
Cs 325 Bus.1 Bus Design. cs 325 Bus.2 What is a bus? °Slow vehicle that many people ride together well, true... °A bunch of wires...
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 7 Microarchitecture.
This material exempt per Department of Commerce license exception TSU Zynq 14.2 Version Creating and Adding Custom IP © Copyright 2012 Xilinx.
Figure 12–1 Basic computer block diagram. Thomas L. Floyd Digital Fundamentals, 9e Copyright ©2006 by Pearson Education, Inc. Upper Saddle River, New Jersey.
1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.
Lecture 7. AMBA Prof. Taeweon Suh Computer Science & Engineering Korea University COMP427 Embedded Systems.
BMU - E I 1 Development of renewable energy sources in Germany in
Unit I Topic 2-7 MAC Protocols for Ad Hoc Wireless Networks Department of Computer Science and Engineering Kalasalingam University 1 CSE 6007 Mobile Ad.
1 I/O Management and Disk Scheduling Chapter Categories of I/O Devices Human readable –Used to communicate with the user –Printers –Video display.
Hash Tables Briana B. Morrison Adapted from William Collins.
CALENDAR NEW CALENDAR
Adding Up In Chunks. Category 1 Adding multiples of ten to any number.
1 Budapest University of Technology and Economics, BME, 1872 Budapest University of Technology and Economics, BME, 1872 Happy New Year 2012.
BMU – KI III 1 Development of renewable energy sources in Germany in
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Chapter 13 Fluids Physics for Scientists & Engineers, 3 rd Edition Douglas C. Giancoli © Prentice Hall.
OCP: Open Core Protocol Marta Posada ESA/ESTEC June 2006.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 Sequential Logic Design.
David Burdett May 11, 2004 Package Binding for WS CDL.
AP STUDY SESSION 2. Answers 1.A 2.E 3.A 4.D 5.B 6.E 7.B 8.E 9.A 10.D 11.C 12.B 13.D 14.B 15.E 16.A 17.E 18.C 19.C 20.D 21.B 22.C 23.A 24.D 25. B 26. E.
Spring 2007W. Rhett DavisNC State UniversityECE 747Slide 1 ECE 747 Digital Signal Processing Architecture SoC Lecture – Working with Buses & Interconnects.
Topics covered: Input/Output Organization CSE243: Introduction to Computer Architecture and Hardware/Software Interface.
6-April 06 by Nathan Chien. PCI System Block Diagram.
1A B C1A B C 2A B C2A B C 3A B C3A B C 4A B C4A B C 5A B C5A B C 6A B C6A B C 7A B C7A B C 8A B C8A B C 9A B C9A B C 10 AA B CBC 11 AA B CBC 12 AA B CBC.
Operating Systems Operating Systems - Winter 2012 Chapter 4 – Memory Management Vrije Universiteit Amsterdam.
C Copyright © 2005, Oracle. All rights reserved. Practice Solutions.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Time for a BREAK! You have 45 Minutes. Time Left 44.
© 2017 SlidePlayer.com Inc. All rights reserved.