Commercial and Open SoC busses

Commercial and Open SoC busses
AMBA bus, PI bus, IBM Core Connect ST bus, Wishbone OCP-IP Updated for 2014

Outline Introduction to bus architectures AMBA bus PI bus
IBM Core Connect ST bus Wishbone OCP-IP Virtual Socket Interface Alliance (VSIA) Nios II Avalon Bus + Sonic Smart Interconnect (Sonic) + Avalon (Altera) + MARBLE (University of Manchester) + CoreFrame (palmChip) ... Extra material

Introduction Bus = shared wires by multiple communicating units
Introduction Bus = shared wires by multiple communicating units Connection logic to avoid electrical conflicts Arbitration to determine bus ownership in time Protocols – set of rules for transmitting information between multiple units Buses designed for PCs and PCBs are not suitable for SoC-s Designed for backplane Limited speed Limited number of signals A bus can be though of as a corridor connecting multiple rooms. If some one is moving from room A to room B, all other doors have to be closed. [

Bus protocol the type and order of data being sent;
Bus protocol the type and order of data being sent; how the sending device indicates that it has ﬁnished sending the information; the data compression method used, if any; how the receiving device acknowledges successful reception of the information; and how arbitration is performed to resolve contention on the bus and in what priority, and the type of error checking to be used. COMPUTER SYSTEM DESIGN System-on-Chip Michael J. Flynn; Wayne Luk Wiley 2011 [Flynn 2011]

A bus might deliver power to peripherals
Bus-based approach Possible hierarchy of buses to optimize system-level performance and cost Address Data Control Format conversion Segmentation Buffering Power A bus bridge is a module that connects together two buses, which are not necessarily of the same type. Format conversion Segmenting traffic (concurrent usage of buses) Buffering transactions bw buses (allowing proceeding on with next transaction faster) Bus can deliver power too! A bus might deliver power to peripherals [Flynn 2011] Cores or IPs

Memory access time for the first word
Bus architectures Unified or split (address and data) Simple with request acknowledgement signals Bus with arbitration support Tenured split bus (bus is occupied only during associated address or data cycles) Local buffers to record address (addresses) Memory access time for the first word [Flynn 2011]

Bus Architectures Technology AMBA AXI (AMBA 3) CoreConnect
Bus Architectures Technology AMBA AXI (AMBA 3) CoreConnect Smart Interconnect IP Nexus Company ARM IBM Sonics Fulcrum** Core type Soft/hard Soft Hard Architecture Bus Unidirectional channels NOC using direct switch Bus width 32/64/128 16 Frequency 200 MHz 400 MHz* MHz 300 MHz 1 GHz Maximum BW (GB/s) 3 6.4* 4.8 72 Minimum latency (ns) 5 2.5* 15 n/a 2 AXI (Advanced eXtensible Interface) was introduced since AMBA3, burst mode Protocol: • be suitable for high-bandwidth and low-latency designs • enable high-frequency operation without using complex bridges • meet the interface requirements of a wide range of components • be suitable for memory controllers with high initial access latency • provide flexibility in the implementation of interconnect architectures • be backward-compatible with existing AHB and APB interfaces. * As implemented in the ARM PL330 high-speed controller. **Fulcrum was acquired by Intel in 2011 [Flynn 2011]

Outline PI bus AMBA bus IBM Core Connect ST bus Wishbone OCP-IP
Virtual Socket Interface Alliance (VSIA) Nios II Avalon Bus

Advance Microcontroller Bus Architecture (AMBA)
Develped by ARM in 1996 Distinct busses in AMBA specification: Advanced High performance Bus (AHB) Advanced Peripheral Bus (APB) [Advanced System Bus (ASB) – designed for lower performance systems, outdated] AXI - Advanced Extensible Interface (since AMBA 3) AXI - Advanced Extensible Interface [Wikipedia] AMBA was introduced by ARM in The first AMBA buses were Advanced System Bus (ASB) and Advanced Peripheral Bus (APB). In its second version, AMBA 2, ARM added AMBA High-performance Bus (AHB) that is a single clock-edge protocol. In 2003, ARM introduced the third generation, AMBA 3, including AXI to reach even higher performance interconnect and the Advanced Trace Bus (ATB) as part of the CoreSight on-chip debug and trace solution. In 2010 the AMBA 4 specifications were introduced starting with AMBA 4 AXI4, then in 2011[2] extending system wide coherency with AMBA 4 ACE. In 2013[3] the AMBA 5 CHI (Coherent Hub Interface) specification was introduced, with a re-designed high-speed transport layer and features designed to reduce congestion. AXI is used in ARM Cortex-A processors These protocols are today the de facto standard for 32-bit embedded processors because they are well documented and can be used without royalties. ARM [Flynn 2011]

AMBA AHB High-performance on-chip backbone bus
Connecting: processors, on-chip and off-chip memory interfaces DMA capability Bridge to APB bus Features: burst transfers, split transactions, single cycle bus master transfer Single clock edge (rising) operation and non-tristate (central multiplexer) implementation Central arbiter Tenured (address phase can occur during previous data phase Why multiplexer is better than tristate?

AMBA AHB bus transaction steps
AMBA AHB bus transaction steps Bus Master obtains access to the Bus Arbiter resolves simultaneous requests Bus Master initiates transfer, driving signals: Address, Width, Direction Burst options Bus Slave provides a Response Success | need for delay (wait states) | error Cycling in idle state or for transfer - bw setup (address decoding) and enable (actual transfer)

AMBA APB bus cycle APB bus is optimised for minimal power and low complexity (less performance) Used to interface to peripherals, which are low bandwidth Three state working diagram: Idle – Setup – Enable (actual transfer cycles)

AMBA Advanced Extensible Interface (AXI)
AXI4 - AMBA 4.generation for high-performance, high frequency Features: Unaligned data transfers using byte strobes Burst-based transactions Backward compatibele with AHB and APB interfaces Separate address/control and data phases Separated read and write data channels (providing low-cost DMA) Two channels for address (read, write) and control signals Additional write response channel for signaling completion of write transactions AXI protocol supports several types of bursts: normal memory access, wrapping cache line, streaming data to peripheral FIFOs Power management features ACE (Advance Cache Coherency Extensions) Out-of-Order transaction completion  5 channels + advanced cache support + exclusive access (semaphores) + register slicing (maximizes operation frequency by matching channel latency to channel delay AMBA AXI and ACE Protocol Specification. 306 pages.

AXI protocol Channel architecture of reads
AXI protocol Channel architecture of reads Channel architecture of writes Data channels can be: 8,16,32, … , 1024 bits wide Three system topologies: Shared address and data buses Shared address buses and multiple data buses Multilayer, with multiple address and data buses

AXI handshake VALID-READY signals
AXI handshake VALID-READY signals Each channel (address, data, response) has own handshake signal pair Transfer happens at clock edge T3 (Fig. 1 and 2) or when both are ready, at T2 (fastest), Fig. 3. There are more control signals in all channels! 1 2 Suggested doc:: AMBA AXI Protocol Specification, It is even possible that write data is ready before address (address might have mode registers before the bus) – alignment is necessary. 3

AXI burst transfer FIXED – address is same for every transfer in the burst (e.g. loading or unloading a FIFO) INCR – address for each transfer is an increment of the address for previous transfer (increment depends on the size of transfer) WRAP – similar to INCR but address wraps around to lower address when upper address limit is reached (used for cache line accesses) Example: Address 0x00, aligned, burst length – 4 transfers, transfer length – 32 bits Burst is limited (max. 16 transfers), transfer lengths up to 1…128 byte, defined by ARSIZE values Unaligned transfers Write strobes - indicate valid bytes in transfer Narrow transfers – transfers narrower than data bus: Fixed lanes (FIXED) different byte lanes (INCR or WRAP)

AXI burst and handshake examples
AXI burst and handshake examples Overlapping read bursts BVALID – Write response valid (channel is signalling) BREADY – Response ready (master can accept write response) Write transaction handshake dependencies One or another or both Both have to be present

AXI on Cortex-A microprocessor
AXI on Cortex-A microprocessor 64/128-bit Configurable AXI Bus dedicated read, write, and address channels up to 23 outstanding transactions with out-of-order completion AMBA® Designer interconnect design tool AXI4 protocol specification – 306 pages ETM - Embedded Trace Macrocell Cortex-A8

Zync-7000 AP SoC interfaces and signals
Zync-7000 AP SoC interfaces and signals The Zynq-7000 AP SoC contains a large number of fixed and flexible I/O. Zynq-7000 AP SoC has a constant 128 pins dedicated to memory interfaces (DDR I/O) MIO – Multiplexed I/O Zync k programmable logic cells GP – General Purpose HP – High Performance ACP – 64-bit Accelerator Coherency Port for asynchronous cache-coherent access [Crockett, Zync book 2014, p192]

Zync-7000 Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®-7 based programmable logic [Xilinx Web]

Zynq-7000 AP SoC [Zynq-7000 All Programmable SoC Technical Reference Manual UG585 (v1.11) September 27, 2016]

ARM CoreLinkTM System (IP)
Components and Methodology for Systems based on AMBA: DMC: DRAM Controller MMU: Memory Managing Unit for hardware-assisted core virtualisation (handling privileged and shared access from hypervisors) NIC: hierarchical low-power and low latency interconnect ARM® CoreLink™ Interconnect provide the components and the methodology for designers to build SoCs based on the AMBA® specifications There are three families of interconnect products: CoreLink CCN Cache Coherent Network - Designed for infrastructure applications. … for up to twelve CPU clusters (48 cores) supporting up to 32MB of L3 cache for highest compute density CoreLink CCI Cache Coherent Interconnect - Optimized for mobile. … for coherency with up to six clusters including big.LITTLE and future fully coherent GPU. Includes performance and efficiency benefits from integrated snoop filter CoreLink NIC Network Interconnect - Highly configurable for SoC wide connectivity, multiple applications. … is optimised to build the lowest latency, highest area applications for AMBA 4, AMBA 3 and AMBA 2. ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively battery-saving processor cores and slower (LITTLE) with relatively more powerful and power-hungry ones (big). Typically, only one "side" or the other will be active at once, but since all the cores have access to the same memory regions, workloads can be swapped between Big and Little cores on the fly CoreLink CCI-500 Cache Coherent Interconnect for coherency with up to four clusters including big.LITTLE and coherent accelerators, and higher performance and efficiency with integrated snoop filter. Optimized for mobile. Source: ARM.com Source: ARM.com

ARM CoreLinkTM System Components and Methodology for Systems based on AMBA: The ARM® CoreLink™ interconnect family from the home of AMBA® is the lowest risk solution for on-chip communication. Designed and tested with ARM Cortex and Mali processors, CoreLink interconnect from ARM provides balanced service for both low latency and high bandwidth data streams. Source: ARM.com

ARM CoreLinkTM System Security Features: Source: ARM.com

ARM CoreSightTM System
Debug Interface: Embedded Trace Macrocells (ETM) Instrumentation Trace Macrocell (ITM) System Trace Macrocell (STM) Trace Memory Controller (TMC) Source: ARM.com

ARM CoreSightTM System – Debug&Trace
CoreSightTM Debug Interface: Source: ARM.com

Amba 5 AXI5, ACE5 Design&Reuse blog by Phil Dworsky, Synopsys Feb. 08, Synopsys supports launch of Arm AMBA 5 AXI5, ACE5 protocols with 1st source code test suite and VIP

Outline AMBA bus PI bus IBM Core Connect ST bus Wishbone OCP-IP

Peripheral Interconnect (PI) bus
1994 Rev. 03d Copyright: Siemens AG 1994, Source:

Open on-chip bus standard Defined by Open Microprocessor Systems Initiative (ARM, SGS-Thomson, TEMIC-Matra MHS, Philips, Siemens) Synchronous and processor-independent shared bus system Memory-mapped data transfers Multiple masters, multiple slaves Bus arbiter periodically analyses requests from masters Free VHDL code September 11th, Five of Europe's major semiconductor companies - Advanced RISC Machines (ARM), Philips Semiconductors, SGS-THOMSON Microelectronics, Siemens and Temic/Matra MHS - have announced an agreement to licence a jointly developed on-chip bus protocol to other companies. The bus, known as the Peripheral Interconnect Bus (PI Bus), was developed by the five partners within the framework of a 3-year European Union ESPRIT Open Microprocessor Initiative (OMI) project. It is particularly suitable for use in very large scale integrated circuits using deep submicron technologies and modular architectures.

Processor independent Demultiplexed operation Clock - synchronous Peak transfer rate of 200 Mbytes/s 50 MHz bus clock) Address and data bus scalable (up to 32 bits) 8-/16-/32-bit data accesses Broad range of transfer types from single to multiple data transfers Multimaster capability PI-Bus does not provide: Cache coherency support Broadcasts Dynamic bus sizing

IBM Core Connect Designed around IBM PowerPC core (adaptable to other cores) Slave-centric (in contrast to most other master-centric busses) Multiple shared slave segments Three buses: Processor Local Bus (PLB), 128 bits high bandwidth, low latency Processor cores, external memory interfaces, DMA controllers On-Chip Peripheral Bus (OPB), 32 bits, multiple masters and slaves Device Control Register Bus (DCR) Decoupled address, read and write data buses (concurrent read and write transfers) Address pipelining No-fee, no-royalty to companies IBM makes the CoreConnect bus available as a no-fee (since 1999), no-royalty architecture to tool-vendors, core IP-companies, and chip-development companies. As such it is licensed by over 1500 electronics companies such as Cadence, Ericsson, Lucent, Nokia, Siemens and Synopsys. E.g. Th Xilinx MicroBlaze™ processor uses the same bus for peripherals as the IBM PowerPC® processor. CoreConnect has bridging capabilities to the competing AMBA bus architecture (similar to AMBA 2.0). Source:

CoreConnect vs AMBA IBM CoreConnect PLB AMBA 2.0 HPB Bus architecture
CoreConnect vs AMBA IBM CoreConnect PLB AMBA 2.0 HPB Bus architecture 32, 64, 128, extendable to 256 bits 32, 64, and 128 bits Data buses Separate read and write Separate or three-state Key capabilities Multiple bus masters Four-deep read pipeline, Two-deep write pipeline Split transactions Burst transfers Line transfers OPB (on-chip peripheral bus) Pipelining AMBA APB Masters supported Multiple masters Single master: The APB bridge

ST Microelectronics STBus
Dedicated to consumer applications (set-top boxes, digital HDTV, digital cameras) Set of architectures, interfaces and protocols Each operation consists of one or several request/response pairs 3 types of STBus protocols: Type1: simple protocol for peripheral register access (no pipeline, acts as Request/Grant protocol) Type 2: Type1 + additional pipeline features and operation codes for ordered transactions Type 3: Type 2 + advanced protocol implementing split transactions (splitting into „request transaction“ and „reply transaction“) Type 1 is similar to IBM CoreConnect DCR bus All types have MUX-based implementation, and shared, partial or full crossbar implementation Dedicated for dual HDTV market, the 65 nm chip recently developed by STMicroelectronics integrates one host CPU, two video decoders enabling the decoding of MPEG-2, H264 and VC1 video frames, dedicated micro-processors for audio decoding and many peripherals for internal or external exchanges. The STBus interconnect supports multiple clock domains. All clocks are considered as fully asynchronous, even if there is an integer ratio between some of them. Type 3 allows Out-of-Order transaction completion Source:

STBus Arbitration Static priority (non-preemtive)
Programmable priority Latency based Masters have counter register, loaded with max latency during request Counters are decreased at each access cycle Arbiter grants access to master with lowest counter register value When draw, static priorities will be used

Wishbone Public version available since 2002 via Open-Core, updated in 2010 (Wishbone Rev.B4) „logic bus“– does not specify electrical information or topology Aim: support use of open cores Scalable bus architecture based on simple master/slave handshake communications Configurable bitwidths: 8, 16, 32 Supported interconnection topologies: Direct P2P (master to slave) Dataflow interconnection based on systolic array architectures Shared bus and crossbar switch interconnection (most commonly used in SoCs) WISHBONE itself is not an IP core...it is a specification for creating IP cores.

Wishbone Examples for Wishbone interconnection topologies:
Shared Bus. Source: wikipedia.org Pipeline. Source: wikipedia.org Crossbar Switch. Source: wikipedia.org

Open Core Protocol International Partnership (OCP-IP)
socket (bus wrapper) interface for SoC design configurable and highly scalable interface for on-chip subsystem communications non-profit, open-industry standards body OCP phases Request Response Data Handshake Recent specification: OCP pages OCP-IP Specification Working Group members: • MIPS Technologies Inc. • Nokia • Sonics Inc. • Texas Instruments Incorporated • Toshiba Corporation Semiconductor Company • Cadence Source: Prashant D. Karandikar, Texas Instruments Inc

Open Core Protocol International Partnership (OCP-IP)
Multiple request multiple data possible Reduced chip area by configuring into the OCP interfaces only those features needed by the communicating cores. Simplified system verification and testing by providing built-in test mechanisms Protocols for cache coherence Source: Prashant D. Karandikar, Texas Instruments Inc [OCP Specification3.0]

Virtual Socket Interface Alliance (VSIA) Nios II Avalon Bus Extra material

Virtual Socket Interface Alliance (VSIA)
Virtual Socket Interface Alliance (VSIA) VSI enables system-level interaction on a chip using predesigned blocks called virtual components (VC). Hard VC (placed and routed) Soft VC (HDL description) Firm VC (in the form of generators or partially placed library blocks) Other buses can interface over VCs following VSI standard protocols VSIA was founded in 1996 and dissolved in All documents are given to the public domain: - legacy documents

Nios II – Avalon Bus Switch Fabric to connect CPU, DMA, memory and memory mapped peripherals on FPGA Master-slave modules Master address is a byte address Slave address is a word address Synchronous, rising clock edge Separate data in and out Data path up to 8, 16, …, 1024 bits (word size) Slave-side arbitration, multiple simultaneous masters (each master-slave pair has a dedicated connection between them) Pipelined read transfers, Burst transfers Avalon does address translation and multiple accesses when needed Avalon-MM pipelined read transfers increase the throughput for synchronous slave devices that require several cycles to return data for the first access. Such devices can typically return one data value per cycle for some time thereafter. New pipelined read transfers can start before readdata for the previous transfers is returned. Write transfers cannot be pipelined. A burst executes multiple transfers as a unit, rather than treating every word independently.

Avalon Multi-Mastering
Avalon Multi-Mastering Example multi master system that permits bus transfers between two masters and two slaves Avalon Bus Control Signals: Master Request Slave (MRS) Master Select Granted (MSG) Wait [Simultaneous Multi-Mastering with the Avalon Bus, Application note 184, Altera 2002]

Summary PC Express x 4 x 16 x 1 Conventional 32-bit PCI
Summary PC Express x 4 x 16 x 1 Conventional 32-bit PCI 4 PCI Express bus card slots (from top to 2nd bottom: x4, x16, x1 and x16), compared to a 32-bit conventional PCI bus card slot (very bottom) [

Commercial and Open SoC busses

Similar presentations

Presentation on theme: "Commercial and Open SoC busses"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Commercial and Open SoC busses

Similar presentations

Presentation on theme: "Commercial and Open SoC busses"— Presentation transcript:

Similar presentations

About project

Feedback