ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.

Slides:

Advertisements

Similar presentations

IT253: Computer Organization

Advertisements

Evaluation of On-Chip Interconnect Architectures for Multi-Core DSP Students : Haim Assor, Horesh Ben Shitrit 2. Shared Bus 3. Fabric 4. Network on Chip.

Prof. Natalie Enright Jerger

ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.

Dynamic Topology Optimization for Supercomputer Interconnection Networks Layer-1 (L1) switch –Dumb switch, Electronic “patch panel” –Establishes hard links.

Datorteknik BusInterfacing bild 1 Bus Interfacing Processor-Memory Bus –High speed memory bus Backplane Bus –Processor-Interface bus –This is what we usually.

REAL-TIME COMMUNICATION ANALYSIS FOR NOCS WITH WORMHOLE SWITCHING Presented by Sina Gholamian, 1 09/11/2011.

ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.

Reference: Message Passing Fundamentals.

NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter Final presentation part A Winter 2006.

Interfacing Processors and Peripherals Andreas Klappenecker CPSC321 Computer Architecture.

Network based System on Chip Final Presentation Part B Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

1 Multiprocessors. 2 Idea: create powerful computers by connecting many smaller ones good news: works for timesharing (better than supercomputer) bad.

10 - Network Layer. Network layer r transport segment from sending to receiving host r on sending side encapsulates segments into datagrams r on rcving.

Network based System on Chip Performed by: Medvedev Alexey Supervisor: Walter Isaschar (Zigmond) Winter-Spring 2006.

1 Interfacing Processors and Peripherals I/O Design affected by many factors (expandability, resilience) Performance: — access latency — throughput — connection.

NETWORK ON CHIP ROUTER Students : Itzik Ben - shushan Jonathan Silber Instructor : Isaschar Walter PART A Midterm presentation Winter 2006.

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

McRouter: Multicast within a Router for High Performance NoCs

A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Communication issues for NOC By Farhadur Arifin. Objective: Future system of NOC will have strong requirment on reusability and communication performance.

Interconnect Networks

On-Chip Networks and Testing

Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.

Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

Networks-on-Chips (NoCs) Basics

QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.

Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.

Ob-Chip Networks and Testing1 On-Chip Networks and Testing-II.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

CS 8501 Networks-on-Chip (NoCs) Lukasz Szafaryn 15 FEB 10.

Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.

I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.

August 1, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 9: I/O Devices and Communication Buses * Jeremy R. Johnson Wednesday,

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni.

Performance Analysis of a JPEG Encoder Mapped To a Virtual MPSoC-NoC Architecture Using TLM 林孟諭 Dept. of Electrical Engineering National Cheng Kung.

By Edward A. Lee, J.Reineke, I.Liu, H.D.Patel, S.Kim

Yu Cai Ken Mai Onur Mutlu

Soc 5.1 Chapter 5 Interconnect Computer System Design System-on-Chip by M. Flynn & W. Luk Pub. Wiley 2011 (copyright 2011)

1 Presenter: Min Yu,Lo 2015/12/21 Kumar, S.; Jantsch, A.; Soininen, J.-P.; Forsell, M.; Millberg, M.; Oberg, J.; Tiensyrja, K.; Hemani, A. VLSI, 2002.

T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.

Lecture 16: Router Design

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Virtual-Channel Flow Control William J. Dally

Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.

Mohamed Younis CMCS 411, Computer Architecture 1 CMCS Computer Architecture Lecture 26 Bus Interconnect May 7,

Network On Chip Cache Coherency Final presentation – Part A Students: Zemer Tzach Kalifon Ethan Kalifon Ethan Instructor: Walter Isaschar Instructor: Walter.

1  2004 Morgan Kaufmann Publishers Page Tables. 2  2004 Morgan Kaufmann Publishers Page Tables.

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.

On-time Network On-Chip: Analysis and Architecture CS252 Project Presentation Dai Bui.

Network-on-Chip Paradigm Erman Doğan. OUTLINE SoC Communication Basics  Bus Architecture  Pros, Cons and Alternatives NoC  Why NoC?  Components 

Bus Interfacing Processor-Memory Bus Backplane Bus I/O Bus

ECE 720T5 Fall 2012 Cyber-Physical Systems

Lecture 23: Interconnection Networks

Exploring Concentration and Channel Slicing in On-chip Network Router

On-time Network On-chip

Chapter 3 Part 3 Switching and Bridging

CS 6290 Many-core & Interconnect

Advanced Computer and Parallel Processing

Multiprocessors and Multi-computers

Presentation transcript:

ECE 720T5 Fall 2012 Cyber-Physical Systems Rodolfo Pellizzoni

/ 31 Assignments – Research Track Saturday Oct 13 8:00AM: Project proposal –Max 2 pages document. –Describe what you want to do, why is it relevant, what will be the contribution, and a brief summary of your work plan. –Please pick a title for the project. –I would suggest using a ACM/IEEE double-column conference format. This way, it is easier for you to re-use the proposal text when you create the final report. –Please send me the proposal by in pdf or word format. If you want to further discuss the project, I will be available this afternoon, tomorrow morning and Friday morning this week. 2

/ 31 Topic Today: Interconnects On-chip bandwidth wall. –We need scalable communication between cores in a multi-core system –How can we provide isolation? Delay on the interconnects compounds cache/memory access delay Interconnects links are a shared resource – tasks suffer timing interference.

/ 31 Interconnects Types Shared bus –Single resource – each data transaction interferes with every other transaction –Not scalable 4 Crossbar –N input ports, M output ports –Each input connected to each output –Usually employs virtual input buffers –Problem: still scales poorly. Wire delay increases with N, M.

/ 31 Interconnects Types Network-on-Chip –Interconnects comprises on-chip routers connected by (usually full-duplex) links –Topologies include linear, ring, 2D mesh, 2D torus 5

/ 31 Off-Chip vs On-Chip Networks Several key differences… Synchronization –It is much easier to synchronize on-chip routers Link Width –Wires are relatively inexpensive in on-chip networks – this means links are typically fairly wide. –On the other hand, many off-chip networks (ex: PCI express, SATA) moved to serial connections years ago. Buffers –Buffers are relatively inexpensive in off-chip networks (compared to other elements). –On the other hand, buffers are the main cost (area and power) in on-chip networks. 6

/ 31 Other Details Wormhole routing (flit switches) –Instead of buffering the whole packet, buffer only part of it –Break packet into blocks (flits) – usually of size equal to link width –Flits propagate in sequence through the network Virtual Channels –Problem: packet now occupies multiple flit switches –If the packet becomes blocked due to contention, all switches are blocked –Solution: implement multiple flit buffers (virtual channels) inside each router –Then assign different packets to different virtual channels 7

AEthereal Network on Chip 8

/ 31 AEthereal Real interconnects architecture implemented by Philips (now NXP semiconductors) Key idea: NoC comprises both Best Effort and Guaranteed Service routers. GS routers are contentionless –Synchronize routers –Divide time into fixed-size slot –Table dictates routing in each time slot –Tables build so that blocks never wait – one-block queuing 9

/ 31 Routing Table 10

/ 31 Combined GS-BE Router 11

/ 31 Alternative: Centralized Model A central scheduling node receives requests for channel creation Central scheduler updates transmission tables in network interfaces (end node -> NoC). Packet injection is regulated only by the network interfaces – no scheduling table in the router. 12

/ 31 Centralized Mode Router 13

/ 31 Results: Buffers are Expensive 14

/ 31 The Big Issue How do you compute the scheduling table? No clear idea in the paper! –In the distributed model, you can request slots until successful. –In the centralized model, the central scheduler should run a proper admission control + scheduling algorithm! –How do you decide the length (slot numbers) of the routing tables? Simple idea: treat the network as a single resource. –Problem: can not exploit NoC parallelism. 15

/ 31 Computing the Schedule Real-Time Communication for Multicore Systems with Multi- Domain Ring Buses. Scheduling for the ring bus implemented in Cell BE processor –12 flit-switches –Full-duplex –SPE units use scratchpad with programmable DMA unit Main assumptions: –Scheduling controlled by software on the SPEs –Transfers large data chunks (unit transactions) using DMA –All switches on the path are considered occupied during the unit transfer –Periodic data transactions with deadline = period. 16

/ 31 Transaction Sets And Linearization 17

/ 31 Results Overlap set: maximal set of overlapping transactions. –Two overlapping transactions can not transmit at the same time… If the periods are all the same, then U <=1 for each overlapping set is a necessary and sufficient schedulability condition. Otherwise, U <= (L-1)/L is a sufficient condition (where L is the GCD of the periods in unit transactions). Implementation transfers 10KB in a time unit of 537.5ns – if periods are multiples of ms, L is large. 18

/ 31 Same Periods – Greedy Algorithm 19

/ 31 Different Periods Divide time into intervals of length L. Define lag for a job of task i as: U i * t - #units_executed –Schedulable if lag at the deadline = 0. –Lag of a overlap set: sum of the lags of tasks in the set. Key idea: compute the number of time units that each job executes in the interval such that: –The number of time units for each overlap set is not greater than L (this makes it schedulable in the interval) –The lag of the job is always > -1 and < 1 (this means the job meets the deadline) How is it done? Complex graph-theoretical proof. –Solve a max flow problem at each interval. 20

/ 31 What about mesh networks? A Slot-based Real-time Scheduling Algorithm for Concurrent Transactions in NoC Same result as before, but usable on 2D mesh networks. Unfortunately, requires some weird assumptions on the transaction configuration… 21

/ 31 NoC Predictability: Other Directions Fixed-Priority Arbitration –Let packets contend at each router, but arbitrate according to strict fixed-priority –Then build a schedulability analysis for all flows –Issue #1: not really composable –Issue #2: do we have enough priorities (i.e. do we have buffers)? Routing –So far we have assumed that routes are predetermined –In practice, we can optimize the routes to reduce contention –Many general-purpose networks use on-line rerouting –Off-line routes optimization probably more suitable for real-time systems. 22

/ 31 Putting Everything Together… In practice, timing interference in a multicore system depends on all shared resources: –Caches –Interconnects –Main Memory A predictable architecture should consider the interplay among all such resources –Arbitration: the order in which cores access one resource will have an effect on the next resource in the chain –Latency: access latency for a slower resource can effectively hide the latency for access to a faster resource Let’s see some examples… 23

HW Support for WCET Analysis of Hard Real-Time Multicore Systems 24

/ 31 Intra-Core and Inter-Core Arbiters 25

/ 31 Timing Interference 26

/ 31 WCET Using Different Cache Banks 27

/ 31 Bankization vs Columnization (Cache-Way Partitioning) 28

/ 31 Non-Real Time Tasks 29

/ 31 Optimizing the Bus Schedule The previous paper assumed RR inter-core arbitration. Can we do better? Yes! Bus scheduling optimization –Use TDMA instead of RR – same worst-case behavior –Analyze the tasks –Determine optimal TDMA schedule –Ex: Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip 30

/ 31 Example 31