Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,

Slides:



Advertisements
Similar presentations
Simulation of Feedback Scheduling Dan Henriksson, Anton Cervin and Karl-Erik Årzén Department of Automatic Control.
Advertisements

© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.
Static Bus Schedule aware Scratchpad Allocation in Multiprocessors Sudipta Chattopadhyay Abhik Roychoudhury National University of Singapore.
1 (Review of Prerequisite Material). Processes are an abstraction of the operation of computers. So, to understand operating systems, one must have a.
Lecture 6: Multicore Systems
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Control Flow Analysis (Chapter 7) Mooly Sagiv (with Contributions by Hanne Riis Nielson)
1/1/ / faculty of Electrical Engineering eindhoven university of technology Processor support devices Part 1:Interrupts and shared memory dr.ir. A.C. Verschueren.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Lifetime Reliability-Aware Task Allocation and Scheduling for MPSoC Platforms Lin Huang, Feng Yuan and Qiang Xu Reliable Computing Laboratory Department.
1 of 30 June 14, 2000 Scheduling and Communication Synthesis for Distributed Real-Time Systems Paul Pop Department of Computer and Information Science.
Institut für Datentechnik und Kommunikationetze Analysis of Shared Coprocessor Accesses in MPSoCs Overview Bologna, Simon Schliecker Matthias.
Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.
1 of 14 1/15 Schedulability Analysis and Optimization for the Synthesis of Multi-Cluster Distributed Embedded Systems Paul Pop, Petru Eles, Zebo Peng Embedded.
Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.
The Xilinx EDK Toolset: Xilinx Platform Studio (XPS) Building a base system platform.
1 of 16 March 30, 2000 Bus Access Optimization for Distributed Embedded Systems Based on Schedulability Analysis Paul Pop, Petru Eles, Zebo Peng Department.
Embedded Computing From Theory to Practice November 2008 USTC Suzhou.
1 Oct 2, 2003 Design Optimization of Mixed Time/Event-Triggered Distributed Embedded Systems Traian Pop, Petru Eles, Zebo Peng Embedded Systems Laboratory.
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
1 of 14 1/15 Design Optimization of Multi-Cluster Embedded Systems for Real-Time Applications Paul Pop, Petru Eles, Zebo Peng, Viaceslav Izosimov Embedded.
Holistic Scheduling and Analysis of Mixed Time/Event-Triggered Distributed Embedded System Traian Pop, Petru Eles, Zebo Peng EE249 Discussion Paper Review.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
November 18, 2004 Embedded System Design Flow Arkadeb Ghosal Alessandro Pinto Daniele Gasperini Alberto Sangiovanni-Vincentelli
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
HARDWARE SUPPORT FOR REAL TIME OPERATING SYSTEMS A presentation by: Jake Swart.
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Advances in Language Design
Course Outline DayContents Day 1 Introduction Motivation, definitions, properties of embedded systems, outline of the current course How to specify embedded.
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
WCET Analysis for a Java Processor Martin Schoeberl TU Vienna, Austria Rasmus Pedersen CBS, Denmark.
Real-Time Java on JOP Martin Schöberl. Real-Time Java on JOP2 Overview RTSJ – why not Simple RT profile Scheduler implementation User defined scheduling.
PRESTO: Improvements of Industrial Real-Time Embedded Systems Development Process
A Time Predictable Instruction Cache for a Java Processor Martin Schoeberl.
1 of 14 1/15 Synthesis-driven Derivation of Process Graphs from Functional Blocks for Time-Triggered Embedded Systems Master thesis Student: Ghennadii.
Real-Time Operating Systems for Embedded Computing 李姿宜 R ,06,10.
1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
Performance Characterization and Architecture Exploration of PicoRadio Data Link Layer Mei Xu and Rahul Shah EE249 Project Fall 2001 Mentor: Roberto Passerone.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Overview of Operating Systems Introduction to Operating Systems: Module 0.
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
1 of 14 1/34 Embedded Systems Design: Optimization Challenges Paul Pop Embedded Systems Lab (ESLAB) Linköping University, Sweden.
Hy-C A Compiler Retargetable for Single-Chip Heterogeneous Multiprocessors Philip Sweany 8/27/2010.
1 of 14 1/15 Schedulability-Driven Frame Packing for Multi-Cluster Distributed Embedded Systems Paul Pop, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB)
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Jamie Unger-Fink John David Eriksen.  Allocation and Scheduling Problem  Better MPSoC optimization tool needed  IP and CP alone not good enough  Communication.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
Martino Ruggiero, Michele Lombardi, Michela Milano and Luca Benini
Architectural Effects on DSP Algorithms and Optimizations Sajal Dogra Ritesh Rathore.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Effect of Instruction Fetch and Memory Scheduling on GPU Performance Nagesh B Lakshminarayana, Hyesoon Kim.
Improving cache performance of MPEG video codec
CSCI1600: Embedded and Real Time Software
Digital Processing Platform
Processor Pipelines and Static Worst-Case Execution Time Analysis
CSCI1600: Embedded and Real Time Software
Presentation transcript:

Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University, Sweden

2 GSM Phone:  Search  Radio Link Control  Talking GSM Phone:  Search  Radio Link Control  Talking MP3 player Digital Camera:  Take Photo  Restore Photo Digital Camera:  Take Photo  Restore Photo...  High performance  Low power  Predictable

3 Design Flow Hardware platform Software Application(s) Extract Task Graph Extract Task Parameters Optimize Formal Simulation CPU0 ASIC0 CPU1 Bus for (i=0;i<99;i++) x=x+a[i]; for(j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Worst case execution times Task power      dl for (i=0;i<99;i++) x=x+a[i]; for (j=0;j<100;j++) y=y+b[i]; if (x<y)z=y; Implement Extract Task Parameters Optimize

4 Application Model       dl

5 Hardware Architecture Bus CPU Interrupt Device Private Memory Private Memory Private Memory Semaphore Device Shared Memory CACHE

6 Execution Model CPU 1 CPU 2 BUS Shared Mem Private Mem 1 Cache Private Mem 2 copy(s,y) use(y) 2:2: y Instructions  2   Original TG copy(x,s) comp(x) x Instructions  1 1:1: s

7 Task Model ii jj Original TG  wi  rj Explicit communication ii jj Extended TG

8 Motivational Example 11 22  wi WCET:  1 =60;  2 =25;  w2 =12  1 and  2 have a deadline at time 63 PMem 1 Bus CPU 1 CPU 2 ShMem PMem 2 11 22  wi

9 Motivational Example (2) CPU 1 CPU 2 BUS 11 22 Implicit communication  w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I1 I2I Explicit communication dl=63 I5I5  w2 I4I4 I3I3

10  w2 I5I5 I4I4 I3I3 I2I2 Motivational Example (3) CPU 1 CPU 2 BUS 11 22  w2 M1M1 M3M3 M5M5 M2M2 M4M4 I1I dl= Deadline violation ! Using a FCFS bus arbiter

11  w2 I5I5 I2I2 I3I3 I4I4 Motivational Example (4) CPU 1 CPU 2 BUS 11 22  w2 M1M1 M3M3 M2M2 I1I dl= M4M4 M4M Using a bus schedule

12 Motivational Example Message In multiprocessor systems, the WCET depends on the bus load ! In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !

13 Implicit Communication BenchmarkBus UtilizationImpl.Communication GSM 1) 12%39% MP3 2) 26%42% MP3 3) 49%86% Setup: ARM7 cores, ST bus protocol 1) Icache: 4096b, Dcache: 1024b 2) Icache: 4096b, Dcache: 1024b 3) Icache: 16b, Dcache: 256b

14 WCET Analysis  Difficult both for single and multiprocessor systems  Single processor tools: Symta/P, Absint aiT  Handle instruction and data caches  Basic idea: enumerate all the possible paths of the program (CFG) and consider always the longest one

15 WCET Analisys Flow source files analysis Data flow Instr. address extraction Program segment simulation Abstract syntax tree generation Data dependency analysis Data flow extraction Data address analysis Data cache binary file CFG construction Annotated CFG WCET Instruction cache Data cache Instr. Cache analysis

16 WCET Analysis: Example void foo() { int i, temp; for (i=0; i<100; i++) { temp=a[i]; a[temp]=0; }

17 WCET Analysis: CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<N; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4,9 id: 12 Lno:3,4,6 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11

18 WCET Analysis: CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 Control nodes: 2, 4, 11 Basic blocks: 12, 17, 13, 6 id: 4 Loop bound (for ex. N=100)

19 WCET Analysis with Instruction Cache  Generate the address traces for each program block  Assume always a miss at the beginning of each block  Use a cache simulator to get the cache rate/miss ratio for each block  We can do better

20 WCET Analysis with ICache: Unrolled CFG 1:void foo() { 2: int i, temp; 3: for (i=0; 4: i<100; 5: i++) { 6:temp=a[i]; 7:a[temp]=0; 8: } 9:} id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6

21 WCET Analysis with ICache: Unrolled CFG id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4

22 WCET Analysis: Multiprocessor  Cache miss penalty is constant in single processor case  Cache miss penalty is variable in the multiprocessor case

23 Predictable MPSoC Bus Access  Partition the bus period in bus slots (TDMA)  Assign bus slots to the processors  The bus arbiter grants the bus to a processor only during its allocated slots  Eliminates the bus interference  Not flexible: an idle bus slot can not be used by another processor

24 Analysis & Bus Access id: 2 id: 17 Lno:3,4 id: 12 Lno:3,4 id: 4 id: 13 Lno:6,7,5,4,6 id: 16 Lno:6,7,5,4,8 id: 11 id: 104 id: 13 Lno:6,7,5,4,6 miss lno 6 (d) lno 6 miss lno 7 (d) lno 7, 5, 4 miss lno 6 (d) miss lno 6 (i) lno 6 miss lno 7 (i) miss lno 7 (d) lno 7 miss lno 5 (i) lno 5, 4 miss lno 3 (i) miss lno 3 (d) lno 3 miss lno 4 (i) lno 4 Bus schedule CPU1 CPU2 CPU1 CPU2 CPU

25 Multiprocessor Analysis and Optimization In multiprocessor systems, the WCET depends on the schedule ! In multiprocessor systems, the schedule depends on the WCET !

26 55 Overall Approach CPU 1 CPU 2 CPU 3 BUS 11 22 33 CPU 1 :  1,  4 CPU 2 :  2 CPU 3 :  3,  5 44 11 33 11 22 33 22 44 22 33 44 44 22 55 22 55 44 44 44 55 55

27 Overall Approach starting at t for the time interval Select bus schedule B tasks from set  Determine WCET of the is the earliest time a tasks from set   finishes  Schedule new task at  time t>=  that are active at time t is the set of all tasks New task to schedule optimization Bus schedule

28 Overall Approach starting at t for the time interval Select bus schedule B tasks from set  Determine WCET of the is the earliest time a tasks from set   finishes  Schedule new task at  time t >=  that are active at time t is the set of all tasks New task to schedule optimization Bus schedule

29 Bus Schedule: BSA1 t0t0 t1t1 t3t3 CPU 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 1 CPU 2... over a period slot_start owner CPU 1 CPU 2 CPU 1... t2t2

30 Bus Schedule: BSA2 t0t0 owners 1, 2 12 seg_size seg_start owner size 1 3 CPU 1 CPU 2 Segment 1 Segment 2 over a period... t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... t4t4 owners 2, 1 7 seg_size seg_start owner size 2 5 CPU 1 CPU 2 CPU 1 t5t5 t6t6...

31 Bus Schedule: BSA3 t0t0 seg_start owners 1, 2 3 slot_size t4t4 2, over a period Segment 1 Segment 2 t1t1 t2t2 t0t0 t4t4 t3t3 CPU 2 CPU 1 CPU 2... CPU 2 CPU 1 t5t5 t6t6

32 Experimental Results BSA 4 BSA 3 BSA 2 BSA 1 Number of CPUs Normalized Schedule Length

33 Experimental Results Number of CPUs Normalized Schedule Length

34 Real-life Example  Smart phone  GSM voice codec (encoder+decoder) and Mp3 player  64 tasks, between lines of C code per task  4 ARM7 processors, interconnected via a bus

35 Real-life Example BSA_1BSA_2BSA_3BSA_  GSM + Mp3  64 tasks  4 ARM7 processors

36 Conclusions  Realistic model for MPSoC  WCET analysis must be integrated in the system scheduling  Tool for system level scheduling and WCET  Tested on real applications

37 ARTIST LiU TU Brauschweig U. of Bologna Original SymtaP code Bus controller Implementation