Safely Exploiting Multithreaded Processors to Tolerate Memory Latency

Slides:



Advertisements
Similar presentations
Feedback EDF Scheduling Exploiting Dynamic Voltage Scaling Yifan Zhu and Frank Mueller Department of Computer Science Center for Embedded Systems Research.
Advertisements

EE5900 Advanced Embedded System For Smart Infrastructure
November 23, 2005 Egor Bondarev, Michel Chaudron, Peter de With Scenario-based PA Method for Dynamic Component-Based Systems Egor Bondarev, Michel Chaudron,
CPE555A: Real-Time Embedded Systems
Courseware Scheduling of Distributed Real-Time Systems Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Tightening the Bounds on Feasible Preemption Points.
THE UNIVERSITY of TEHRAN Mitra Nasri Sanjoy Baruah Gerhard Fohler Mehdi Kargahi October 2014.
Introduction and Background  Power: A Critical Dimension for Embedded Systems  Dynamic power dominates; static /leakage power increases faster  Common.
Real-Time Scheduling CIS700 Insup Lee October 3, 2005 CIS 700.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Sporadic Server Scheduling in Linux Theory vs. Practice Mark Stanovich Theodore Baker Andy Wang.
Latency Tolerance: what to do when it just won’t go away CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley.
CS 3013 & CS 502 Summer 2006 Scheduling1 The art and science of allocating the CPU and other resources to processes.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
1 Center for Embedded Systems Research (CESR) Department of Computer Science North Carolina State University Frank Mueller Timing Analysis: In Search of.
1 of 14 1 Fault-Tolerant Embedded Systems: Scheduling and Optimization Viacheslav Izosimov, Petru Eles, Zebo Peng Embedded Systems Lab (ESLAB) Linköping.
NC STATE UNIVERSITY Anantaraman © 2004RTSS–25 Enforcing Safety of Real-Time Schedules on Contemporary Processors using a Virtual Simple Architecture (VISA)
1 of 14 1 Analysis and Synthesis of Communication-Intensive Heterogeneous Real-Time Systems Paul Pop Computer and Information Science Dept. Linköpings.
Courseware Basics of Real-Time Scheduling Jan Madsen Informatics and Mathematical Modelling Technical University of Denmark Richard Petersens Plads, Building.
Embedded Systems Exercise 3: Scheduling Real-Time Periodic and Mixed Task Sets 18. May 2005 Alexander Maxiaguine.
Wk 2 – Scheduling 1 CS502 Spring 2006 Scheduling The art and science of allocating the CPU and other resources to processes.
By Group: Ghassan Abdo Rayyashi Anas to’meh Supervised by Dr. Lo’ai Tawalbeh.
Spring 2002Real-Time Systems (Shin) Rate Monotonic Analysis Assumptions – A1. No nonpreemptible parts in a task, and negligible preemption cost –
Misconceptions About Real-time Computing : A Serious Problem for Next-generation Systems J. A. Stankovic, Misconceptions about Real-Time Computing: A Serious.
DBMSs On A Modern Processor: Where Does Time Go? by A. Ailamaki, D.J. DeWitt, M.D. Hill, and D. Wood University of Wisconsin-Madison Computer Science Dept.
1 Previous lecture review n Out of basic scheduling techniques none is a clear winner: u FCFS - simple but unfair u RR - more overhead than FCFS may not.
Probabilistic Preemption Control using Frequency Scaling for Sporadic Real-time Tasks Abhilash Thekkilakattil, Radu Dobrin and Sasikumar Punnekkat.
Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.
Scheduling policies for real- time embedded systems.
DESIGNING VM SCHEDULERS FOR EMBEDDED REAL-TIME APPLICATIONS Alejandro Masrur, Thomas Pfeuffer, Martin Geier, Sebastian Drössler and Samarjit Chakraborty.
A Mixed Time-Criticality SDRAM Controller MeAOW Sven Goossens, Benny Akesson, Kees Goossens COBRA – CA104 NEST.
Real-Time Systems Mark Stanovich. Introduction System with timing constraints (e.g., deadlines) What makes a real-time system different? – Meeting timing.
BFair: An Optimal Scheduler for Periodic Real-Time Tasks
Real-Time Scheduling CS 3204 – Operating Systems Lecture 20 3/3/2006 Shahrooz Feizabadi.
The Global Limited Preemptive Earliest Deadline First Feasibility of Sporadic Real-time Tasks Abhilash Thekkilakattil, Sanjoy Baruah, Radu Dobrin and Sasikumar.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University Ali El-Haj-Mahmoud and.
Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.
NC STATE UNIVERSITY 1 Feedback EDF Scheduling w/ Async. DVS Switching on the IBM Embedded PowerPC 405 LP Frank Mueller North Carolina State University,
Undergraduate course on Real-time Systems Linköping 1 of 45 Autumn 2009 TDDC47: Real-time and Concurrent Programming Lecture 5: Real-time Scheduling (I)
CprE 458/558: Real-Time Systems (G. Manimaran)1 CprE 458/558: Real-Time Systems RMS and EDF Schedulers.
F A S T Frequency-Aware Static Timing Analysis
NC STATE UNIVERSITY Center for Embedded Systems Research (CESR) Department of Electrical & Computer Eng’g North Carolina State University Ali El-Haj-Mahmoud,
CSE 522 Real-Time Scheduling (2)
Real-Time Scheduling CS 3204 – Operating Systems Lecture 13 10/3/2006 Shahrooz Feizabadi.
CSCI1600: Embedded and Real Time Software Lecture 23: Real Time Scheduling I Steven Reiss, Fall 2015.
Real-Time Systems, Events, Triggers. Real-Time Systems A system that has operational deadlines from event to system response A system whose correctness.
Multimedia Computing and Networking Jan Reduced Energy Decoding of MPEG Streams Malena Mesarina, HP Labs/UCLA CS Dept Yoshio Turner, HP Labs.
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Bounding Preemption Delay within Data Cache Reference.
An Overview of Distributed Real- Time Systems Research By Brian Demers March 24, 2003 CS 535, Spring 2003.
1 Adapted from UC Berkeley CS252 S01 Lecture 17: Reducing Cache Miss Penalty and Reducing Cache Hit Time Hardware prefetching and stream buffer, software.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Introduction Goal: connecting multiple computers to get higher performance – Multiprocessors – Scalability, availability, power efficiency Job-level (process-level)
ECE 692 Power-Aware Computer Systems Final Review Prof. Xiaorui Wang.
1 Software Thread Integration for Concurrency and ILP in Embedded Systems Alex Dean Center for Embedded Systems Research Department.
Lecture 6: Real-Time Scheduling
Real-Time Operating Systems RTOS For Embedded systems.
Networked Embedded Control System - Integration of control and computing Moonju Park Dept. of Computer Science & Engineering University of Incheon 1.
Chien-Chung Shen CIS, UD
Wayne Wolf Dept. of EE Princeton University
EEE 6494 Embedded Systems Design
Lecture 24: Process Scheduling Examples and for Real-time Systems
Hyperthreading Technology
Sanjoy Baruah The University of North Carolina at Chapel Hill
Latency Tolerance: what to do when it just won’t go away
Processes and operating systems
Linköping University, IDA, ESLAB
Aravindh Anantaraman*, Kiran Seth†, Eric Rotenberg*, Frank Mueller‡
FAST: Frequency-Aware Static Timing Analysis
Research Topics Embedded, Real-time, Sensor Systems Frank Mueller moss
Presentation transcript:

Safely Exploiting Multithreaded Processors to Tolerate Memory Latency in Real-Time Systems Title Slide Ali El-Haj-Mahmoud and Eric Rotenberg Center for Embedded Systems Research (CESR) Electrical & Computer Engineering North Carolina State University

Embedded Processor Trends More demanding applications and user expectations Higher frequency ARM-11 (0.13µ): 500 MHz ARM-11 (0.10µ): 1 GHz Processor-memory speed gap El-Haj-Mahmoud © 2004 CASES 2004

How to capitalize on higher frequency? Memory Wall How to capitalize on higher frequency? El-Haj-Mahmoud © 2004 CASES 2004

Multithreading Switch-on-event coarse-grain multithreading Multiple register contexts for fast switching Switch to alternate task when current task accesses memory Overlap memory accesses with computation But what about multithreading in hard-real-time? Require analyzability Cannot rely on dynamic schemes El-Haj-Mahmoud © 2004 CASES 2004

Exploiting Multithreading in Hard-Real-Time Systems Safety Guarantee all tasks meet deadlines (worst-case) Statically bound overlap under all scenarios Tractability Confirm/disconfirm schedulability mathematically using closed-form tests Consider each task individually El-Haj-Mahmoud © 2004 CASES 2004

Classic Real-Time Scheduling Example: Earliest Deadline First (EDF) periodA time Task A A1 A2 release A1 deadline A1 release A2 periodB Task B B1 B2 B3 B4 release B1 deadline B1 release B2 EDF schedulability test EDF B1 A1 B2 B3 A2 B4 Utilization-based schedulability test No need to construct the schedule a priori El-Haj-Mahmoud © 2004 CASES 2004

Performance vs. Tractability Performance: Memory overlap Tractability: Closed-form schedulability test Only basic parameters of tasks are known WCET = C + M (from conventional WCET analysis) Period = deadline A B M El-Haj-Mahmoud © 2004 CASES 2004

EDF (Classic Real-Time): Low Performance but Tractable DEADLINE MISSED Memory overlap: No Closed-form test: Yes El-Haj-Mahmoud © 2004 CASES 2004

Dynamic Switch on Memory Access: Possibly High Performance but Intractable DEADLINE MISSED Memory overlap: Possible (unfair for low priority) Closed-form test: No Must examine memory positioning! El-Haj-Mahmoud © 2004 CASES 2004

Deterministic Switching: High Performance and Tractable Memory overlap: Yes (fair and bounded) Closed-form test: Yes El-Haj-Mahmoud © 2004 CASES 2004

Tractability through Deterministic Switching Fully decouple independent tasks by forcing periodic switches Every task gets a chance to initiate/overlap memory accesses No scheduling dependences No specificity regarding memory positioning El-Haj-Mahmoud © 2004 CASES 2004

Weighted-Round-Robin (WRR) Round i Round i+1 Round i+2 duty cycle Pipeline T1 T2 T3 T4 T1 T2 T3 T4 T1 T2 T3 T4 time round = memory latency Aggregate computation is the same regardless of context switching Virt. Proc. 1 T1 T1 T1 Virt. Proc. 2 T2 T2 T2 Virt. Proc. 3 T3 T3 T3 Virt. Proc. 4 T4 T4 T4 T4 memory transfer operation forced pre-emption dilate WCET El-Haj-Mahmoud © 2004 CASES 2004

Analytical Framework for WRR d: duty cycle 0 < d  1 P: period = deadline WCET: Worst-Case Execution Time WCET = C + M C: aggregate computation time M: aggregate memory time WCET’: dilated Worst-Case Execution Time WCET’ = (C/d) + M El-Haj-Mahmoud © 2004 CASES 2004

Schedulability Test 1. Dilated task meets deadline on its virtual processor 2. Sum of all duty cycles less than or equal to 1 El-Haj-Mahmoud © 2004 CASES 2004

Generalized Analytical Framework Multiple tasks per VP: El-Haj-Mahmoud © 2004 CASES 2004

Modeling Bus and Memory System Addressed in detail in paper Analysis accounts for: Worst-case task serialization on memory bus DRAM bank conflicts Multiple VPs sharing single DRAM bank El-Haj-Mahmoud © 2004 CASES 2004

Results mem component 28% 50% El-Haj-Mahmoud © 2004 CASES 2004

Novel Memory overlap Formalism (safety & tractability) Classic real-time No Yes Classic multithreading Our real-time multithreading framework El-Haj-Mahmoud © 2004 CASES 2004

Useful Fully capitalize on high-frequency embedded microprocessors Exceed schedulability limit of conventional real-time theory for uniprocessors by analytically bounding WCET overlap El-Haj-Mahmoud © 2004 CASES 2004

Deployable Software-only solution Use Ubicom IP3023 8-thread embedded microprocessor Analytical framework + scheduling policy El-Haj-Mahmoud © 2004 CASES 2004

Summary Safely expose multithreading to hard-real-time schedulability analysis Bound computation / memory overlap Offline closed-form schedulability test Safe Tractable Scale “Memory Wall” in embedded systems Expose full benefits of high-frequency embedded processors El-Haj-Mahmoud © 2004 CASES 2004

Questions? El-Haj-Mahmoud © 2004 CASES 2004