A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.

Slides:



Advertisements
Similar presentations
© 2004 Wayne Wolf Topics Task-level partitioning. Hardware/software partitioning.  Bus-based systems.
Advertisements

Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.
Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
Static Bus Schedule aware Scratchpad Allocation in Multiprocessors Sudipta Chattopadhyay Abhik Roychoudhury National University of Singapore.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
CPE 731 Advanced Computer Architecture Instruction Level Parallelism Part I Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University.
Computer Structure 2014 – Out-Of-Order Execution 1 Computer Structure Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Timing Analysis of Concurrent Programs Running on Shared Cache Multi-Cores Presented By: Rahil Shah Candidate for Master of Engineering in ECE Electrical.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
T IME -P REDICTABLE E XECUTION OF E MBEDDED S OFTWARE ON M ULTI - CORE P LATFORMS Sudipta Chattopadhyay under the guidance of A/P Abhik Roychoudhury 1.
CML Efficient & Effective Code Management for Software Managed Multicores CODES+ISSS 2013, Montreal, Canada Ke Bai, Jing Lu, Aviral Shrivastava, and Bryce.
PERFORMANCE ANALYSIS OF MULTIPLE THREADS/CORES USING THE ULTRASPARC T1 (NIAGARA) Unique Chips and Systems (UCAS-4) Dimitris Kaseridis & Lizy K. John The.
Computer Architecture Instruction Level Parallelism Dr. Esam Al-Qaralleh.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Pipeline Hazards Pipeline hazards These are situations that inhibit that the next instruction can be processed in the next stage of the pipeline. This.
1 IIES 2008 Thomas Heinz (Saarland University, CR/AEA3) | 22/03/2008 | © Robert Bosch GmbH All rights reserved, also regarding any disposal, exploitation,
Harini Ramaprasad, Frank Mueller North Carolina State University Center for Embedded Systems Research Tightening the Bounds on Feasible Preemption Points.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
S CALABLE A ND P RECISE R EFINEMENT OF C ACHE T IMING A NALYSIS VIA M ODEL C HECKING Sudipta Chattopadhyay Abhik Roychoudhury 1.
1 Copyright © 2012, Elsevier Inc. All rights reserved. Chapter 3 (and Appendix C) Instruction-Level Parallelism and Its Exploitation Computer Architecture.
Computer Architecture 2011 – Out-Of-Order Execution 1 Computer Architecture Out-Of-Order Execution Lihu Rappoport and Adi Yoaz.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,
1  2004 Morgan Kaufmann Publishers Chapter Six. 2  2004 Morgan Kaufmann Publishers Pipelining The laundry analogy.
Review for Midterm 2 CPSC 321 Computer Architecture Andreas Klappenecker.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
WCET Analysis for a Java Processor Martin Schoeberl TU Vienna, Austria Rasmus Pedersen CBS, Denmark.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
1 Estimating the Worst-Case Energy Consumption of Embedded Software Ramkumar Jayaseelan Tulika Mitra Xianfeng Li School of Computing National University.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
1 Presented By: Michael Bieniek. Embedded systems are increasingly using chip multiprocessors (CMPs) due to their low power and high performance capabilities.
Chapter 6 Pipelined CPU Design. Spring 2005 ELEC 5200/6200 From Patterson/Hennessey Slides Pipelined operation – laundry analogy Text Fig. 6.1.
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
Hybrid Multi-Core Architecture for Boosting Single-Threaded Performance Presented by: Peyman Nov 2007.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File Stephen Hines, Gary Tyson, and David Whalley Computer Science Dept. Florida.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Real-time aspects Bernhard Weirich Real-time Systems Real-time systems need to accomplish their task s before the deadline. – Hard real-time:
Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.
CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480)
Lecture 9. MIPS Processor Design – Pipelined Processor Design #1 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System.
WCET-Aware Dynamic Code Management on Scratchpads for Software-Managed Multicores Yooseong Kim 1,2, David Broman 2,3, Jian Cai 1, Aviral Shrivastava 1,2.
SYNAR Systems Networking and Architecture Group CMPT 886: Computer Architecture Primer Dr. Alexandra Fedorova School of Computing Science SFU.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
Timing Anomalies in Dynamically Scheduled Microprocessors Thomas Lundqvist, Per Stenstrom (RTSS ‘99) Presented by: Kaustubh S. Patil.
Computer Architecture Principles Dr. Mike Frank
CS 5513 Computer Architecture Pipelining Examples
CSCI1600: Embedded and Real Time Software
Module 3: Branch Prediction
Worst-Case Execution Time
Evaluation and Validation
Coe818 Advanced Computer Architecture
Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt
Jinquan Dai, Long Li, Bo Huang Intel China Software Center
CS 286 Computer Architecture & Organization
Hardik Shah, Kai Huang and Alois Knoll
Dynamic Hardware Prediction
CS 3853 Computer Architecture Pipelining Examples
Processor Pipelines and Static Worst-Case Execution Time Analysis
CSCI1600: Embedded and Real Time Software
Presentation transcript:

A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon Kelter, Peter Marwedel Heiko Falk TU Dortmund, Germany Ulm University, Germany RTAS 2012, Beijing1

Timing Analysis RTAS 2012, Beijing2  Hard real time systems require absolute timing guarantees  System level analysis  Single task analysis  Worst case execution time (WCET) analysis  An upper bound on execution time for all possible inputs  Sound over-approximation is obtained by static analysis

WCET Analysis RTAS 2012, Beijing3 Program Micro-architectural modeling Control flow graph WCET of basic blocks constraints Infeasible path constraints Loop bound Path analysis IPET IPET = Implicit Path Enumeration Technique

Architecture RTAS 2012, Beijing4 Core 1Core n L1 cache Shared L2 cache Memory Shared bus

Micro-architectural Modeling RTAS 2012, Beijing5 pipelinecache branch predictor Single Core Interactions shared cache shared bus Multi Core Rosen et. al RTSS’07 Li et. al RTSS’09 Chattopadhyay et. al SCOPES’10 Kelter et. al ECRTS’11 Unified Multi-core timing analysis

Timing Anomaly (shared Cache) RTAS 2012, Beijing6 hitmiss hit miss hit miss hit May not be the worst case path

Timing Anomaly (Shared Bus) RTAS 2012, Beijing7 delay min delay max delay min May not be the worst case path

Background RTAS 2012, Beijing8  Representing each pipeline stage as a timing interval IF ID EX WB CM Structural dependency R1 := R2 + 5 R5 := R1 * R7 R3 := R5 * 5 Contention A fixed-point analysis derives the timing of each stage as an interval [3,7][4,10] startfinish latency [1,3]

Shared Cache + Pipeline RTAS 2012, Beijing9 L1 L2 Abstract interpretation – hit, miss or unclear Timing interval T := T + [1, 1] T := T + [ miss 1 + 1, miss 1 + 1] T := T + [miss 1 + 1, miss 1 + miss 2 + 1] T := T + [1, miss 1 + miss 2 + 1] hit unclearmiss unclear hit latency = 1 cycle miss 1 L1 cache miss penalty miss 2 L2 cache miss penalty (shared)

Shared Bus Analysis RTAS 2012, Beijing10  Time Division Multiple Access (TDMA)  Offset abstraction Core 0Core 1Core 0Core 1 Core 0Core 1Core 0Core 1 T (core 1) offset round offsetdelay T’ (core 0) delay = 0

Shared bus + pipeline RTAS 2012, Beijing11 IF3 IF1ID1 ID3 O1O1 O2O2 O in ID1  IF2 O in = O 1 IF2  ID1 O in = O 2 IF2  ID1 O in = O 1 U O 2 (approximate timing by static analysis) IF2 finishes after ID1ID1 finishes after IF2 Property: Offset content monotonically decreases over different iterations IF2ID2

Loop Construct RTAS 2012, Beijing12 C1C1 C2C2 C3C3 C 100 Unrolling loop iterations EXPENSIVE …… Bus contexts C i = bus context of the loop body at i-th iteration

Loop Construct RTAS 2012, Beijing13 Bus context flow graph C1C1 C2C2 C3C3 C4C4 C 5  C 3 C5C5 Property: If C i  C j, then C i+k  C j+k for any k > 0 How do we define bus context?

Loop Construct RTAS 2012, Beijing14 How do we define bus context? Bus offsets of all pipeline stages of all instructions? There could be thousands of nodes C1C1 C2C2 C3C3 C4C4 Bus context flow graph

Loop Construct RTAS 2012, Beijing 15 How do we define bus context? IF ID EX WB CM previous iteration current iteration Property: If the bus offsets of the cross-iteration edges do not change, WCET of the loop iteration cannot change

Loop Construct RTAS 2012, Beijing16 C1C1 C2C2 C3C3 C4C4 Compute WCET for each bus context Generate ILP flow constraints: E(C1) + E(C2) + E(C3) + E(C4) ≤ loop bound E(C1) ≥ E(C2) E(C1) = number of times context C1 is executed Bus context flow graph

Branch prediction + Cache RTAS 2012, Beijing17 m’ m m Cache conflict Cache hit branch correctly predicted branch incorrectly predicted m evicted from cache Cache miss

Branch prediction + Cache RTAS 2012, Beijing18 m’ m m Branch location Maximum number of speculated instructions JOIN Unclear cache access Cache content Cache content

Overall Picture RTAS 2012, Beijing19 pipelinecache branch predictor shared cache shared bus Multi Core WCET of basic blocks constraints Infeasible path constrain s Loop bound Path analysis IPET Bus context constraints

Experimental Setup (Chronos Toolkit) RTAS 2012, Beijing20 C source GCC simplescalar Binary codeCFG Micro architectural modeling Private cache pipelineBranch prediction Micro-architectural constraints ILP Flow constraints WCET Shared cacheShared bus

Cache Sharing vs Cache Partitioning RTAS 2012, Beijing Shared Cache between 2 cores 8 4 Core 1Core 2 Vertically partition 8 Core 1 Core 2 Horizontally partition 4

Evaluation (cache + pipeline) RTAS 2012, Beijing22 jfdctint statemate Imprecision of shared cache analysis

Evaluation (Cache + pipeline + Speculation) RTAS 2012, Beijing23 Imprecision of modeling speculation

Evaluation (Bus + pipeline) RTAS 2012, Beijing24 Imprecision of shared bus analysis Imprecision of path analysis

Evaluation (Bus + pipeline + Speculation) RTAS 2012, Beijing25 Imprecision of shared bus analysis Imprecision of path analysis

Conclusion RTAS 2012, Beijing26  A unified WCET analysis framework  Handles interaction of shared cache and bus with pipeline and branch prediction  Timing anomaly is possible, state explosion is handled by timing interval abstraction  Detailed information of the tool and extensive results are available at: 

RTAS 2012, Beijing27 Questions Thank You