CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ 85287 Dr. Yann-Hang Lee (480) 727-7507.

Slides:



Advertisements
Similar presentations
Approximating the Worst-Case Execution Time of Soft Real-time Applications Matteo Corti.
Advertisements

Compiler Support for Superscalar Processors. Loop Unrolling Assumption: Standard five stage pipeline Empty cycles between instructions before the result.
Xianfeng Li Tulika Mitra Abhik Roychoudhury
U NIVERSITY OF D ELAWARE C OMPUTER & I NFORMATION S CIENCES D EPARTMENT Optimizing Compilers CISC 673 Spring 2009 Instruction Scheduling John Cavazos University.
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
RISC and Pipelining Prof. Sin-Min Lee Department of Computer Science.
Modeling shared cache and bus in multi-core platforms for timing analysis Sudipta Chattopadhyay Abhik Roychoudhury Tulika Mitra.
Pipeline Computer Organization II 1 Hazards Situations that prevent starting the next instruction in the next cycle Structural hazards – A required resource.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Constraint Systems used in Worst-Case Execution Time Analysis Andreas Ermedahl Dept. of Information Technology Uppsala University.
Microarchitectural Approaches to Exceeding the Complexity Barrier © Eric Rotenberg 1 Microarchitectural Approaches to Exceeding the Complexity Barrier.
Timing Predictability - A Must for Avionics Systems - Reinhard Wilhelm Saarland University, Saarbrücken.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Emery Berger University of Massachusetts, Amherst Advanced Compilers CMPSCI 710.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei Embedded Systems Laboratory Linköping University,
PSUCS322 HM 1 Languages and Compiler Design II Basic Blocks Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU Spring.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
Wish Branches A Review of “Wish Branches: Enabling Adaptive and Aggressive Predicated Execution” Russell Dodd - October 24, 2006.
4/25/08Prof. Hilfinger CS164 Lecture 371 Global Optimization Lecture 37 (From notes by R. Bodik & G. Necula)
Code Generation for Basic Blocks Introduction Mooly Sagiv html:// Chapter
Prof. Fateman CS 164 Lecture 221 Global Optimization Lecture 22.
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Predictable Implementation of Real-Time Applications on Multiprocessor Systems-on-Chip Alexandru Andrei, Petru Eles, Zebo Peng, Jakob Rosen Presented By:
Computer Science 12 Design Automation for Embedded Systems ECRTS 2011 Bus-Aware Multicore WCET Analysis through TDMA Offset Bounds Timon Kelter, Heiko.
Pipelines for Future Architectures in Time Critical Embedded Systems By: R.Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and C.Ferdinand EEL.
Optimization software for apeNEXT Max Lukyanov,  apeNEXT : a VLIW architecture  Optimization basics  Software optimizer for apeNEXT  Current.
A Modular and Retargetable Framework for Tree-based WCET analysis Antoine Colin Isabelle Puaut IRISA - Solidor Rennes, France.
WCET Analysis for a Java Processor Martin Schoeberl TU Vienna, Austria Rasmus Pedersen CBS, Denmark.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 20 Slide 1 Defect testing l Testing programs to establish the presence of system defects.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
CSc 453 Final Code Generation Saumya Debray The University of Arizona Tucson.
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany 2013 年 12 月 02 日 These slides use Microsoft clip arts. Microsoft copyright.
LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:
Timing Analysis of Embedded Software for Speculative Processors Tulika Mitra Abhik Roychoudhury Xianfeng Li School of Computing National University of.
Super computers Parallel Processing By Lecturer: Aisha Dawood.
CS 211: Computer Architecture Lecture 6 Module 2 Exploiting Instruction Level Parallelism with Software Approaches Instructor: Morris Lancaster.
Zheng Wu. Background Motivation Analysis Framework Intra-Core Cache Analysis Cache Conflict Analysis Optimization Techniques WCRT Analysis Experiment.
1 Program Testing (Lecture 14) Prof. R. Mall Dept. of CSE, IIT, Kharagpur.
6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)
A Unified WCET Analysis Framework for Multi-core Platforms Sudipta Chattopadhyay, Chong Lee Kee, Abhik Roychoudhury National University of Singapore Timon.
1 Control Flow Analysis Topic today Representation and Analysis Paper (Sections 1, 2) For next class: Read Representation and Analysis Paper (Section 3)
Static WCET Analysis vs. Measurement: What is the Right Way to Assess Real-Time Task Timing? Worst Case Execution Time Prediction by Static Program Analysis.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
CS 614: Theory and Construction of Compilers Lecture 15 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
ECE 720T5 Fall 2011 Cyber-Physical Systems Rodolfo Pellizzoni.
Branch Prediction Prof. Mikko H. Lipasti University of Wisconsin-Madison Lecture notes based on notes by John P. Shen Updated by Mikko Lipasti.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
1 Test Coverage Coverage can be based on: –source code –object code –model –control flow graph –(extended) finite state machines –data flow graph –requirements.
ECE 720T5 Winter 2014 Cyber-Physical Systems Rodolfo Pellizzoni.
White-Box Testing Statement coverage Branch coverage Path coverage
Week 5-6 MondayTuesdayWednesdayThursdayFriday Testing I No reading Group meetings MidtermNo Section Testing II Progress report due Readings out Testing.
Worst-case Execution Time (WCET) Estimation
Code Optimization.
BASIS PATH TESTING.
Prof. Hsien-Hsin Sean Lee
Software Testing.
Flow Path Model of Superscalars
CSCI1600: Embedded and Real Time Software
Software Testing (Lecture 11-a)
CS 201 Compiler Construction
Evaluation and Validation
Inlining and Devirtualization Hal Perkins Autumn 2011
Estimating Timing Profiles for Simulation of Embedded Systems
Instruction Level Parallelism (ILP)
Sudipto Ghosh CS 406 Fall 99 November 16, 1999
EECS 583 – Class 3 Region Formation, Predicated Execution
CSCI1600: Embedded and Real Time Software
Spring 2019 Prof. Eric Rotenberg
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

CSE 522 WCET Analysis Computer Science & Engineering Department Arizona State University Tempe, AZ Dr. Yann-Hang Lee (480) Some of the slides were based on the lecture by G. Fainekos (ASU)

Execution Time – WCET & BCET (Figure from R.Wilhelm et al., ACM Trans. Embed. Comput. Sys, 2007.) 2

The WCET Problem  Given  the code for a software task  the platform (OS + hardware) that it will run on  Determine the WCET of the task.  Why is this problem important?  The WCET is central in the design of real-time computing  Can the WCET always be found?  In general, not a decidability problem, but a complexity problem  Compute bounds for the execution times of instructions and basic blocks and determine a longest path in the basic- block graph of the program.  3

Components of Execution Time Analysis  Program path (Control flow) analysis  Want to find longest path through the program  Identify feasible paths through the program  Find loop bounds  Identify dependencies amongst different code fragments  Processor behavior analysis  For small code fragments (basic blocks), generate bounds on run-times on the platform  Model details of architecture, including cache behavior, pipeline stalls, branch prediction, etc.  Outputs of both analyses feed into each other 4

Program Path Analysis: Overall Approach (1)  Construct Control-Flow Graph (CFG) for the task  Nodes represent Basic Blocks of the task  Basic block: a sequence of consecutive program statements where there is no possibility of branching  We have a single entry and a single exit node  Edges represent flow of control (jumps, branches, calls, …)  The problem is to identify the longest path in the CFG  Note: CFG can have loops, so need to infer loop bounds and unroll them  This gives us a directed acyclic graph (DAG). How do we find the longest path in this DAG? 5

Program Path Analysis: Overall Approach (2)  In a CFG  B i = basic block i  x i = number of times the block B i is executed  d j = number of times edge is executed  c i = worst case running time of block B i  Objective: find  How to get x i ?  Structural constraints  Functionality constraints  Loop bounds -- need to be known 6

CFG Example N = 10; q = 0; while(q < N) q++; q = r; Example due to Y.T. Li and S. Malik B1: N = 10; q = 0; B2: while(q<N) B4: q = r; B3: q++; 1 0 x1x1 x2x2 x4x4 x3x3 d1 d2 d3 d4 d5 d6 Want to maximize  i c i x i subject to constraints x 1 = d 1 = d 2 d 1 = 1 x 2 = d 2 +d 4 = d 3 +d 5 x 3 = d 3 = d 4 = 10 x 4 = d 5 = d 6 7

CFG – Another example 8 /* k >=0 */ s = k; while (k < 10){ if (ok) j++; else { j = 0; ok = true; } k++; } r = j; d1d1 d5d5 d4d4 d3d3 d2d2 d8d8 d 10 d9d9 d6d6 d7d7 B1B1 s = k; B2B2 while (k < 10){ B3B3 if (ok) B4B4 j++; B6B6 k++; B7B7 r = j; B5B5 j = 0; ok = true; x7x7 x2x2 x3x3 x4x4 x5x5 x6x6 x1x1

check_data() { x 1 int i, morecheck, wrongone; x 2 morecheck = 1; i = 0; wrongone = -1; x 3 while (morecheck) { x 4 if (data[i] < 0) { x 5 wrongone = i; morecheck = 0; } else x 6 if (++i >= 10) x 7 morecheck = 0; } x 8 if (wrongone >= 0) x 9 return 0; else x 10 return 1; } Functionality Constraints 9 x 2  x 4 x 4  10x 2 Constraints (x 5 = 0 & x 7 = 1) | (x 5 = 1 & x 7 = 0) x 5 = x 9

Micro-architectural Modeling -- Cache  Modify cost function (cache hit and miss have different costs)  Add linear constraints to describe relationship between cache hits and misses  Basic idea  Basic blocks assumed to be smaller than entire cache  Subdivide instruction counts (x i ) into counts of cache hits (x i hit ) and misses (x i miss )  Line-block (or l-block) is a contiguous sequence of code within the same basic block that is mapped to the same cache line in the instruction cache  Either all hit or all miss in a l-block 10

Basic Blocks to Line Blocks (Direct- mapped cache) Color Cache Set B1B1 B2B2 B3B3 B 1.1 B 1.2 B 1.3 B 2.1 B 2.2 B 3.1 B 3.2 No conflicting l-blocks: (only the first execution has a miss) Two nonconflicting l-blocks are mapped to same cache line Conflicting blocks: affected by the sequence Cache Constraints: 11

Cache Conflict Graph  For every cache set containing two or more conflicting l- blocks  start node, end node, and node B k.l for every l-block in the cache set  Edge from B k.l to B m.n : control can pass between them without passing through any other l-blocks of the same cache set.  p (i. j,u.v) : the number of times that the control passes through that edge. 12 start B m.n end B k.l p (k.l,k.l) p (m.n,m.n) p (s,k.l) p (s,m.n) p (k.l,m.n ) p (m.n,k.l ) p (k.l,e) p (m.n,e) p (s,e)

Cache Cache Constraints Example (1) d1d1 d5d5 d4d4 d3d3 d2d2 d8d8 d 10 d9d9 d6d6 d7d7 B 1.1 s = k; B 2.1 while (k < 10){ B 3.1 if (ok) B 4.1 j++; B 6.1 k++; B 7.1 r = j; B 5.1 j = 0; ok = true; x7x7 x2x2 x3x3 x4x4 x5x5 x6x6 x1x1 13

Cache Constraints Example (2) S E B 5.1 B 4.1 p (s,4.1) p (s,5.1) p (s,e) p (4.1,4.1) p (5.1,4.1) p (4.1,5.1) p (5.1,5.1) p (5.1,e) p (4.1,e) S E B 6.1 B 1.1 p (s,1.1) p (1.1,6.1) p (1.1,e) p (6.1,e) p (6.1,6.1) 14

over-estimation 20-30% 15% 30-50% cache-miss penalty Lim et al. Thesing et al.Souyris et al. The explosion of penalties has been compensated by a reduction of uncertainties! 10% 25% Progress During the Past 10 Years 15

Open Problems  Architectures are getting much more complex.  Can we create processor behavior models without the pain?  Can we change the architecture to make timing analysis easier?  Small changes to code and/or architecture require completely re-doing the WCET computation  Use robust techniques that learn about processor/platform behavior  Need more reliable ways to measure execution time  References:  Li, Malik, and Wolfe, “Cache Modeling for Real-Time Software: Beyond Direct Mapped Instruction Caches”  Wilhelm, “Determining bounds on execution times,” Handbook on Embedded Systems, CRC Press,