Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.

Similar presentations


Presentation on theme: "CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015."— Presentation transcript:

1 CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015

2 Worst Case Execution Time  What is it?  Longest time a task can take  Why do we need it?  Scheduling algorithms assume it is known  Can’t say anything about real time without it  What is the goal?  Manually check each task to gets it max run time  Automatically get the run time of a task using a tool

3 What is the Problem  This should be easy  Knuth volume 1 does this for a variety of algorithms  Just count the number of instructions  What are the problems?  The halting problem  Almost anything you want to know about a real program is undecidable  Need to understand and limit control flows  Need to understand the hardware  Need to understand the execution model

4 Control Flow  To compute WCET, the control flow must be limited  Control flow can be modeled as a graph  Graph of basic blocks  Basic block: code with no branches  Once started, will execute to completion  Suppose we could compute the WCET of each block  How could we compute the run time of the program

5 Control Flow Graphs  Loops have to be bounded  Bounds can be fixed  Can be based on input  Need to determine the bounds  Nested loops  Fixed, based on input  Based on index of outer loop

6 Reducible Control Structures  Can you compute the time for an arbitrary graph?  Can be difficult  But programs don’t produce arbitrary graphs  Clean programs produce reducible graphs  A reducible graph allows you to cluster nodes  WCET of a cluster can be computed  The cluster can be replaced with a single node

7 Reducible  A graph is reducible iff repeated applications of the following actions yields a graph with only one node:  Replace a self loop with a single node  Replace a sequence of nodes such that all the incoming edges are to the first node and all the outgoing edges are from the last node with a single node

8 Reducible Example B1 B2,B3,B4,B5 B6 B1 B2,B3,B4,B5 B6 B1,B2,B3,B4,B5,B6

9 WCET On Reducible Graphs  Assume you have WCET for each block  This should be easy – sequence of instructions  Can compute the WCET for each reduced block  Loops are bounded  Self loop = WCET(block) * loop count  Others can’t have loops  Compute MAX(WCET for each path) from start to finish

10 Basic Block WCET  Each instruction takes k cycles  Count the number of cycles  Multiply by the clock speed  If only it were that simple  Processor timing can depend on many factors  Pipelining, out-of-order execution  Memory behavior needs to be considered  Caching

11 Speculation-Based CPU Anomalies  Instruction A does conditional branch followed by B or C  Speculate B rather than C, but execute C  C is in the cache  If A is in the cache, there is time to prefetch B  B drives C out of the cache => Longer time  If A is not in the cache, then the overall time is faster

12 Scheduling-Based CPU Anomalies  Instructions A-B-C-D-E  B depends on A, D depends on C, E depends on D  A, D, E use resource 1 (CPU unit)  B, C use resource 2  Resource 2 initially in use  A is run first  If A is quick, then B is run followed by C,D,E  This is linear time, with no overlap  If A is slow, then C can start (resource 2 freed)  B and D can then overlap  Result is faster

13 Memory Behavior  Caching can change timings considerably  Both instruction and data caching  Why not just assume worst-case time / instruction  What is the cost of an I-cache miss  Can be several orders of magnitude  Can’t afford to do this for each instruction  Need to maintain a complex model of processor and cache state  Assume start state is unknown  Determining worst case input can be difficult  Need to handle preemption  This could change the processor and cache states at any time  But the number of preemptions can be limited

14 Approaches to WCET  We need to compute WCET  To handle real time scheduling  To understand real time limits  What can we do with real problems  Measurement-based approaches  Code-analysis based approaches  Hybrid approaches

15 Measurement-Based Approaches  Why not just run the code  On multiple inputs, multiple times  Recording the time it takes  Get a graph of execution times  Best, worst, distribution

16 Execution Time Distribution

17 Practical Measurements  Break the program in subtasks  Input distribution can be better controlled  Get measurements of the time for each subtask  Put these together to get total time  This can be a bit better but still not safe

18 How to Get Measurements  Getting Measurements  Clock time, CPU cycle counters, etc. are availbalbe  On real hardware, probes might change processor states  Simulation  Assumes you know everything about the hardware  On real hardware using hardware probes  External triggers on hardware lines  Picking inputs  Randomly (from what space, what distribution)  From sample data (how representative)  Manually (can be difficult)

19 Static WCET Analysis  Compiler technology can be used  Much of the same type of work that compilers do in the optimization process  Compilers need to understand control flow  Compilers want to understand loop bounds  Compilers need to understand processor state  Model the processor when generating instructions  We can use this to compute WCET

20 Static WCET Analysis

21 Static Analysis for WCET  Build the program model  Control flow graph with connected basic blocks  Include information on path dependencies  Might require programmer annotations  Compute the loop bounds  Have the programmer provide them for you  Deduce through symbolic execution and constraints  Hybrid approaches

22 Static Analysis for WCET  Estimate the time for each basic block  Using a model of the CPU/Memory/etc.  Tracking processor/cache states  Known X, Known not X, unknown  Produce a range instead of a single number  Typically take into account I-cache, not D-cache  Can be done using measurement  Put the result back together  Using reducible control structures  Can be formulated as linear programming  Still have to handle calls, …

23 Other Techniques for WCET  Partition the task into subtasks and analyze them  Partitioning can be heuristic or programmer-defined  Generally, the smaller the unit, the easier it is to analyze  Hybrid approaches  Use measurements for small units  Do both measurement and static analysis to get a better approximation  Use dynamics to determine possible initial states

24 State-of-the-Art Tools  Tools exist to do this work  Using programmer annotations and assistance  Tools aren’t perfect  Don’t handle preemption and scheduling  Don’t handle data caching  Don’t have the most accurate models of the CPU  Models aren’t necessarily correct  Other tools  Languages, compilers and system design for time-prediction

25 Next Time  Guest Lecture on Security: Vasilis Kemerlis  Project Presentations Start FRIDAY  Mechanics: Order, volunteers, …


Download ppt "CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015."

Similar presentations


Ads by Google