Presentation is loading. Please wait.

Presentation is loading. Please wait.

Performance Optimization for Embedded Software

Similar presentations


Presentation on theme: "Performance Optimization for Embedded Software"— Presentation transcript:

1 Performance Optimization for Embedded Software
Presented by: Yingjun Lyu

2 What is Software Optimization?
The process of modifying a software system —> work more efficiently or use fewer resources

3 Do you Optimize your Program?

4 When to Optimize? A better approach: design first, code from design, profile the code Keep performance goals in mind

5 Levels of Optimization
Design level Algorithms and data structures Source code level while(1) vs for(;;) Build level Compile level Assembly level Run time

6 The Code Optimization Process
Build —> Optimize —> Check outputs Build —> Generate tests —> Optimize —> Check outputs

7 Basic C Optimization Techniques
Choose the right data type Example: a processor does not support a 32-bit multiplication. Use of a 32-bit type in a multiply—> A sequence of 16-bit operations What if only a 16-bit precision is needed? Solution: Use intrinsics to leverage embedded processor features.

8 An intrinsic function is a function available for use in a given programming language whose implementation is handled specially by the compiler.

9 Function calling conventions Definition: an implementation-level (low-level) scheme for how callees receive parameters from their caller and how they return a result. Stack-based or Register-based?

10 Restrict and point aliasing Compiler knows pointers do not alias—>Parallelism

11 Loops Communicate loop count information: specify the loop count bounds to the compiler Example: Hardware loop: keep the loop body in a buffer or prefetching

12 General Loop Transformation
Loop unrolling Multisampling Partial summation Software pipelining

13 Loop unrolling: A loop body is duplicated one or more times
Loop unrolling: A loop body is duplicated one or more times. The loop count is then reduced by the same factor to compensate.

14 Multisampling: independent output values that have an overlap in input source data values

15 Partial Summation: The computation for one output sum is divided into multiple smaller, or partial, sums.

16 Software pipelining: A sequence of instructions is transformed into a pipeline of several copies of that sequence

17 Is there any cost for performance optimization?

18 Example: Loop Unrolling

19 Code Size Optimization
Why? Code Size —> The amount of space in memory the code will occupy at program run-time and the potential reduction in the amount of instruction cache needed by the device.

20 Compiler flags (configure the compiler)
Optimize code size Example: command line option -Os in the GNU GCC compiler Optimize performance O3Os? Critical code is optimized for speed and the bulk of the code may be optimized for size

21 “Premium encodings”: The most commonly used instructions can be represented in a reduced binary footprint Example: integer add instructions in a 32-bit device are represented with a premium 16-bit encoding Drawback: Performance Degration

22 Tuning the ABI for code size ABI: application binary interface, an interface between a given program and the OS, system libraries, etc. To reduce code size, there are two areas of interest: calling convention and alignment

23 Fewer instructions are required for setting up parameters to be passed via registers than for those to be passed via the stack. Calling Convention

24 Increase cache misses and register pressure
Space-time Tradeoff Depend on the unrolling factor Increase cache misses and register pressure

25 Space-time Tradeoff

26 Improve Performance through memory layout optimization
Vectorization of loops Computation performed across multiple loop iterations can be combined into single vector instructions.

27 An important concern for vectorizing:
Loop Dependence Analysis: array access, data modification, conditional statement, etc Challenge: Pointer aliasing Solution: Place restrict keyword

28 Array-of-structures or Structure-of-arrays
Array-of-structures or Structure-of-arrays? Hint: Memory is most efficiently accessed sequentially.

29 Source Code Level Optimization
Performance bug: Bugs that cause significant performance degradation PerfChecker: a performance bug detection tool for mobile applications (static analysis)

30

31

32 GUI lagging becomes the most dominant bug types(75.7%)
Long running operations in main threads

33 View holder design pattern

34 [1] Oshana and Kraeling. Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications - Chapter 11: Optimizing Embedded Software for Performance [2] Oshana and Kraeling. Software Engineering for Embedded Systems: Methods, Practical Techniques, and Applications - Chapter 12: Optimizing Embedded Software for Memory [3] Heydemann, K., Bodin, F., Knijnenburg, P. M. W. and Morin, L. (2006), UFS: a global trade-off strategy for loop unrolling for VLIW architectures. Concurrency Computat.: Pract. Exper., 18: 1413– doi: /cpe.1014 [4] Yepang Liu, Chang Xu, and Shing-Chi Cheung Characterizing and detecting performance bugs for smartphone applications. In Proceedings of the 36th International Conference on Software Engineering (ICSE 2014). ACM, New York, NY, USA, DOI= [5]


Download ppt "Performance Optimization for Embedded Software"

Similar presentations


Ads by Google