Estimating Flight CPU Utilization of Algorithms in a Desktop Environment Using Xprof Dr. Matt Wette 2013 Flight Software Workshop.

Slides:



Advertisements
Similar presentations
ECE 454 Computer Systems Programming Compiler and Optimization (I) Ding Yuan ECE Dept., University of Toronto
Advertisements

Program Representations. Representing programs Goals.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.
Instrumentation and Profiling David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA
Educational Computer Architecture Experimentation Tool Dr. Abdelhafid Bouhraoua.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
COM181 Computer Hardware Ian McCrumRoom 5B18,
Precision Going back to constant prop, in what cases would we lose precision?
1 Integrated Development Environment Building Your First Project (A Step-By-Step Approach)
ITEC 352 Lecture 11 ISA - CPU. ISA (2) Review Questions? HW 2 due on Friday ISA –Machine language –Buses –Memory.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
CS-2710 Computer Organization Dr. Mark L. Hornick web: faculty-web.msoe.edu/hornick – CS-2710 info syllabus, homework, labs… –
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Computer Systems Organization CS 1428 Foundations of Computer Science.
Copyright © 2013, SAS Institute Inc. All rights reserved. MEMORY CACHE – PERFORMANCE CONSIDERATIONS CLAIRE CATES DISTINGUISHED DEVELOPER
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.
Virtualization Part 2 – VMware. Virtualization 2 CS5204 – Operating Systems VMware: binary translation Hypervisor VMM Base Functionality (e.g. scheduling)
COP4020 Programming Languages Subroutines and Parameter Passing Prof. Xin Yuan.
The Central Processing Unit (CPU) and the Machine Cycle.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Memory: Relocation.
Computer Science 210 Computer Organization More on Assembler.
Ted Pedersen – CS 3011 – Chapter 10 1 A brief history of computer architectures CISC – complex instruction set computing –Intel x86, VAX –Evolved from.
Power Profiling using Sim-Panalyzer Andria Dyess and Trey Brakefield CPE631 Spring 2005.
CMSC 150 PROGRAM EXECUTION CS 150: Wed 1 Feb 2012.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Software. 2 What is software ► Software is the term that we use for all the programs and data on a computer system. ► Two types of software ► Program.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
C LANGUAGE Characteristics of C · Small size
NETW3005 Memory Management. Reading For this lecture, you should have read Chapter 8 (Sections 1-6). NETW3005 (Operating Systems) Lecture 07 – Memory.
High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.
Chapter – 8 Software Tools.
QEMU, a Fast and Portable Dynamic Translator Fabrice Bellard (affiliation?) CMSC 691 talk by Charles Nicholas.
Digital Computer Concept and Practice Copyright ©2012 by Jaejin Lee Control Unit.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Chapter 2 Build Your First Project A Step-by-Step Approach 2 Exploring Microsoft Visual Basic 6.0 Copyright © 1999 Prentice-Hall, Inc. By Carlotta Eaton.
Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
GCSE Computing - The CPU
Introduction to Operating Systems
Assembler, Compiler, MIPS simulator
Chapter 1 Introduction.
Computer Science 210 Computer Organization
COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE
Chapter 1 Introduction.
Workshop in Nihzny Novgorod State University Activity Report
Computer Science 210 Computer Organization
What we need to be able to count to tune programs
TRANSLATORS AND IDEs Key Revision Points.
OS Virtualization.
CSCI1600: Embedded and Real Time Software
Introduction to Operating Systems
Code Generation.
Lecture 14: Reducing Cache Misses
Performance Optimization for Embedded Software
Instruction Level Parallelism (ILP)
MARIE: An Introduction to a Simple Computer
Program Execution.
Lecture 4: Instruction Set Design/Pipelining
GCSE Computing - The CPU
CSc 453 Final Code Generation
CSCI1600: Embedded and Real Time Software
Dynamic Binary Translators and Instrumenters
Presentation transcript:

Estimating Flight CPU Utilization of Algorithms in a Desktop Environment Using Xprof Dr. Matt Wette 2013 Flight Software Workshop Dec , 2013 Pasadena, CA v131202b

2013 FSW WorkshopM.Wette 2 Outline Flight Algorithm Development Process The Challenge of Generating CPU Utilization Estimates The Xprof Approach An Example How Valgrind / Xprof Works Critique Summary

2013 FSW WorkshopM.Wette 3 Development Process A computationally intensive component of flight software is developed by domain experts. Examples: image processing, hazard avoidance, descent control. The algorithm development is performed on desktop workstations in the C or C++ language. Note: The tool covered in this talk operates on the compiled binary so language is not relevant. The development environment includes: ✤ Flight algorithms ✤ Simulation models ✤ Glue-code to run these together The flight algorithm code is delivered to the FSW/Avionics group for integration into the real- time flight load. The model code is delivered to the real-time simulation group for integration into the test env.

2013 FSW WorkshopM.Wette 4 The Challenge of Generating CPU Loading Estimates Early in the development, the avionics group needs to size the flight computing resources. The required computer capacity can be driven by computationally intensive algorithms developed by the domain experts. ✤ Trades may be required: algorithm capability versus availability/cost of a computer. ✤ This trade should happen prior to procurement, not during integration and test. ✤ The domain experts should provide CPU utilization estimates prior to computer procurement. Accurate estimates of component CPU utilization can be challenging to generate. ✤ The flight code is difficult, if not impossible, to run in isolation. ✤ FLOP count estimates, performed by code analysis, can be labor intensive and inaccurate. - Example: Division is more intensive than addition/multiplication. On PowerPC implementations, division requires about 16 times that of addition, subtraction or multiplication. - Compiler code optimization can distort estimates based on source code

2013 FSW WorkshopM.Wette 5 The Xprof Approach Copy xprofreq.h to the working directory. Add to glue code: #include “xprofreq.h” No libraries to link. No special compiler arguments (unless to locate xprofreq.h). Instrument the code. ✤ Add counter variables. One for each section of code to trace, repeated for different modes of operation (e.g., aquisition and tracking modes). ✤ Add xprof instructions before/after code fragments to be measured: XP_CLRCTR1();... cnt += XP_GETCTR1(); ✤ Print results at end of execution: if (cnt > 0) printf(“cnt/RTI = %lu/%lu\n”, cnt, nrti); Nominally, run without estimating CPU load; the code runs without overhead. $./algsim results... Periodically run under Valgrind/Xprof to get clock cycle estimates. $ valgrind -q --tool=xprof./algsim results... cnt/RTI = /500

2013 FSW WorkshopM.Wette 6 An Example Objectives: ✤ Estimate clock cycles for flight code executing in each of two modes. ✤ Compare double vs float. ✤ Compare no optimization vs. “-O3” Build: gcc -O0 -DFLT_TYPE=float demo1.c gcc -O0 -DFLT_TYPE=double demo1.c gcc -O3 -DFLT_TYPE=float demo1.c gcc -O3 -DFLT_TYPE=double demo1.c Run each build under valgrind-xprof: $ valgrind -q --tool=xprof./a.out Results (clock cycles vs options for 10k iterations): options mode 1 mode 2 sim. flt,-O dbl,-O flt,-O dbl,-O #include #include #include "xprofreq.h“ typedef struct { int mode; FLT_TYPE x[3];} fc_t; void fsw_run(fc_t *fc, FLT_TYPE y, FLT_TYPE *u) { FLT_TYPE *x = fc->x; switch (fc->mode) { case 1: if (fabs(y) > 0.01) fc->mode = 2; x[0] = 0.1*x[1]; x[1] = 2.36*x[1] - 0.1*y; *u = 72.8*x[0] - 3.4e5*x[1] *y; break; case 2: if (fabs(y) mode = 1; x[0] = 0.0; x[1] = 2.36*x[1] - 0.1*y; *u = 72.8*x[0] - 3.4e5*x[1] *y; break; } } typedef struct { FLT_TYPE x[3]; } sc_t; void sim_run(sc_t *fs, FLT_TYPE u, FLT_TYPE *y) { FLT_TYPE *x = fs->x; if (fabs(x[0]) < 0.01) u *= -10.0; x[0] = 0.1*x[1]; x[1] = u/4.62e4; *y = x[1]; } int main() { unsigned long c[2], cs; FLT_TYPE y, u; int i, n = 10000; fc_t fc = { 1, { 0.1, 0.2, 0.3 }}; sc_t sc = { { 0.1, 0.2, 0.3 }}; u = y = 0.0; c[0] = c[1] = cs = 0; for (i = 0; i 0) printf("c[0]=%lu c[1]=%lu cs=%lu n=%d\n", c[0], c[1], cs, n); }

2013 FSW WorkshopM.Wette 7 How Valgrind-Xprof Works Valgrind Steps: ✤ Break code into superblocks ✤ Instrument code blocks ✤ Execute Instrumenting code blocks: ✤ Convert guest machine code to IR ✤ Convert tree IR to flat IR ✤ Instrument IR (Xprof instrumentation is executed) ✤ Optimize IR ✤ Convert flat IR to tree IR ✤ Convert tree IR to host assembly code ✤ Allocate registers ✤ Generate machine code The Xprof tool contains a table that maps IR instructions and operations to cycle counts. The default table can be dumped. $ valgrind --tool=xprof --dump-op-table=dump.tbl /bin/true A user-defined table can be loaded. $ valgrind --tool=xprof --load-op-table=load.tbl./algsim For each superblock, xprof counts the instructions from entry to each exit and adds a valgrind “Dirty call” to update an internal counter. Notes: Valgind translates all code down to system calls. Execution of system calls is not counted. IR = Intermediate Representation

2013 FSW WorkshopM.Wette 8 Critique Xprof looks promising. ✤ Simple to use, very cheap way to get approximate CPU clock counts. ✤ Evaluates almost all code (only missing system calls). Drawbacks: ✤ Mapping IR operations to target processor is not perfect. - Calibration and tuning should help. - The tool may work better for relative tuning. “Please reduce your CPU load by 25%.” ✤ Does not (currently) deal with cache. (Valgrind does have “cachegrind” tool.) One common approach is to multiply cycle counts by cache-factor to get time. ✤ Does not capture pipelining. ✤ Does not (currently) work on multithreaded code. ✤ No guarantees of over- or under-bounding execution time on target computer.

2013 FSW WorkshopM.Wette 9 Summary For the challenge of sizing CPU capability early in development of computationally intensive flight algorithms, Valgrind-Xprof shows promise. The tool is easy to use and cheap to implement. Work remains to test, tune and calibrate. References: ✤ provides source code and documentation for the toolsuite ✤ provides code for xprof patch ✤ Nethercote, N., and Seward, J., "Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation." Proceedings of ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI 2007), San Diego, California, USA, June 2007.