Presentation is loading. Please wait.

Presentation is loading. Please wait.

Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore

Similar presentations


Presentation on theme: "Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore"— Presentation transcript:

1 Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore {zhaoqin,esim,wongwf}@comp.nus.edu.sg DEP : Detailed Execution Profile Larry Rudolph SingaporeMIT Alliance Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology rudolph@csail.mit.edu Chine -Cheng Wu PAS Lab,CSIE, NTU

2 Introduction Previous work on profiling needs large memory space and big times slowdown DEP (detailed Execution Profile) captures the complete dynamic control flow, data dependency and memory reference at the same time The profile size is significantly reduced DEP uses DynamoRIO binary instrumentation framework to profile in an infrastructure called Adept (A dynamic execution profiling tool)

3 DEP Advantage DEP complete coverage of the program including shared libraries Multi-threaded application can be collected by independent DEPs Collection is very efficient, incurring a 5 times slowdown Profile contains memory reference and control flow information

4 Control Flow Profile : DEP c Traditional way to record basic block entries using 4 byte for each DEP use 2-byte for each and an extra 2-byte if needed H-tag for high 2 bytes L-tag for low 2 bytes This compressibility does not guarantee space optimization

5 Memory References Profile : DEP m Memory reference : {pc,addr,size,type} PC of the memory reference instruction Address of memory reference Size of the data being accessed If it’s a read or a write Storing only the necessary values that

6 Memory Reference There are three memory references above Push ebp; Mov 0 -> [esp+4]; Mov 0 -> [esp+8];

7 BB_pc+Mem_addr Compared to DEP DEP trigger fewer analyzer calls than (BB_pc+Mem_addr) cause of smaller profile data that reach overflow to signal analyzer Penalty includes steal and restore registers Address calculation Storage of the address Update profile counter Extra overhead Checking H-tag changes Checking and updating register status

8 DynamoRIO Running on IA-32 under both Linux and Windows DynamoRIO executes applications by copying user code into cache and then executing Code is the same as original one except control operation return to DynamoRIO Trace cache will cache code for in-direct branch lookup

9 ADEPT : A Dynamic Execution Profiling Tool

10 Control Flow : Obtaining DEPc If the L-tag is 0x0000

11 Memory References: Obtaining DEPm Two state of each register variable : UPDATED, RECORDED

12 Profile Buffer Store the collected profile for future analysis One buffer for each thread Using large buffer will reduce analyzer invocations Profile buffer has two parts for DEPc and DEPm separately 20 % for DEPc, 80 % for DEPm works well Analyzer is triggered by buffer full using OS signal of page segmentation fault

13 Optimizing DEPc Basic block 0x0804ffa4 branch to 0x08050000

14 Optimizing DEP m Optimized

15 Evaluation Platform : Dual-core 3.2GHz Intel Pentium D 840, 2GBytes of RAM OS : Linux Fedora Core 4 and Windows XP SP2 Benchmarks : SPEC CPU2000 integer benchmarks for Linux, SysMark 2004SE for windows ( run Access, PowerPoint and Word ) Compiler : gcc with -O3 flag

16 Execution Time

17 Relative slowdown

18 Profile Frameworks Pin Count number of basic blocks executed Count number of memory references Valgrind Cachegrind is a cache profiler for capture the number of basic blocks counts and memory references counts eWPP (Extended Whole Program Paths) Recording control flow and dependence information Uses two-phase profiling approach First phase, identify all memory dependence Second phase, collection phase

19 Profile Size and Compressibility * CF_bit uses bits and 4-byte target addresses for indirect branches

20 Normalize by uncompress BB_pc size Normalize by uncompress Mem_addr CF_bit not compress well

21 Related Work Whole Execution Traces (WET) Simulation environment Whole Program Paths (eWPP) Encode trace information in WPP Whole Program Paths (WPP) They have difficulties to support multi-thread applications

22 Conclusion DEP captures major program execution Control flow, memory reference DEP collected by Adept which can perform on-line or off- line analysis Adept builds the mapping between collected information and original apps. Experiment results show 5 times slowdown and save 40% space compared to traditional profiles Complete trace to recover whole program execution is not necessarily, particular segment can be reproduced for simulations or replay

23 Back-up Slides

24 Recovering memory reference trace Using naïve approach of recovering the memory reference trace from a DEP

25 Recovering Memory References Scenario 1 : complete memory reference profile { pc,addr,size,type} Scenario 2 : DEP collected by Adept Scenario 2 almost triple of native execution time Tradeoff


Download ppt "Qin Zhao1, Joon Edward Sim2, WengFai Wong1,2 1SingaporeMIT Alliance 2Department of Computer Science National University of Singapore"

Similar presentations


Ads by Google