1 Compression Techniques to Simplify the Analysis of Large Execution Traces Abdelwahab Hamou-Lhadj and Dr. Timothy C. Lethbridge {ahamou,

Slides:



Advertisements
Similar presentations
Data-Flow Analysis II CS 671 March 13, CS 671 – Spring Data-Flow Analysis Gather conservative, approximate information about what a program.
Advertisements

Course Outline Traditional Static Program Analysis Software Testing
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Introduction to Computer Science 2 Lecture 7: Extended binary trees
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Lecture 3: Parallel Algorithm Design
Graphs Graphs are the most general data structures we will study in this course. A graph is a more general version of connected nodes than the tree. Both.
Ant colony algorithm Ant colony algorithm mimics the behavior of insect colonies completing their activities Ant colony looking for food Solving a problem.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
Huffman Coding: An Application of Binary Trees and Priority Queues
Complexity Analysis (Part I)
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
CS 536 Spring Intermediate Code. Local Optimizations. Lecture 22.
A Data Compression Algorithm: Huffman Compression
CS 201 Compiler Construction
DEPARTMENT OF COMPUTER SCIENCE SOFTWARE ENGINEERING, GRAPHICS, AND VISUALIZATION RESEARCH GROUP 15th International Conference on Information Visualisation.
1 Program Analysis Mooly Sagiv Tel Aviv University Textbook: Principles of Program Analysis.
CS 310 – Fall 2006 Pacific University CS310 Parsing with Context Free Grammars Today’s reference: Compilers: Principles, Techniques, and Tools by: Aho,
CS 206 Introduction to Computer Science II 04 / 29 / 2009 Instructor: Michael Eckmann.
1 Copy Propagation What does it mean? – Given an assignment x = y, replace later uses of x with uses of y, provided there are no intervening assignments.
CS 206 Introduction to Computer Science II 12 / 10 / 2008 Instructor: Michael Eckmann.
Chapter 1 Principles of Programming and Software Engineering.
Branch and Bound Algorithm for Solving Integer Linear Programming
Improving Code Generation Honors Compilers April 16 th 2002.
Recurrence Relations Reading Material –Chapter 2 as a whole, but in particular Section 2.8 –Chapter 4 from Cormen’s Book.
Improving code generation. Better code generation requires greater context Over expressions: optimal ordering of subtrees Over basic blocks: Common subexpression.
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Data Flow Analysis Compiler Design Nov. 8, 2005.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
PSUCS322 HM 1 Languages and Compiler Design II IR Code Optimization Material provided by Prof. Jingke Li Stolen with pride and modified by Herb Mayer PSU.
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
HORSEED International University
272: Software Engineering Fall 2012 Instructor: Tevfik Bultan Lecture 4: SMT-based Bounded Model Checking of Concurrent Software.
Data Structures and Algorithms Huffman compression: An Application of Binary Trees and Priority Queues.
Automated malware classification based on network behavior
Factor Graphs Young Ki Baik Computer Vision Lab. Seoul National University.
5.3 Machine-Independent Compiler Features
Computer Vision – Compression(2) Hanyang University Jong-Il Park.
Advanced Algorithms Analysis and Design Lecture 8 (Continue Lecture 7…..) Elementry Data Structures By Engr Huma Ayub Vine.
JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina.
June 27, 2002 HornstrupCentret1 Using Compile-time Techniques to Generate and Visualize Invariants for Algorithm Explanation Thursday, 27 June :00-13:30.
Chapters 7, 8, & 9 Quiz 3 Review 1. 2 Algorithms Algorithm A set of unambiguous instructions for solving a problem or subproblem in a finite amount of.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Inferring Specifications to Detect Errors in Code Mana Taghdiri Presented by: Robert Seater MIT Computer Science & AI Lab.
Summarizing the Content of Large Traces to Facilitate the Understanding of the Behaviour of a Software System Abdelwahab Hamou-Lhadj Timothy Lethbridge.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Ant colony algorithm Ant colony algorithm mimics the behavior of insect colonies completing their activities Ant colony looking for food Solving a problem.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
Notes on Sequence Binary Decision Diagrams: Relationship to Acyclic Automata and Complexities of Binary Set Operations Shuhei Denzumi1, Ryo Yoshinaka2,
Compiler Principles Fall Compiler Principles Lecture 0: Local Optimizations Roman Manevich Ben-Gurion University.
Lecture 3: Uninformed Search
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
© 2006 Pearson Addison-Wesley. All rights reserved2-1 Chapter 2 Principles of Programming & Software Engineering.
© 2006 Pearson Addison-Wesley. All rights reserved 2-1 Chapter 2 Principles of Programming & Software Engineering.
Generating Software Documentation in Use Case Maps from Filtered Execution Traces Edna Braun, Daniel Amyot, Timothy Lethbridge University of Ottawa, Canada.
SOFTWARE TESTING. Introduction Software Testing is the process of executing a program or system with the intent of finding errors. It involves any activity.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 1 Extracting Sequence.
CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.
DATA STRUCTURES (CS212D) Overview & Review Instructor Information 2  Instructor Information:  Dr. Radwa El Shawi  Room: 
Chapter 7 Lossless Compression Algorithms 7.1 Introduction 7.2 Basics of Information Theory 7.3 Run-Length Coding 7.4 Variable-Length Coding (VLC) 7.5.
SOFTWARE TESTING LECTURE 9. OBSERVATIONS ABOUT TESTING “ Testing is the process of executing a program with the intention of finding errors. ” – Myers.
DATA STRUCURES II CSC QUIZ 1. What is Data Structure ? 2. Mention the classifications of data structure giving example of each. 3. Briefly explain.
CHAPTER 1 INTRODUCTION TO COMPILER SUNG-DONG KIM, DEPT. OF COMPUTER ENGINEERING, HANSUNG UNIVERSITY.
Topic 2: binary Trees COMP2003J: Data Structures and Algorithms 2
Automatic Network Protocol Analysis
Software Testing.
Podcast Ch23d Title: Huffman Compression
Presentation transcript:

1 Compression Techniques to Simplify the Analysis of Large Execution Traces Abdelwahab Hamou-Lhadj and Dr. Timothy C. Lethbridge {ahamou, University of Ottawa - Canada IWPC Paris

2 Introduction Execution traces are important to understand the behavior and sometimes the structure of a software system Execution traces tend to be very large and need to be compressed In this presentation, we present techniques for compressing traces of procedure calls We also show the results of our techniques when applied to two different software systems

3 Why Traces of Procedure Calls? Many of today’s legacy systems were developed using the procedural paradigm The flow of procedure calls can be useful to comprehend the execution of a particular software feature The level of abstraction of traces of procedure calls tend to be not too low and not too high Traces of method invocation become crucial when it comes to understand the behavior of object-oriented systems

4 Traditional Compression Techniques They are two types of compression techniques: lossy and lossless compression In Information theory, most of the compression algorithms are based on the same principle (David Salomon, 2000) Compressing data by removing redundancy These techniques produce good results, however The information, once compressed, is no longer readable by humans. Such algorithms certainly will not help in program comprehension

5 Trace Compression Steps Preprocess the trace by removing the contiguous redundancies due to loops and recursion Represent the trace as a rooted ordered labeled tree Detect the non-contiguous redundancies and represent them only once this problem is also known as the common subexpression problem and can be solved in linear time Analyze the compressed version and estimate the gain

6 Preprocessing Stage Redundant calls caused by loops and recursion tend to encumber the trace and should be removed the number of occurrences is stored to reconstruct the original trace Removing the redundant calls is one form of compression that could make the trace more readable If the trace is perceived as a tree, removing contiguous redundancies reduce the depth of the tree and the degree of its nodes

7 The Common Subexpression Problem Introduced by J.P. Downey, R. Sethi and R.E. Tarjan “Any tree can be represented in a maximally compact form as a directed acyclic graph where common subtrees are factored and shared, being represented only once” - Flajolet, Sipala and Steyaert The process of compacting the tree is known as the common subexpression problem also called “subtree factoring” If we consider trees with a finite number of nodes so that the degrees are bounded by some constant... “The compacted form of a tree can be computed in expected time O(n) using a top-down recursive procedure in conjecture with hashing...” - Flajolet, Sipala and Steyaert

8 Example A D C B E C CB D AEDCB Input tree: 9 nodes and 8 linksThe Compressed form: 5 nodes and 6 links

9 The Algorithm Introduced by P. Flajolet, P. Sipala, J.–M. Steyaert and improved by G. Valiente The algorithm assigns a positive number called certificate to each node Two nodes have the same certificate if, and only if the trees rooted at them are isomorphic. The certificate of a node n is obtained by building a sequence [L(n), a 1,...., a m ] called the signature of the node, where L(n) is the label of the node, a 1,..., a m are the certificates of the children of the node. The certificates and signatures are stored in a global table

10 Example NodeSignatureCertificate A[A, 4, 3]5 E[E, 3, 2]4 D[D, 2, 1]3 C[C, 0, 0]2 B[B, 0, 0]1 A D C B E C CB D

11 The Algorithm Steps (iterative version) The algorithm performs a bottom-up traversal of the tree using a queue 1. For each node n 2. Build a signature for n 3. If the signature already exists in the global table then 4. Return the corresponding certificate Else 5. Create a new certificate 6. Update the table 7. Assign the certificate to the node If the degree of the tree is bounded by a constant and a hash table is used to store the certificates then this algorithm performs in linear time

12 Experiment We experimented with traces of the following systems: XFIG (a drawing system under UNIX) A real world telecommunication system We are interested in the following results: The initial size of the trace n The size of the trace after preprocessing it n 1 The compression ratio r 1 such that r 1 = n 1 / n The size of the trace after using the common subexpression algorithm n 2. The compression ratio r 2 such that r 2 = n 2 / n

13 Results of the Experiment (XFIG System) Tracenn1n1 r 1 (%)n2n2 r 2 (%)

14 Some Considerations Regarding the Telecommunication System It is a large legacy system The traces are generated using an internal mechanism The traces tend to be incomplete. This is reflected as an inconsistency in the trace with respect to the nesting levels. Our solution to this problem is to complete the trace by filling up the gaps with virtual procedure calls estimate the error ratio, which is the number of missing calls to the size of the original trace. e = g / (g+n)

15 Tracenge(%)n1n1 r 1 (%)n2n2 r 2 (%) Results of the Experiment (Telecom. System)

16 Before the preprocessing stepAfter the preprocessing step Variation of the degrees of the tree according to depth (3 traces of XFIG)

17 Variation of the degrees of the tree according to depth (3 traces of the telecom. system) Before the preprocessing step After the preprocessing step

18 Discussion Procedure-call traces could be considerably compressed in a way that preserves the ability for humans to understand them Possible improvement look for procedures that are not of a great interest to software engineers remove them before the compression process The preprocessing stage could be very useful to reduce the trace size increase of the performance of the common subexpression algorithm

19 Conclusions and future directions The results shown in this presentation can help build better tools based on execution traces We intend to conduct more experiments with this framework to see how helpful it is to software engineers Future directions should focus on lossy compression.Types of information eliminated can include: the number of repetitions, the order of calls, and some lower- level utility procedures The non-contiguous redundancies can be used to determine other features of the system

20

21 Results of the Experiment (XFIG System) With procedures and files Tracenn1n1 r 1 (%)n2n2 r 2 (%)# Proc.# Files

22 Results of the Experiment (Telecom. System) with procedures and files Tracenge(%)n1n1 r 1 (%)n2n2 r 2 (%)# Proc.# Files