Presentation is loading. Please wait.

Presentation is loading. Please wait.

Program Analysis and Transformation

Similar presentations


Presentation on theme: "Program Analysis and Transformation"— Presentation transcript:

1 Program Analysis and Transformation

2 Program Analysis Extracting information, in order to present abstractions of, or answer questions about, a software system Static Analysis: Examines the source code Dynamic Analysis: Examines the system as it is executing 8-Dec-18 COSC6431

3 What are we looking for? Depends on our goals and the system
In almost any language, we can find out information about variable usage In an OO environment, we can find out which classes use other classes, which are a base of an inheritance structure, etc. We can also find potential blocks of code that can never be executed in running the program (dead code) Typically, the information extracted is in terms of entities and relationships 8-Dec-18 COSC6431

4 Entities Entities are individuals that live in the system, and attributes associated with them. Some examples: Classes, along with information about their superclass, their scope, and ‘where’ in the code they exist. Methods/functions and what their return type or parameter list is, etc. Variables and what their types are, and whether or not they are static, etc. 8-Dec-18 COSC6431

5 Relationships Relationships are interactions between the entities in the system. Relationships include: Classes inheriting from one another. Methods in one class calling the methods of another class, and methods within the same class calling one another. A method referencing an attribute. 8-Dec-18 COSC6431

6 Information format Many different formats in use
Simple but effective: RSF inherit TRIANGLE SHAPE TA is an extension of RSF that includes a schema $INSTANCE SHAPE Class GXL is a XML-like extension of TA Blow-up factor of 10 or more makes it rather cumbersome 8-Dec-18 COSC6431

7 Static Analysis Involves parsing the source code
Usually creates an Abstract Syntax Tree Borrows heavily from compiler technology but stops before code generation Requires a grammar for the programming language Can be very difficult to get right 8-Dec-18 COSC6431

8 CppETS CppETS is a benchmark for C++ extractors
It consists of a collection of C++ programs that pose various problems commonly found in parsing and reverse engineering Static analysis research tools typically get about 60% of the problems right 8-Dec-18 COSC6431

9 Example program #include <iostream.h> class Hello {
public: Hello(); ~Hello(); }; Hello::Hello() { cout << "Hello, world.\n"; } Hello::~Hello() { cout << "Goodbye, cruel world.\n"; } main() { Hello h; return 0; } 8-Dec-18 COSC6431

10 Example Q&A How many member methods are in the Hello class?
Where are these member methods used? Answer: Two, the constructor (Hello::Hello()) and destructor (Hello::~Hello()). Answer: The constructor is called implicitly when an instance of the class is created. The destructor is called implicitly when the execution leaves the scope of the instance. 8-Dec-18 COSC6431

11 Static analysis in IDEs
High-level languages lend themselves better to static analysis needs EiffelStudio automatically creates BON diagrams of the static structure of Eiffel systems Rational Rose does the same with UML and Java Unfortunately, most legacy systems are not written in either of these languages 8-Dec-18 COSC6431

12 Static analysis pipeline
Source code Parser Abstract Syntax Tree Fact extractor Clustering algorithm Fact base Visualizer Metrics tool 8-Dec-18 COSC6431

13 Dynamic Analysis Provides information about the run-time behaviour of software systems, e.g. Component interactions Event traces Concurrent behaviour Code coverage Memory management Can be done with a profiler or a debugger 8-Dec-18 COSC6431

14 Instrumentation Augments the subject program with code that transmits events to a monitoring application, or writes relevant information to an output file A profiler can be used to examine the output file and extract relevant facts from it Instrumentation affects the execution speed and storage space requirements of the system 8-Dec-18 COSC6431

15 Instrumentation process
Source code Annotator Annotated program Annotation script Compiler Instrumented executable 8-Dec-18 COSC6431

16 Dynamic analysis pipeline
Instrumented executable CPU Dynamic analysis data Profiler Clustering algorithm Fact base Visualizer Metrics tool 8-Dec-18 COSC6431

17 Non-instrumented approach
One can also use debugger log files to obtain dynamic information Disadvantage: Limited amount of information provided Advantage: Less intrusive approach, more accurate performance measurements 8-Dec-18 COSC6431

18 Dynamic analysis issues
Ensuring good code coverage is a key concern A comprehensive test suite is required to ensure that all paths in the code will be exercised Results may not generalize to future executions 8-Dec-18 COSC6431

19 Static vs. Dynamic Reasons over all possible behaviours (general results) Conservative and sound Challenge: Choose good abstractions Observes a small number of behaviours (specific results) Precise and fast Challenge: Select representative test cases 8-Dec-18 COSC6431

20 SWAGKit SWAGKit is used to generate software landscapes from source code Based on a pipeline architecture with three phases Extract (cppx) Manipulate (prep, linkplus, layoutplus) Present (lsedit) Currently usable for programs written in C/C++ 8-Dec-18 COSC6431

21 The SWAGKit Pipeline Source Code cppx prep linkplus layoutplus lsedit
Landscape 8-Dec-18 COSC6431

22 The SWAGKit Pipeline Function Filter Input Output Extract cppx source
.ta Manipulate prep .o.ta Linkplus *.o.ta out.ln.ta Layoutplus out.ls.ta Present lsedit picture 8-Dec-18 COSC6431

23 cppx & prep C/C++ Fact extractor based on gcc ( Extracts facts from one source file at a time Facts represent program information as a series of triples $INSTANCE x integer == x is an integer inherit Student Person == Student inherits from Person call foo bar == foo calls bar Produces .c.ta files, one per source file Use –g option for gcc parameters 8-Dec-18 COSC6431

24 cppx & prep Prep is a series of scripts written in Grok
Function is to “clean up” facts from cppx so they are in a form which can be usable by the rest of the pipeline. Produces one .o.ta for each .ta Can replace “manual” use of cppx & prep with gce Edit makefile, replace gcc with gce Type make 8-Dec-18 COSC6431

25 Grok A simple scripting language A relational algebraic calculator
Powerful in manipulating binary relations Widely used in architecture transformation Online documentation 8-Dec-18 COSC6431

26 Grok Features Set operations Binary relation operations
Union (+), intersection (^), subtraction (-), cross-product (X) Binary relation operations Union (+), intersection (^), subtraction (-), composition (o, *), projection (.), domain (dom), range (rng), identity (id), inverse (inv), entity (ent), transitive closure (+), and reflective transitive closure (*) 8-Dec-18 COSC6431

27 Grok Features Cont. Programming constructs
if else for, while Arithmetic, comparison, logical operators +, -, *, /, % <, <=, ==, >=, >, != !, &&, || 8-Dec-18 COSC6431

28 Grok Scripts (1) 8-Dec-18 COSC6431 $ Grok
>> cat := {“Garfield”, “Fluffy”} >> mouse := {“Mickey”, “Nancy”} >> cheese := {“Roquefort”, “Swiss”} >> animals := cat + mouse >> food := mouse + cheese >> animalsWhichAreFood := animals ^ food >> animalsWhichAreNotFood := animals – food >> animalsWhichAreFood Mickey Nancy >> animals – food Garfield Fluffy >> #food 4 >> mouse <= food True >> >> chase := cat X mouse >> chase Garfield Mickey Garfield Nancy Fluffy Mickey Fluffy Nancy >> >> eat := chase + mouse X cheese >> eat Mickey Roquefort Mickey Swiss Nancy Roquefort Nancy Swiss 8-Dec-18 COSC6431

29 Grok Scripts (2) >> {“Mickey”} . eat Roquefort Swiss
>> eat . {“Mickey”} Garfield Fluffy >> >> eater := dom eat >> food := rng eat >> chasedBy := inv chase >> topOfFoodChain := dom eat – rng eat >> bottomOfFoodChain := rng eat – dom eat >> bothEatAndChase :=  eat ^ chase >> eatButNotChase := eat – chase >> chaseButNotEat := chase – eat >> secondOrderEat :=  eat  o  eat >> anyOrderEat := eat + Programming constructs if expression { statements } else { } while expression { for variable in expression { 8-Dec-18 COSC6431

30 A real example Input: A containment tree
containFacts := $1 getdb containFacts d := dom contain r := rng contain e := ent contain root := d – r leaves := r – d rootChildren := root . contain toKeep := leaves + rootChildren toDelete := e – toKeep cc := contain+ delset toDelete delrel contain contain := cc relToFile contain $2 Input: A containment tree Output: A flattened version of the containment tree 8-Dec-18 COSC6431

31 linkplus Function is to “link” all facts into one large graph Usage:
Combine graphs from .o.ta files Resolve inter-compilation unit relationships Merge header files together Do some cleanup to shrink final graph Usage: linkplus list_of_files_to_link Produces out.ln.ta 8-Dec-18 COSC6431

32 layoutplus Adds Usage Produces out.ls.ta
Clustering of facts based on contain.rsf (created manually or from a clustering algorithm Layout information so that graph can be displayed Schema information Usage layoutplus contain_file out.ln.ta Produces out.ls.ta 8-Dec-18 COSC6431

33 lsedit View software landscape produced by previous parts of the pipeline Can make changes to landscape and save them Usage lsedit out.ls.ta 8-Dec-18 COSC6431

34 Program Representation
Fundamental issue in re-engineering Provides means to generate abstractions Provides input to a computational model for analyzing and reasoning about programs Provides means for translation and normalization of programs 8-Dec-18 COSC6431

35 Key questions What are the strengths and weaknesses of various representations of programs? What levels of abstraction are useful? 8-Dec-18 COSC6431

36 Abstract Syntax Trees A translation of the source text in terms of operands and operators Omits superficial details, such as comments, whitespace All necessary information to generate further abstractions is maintained 8-Dec-18 COSC6431

37 AST production Four necessary elements to produce an AST:
Lexical analyzer (turn input strings into tokens) Grammar (turn tokens into a parse tree) Domain Model (defines the nodes and arcs allowable in the AST) Linker (annotates the AST with global information, e.g. data types, scoping etc.) 8-Dec-18 COSC6431

38 AST example Input string: 1 + /* two */ 2 Parse Tree:
AST (without global info) + 1 2 Add arg1 arg2 int int 1 2 8-Dec-18 COSC6431

39 Program Transformation
12/8/2018 Program Transformation A program is a structured object with semantics Structure allows us to transform a program Semantics allow us to compare programs and decide on the validity of transformations 8-Dec-18 COSC6431

40 Program Transformation
The act of changing one program into another (from a source language to a target language) Used in many areas of software engineering: Compiler construction Software visualization Documentation generation Automatic software renovation 8-Dec-18 COSC6431

41 Application examples Converting to a new language dialect
Migrating from a procedural language to an object-oriented one, e.g. C to C++ Adding code comments Requirement upgrading, e.g. using 4 digits for years instead of 2 (Y2K) Structural improvements, e.g. changing GOTOs to control structures Pretty printing 8-Dec-18 COSC6431

42 Simple program transformation
Modify all arithmetic expressions to reduce the number of parentheses using the formula: (a+b)*c = a*c + b*c x := (2+5)*3 becomes x := 2*3 + 5*3 8-Dec-18 COSC6431

43 Two types of transformations
Translation Source and target language are different Semantics remain the same Rephrasing Source and target language are the same Goal is to improve some aspect of the program such as its understandability or performance Semantics might change 8-Dec-18 COSC6431

44 Translation Program synthesis Program migration Reverse Engineering
Lowers the level of abstraction, e.g. compilation Program migration Transform to a different language Reverse Engineering Raises the level of abstraction, e.g. create architectural descriptions from the source code Program Analysis Reduces the program to one aspect, e.g. control flow 8-Dec-18 COSC6431

45 Translation taxonomy 8-Dec-18 COSC6431

46 Rephrasing Program normalization Program optimization
Decreases syntactic complexity (desugaring), e.g. algebraic simplification of expressions Program optimization Improves performance, e.g. inlining, common-subexpression and dead code elimination 8-Dec-18 COSC6431

47 Rephrasing Program refactoring Program obfuscation Software renovation
Improves the design by restructuring while preserving the functionality Program obfuscation Deliberately makes the program harder to understand Software renovation Fixes bugs such as Y2K 8-Dec-18 COSC6431

48 Transformation tools There are many transformation tools
Program-Transformation.org lists 90 of them Most are based on term rewriting Other solutions use functional programming, lambda calculus, etc. 8-Dec-18 COSC6431

49 Term rewriting The process of simplifying symbolic expressions (terms) by means of a Rewrite System, i.e. a set of Rewrite Rules. A Rewrite Rule is of the form lhs rhs where lhs and rhs are term patterns 8-Dec-18 COSC6431

50 Example Rewrite System
0 + x x s(x) + y s(x + y) (x + y) + z x + (y + z) Under these rewrite rules, the term ((s(s(a)) + s(b)) + c) will be rewritten as s(s(s(a + (b + c)))) 8-Dec-18 COSC6431

51 TXL A generalized source-to-source translation system
Uses a context-free grammar to describe the structures to be transformed Rule specification uses a by-example style Has been used to process billions of lines of code for Y2K purposes 8-Dec-18 COSC6431

52 TXL programs TXL programs consist of two parts:
Grammar for the input language Transformation Rules Let’s look at some examples… 8-Dec-18 COSC6431

53 Calculator.Txl - Grammar
% Part I. Syntax specification define program [expression] end define define expression [term] | [expression] [addop] [term] define term [primary] | [term] [mulop] [primary] define primary [number] | ( [expression] ) end define define addop '+ | '- define mulop '* | '/ 8-Dec-18 COSC6431

54 Calculator.Txl - Rules % Part 2. Transformation rules rule main
replace [expression] E [expression] construct NewE [expression] E [resolveAddition] [resolveSubtraction] [resolveMultiplication] [resolveDivision] [resolveParentheses] where not NewE [= E] by NewE end rule rule resolveAddition replace [expression] N1 [number] + N2 [number] by N1 [+ N2] end rule rule resolveSubtraction … rule resolveMultiplication … rule resolveDivision … rule resolveParentheses replace [primary] ( N [number] ) by N 8-Dec-18 COSC6431

55 DotProduct.Txl % Form the dot product of two vectors,
% e.g., (1 2 3).(3 2 1) => 10 define program ( [repeat number] ) . ( [repeat number] ) | [number] end define rule main replace [program] ( V1 [repeat number] ) . ( V2 [repeat number] ) construct Zero [number] by Zero [addDotProduct V1 V2] end rule rule addDotProduct V1 [repeat number] V2 [repeat number] deconstruct V1 First1 [number] Rest1 [repeat number] deconstruct V2 First2 [number] Rest2 [repeat number] construct ProductOfFirsts [number] First1 [* First2] replace [number] N [number] by N [+ ProductOfFirsts] [addDotProduct Rest1 Rest2] end rule 8-Dec-18 COSC6431

56 Sort.Txl % Sort.Txl - simple numeric bubble sort define program
[repeat number] end define rule main replace [repeat number] N1 [number] N2 [number] Rest [repeat number] where N1 [> N2] by N2 N1 Rest end rule 8-Dec-18 COSC6431

57 Other TXL constructs compounds -> := end compounds keys
-> := end compounds keys var procedure exists inout out end keys function isAnAssignmentTo X [id] match [statement] X := Y [expression] end function 8-Dec-18 COSC6431

58 www.txl.ca Guided Tour Many examples Reference manual
Download TXL for many platforms 8-Dec-18 COSC6431

59 Example uses HTML Pretty Printing of Source Code
Language to Language Translation Design Recovery from Source Improvement of security problems Program instrumentation and measurement Logical formula simplification and interpretation. 8-Dec-18 COSC6431


Download ppt "Program Analysis and Transformation"

Similar presentations


Ads by Google