Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Configuration management
Configuration management
University of Sheffield NLP Module 4: Machine Learning.
A System to Generate Test Data and Symbolically Execute Programs Lori A. Clarke September 1976.
Model Based Testing Course Software Testing & Verification 2013/14 Wishnu Prasetya.
Mahadevan Subramaniam and Bo Guo University of Nebraska at Omaha An Approach for Selecting Tests with Provable Guarantees.
SOFTWARE TESTING. INTRODUCTION  Software Testing is the process of executing a program or system with the intent of finding errors.  It involves any.
Model Checker In-The-Loop Flavio Lerda, Edmund M. Clarke Computer Science Department Jim Kapinski, Bruce H. Krogh Electrical & Computer Engineering MURI.
Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.
Overview Motivations Basic static and dynamic optimization methods ADAPT Dynamo.
Alternate Software Development Methodologies
ISBN Chapter 3 Describing Syntax and Semantics.
CMSC 345, Version 11/07 SD Vick from S. Mitchell Software Testing.
Proof System HY-566. Proof layer Next layer of SW is logic and proof layers. – allow the user to state any logical principles, – computer can to infer.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 23 Slide 1 Software testing.
Michael Ernst, page 1 Improving Test Suites via Operational Abstraction Michael Ernst MIT Lab for Computer Science Joint.
Hierarchical GUI Test Case Generation Using Automated Planning Atif M. Memon, Student Member, IEEE, Martha E. Pollack, and Mary Lou Soffa, Member, IEEE.
Testing an individual module
Testing Components in the Context of a System CMSC 737 Fall 2006 Sharath Srinivas.
Describing Syntax and Semantics
Formula Auditing, Data Validation, and Complex Problem Solving
© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.
Regression testing Tor Stållhane. What is regression testing – 1 Regression testing is testing done to check that a system update does not re- introduce.
1 Functional Testing Motivation Example Basic Methods Timing: 30 minutes.
Test Design Techniques
Software Testing Verification and validation planning Software inspections Software Inspection vs. Testing Automated static analysis Cleanroom software.
Dr. Pedro Mejia Alvarez Software Testing Slide 1 Software Testing: Building Test Cases.
Vulnerability-Specific Execution Filtering (VSEF) for Exploit Prevention on Commodity Software Authors: James Newsome, James Newsome, David Brumley, David.
Verification and Validation Yonsei University 2 nd Semester, 2014 Sanghyun Park.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Verification and Validation.
Class Specification Implementation Graph By: Njume Njinimbam Chi-Chang Sun.
CMSC 345 Fall 2000 Unit Testing. The testing process.
Chapter 2: Software Process Omar Meqdadi SE 2730 Lecture 2 Department of Computer Science and Software Engineering University of Wisconsin-Platteville.
1 A Static Analysis Approach for Automatically Generating Test Cases for Web Applications Presented by: Beverly Leung Fahim Rahman.
Configuration Management (CM)
What is software testing? 1 What are the problems of software testing? 2 Time is limited Applications are complex Requirements are fluid.
Which Configuration Option Should I Change? Sai Zhang, Michael D. Ernst University of Washington Presented by: Kıvanç Muşlu.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
Grey Box testing Tor Stålhane. What is Grey Box testing Grey Box testing is testing done with limited knowledge of the internal of the system. Grey Box.
Testing Testing Techniques to Design Tests. Testing:Example Problem: Find a mode and its frequency given an ordered list (array) of with one or more integer.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 22 Slide 1 Software Verification, Validation and Testing.
Automatically Repairing Broken Workflows for Evolving GUI Applications Sai Zhang University of Washington Joint work with: Hao Lü, Michael D. Ernst.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
CASE/Re-factoring and program slicing
1 A Plethora of Paths Eric Larson May 18, 2009 Seattle University.
Learning Symbolic Interfaces of Software Components Zvonimir Rakamarić.
Comparing model-based and dynamic event-extraction based GUI testing techniques : An empirical study Gigon Bae, Gregg Rothermel, Doo-Hwan Bae The Journal.
Generating Software Documentation in Use Case Maps from Filtered Execution Traces Edna Braun, Daniel Amyot, Timothy Lethbridge University of Ottawa, Canada.
Software Engineering1  Verification: The software should conform to its specification  Validation: The software should do what the user really requires.
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
UNIT-III Group Technology and Computer Aided Process Planning
Survey of Tools to Support Safe Adaptation with Validation Alain Esteva-Ramirez School of Computing and Information Sciences Florida International University.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software Processes.
Requirements Engineering Requirements Validation and Management Lecture-24.
Recommending Adaptive Changes for Framework Evolution Barthélémy Dagenais and Martin P. Robillard ICSE08 Dec 4 th, 2008 Presented by EJ Park.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
MOPS: an Infrastructure for Examining Security Properties of Software Authors Hao Chen and David Wagner Appears in ACM Conference on Computer and Communications.
Software Testing Reference: Software Engineering, Ian Sommerville, 6 th edition, Chapter 20.
Foundations of Software Testing Chapter 5: Test Selection, Minimization, and Prioritization for Regression Testing Last update: September 3, 2007 These.
1 Visual Computing Institute | Prof. Dr. Torsten W. Kuhlen Virtual Reality & Immersive Visualization Till Petersen-Krauß | GUI Testing | GUI.
1 Process activities. 2 Software specification Software design and implementation Software validation Software evolution.
Introduction to Compiler Construction
Software Testing.
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Lecture 09:Software Testing
Lesson Objectives Aims Understand how machine code is generated
Graph Coverage for Specifications CS 4501 / 6501 Software Testing
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Regression testing Tor Stållhane.
Presentation transcript:

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin

State Machines Used to model software behaviour load exit close edit save as ok Documentation Inspection / review Model-based testing Model checking

State Machines Used to model software behaviour load exit close edit save as ok Documentation Inspection / review Model-based testing Model checking Only useful if complete and up-to-date Usually not the case due to time constraints and software evolution

Reverse Engineering State Machines Static analysis – analysis of source code – symbolic execution, flow analyses,... – Inevitably considers executions that are infeasible in practice Dynamic analysis – infer model from sample executions – Favoured for accuracy – States considered equal if subsequent trace is similar – Variants of the k-tails algorithm [Biermann, Feldman- 1972] most common reverse engineering algorithm

Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal load edit save_as ok edit

load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2

load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2 load edit save_as edit ok

load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2 load edit save_as edit ok Remove Non determinism load save_as edit ok

Problems Too expensive if result is to be correct and complete: – Need complete set of executions up to certain length – Passive – all executions need to be presented at once If provided traces only partial (probable for non- trivial system) the resulting model is untrustworthy – Difficult to tell how complete the model is – what’s missing? load save_as edit ok load exit close edit save as ok

Regular Grammar Inference Given a set of valid and (optionally) invalid sentences from a language, infer its grammar. Regular grammars can be represented as deterministic finite state machines Problem of regular grammar inference equivalent to that of reverse engineering state machines Several sophisticated grammar inference techniques – Effectively address many problems that arise with current reverse-engineering approaches

Benefits of Adapting Grammar Inference Techniques Active techniques – Do not require set of executions to be presented at once – Interact with an oracle to identify missing information More efficient – Can efficiently process large sample sets. Reasonably accurate given sparse sets of executions – More sophisticated heuristics to accurately identify equivalent states

Query-Driven State Merging (QSM) Devised by Dupont et al. Combines benefits mentioned on previous slide – Active, efficient, reasonably accurate for sparse sets of sample executions Guaranteed to produce correct machine if set of sample executions is characteristic: – Must cover every transition in the target grammar – Enough positive and negative samples to differentiate between different states (to prevent false merges) – Questions aim to elicit characteristic sample from oracle

Query-Driven State Merging (QSM) load close exit edit save_as ok close exit edit close exit Generate “Prefix Tree Acceptor”

Query-Driven State Merging (QSM) load close exit edit save_as ok close exit edit close exit Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) ?

Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) If all questions answered yes, merge nodes Else add negative questions to graph load close exit edit save_as ok close exit edit close exit close, edit Active Efficient Accepts negative information about model

Implementation Use Eclipse TPTP to record traces – Sequence of method calls → Questions can either be answered manually – OR as tests directly to the system – Can vary number of questions generated QSM component accepts simple text files of strings (prefixed with “+” and “-”)

Evaluation Used traces to generate JHotDraw case study – Described in paper Generated random state machines – Subject to certain constraints – minimal, deterministic etc. – Three sets of 10 random machines (5, 25, 50 states) – Random paths over these machines = initial set of traces – Measured accuracy of final machine, and number of questions required

Current and Future Work Identify data constraints associated with states – Can use tools such as Daikon Automatically answer queries – Static analysis – using call graph analysis to automatically propose negative / impossible executions – Automated test generation Heuristics – can certain questions be safely ignored?

Conclusions Preliminary results show technique is reasonably accurate and efficient Can potentially be almost entirely automated – Automatically generates tests (questions), many of which can be eliminated by static analysis anyway Grammar Inference is useful source of ideas for dynamic analysis and reverse engineering