Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

Similar presentations


Presentation on theme: "Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin."— Presentation transcript:

1 Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin

2 State Machines Used to model software behaviour load exit close edit save as ok Documentation Inspection / review Model-based testing Model checking

3 State Machines Used to model software behaviour load exit close edit save as ok Documentation Inspection / review Model-based testing Model checking Only useful if complete and up-to-date Usually not the case due to time constraints and software evolution

4 Reverse Engineering State Machines Static analysis – analysis of source code – symbolic execution, flow analyses,... – Inevitably considers executions that are infeasible in practice Dynamic analysis – infer model from sample executions – Favoured for accuracy – States considered equal if subsequent trace is similar – Variants of the k-tails algorithm [Biermann, Feldman- 1972] most common reverse engineering algorithm

5 Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal load edit save_as ok edit

6 load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2

7 load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2 load edit save_as edit ok

8 load edit save_as ok edit Traditional Approach For any point in a trace, its k-tail is the following sequence of k events or functions – Point x is considered equivalent to y if the k-tails are equal K=2 load edit save_as edit ok Remove Non determinism load save_as edit ok

9 Problems Too expensive if result is to be correct and complete: – Need complete set of executions up to certain length – Passive – all executions need to be presented at once If provided traces only partial (probable for non- trivial system) the resulting model is untrustworthy – Difficult to tell how complete the model is – what’s missing? load save_as edit ok load exit close edit save as ok

10 Regular Grammar Inference Given a set of valid and (optionally) invalid sentences from a language, infer its grammar. Regular grammars can be represented as deterministic finite state machines Problem of regular grammar inference equivalent to that of reverse engineering state machines Several sophisticated grammar inference techniques – Effectively address many problems that arise with current reverse-engineering approaches

11 Benefits of Adapting Grammar Inference Techniques Active techniques – Do not require set of executions to be presented at once – Interact with an oracle to identify missing information More efficient – Can efficiently process large sample sets. Reasonably accurate given sparse sets of executions – More sophisticated heuristics to accurately identify equivalent states

12 Query-Driven State Merging (QSM) Devised by Dupont et al. Combines benefits mentioned on previous slide – Active, efficient, reasonably accurate for sparse sets of sample executions Guaranteed to produce correct machine if set of sample executions is characteristic: – Must cover every transition in the target grammar – Enough positive and negative samples to differentiate between different states (to prevent false merges) – Questions aim to elicit characteristic sample from oracle

13 Query-Driven State Merging (QSM) load close exit edit save_as ok close exit edit close exit Generate “Prefix Tree Acceptor”

14 Query-Driven State Merging (QSM) load close exit edit save_as ok close exit edit close exit Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) ?

15 Query-Driven State Merging (QSM) Attempt merge Produce questions (executions valid in this machine, but not in unmerged version) If all questions answered yes, merge nodes Else add negative questions to graph load close exit edit save_as ok close exit edit close exit close, edit Active Efficient Accepts negative information about model

16 Implementation Use Eclipse TPTP to record traces – Sequence of method calls → Questions can either be answered manually – OR as tests directly to the system – Can vary number of questions generated QSM component accepts simple text files of strings (prefixed with “+” and “-”)

17

18 Evaluation Used traces to generate JHotDraw case study – Described in paper Generated random state machines – Subject to certain constraints – minimal, deterministic etc. – Three sets of 10 random machines (5, 25, 50 states) – Random paths over these machines = initial set of traces – Measured accuracy of final machine, and number of questions required

19

20

21 Current and Future Work Identify data constraints associated with states – Can use tools such as Daikon Automatically answer queries – Static analysis – using call graph analysis to automatically propose negative / impossible executions – Automated test generation Heuristics – can certain questions be safely ignored?

22 Conclusions Preliminary results show technique is reasonably accurate and efficient Can potentially be almost entirely automated – Automatically generates tests (questions), many of which can be eliminated by static analysis anyway Grammar Inference is useful source of ideas for dynamic analysis and reverse engineering


Download ppt "Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin."

Similar presentations


Ads by Google