Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University.

Slides:



Advertisements
Similar presentations
Kai Pan, Xintao Wu University of North Carolina at Charlotte Database State Generation via Dynamic Symbolic Execution for Coverage Criteria Tao Xie North.
Advertisements

Masahiro Fujita Yoshihisa Kojima University of Tokyo May 2, 2008
Symbolic Execution with Mixed Concrete-Symbolic Solving
Parallel Symbolic Execution for Structural Test Generation Matt Staats Corina Pasareanu ISSTA 2010.
1 Symbolic Execution for Model Checking and Testing Corina Păsăreanu (Kestrel) Joint work with Sarfraz Khurshid (MIT) and Willem Visser (RIACS)
Effectively Prioritizing Tests in Development Environment
A survey of techniques for precise program slicing Komondoor V. Raghavan Indian Institute of Science, Bangalore.
CSE503: SOFTWARE ENGINEERING SYMBOLIC TESTING, AUTOMATED TEST GENERATION … AND MORE! David Notkin Spring 2011.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Automated Analysis and Code Generation for Domain-Specific Models George Edwards Center for Systems and Software Engineering University of Southern California.
Precise Inter-procedural Analysis Sumit Gulwani George C. Necula using Random Interpretation presented by Kian Win Ong UC Berkeley.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Object-Oriented Analysis and Design Lecture 10 Implementation (from Schach, “O-O and Classical Software Engineering”)
White Box Testing Techniques Dynamic Testing. White box testing(1) Source code is known and used for test design While executing the test cases, the internal.
Ch6: Software Verification. 1 Statement coverage criterion  Informally:  Formally:  Difficult to minimize the number of test cases and still ensure.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
1 Software Testing and Quality Assurance Lecture 30 – Testing Systems.
Software Testing and QA Theory and Practice (Chapter 4: Control Flow Testing) © Naik & Tripathy 1 Software Testing and Quality Assurance Theory and Practice.
Presenter: Miguel Garzon Torres CrUise Lab - SITE SQL Coverage Measurement for Testing Database Applications María José Suárez-Cabal University of Oviedo.
University of Toronto Department of Computer Science © 2001, Steve Easterbrook CSC444 Lec22 1 Lecture 22: Software Measurement Basics of software measurement.
CS527: (Advanced) Topics in Software Engineering Overview of Software Quality Assurance Tao Xie ©D. Marinov, T. Xie.
Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.
Tao Xie North Carolina State University Supported by CACC/NSA Related projects supported in part by ARO, NSF, SOSI.
Automated Testing of System Software (Virtual Machine Monitors) Tao Xie Department of Computer Science North Carolina State University
CUTE: A Concolic Unit Testing Engine for C Technical Report Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
Oracle10g Developer: PL/SQL Programming1 Objectives Manipulating data with cursors Managing errors with exception handlers Addressing exception-handling.
Chapter 4 Cursors and Exception Handling Oracle10g Developer:
DySy: Dynamic Symbolic Execution for Invariant Inference.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Tao Xie Automated Software Engineering Group Department of Computer Science North Carolina State University
Evolution of Programming Languages Generations of PLs.
Teaching and Learning Programming and Software Engineering via Interactive Gaming Tao Xie University of Illinois at Urbana-Champaign In collaboration with.
Access Path Selection in a Relational Database Management System Selinger et al.
1 Hybrid-Formal Coverage Convergence Dan Benua Synopsys Verification Group January 18, 2010.
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.
Chapter 15 Introduction to PL/SQL. Chapter Objectives  Explain the benefits of using PL/SQL blocks versus several SQL statements  Identify the sections.
Alattin: Mining Alternative Patterns for Detecting Neglected Conditions Suresh Thummalapenta and Tao Xie Department of Computer Science North Carolina.
1 Test Selection for Result Inspection via Mining Predicate Rules Wujie Zheng
1 UP MBT Extending the Unified Process with Model-Based Testing Fabrice Bouquet, Stéphane Debricon, Bruno Legeard and Jean-Daniel Nicolet MoDeV 2 a 2006.
Xusheng Xiao North Carolina State University CSC 720 Project Presentation 1.
Computer Science 1 Mining Likely Properties of Access Control Policies via Association Rule Mining JeeHyun Hwang 1, Tao Xie 1, Vincent Hu 2 and Mine Altunay.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Cooperative Developer Testing: Tao Xie North Carolina State University In collaboration with Xusheng ASE and Nikolai Tillmann, Peli de
Symbolic and Concolic Execution of Programs Information Security, CS 526 Omar Chowdhury 10/7/2015Information Security, CS 5261.
A Test Case + Mock Class Generator for Coding Against Interfaces Mainul Islam, Christoph Csallner Software Engineering Research Center (SERC) Computer.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 Exposing Behavioral Differences in Cross-Language API Mapping Relations Hao Zhong Suresh Thummalapenta Tao Xie Institute of Software, CAS, China IBM.
Computer Science 1 Systematic Structural Testing of Firewall Policies JeeHyun Hwang 1, Tao Xie 1, Fei Chen 2, and Alex Liu 2 North Carolina State University.
32nd International Conference on Very Large Data Bases September , 2006 Seoul, Korea Efficient Detection of Empty Result Queries Gang Luo IBM T.J.
Improving Structural Testing of Object-Oriented Programs via Integrating Evolutionary Testing and Symbolic Execution Kobi Inkumsah Tao Xie Dept. of Computer.
CUTE: A Concolic Unit Testing Engine for C Koushik SenDarko MarinovGul Agha University of Illinois Urbana-Champaign.
1 An infrastructure for context-awareness based on first order logic 송지수 ISI LAB.
Random Test Generation of Unit Tests: Randoop Experience
Symbolic Execution in Software Engineering By Xusheng Xiao Xi Ge Dayoung Lee Towards Partial fulfillment for Course 707.
Symstra: A Framework for Generating Object-Oriented Unit Tests using Symbolic Execution Tao Xie, Darko Marinov, Wolfram Schulte, and David Notkin University.
CS223: Software Engineering Lecture 26: Software Testing.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Generating Automated Tests from Behavior Models
Control Flow Testing Handouts
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 4 Control Flow Testing
Software Engineering (CSI 321)
Automated Pattern Based Mobile Testing
Outline of the Chapter Basic Idea Outline of Control Flow Testing
A Test Case + Mock Class Generator for Coding Against Interfaces
Web Data Extraction Based on Partial Tree Alignment
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CUTE: A Concolic Unit Testing Engine for C
Presentation transcript:

Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University 26th IEEE/ACM International Conference on Automated Software Engineering Nov 11, 2011 Lawrence, Kansas

2 Functional Testing Test Generation Program Inputs Background

3 Test Generation Program Inputs Background Database States Functional Testing

4 Program inputs Database An Example

Motivation 5

Represent real-world objects’ characteristics, helping detect faults that could cause failures in real-world settings Reduce cost of generating new database records 6 Benefits to use an existing database state

Dynamic Symbolic Execution (DSE) Execute the program in both concrete and symbolic way (also called concolic testing) Collect constraints along executed path as path condition Negate part of the path condition and solve the new path condition to lead to new path DSE tools for various program languages Pex for.NET from Microsoft Research 7

Motivation 8 Path Condition: C1: Query construction constraints

Motivation 9 Path Condition: C1: Query construction constraints C2: Query/DB constraints

Motivation 10 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints

Motivation 11 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3

Motivation 12 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3 A hard part

Motivation 13 How to derive high-covering program input values based on a given database state?

Outline Background Approach Evaluation Conclusion and future work 14

SQL query forms Fundamental structure: SELECT, FROM, WHERE, GROUP BY, and HAVING clauses. SELECT select-list FROM from-list WHERE qualification (GROUP BY grouping-list) (HAVING group-qualification) 15

SQL query forms (cont’d) Nested query: a query with another query embedded within it Nested query can be unnested into equivalent single level canonical queries SELECT S.sname FROM Sailors S FROM Sailors S, Reserves R WHERE EXISTS ( SELECT * WHERE R.sid=S.sid AND R.bid=103 FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid) 16 transoformation rules A nested query Its canonical form

SQL query forms of focus WHERE clause consisting of a disjunction of conjunctions SELECT C1, C2,..., Ch FROM from-list WHERE (A11 AND... AND A1n) OR... OR (Am1 AND... AND Amn) 17

Outline Background Approach Evaluation Conclusion and future work 18

Illustrative example 19

Apply DSE on the existing database 20 Step1: DSE chooses “ type=0, zip=0 ”  executed query: Q1: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q1  zero record, not covering loop body

Apply DSE on the existing database (cont’d) 21 Step2: DSE flips “type == 0” to “type != 0”  “type=1, zip=0”  executed query: Q2: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=30 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q2  zero record not covering loop body

Apply DSE on the existing database (cont’d) 22 However, An input like “type=0, zip=27694”  executed query: Q3: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=27695 AND C.SSN=M.SSN Execution of Q3  one record {C.SSN = 001, C.income = 50000, M.balance = 20000}. Covering Line14=true and Line18=false

Apply DSE on the existing database (cont’d) 23 Furthermore, An input like “type=0, zip=28222”,  executed query: Q4: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=28223 AND C.SSN=M.SSN Execution of Q4  one record {C.SSN = 002, C.income = , M.balance = 30000}. As a result, Line14=true and Line18=true

Assist DSE to generate program inputs 24 How to derive high-covering program input values based on a given database state?

Our idea: construct auxiliary queries 25 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”!

Our idea: construct auxiliary queries (cont’d) 26 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”! Cover Line14=true and Line18=false! true false

Our idea: construct auxiliary queries (cont’d) 27 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes “fzip=27695”. From “fzip=zip+1”, we derive “zip=27694”! Cover Line14=true and Line18=false! true false Act like “Constraint Solver” for Program Constraints +DB State Constraints

Approach Collect query construction constraints on program variables used in the executed queries from the program code 28

Approach (cont’d) Collect query construction constraints on program variables used in the executed queries from the program code Collect result manipulation constraints on comparing with record values in the query’s result set (such as “if (diff>100000)” ) 29

Construct auxiliary queries 30 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=‘fzip’ AND C.SSN=M.SSN For path “Line04=true, Line14=true”, construct the abstract query: true

Construct auxiliary queries 31 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=‘fzip’ AND C.SSN=M.SSN For path “Line04=true, Line14=true”, construct the abstract query: true Our target

Construct auxiliary queries 32 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECT C.zipcode true Construct auxiliary query

Construct auxiliary queries 33 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECT C.zipcode FROM customer C, mortgage M true Construct auxiliary query

Construct auxiliary queries 34 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=‘fzip’ AND C.SSN=M.SSN SELECT C.zipcode FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN Construct auxiliary query true

Generate program input values 35 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN  fzip:27695 or 28223

Generate program input values 36 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN  fzip: or  zip: or 28222

37 “type=0, zip=27694” covers Line04=true, Line14=true, but Line18=false true false Input combinations: type: 0 or !0 X zip: or Generate program input values

Approach (cont’d) Not enough! Program variables in branch condition after executing the query may be data-dependent on returned record values. How to cover Line18 true branch? 38

Approach (cont’d) To cover path Line04=true, Line14=true, Line18=true We need to extend previous auxiliary query 39 true

Construct auxiliary queries 40 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to extend?----) We extend the WHERE clause true

Construct auxiliary queries 41 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to extend?----) We extend the WHERE clause true

Construct auxiliary queries 42 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income * M.balance > We extend the WHERE clause true

Generate program input values 43 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income * M.balance >  fzip=28223

Generate program input values 44 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income * M.balance >  fzip=28223  zip=28222

Other issues (aggregate calculation) Extend auxiliary query with GROUP BY and HAVING clauses. 45 Involve multiple records

Other issues (aggregate calculation) SELECT C.zipcode, sum(M.balance) FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income * M.balance > GROUP BY C.zipcode HAVING sum(M.balance) >

Other issues (cardinality constraints) SELECT C.zipcode FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income * M.balance > GROUP BY C.zipcode HAVING COUNT(*) >= 3 Use a special DSE technique for dealing with input- dependent loops P. Godefroid and D. Luchaup. Automatic partial loop summarization in dynamic test generation. In ISSTA

Outline Background Approach Evaluation Conclusion and future work 48

Research questions RQ1 (Effectiveness): What is the percentage increase in code coverage by the program inputs generated by Pex with our approach’s assistance? RQ2 (Cost): What is the cost of our approach’s assistance? 49

Evaluation subjects Two open source database applications RiskIt 4.3K LOC, database: 13 tables, 57 attributes, and >1.2 million records 17 DB-interacting methods selected for testing UnixUsage 2.8K LOC, database: 8 tables, 31 attributes, and >0.25 million records 28 DB-interacting methods selected for testing 50

Evaluation setup Measurement for test generation effectiveness: code coverage cost: number of runs/paths, execution time Procedure run Pex w/o our approach’s assistance perform our algorithms to generate new additional test inputs 51

Evaluation results: RiskIt 52 Higher code coverage

Evaluation results: RiskIt 53 Low additional cost Pex (only) timeout: 120 seconds Even given longer time, no new coverage observed for Pex (only)

Evaluation results: RiskIt 54 Pex (only) timeout: 120 seconds Even given longer time, no new coverage observed for Pex (only)

Preliminary Evaluation(cont’d) Evaluation results: UnixUsage

Summary of evaluation results RQ1: Effectiveness RiskIt: 26% higher block coverage over Pex only UnixUsage: 35% higher block coverage over Pex only RQ2: Cost RiskIt: #runs/paths: 131 more over 1135 (Pex) execution time: 517 secs more over 1781 (Pex) UnixUsage #runs/paths: 93 more over 1197 (Pex) execution time: 580 secs more over 1718 (Pex) 56

Outline Background Approach Evaluation Conclusion 57

Conclusion A new approach that formulates auxiliary queries to bridge gap between program/DB constraints. Act like a “constraint solver” for program constraints + DB constraints Empirical evaluations on 2 open source DB apps our approach can assist DSE to generate program inputs effectively achieving higher code coverage with low additional cost. 58

Future Work To construct auxiliary queries directly from embedded complex queries (e.g., nested queries), rather than from their transformed norm forms. To handle complex program context such as multiple queries. 59

Acknowledgment: This work was supported in part by U.S. National Science Foundation under CCF for Kai Pan and Xintao Wu, and under CCF for Tao Xie. Thank you! Questions? 60

Related Work All previous related work addresses a different problem: constructing both program inputs and database states (from scratch) M. Emmi, R. Majumdar, and K. Sen. Dynamic test input generation for database applications. In ISSTA, K. Taneja, Y. Zhang, and T. Xie. MODA: Automated test generation for database applications via mock objects. In ASE,