Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah.

Slides:



Advertisements
Similar presentations
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Advertisements

Using the Optimizer to Generate an Effective Regression Suite: A First Step Murali M. Krishna Presented by Harumi Kuno HP.
Query Manager. QM is a collection of tools you can use to obtain information from the AS/400 database Used to –select, arrange, and analyze information.
Introduction to Structured Query Language (SQL)
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Introduction to SQL Programming Techniques.
Introduction to Structured Query Language (SQL)
Introduction to Database Processing
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 8 Advanced SQL.
A Guide to SQL, Seventh Edition. Objectives Understand the concepts and terminology associated with relational databases Create and run SQL commands in.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Chapter 7 Advanced SQL Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
A Guide to SQL, Seventh Edition. Objectives Embed SQL commands in PL/SQL programs Retrieve single rows using embedded SQL Update a table using embedded.
Introduction to Structured Query Language (SQL)
1 Case Study: Starting the Student Registration System Chapter 3.
Concepts of Database Management Sixth Edition
Presenter: Miguel Garzon Torres CrUise Lab - SITE SQL Coverage Measurement for Testing Database Applications María José Suárez-Cabal University of Oviedo.
Query Processing Presented by Aung S. Win.
Class 6 Data and Business MIS 2000 Updated: September 2012.
Advance Computer Programming Java Database Connectivity (JDBC) – In order to connect a Java application to a database, you need to use a JDBC driver. –
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 8 Advanced SQL.
Introduction. 
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
1 PHP and MySQL. 2 Topics  Querying Data with PHP  User-Driven Querying  Writing Data with PHP and MySQL PHP and MySQL.
Database Technical Session By: Prof. Adarsh Patel.
INNOV-7: Can you create a report in less than 15 seconds? Rakesh Godhani Software Architect.
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Module 11: Programming Across Multiple Servers. Overview Introducing Distributed Queries Setting Up a Linked Server Environment Working with Linked Servers.
DEPICT: DiscovEring Patterns and InteraCTions in databases A tool for testing data-intensive systems.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Concepts of Database Management Seventh Edition
Dive into the Query Optimizer Dive into the Query Optimizer: Undocumented Insight Benjamin Nevarez Blog: benjaminnevarez.com
Triggers and Stored Procedures in DB 1. Objectives Learn what triggers and stored procedures are Learn the benefits of using them Learn how DB2 implements.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Chapter 6 Procedural Language SQL and Advanced SQL Database Principles: Fundamentals of Design, Implementation, and Management Tenth Edition.
Concepts of Database Management Eighth Edition Chapter 3 The Relational Model 2: SQL.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 (Part II) INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor.
A Guide to SQL, Eighth Edition Chapter Eight SQL Functions and Procedures.
Assoc. Prof. Dr. Ahmet Turan ÖZCERİT.  The concept of Data, Information and Knowledge  The fundamental terms:  Database and database system  Database.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 6 This material was developed by Oregon Health & Science.
IMS 4212: Constraints & Triggers 1 Dr. Lawrence West, Management Dept., University of Central Florida Stored Procedures in SQL Server.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
7 1 Database Systems: Design, Implementation, & Management, 7 th Edition, Rob & Coronel 7.6 Advanced Select Queries SQL provides useful functions that.
SQL Triggers, Functions & Stored Procedures Programming Operations.
 CONACT UC:  Magnific training   
BTM 382 Database Management Chapter 8 Advanced SQL Chitu Okoli Associate Professor in Business Technology Management John Molson School of Business, Concordia.
Copyright © 2016 Ramez Elmasri and Shamkant B. Navathe.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Concepts of Database Management, Fifth Edition Chapter 3: The Relational Model 2: SQL.
SQL Database Management
A Guide to SQL, Seventh Edition
Database Systems: Design, Implementation, and Management Tenth Edition
Database Performance Tuning and Query Optimization
Data Model.
Contents Preface I Introduction Lesson Objectives I-2
Chapter 11 Database Performance Tuning and Query Optimization
Database Systems: Design, Implementation, and Management Tenth Edition
A – Pre Join Indexes.
Presentation transcript:

Massive Stochastic Testing of SQL Don Slutz Microsoft Research Presented By Manan Shah

Testing a Database Two Broad Aspects: Two Broad Aspects: Testing the correctness of the SQL Engine Testing the correctness of the SQL Engine Testing the correctness of the SQL output Testing the correctness of the SQL output Focus of this paper: Focus of this paper: Testing the correctness of the SQL Engine. Testing the correctness of the SQL Engine.

Test Coverage Problem All possible SQL Used by customers SQL test library Detectable software bugs

Background SQL test groups focus on deterministic testing to cover individual features of the language. Typical SQL test libraries require an estimated ½ person-hour per statement to compose. Software engineering requires a substantial investment Commercial SQL vendors with tight schedules tend to use a more ad hoc process.

Motivation Deterministic testing of SQL database systems is human intensive. The input domain, all SQL statements, from any number of users, with all states of the database, is gigantic. These test libraries cover an important, but tiny, fraction of the SQL input domain. Large increases in test coverage must come from automating the generation of tests.

Our Aim All possible SQL Used by customers SQL test library Detectable software bugs

Introduction This paper describes a method to rapidly create a very large number of SQL statements without human intervention. The SQL statements are generated stochastically. Stochastic testing has the advantage that the quality of the tests improves as the test size increases Introduces idea of cross-validation of query results against existing SQL implementations RAGS (Random Generation Of SQL) is currently used by the Microsoft SQL Server testing group.

SQL Grammar SELECT FROM SELECT FROM WHERE WHERE selectpartfrompart [TOP term] [DISTINCT | ALL] selectExpression [,...] * | expression [[AS] columnAlias] | tableAlias.* tableExpression [,...] {[schemaName.] tableName] [{{LEFT | RIGHT} [OUTER] | [INNER] | CROSS | NATURAL} JOIN tableExpression expression andcondition [OR andcondition] operand [conditionRightHandSide] | NOT condition | EXISTS (select)

Example Emp_noNameSexDept.JobSalaryComm 1JohnMR&DEngg MaryFSalesClerk AndyMR&DMgr SamMSalesMgr PeteMR&DClerk

Stochastic generation Select name salary + comm from Table Emp where and 10000salary > “sales”Dept =

SQL Statement Generation RAGS follows the semantic rules of SQL. Carries state information and directives on its walk down the tree and the results of stochastic outcomes as it walks up. RAGS makes all its stochastic decisions at the last possible moment

The RAGS System The RAGS approach is: 1. Greatly enlarge the shaded circle in the figure by stochastic SQL statement generation. 2. Make all aspects of the generated SQL statements configurable. 3. Experiment with configurations to maximize the bug detection rate.

The RAGS System cont… RAGS is an experiment to see how effective a million fold increase in the size of a SQL test library can be. RAGS can be used to drive an SQL system and look for observable errors The output of successful Select statements can be saved for regression testing.

The RAGS System Print Report Config. file DBMS1=SYS A DBMS2=SYS B Stmts 65% Select 35%Update Connect to DBMS’s Read Table Schema Loop Generate SQL Stmt Execute on SYS A Execute on SYS B Compare Results Record Errors SQL DBMS SYS A Read configuration SQL DBMS SYS B Report: Stmts exec. Stmt 156: error in SYS A Stmt 765: error in SYS B

Components of RAGS System The configuration file has several parameters: The configuration file has several parameters: 1. Frequency of occurrence of different statements 2. Limits (max no. of tables in a join, entries in group by, etc.) 3. Frequency of occurrence of features (group by, order by) 4. Execution Parameters (no. of rows to fetch per query) When the RAGS program is started, it first reads the configuration file. It then uses ODBC to connect to the first DBMS and read the schema information.

Execution of RAGS RAGS loops to generate SQL statements and optionally execute them. If the statement is executed on more than one system, the execution results are compared At the end of the run, RAGS produces a report. A utility is provided that compares the reports from several runs and summarizes the differences. The comparison can be between different vendors or different versions of the same system (regression testing).

Validation Issues If a SQL Select executes without errors, there is no easy method to validate the returned values The authors’ approach is to execute the same query on multiple vendor’s DBMSs and then compare the results. First, the number of rows returned is compared To avoid sorts, a special checksum over all the column values in all the rows is compared. The method only works for SQL statements that will execute on more than one vendor’s database.

Example RAGS Query

Execution Facts On a 200Mhz Pentium RAGS can generate 833 moderate size SQL statements per second. In one hour RAGS can generate 3 million different SQL statements The starting random seed for a RAGS run can be specified in the configuration file. If the starting seed is not specified, RAGS obtains a seed by hashing the time of day.

Testing Experiences Each of the 10 clients executed 2500 SQL statements in transactions that contained an average of 5 statements 86.1% of the statements executed without error, 13.8% had expected errors and 0.07% indicated possible bugs

Comparison Tests

Automatic Statement Simplification When a RAGS generated statement caused an error, the debugging process was difficult if the statement was complex The offending statement can usually be vastly simplified by hand. The simplification process itself was tedious, so RAGS was extended to simplify the statement automatically. RAGS walks a parse tree for the statement and tries to remove terms in expressions and certain clauses (Where and Having).

Visualization To investigate the relationship between two metrics, a set of sample pairs is collected and analyzed. RAGS presents an opportunity to scale up the size of such samples by several orders of magnitude. It allows one to plot the sample points and visualize the relationship.

Execution time of V2 (sec) Execution time of V1 (sec)

Summary RAGS is an experiment in massive stochastic testing of SQL systems. Its main contribution is to generate entire SQL statements stochastically The problem of validating outputs remains a tough issue. Output comparisons for different vendor’s database systems proved to be extremely useful, but only for the small set of common SQL. The outcome of the experiments was encouraging since RAGS could steadily generate errors in released SQL products.

Future Works SQL coverage can be extended to more data types, more DDL, stored procedures, utilities, etc. Robustness tests performed by stochastically generating a family of equivalent SQL statements and comparing their O/P Testing with equivalent statements has the important advantage of a method to help validate the outputs. The optimizer estimates of execution metrics, together with the measured execution metrics, can be compared.

Thank You!