1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University

2 Background Many applications in the field of scientific computing depend on machine learning (ML) algorithms Many applications in the field of scientific computing depend on machine learning (ML) algorithms ML applications often do not have test oracles that indicate whether the output is correct for arbitrary input ML applications often do not have test oracles that indicate whether the output is correct for arbitrary input Applications without test oracles are called “non-testable programs” Applications without test oracles are called “non-testable programs”

3 Problem Statement Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques However, it is difficult to detect subtle (computational) errors for arbitrary inputs However, it is difficult to detect subtle (computational) errors for arbitrary inputs

4 Testing ML Applications There has been much research into applying ML techniques to software testing, but not the other way around There has been much research into applying ML techniques to software testing, but not the other way around Reusable real-world data sets and frameworks are available for checking that an ML algorithm predicts well, but not for checking that an implementation works correctly Reusable real-world data sets and frameworks are available for checking that an ML algorithm predicts well, but not for checking that an implementation works correctly

5 Observation If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output However, it may be possible to know relationships between a set of inputs and the corresponding set of outputs However, it may be possible to know relationships between a set of inputs and the corresponding set of outputs “Metamorphic Testing” [Chen et al. ’98] is such an approach “Metamorphic Testing” [Chen et al. ’98] is such an approach

6 Metamorphic Testing An approach for creating follow-on test cases based on previous test cases An approach for creating follow-on test cases based on previous test cases If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution

7 Metamorphic Testing without an Oracle When a test oracle exists, we can know whether f(t(x)) is correct When a test oracle exists, we can know whether f(t(x)) is correct –Because we have an oracle for f(x) –So if f(t(x)) is as expected, then it is correct When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) –If f(t(x)) is as expected, it is not necessarily correct –However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong

8 Metamorphic Testing Example Consider a program that reads a text file of test scores for students in a class, and computes the averages and the standard deviation of the averages Consider a program that reads a text file of test scores for students in a class, and computes the averages and the standard deviation of the averages If we permute the values in the text file, the results should stay the same If we permute the values in the text file, the results should stay the same If we multiply each score by 10, the final results should all be multiplied by 10 as well If we multiply each score by 10, the final results should all be multiplied by 10 as well These metamorphic properties can be used to create a “pseudo-oracle” for the application These metamorphic properties can be used to create a “pseudo-oracle” for the application

9 Approach To apply Metamorphic Testing to such ML applications, we first enumerate the metamorphic relations based on the expected behaviors of a given machine learning algorithm To apply Metamorphic Testing to such ML applications, we first enumerate the metamorphic relations based on the expected behaviors of a given machine learning algorithm We then utilize these relations to conduct metamorphic testing on the implementation We then utilize these relations to conduct metamorphic testing on the implementation

10 Verification & Validation The scope of which metamorphic properties are necessary may differ between various problems in the domain The scope of which metamorphic properties are necessary may differ between various problems in the domain Properties that are necessary can be used for verification: “Is the implementation of the algorithm correct?” Properties that are necessary can be used for verification: “Is the implementation of the algorithm correct?” Other properties can be used for validation: “Is the algorithm appropriate for solving this problem?” Other properties can be used for validation: “Is the algorithm appropriate for solving this problem?”

11 Research Questions What are the metamorphic properties of supervised ML classification algorithms? What are the metamorphic properties of supervised ML classification algorithms? –Which can be used for verification? –Which can be used for validation? Can metamorphic testing detect defects in real-world ML applications? Can metamorphic testing detect defects in real-world ML applications?

12 Machine Learning Fundamentals Data sets consist of a number of samples, each of which has attributes and a label Data sets consist of a number of samples, each of which has attributes and a label In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label In the second phase, the model is applied to a previously-unseen data set (“testing” data) with unknown labels to produce a classification of each sample In the second phase, the model is applied to a previously-unseen data set (“testing” data) with unknown labels to produce a classification of each sample

13 Algorithms Investigated k-Nearest Neighbors (kNN) k-Nearest Neighbors (kNN) –Samples in the testing data are classified by using Euclidean distance to find the k nearest samples in the training data –Classification is then done by majority rule Naïve Bayes Classifier (NBC) Naïve Bayes Classifier (NBC) –For a given sample in the testing data, computes the probability of that sample belonging to each class, assuming conditional independence between the attributes –Chooses the class that is most likely

14 Metamorphic Relations We identified 11 properties that we would expect all classification algorithms to have We identified 11 properties that we would expect all classification algorithms to have Affine transformation of attributes Affine transformation of attributes Permutation of labels or attributes Permutation of labels or attributes Addition of informative or uninformative attributes Addition of informative or uninformative attributes Addition of classes by duplicating or re-labeling samples Addition of classes by duplicating or re-labeling samples Removal of classes or samples Removal of classes or samples

15 Experimental Setup Applied the approach to implementations in the Weka 3.5.7 toolkit Applied the approach to implementations in the Weka 3.5.7 toolkit Initial test cases: Initial test cases: –Randomly generated values –Four attributes (“columns”) –20-50 samples (“rows”) Metamorphic relations were applied to create 20-300 follow-on test cases Metamorphic relations were applied to create 20-300 follow-on test cases

16 PropertyNecessary? % violated Necessary? 007.4 1.115.90.3 1.200 2.100.6 2.24.10 3.100 3.200 4.125.30 4.203.9 5.15.95.6 5.22.82.8 k Nearest NeighborsNaïve Bayes Classifier Results

17 Analysis: kNN No necessary properties were violated No necessary properties were violated Issues related to validation: Issues related to validation: –Labels that are non-existent in the training data have a non-zero chance of being selected in classification –If two labels are equally likely, the “first” one that is listed is chosen

18 Analysis: Naïve Bayes Four necessary properties were violated, indicating defects in the implementation Four necessary properties were violated, indicating defects in the implementation –Loss of precision related to use of the “double” datatype in Java –Laplace Accuracy used to determine probabilities; thus, labels that did not appear in training data have non-zero probability

19 Suggestions We suggest using the “BigDecimal” class instead of the “double” datatype We suggest using the “BigDecimal” class instead of the “double” datatype Laplace Accuracy is appropriate for the attributes but not for the labels Laplace Accuracy is appropriate for the attributes but not for the labels Use of Laplace Accuracy should be set as an option Use of Laplace Accuracy should be set as an option

20 Future Work Apply the testing approach to other domains that depend on ML, such as scientific computing Apply the testing approach to other domains that depend on ML, such as scientific computing Further investigation of testing “ non- testable programs ” Further investigation of testing “ non- testable programs ” Measure the effectiveness of the approach in empirical studies Measure the effectiveness of the approach in empirical studies

21 Summary Metamorphic testing is easy to implement and automate Metamorphic testing is easy to implement and automate We were able to devise fault-revealing properties even with just a basic understanding of the ML algorithms We were able to devise fault-revealing properties even with just a basic understanding of the ML algorithms Metamorphic testing can be used for both verification and validation Metamorphic testing can be used for both verification and validation

22 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail Kaiser Columbia University Joshua Ho University of Sydney Baowen Xu Nanjing University

23 Related Work Applying MT to non-testable programs in other domains Applying MT to non-testable programs in other domains General properties for use in MT General properties for use in MT

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

Similar presentations

Presentation on theme: "1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

Similar presentations

Presentation on theme: "1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail."— Presentation transcript:

Similar presentations

About project

Feedback