Presentation is loading. Please wait.

Presentation is loading. Please wait.

Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.

Similar presentations


Presentation on theme: "Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University."— Presentation transcript:

1 Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University

2 Introduction We are investigating the quality assurance of Machine Learning (ML) applications Machine Learning applications fall into a class for which it can be said that there is “no reliable test oracle”

3 Introduction Previously we have investigated approaches to testing such applications by considering properties of their data sets and by using random testing In this work, we seek to adapt Metamorphic Testing [Chen ’98] to these applications and consider their Metamorphic Properties

4 Contribution Our contribution is a set of Metamorphic Properties that can be used to define these relationships so that Metamorphic Testing can be used as a general approach to testing machine learning applications

5 Overview Background Testing Approach Findings and Results Future Work and Conclusion

6 Metamorphic Testing General technique for creating follow-up test cases based on existing ones, particularly those that have not revealed any failure  [Chen ’98, Gotleib COMPSAC’03, Chen STEP’04, Zhou ISFST’04] Use a function’s Metamorphic Properties to predict the output for a particular input, given the known output for another input  For example, if we know sin(x)=y, then we know: sin(x+2 π ) = y and sin(-x) = -y

7 Related Work Applying metamorphic testing to situations in which there is no test oracle [Chen IST’02] There has been much research into applying Machine Learning techniques to software testing, but not much the other way around Testing of intrusion detection systems has typically addressed quantitative measurements but does not seek to ensure that the implementation is free of defects

8 Machine Learning Fundamentals Data sets consist of a number of examples, each of which has attributes and a label In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label (if they exist) In the second phase, the model is applied to a previously-unseen data set with unknown labels to produce a classification (or, in some cases, a ranking)

9 Sample Data Set For supervised machine learning 27,81,88,59,42,16,88, 0 82, 6,51,47, 5, 4, 1, 0 22,72,11,84,96,24,44, 1 4,77,91,86,89,77,61, 1 76,11, 4,51,43, 2,79, 0 6,33,44,18,52,63,94, 0 77,36,91,81,47, 3,85, 1 39,17,15, 2,90,70,13, 0 8,58,42,41,74,87,68, 1 examples attributes labels

10 Applications Investigated MartiRank  Specifically designed for potential future experimental use in predicting impending electrical device failures by ranking them according to likelihood of failure  Seeks to find the combination of segmenting and sorting the data that produces the best result Support Vector Machines (SVM)  Seeks to find a hyperplane that separates examples from different classes  SVM-Light has a ranking mode based on the distance from the hyperplane PAYL  Anomaly-based intrusion detection system (IDS)  Builds a model of “normal” network traffic based on byte distribution, and reports any anomalies

11 Approach Previously tested such applications by analysis of the data sets and algorithms, and by using equivalence partitions to guide random testing In this work, we use our knowledge of MartiRank to devise a set of Metamorphic Properties, and then see if they also apply to SVM and PAYL We then use these properties to guide testing of these applications

12 MartiRank Metamorphic Properties Additive  If each value in the data set is increased by a constant, the final ranking should be unchanged Multiplicative  If each value in the data set is multiplied by a positive constant, the final ranking should be unchanged Permutative  If the order of the data is permuted, the final ranking should be unchanged (assuming distinct values in the data set)

13 MartiRank Metamorphic Properties Invertive  If each value in the data set is multiplied by a negative constant, the final ranking should be in the reverse order Inclusive  In the “testing phase”, if the model is already known, it should be possible to create an example in the testing data such that it is guaranteed to be at the top of the ranking Exclusive  If an example is removed from the testing data, the final ranking should be unchanged

14 Testing MartiRank Its invertive property should hold for the labels in the training data, too  Multiplying the labels by –1 should yield a model that, when applied to the same testing data, will result in the reverse ordering Negative labels were not considered by the developer and a defect was revealed through Metamorphic Testing

15 Applying Approach to SVM SVM exhibits all six Metamorphic Properties A defect was found in SVM-Light by using its permutative property  Permuting the input data led to different models (and then different rankings)  Caused by “chunking” data for use by an approximating variant of optimization algorithm

16 Applying Approach to PAYL PAYL exhibits all six Metamorphic Properties  Even though it is unsupervised ML Two defects were found by using its exclusive property  Removing a value from the training data did not cause it to be considered anomalous later on  It also caused other values to be considered anomalous

17 Future Work and Conclusion We have identified six Metamorphic Properties that we believe exist in many machine learning applications:  additive, multiplicative, permutative, invertive, inclusive, and exclusive These properties were used to find new defects in the ML applications of interest Further investigation could involve applying these properties to other, larger ML applications, and looking to classify other properties

18 Properties of Machine Learning Applications for Use in Metamorphic Testing Leon Wu leon@cs.columbia.edu Columbia University


Download ppt "Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University."

Similar presentations


Ads by Google