Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University.

Slides:



Advertisements
Similar presentations
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Advertisements

Data Mining Classification: Alternative Techniques
Classifying Objects as New or Learned with Convolutional Networks and SGD By Kevin Xiong and Evan Phibbs Mentored by Yufei Wang.
Prachi Saraph, Mark Last, and Abraham Kandel. Introduction Black-Box Testing Apply an Input Observe the corresponding output Compare Observed output with.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
CES 514 – Data Mining Lecture 8 classification (contd…)
Automatic System Testing of Programs without Test Oracles
1 An Approach to Software Testing of Machine Learning Applications Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Ensemble Learning: An Introduction
Neural Technology and Fuzzy Systems in Network Security Project Progress 2 Group 2: Omar Ehtisham Anwar Aneela Laeeq
On Effective Testing of Health Care Simulation Software Christian Murphy, M.S. Raunak, Andrew King, Sanjian Chen, Christopher Imbriano, Gail Kaiser, Insup.
An Experimental Evaluation on Reliability Features of N-Version Programming Xia Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005.
Applications of Metamorphic Testing Chris Murphy University of Pennsylvania November 17, 2011.
Using JML Runtime Assertion Checking to Automate Metamorphic Testing in Applications without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Parameterizing Random Test Data According to Equivalence Classes Chris Murphy, Gail Kaiser, Marta Arias Columbia University.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Machine Learning as Applied to Intrusion Detection By Christine Fossaceca.
05/06/2005CSIS © M. Gibbons On Evaluating Open Biometric Identification Systems Spring 2005 Michael Gibbons School of Computer Science & Information Systems.
Using Runtime Testing to Detect Defects in Applications without Test Oracles Chris Murphy Columbia University November 10, 2008.
Graph Classification.
Introduction to machine learning
Radial Basis Function Networks
Combining Supervised and Unsupervised Learning for Zero-Day Malware Detection © 2013 Narus, Inc. Prakash Comar 1 Lei Liu 1 Sabyasachi (Saby) Saha 2 Pang-Ning.
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data Authors: Eleazar Eskin, Andrew Arnold, Michael Prerau,
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Active Learning for Class Imbalance Problem
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data mining and machine learning A brief introduction.
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
by B. Zadrozny and C. Elkan
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Mike Nonte.  Apply voltage or current with known frequency and amplitude  Record current or voltage response  Use phase shift and change in magnitude.
Universit at Dortmund, LS VIII
An Overview of Intrusion Detection Using Soft Computing Archana Sapkota Palden Lama CS591 Fall 2009.
KAIST Internet Security Lab. CS710 Behavioral Detection of Malware on Mobile Handsets MobiSys 2008, Abhijit Bose et al 이 승 민.
Implementation of Machine Learning and Chaos Combination for Improving Attack Detection Accuracy on Intrusion Detection System (IDS) Bisyron Wahyudi Kalamullah.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Distributed Representative Reading Group. Research Highlights 1Support vector machines can robustly decode semantic information from EEG and MEG 2Multivariate.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Detection, Classification and Tracking in Distributed Sensor Networks D. Li, K. Wong, Y. Hu and A. M. Sayeed Dept. of Electrical & Computer Engineering.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Nawanol Theera-Ampornpunt, Seong Gon Kim, Asish Ghoshal, Saurabh Bagchi, Ananth Grama, and Somali Chaterji Fast Training on Large Genomics Data using Distributed.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Software Defects Cmpe 550 Fall 2005
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
An Enhanced Support Vector Machine Model for Intrusion Detection
Machine Learning Week 1.
Open-Category Classification by Adversarial Sample Generation
Parametric Methods Berlin Chen, 2005 References:
Microarray Data Set The microarray data set we are dealing with is represented as a 2d numerical array.
Modeling IDS using hybrid intelligent systems
Presentation transcript:

Properties of Machine Learning Applications for Use in Metamorphic Testing Chris Murphy, Gail Kaiser, Lifeng Hu, Leon Wu Columbia University

Introduction We are investigating the quality assurance of Machine Learning (ML) applications Machine Learning applications fall into a class for which it can be said that there is “no reliable test oracle”

Introduction Previously we have investigated approaches to testing such applications by considering properties of their data sets and by using random testing In this work, we seek to adapt Metamorphic Testing [Chen ’98] to these applications and consider their Metamorphic Properties

Contribution Our contribution is a set of Metamorphic Properties that can be used to define these relationships so that Metamorphic Testing can be used as a general approach to testing machine learning applications

Overview Background Testing Approach Findings and Results Future Work and Conclusion

Metamorphic Testing General technique for creating follow-up test cases based on existing ones, particularly those that have not revealed any failure  [Chen ’98, Gotleib COMPSAC’03, Chen STEP’04, Zhou ISFST’04] Use a function’s Metamorphic Properties to predict the output for a particular input, given the known output for another input  For example, if we know sin(x)=y, then we know: sin(x+2 π ) = y and sin(-x) = -y

Related Work Applying metamorphic testing to situations in which there is no test oracle [Chen IST’02] There has been much research into applying Machine Learning techniques to software testing, but not much the other way around Testing of intrusion detection systems has typically addressed quantitative measurements but does not seek to ensure that the implementation is free of defects

Machine Learning Fundamentals Data sets consist of a number of examples, each of which has attributes and a label In the first phase (“training”), a model is generated that attempts to generalize how attributes relate to the label (if they exist) In the second phase, the model is applied to a previously-unseen data set with unknown labels to produce a classification (or, in some cases, a ranking)

Sample Data Set For supervised machine learning 27,81,88,59,42,16,88, 0 82, 6,51,47, 5, 4, 1, 0 22,72,11,84,96,24,44, 1 4,77,91,86,89,77,61, 1 76,11, 4,51,43, 2,79, 0 6,33,44,18,52,63,94, 0 77,36,91,81,47, 3,85, 1 39,17,15, 2,90,70,13, 0 8,58,42,41,74,87,68, 1 examples attributes labels

Applications Investigated MartiRank  Specifically designed for potential future experimental use in predicting impending electrical device failures by ranking them according to likelihood of failure  Seeks to find the combination of segmenting and sorting the data that produces the best result Support Vector Machines (SVM)  Seeks to find a hyperplane that separates examples from different classes  SVM-Light has a ranking mode based on the distance from the hyperplane PAYL  Anomaly-based intrusion detection system (IDS)  Builds a model of “normal” network traffic based on byte distribution, and reports any anomalies

Approach Previously tested such applications by analysis of the data sets and algorithms, and by using equivalence partitions to guide random testing In this work, we use our knowledge of MartiRank to devise a set of Metamorphic Properties, and then see if they also apply to SVM and PAYL We then use these properties to guide testing of these applications

MartiRank Metamorphic Properties Additive  If each value in the data set is increased by a constant, the final ranking should be unchanged Multiplicative  If each value in the data set is multiplied by a positive constant, the final ranking should be unchanged Permutative  If the order of the data is permuted, the final ranking should be unchanged (assuming distinct values in the data set)

MartiRank Metamorphic Properties Invertive  If each value in the data set is multiplied by a negative constant, the final ranking should be in the reverse order Inclusive  In the “testing phase”, if the model is already known, it should be possible to create an example in the testing data such that it is guaranteed to be at the top of the ranking Exclusive  If an example is removed from the testing data, the final ranking should be unchanged

Testing MartiRank Its invertive property should hold for the labels in the training data, too  Multiplying the labels by –1 should yield a model that, when applied to the same testing data, will result in the reverse ordering Negative labels were not considered by the developer and a defect was revealed through Metamorphic Testing

Applying Approach to SVM SVM exhibits all six Metamorphic Properties A defect was found in SVM-Light by using its permutative property  Permuting the input data led to different models (and then different rankings)  Caused by “chunking” data for use by an approximating variant of optimization algorithm

Applying Approach to PAYL PAYL exhibits all six Metamorphic Properties  Even though it is unsupervised ML Two defects were found by using its exclusive property  Removing a value from the training data did not cause it to be considered anomalous later on  It also caused other values to be considered anomalous

Future Work and Conclusion We have identified six Metamorphic Properties that we believe exist in many machine learning applications:  additive, multiplicative, permutative, invertive, inclusive, and exclusive These properties were used to find new defects in the ML applications of interest Further investigation could involve applying these properties to other, larger ML applications, and looking to classify other properties

Properties of Machine Learning Applications for Use in Metamorphic Testing Leon Wu Columbia University