Mining Binary Constraints in Feature Models: A Classification-based Approach 2011.10.10 Yi Li.

Slides:

Advertisements

Similar presentations

Relevance Feedback Limitations –Must yield result within at most 3-4 iterations –Users will likely terminate the process sooner –User may get irritated.

Advertisements

Classification Classification Examples

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Chapter 5: Introduction to Information Retrieval

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:

CS6800 Advanced Theory of Computation

On-line learning and Boosting

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,

On the Genetic Evolution of a Perfect Tic-Tac-Toe Strategy

Face Recognition & Biometric Systems Support Vector Machines (part 2)

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Decision Tree Algorithm

Feature Selection for Regression Problems

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Clustering Unsupervised learning Generating “classes”

Evaluating Performance for Data Mining Techniques

Walter Hop Web-shop Order Prediction Using Machine Learning Master’s Thesis Computational Economics.

Mining Binary Constraints in the Construction of Feature Models Li Yi Peking University March 30, 2012.

Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.

Genetic Algorithm.

CLassification TESTING Testing classifier accuracy

Efficient Model Selection for Support Vector Machines

Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.

Cristian Urs and Ben Riveira. Introduction The article we chose focuses on improving the performance of Genetic Algorithms by: Use of predictive models.

Machine Learning CSE 681 CH2 - Supervised Learning.

Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

Zorica Stanimirović Faculty of Mathematics, University of Belgrade

Seungchan Lee Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Software Release and Support.

Learning from Observations Chapter 18 Through

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Stefan Mutter, Mark Hall, Eibe Frank University of Freiburg, Germany University of Waikato, New Zealand The 17th Australian Joint Conference on Artificial.

2005MEE Software Engineering Lecture 11 – Optimisation Techniques.

Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.

ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Project 2: Classification Using Genetic Programming Kim, MinHyeok Biointelligence laboratory Artificial.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,

Data Mining and Decision Support

Validation methods.

Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Evolving Decision Rules (EDR)

Rule Induction for Classification Using

Presented by: Dr Beatriz de la Iglesia

Perceptrons Lirong Xia.

Behrouz Minaei, William Punch

Perceptrons Lirong Xia.

Presentation transcript:

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li

Outline Approach Overview Approach in Detail The Experiments

Basic Idea If we focus on binary constraints… – Requires – Excludes We can classify a feature-pair as: – Non-constrained – Require-constrained – Exclude-constrained

Approach Overview Training & Test FM(s) Make Pairs Vectorize Optimize & Train Test Training & Test Pair(s) Training Vector(s) Trained Classifier Test Vector(s) Classified Test Pair(s) Classifier Stanford Parser

Outline Approach Overview Step 1: Make Pairs The Experiment

Rules of Making Pairs Unordered – It means if (A, B) is a “requires-pair”, then A requires B or B requires A or both. – Why? Because “non-constrained” and “excludes” are unordered, if we use ordered pairing “ ”, there are redundant pairs for “non-constrained” and “excludes” classes. Cross-Tree Only – Pair (A, B) is valid  A, B has no “ancestor/descendant” relation. – Why? “excludes” between ancestor/descendant is an error. “requires” between them is better expressed by optionality.

Outline Approach Overview Step 2: Vectorize the Pairs The Experiment

Vectorization: Text to Number A pair contains 2 features’ names and descriptions (i.e. textual attributes) To work with a classifier, a pair must be represented as a group of numerical attributes We calculate 4 numerical attributes for pair (A, B) – Similarity A, B = Pr (A.description == B.description) – Overlap A, B = Pr (A.objects == B.objects) – Target A, B = Pr (A.name == B.objects) – Target B, A = Pr (B.name == A.objects)

Reasons of Choosing the Attributes Constraints indicate some kinds of dependency / intervener between features Similar feature descriptions Overlapped objects A feature is targeted by another – These phenomena increase the chance of dependency or intervener being happened

Use Stanford Parser to Find Objects The Stanford Parser can perform grammatical analysis on sentences in many languages, including English and Chinese For English sentences, we extract objects (direct, indirect, prepositional) and any adjectives modifying those objects The parser works well even for incomplete sentences. (Common in feature descriptions)

Examples Add web links, document files, image files and notes to any event. Use a PDF driver to output or publish web calendars so anyone on your team can view scheduled events. Direct Objects Prepositional Object Direct Objects Direct Object Adjective Modifier

Calculate the Attributes Each of the 4 attributes follows the general form: Pr (Text A == Text B ), where Text is either description, objects or name. To calculate: – Stem words in the Text, and remove stop words. – Compute tf_idf (term frequency, inverse document frequency) value v i for each word i. Thus Text = (v 1, v 2, … v n ), n is the total number of distinct words of Text A and Text B – Pr(Text A == Text B ) = (Text A · Text B ) / (|Text A |·|Text B |)

Outline Approach Overview Step 3: Optimize and Train the Classifier The Experiment

The Support Vector Classifier A (binary) classification technique that has shown promising empirical results in many practical applications. Basic Idea – Data = Points in k-dimensional space (k is the number of attributes) – Classification = Find a hyperplane (a line in 2-D space) to separate these points

Find the Line in 2D Attribute 2 Attribute 1 There are infinite number of lines available.

SVC: Find the Best Line Best = Maximum Margin Attribute 2 Attribute 1 Margin for Red Margin for Green Larger margin has fewer prediction errors. These points defining the margin are called “support vectors”.

LIBSVM: A practical SVC Chih-Chung Chang and Chih-Jen Lin, National Taiwan University – See Key features of LIBSVM – Easy-to-use – Integrated support for cross-validation (discuss later) – Built-in support for multi-class (more than 2 classes) – Built-in support for unbalanced classes (there’s far more NO_CONSTRAINED pairs than the others)

LIBSVM: Best Practices 1. Optimize (Find best SVC parameters) – Run cross-validation to compute classification accuracy. – Apply an optimization algorithm to find best accuracy and corresponding parameters. 2. Train with best parameters

Cross-Validation (k-Fold) Divide the training data set into k equal-sized subsets. Run the classifier k times. – During each run, one subset is chosen for testing, and others for training. Compute the average accuracy accuracy = Number of correctly classified / Total number

The Optimization Algorithm Basic concepts – Solution: a set of parameters to be optimized – Cost Function: a function that evaluates higher values for worse solutions. – Optimization tries to find a solution with lowest cost. For the classifier – Cost = 1 – accuracy We use genetic algorithm for optimization

Genetic Algorithm Basic idea – Start with random solutions (initial population) – Produce next generation from top elites of current population Mutation: slightly change an elite solution Crossover (Breeding): combine random parts of 2 elite solutions into a new one – Repeat until the stop condition has been reached – The best solution of last generation is the globally best. [ 0.3, 2, 5 ]  [ 0.4, 2, 5 ] [ 0.3, 2, 5 ] and [ 0.5, 3, 3 ]  [ 0.3, 3, 3 ]

Outline Overview Details The Experiments

Preparing Data We need – 2 feature models, with already added constraints We use 2 feature models from SPLOT Feature Model Repository – Graph Product Line, by Don Batory – Weather Station, by Pure-Systems Most of the features are terms that are defined in Wikipedia, we use the first paragraph of the definition as the feature’s description

Experiment Settings There are 2 types of experiments Without Feedback With Limited Feedback Generate Training & Test Set Optimize, Train and Test Result Generate Initial Training & Test Set Optimize, Train and Test Result Training & Test Set Check a few results Add checked results to training set; Remove checked results from test set Add checked results to training set; Remove checked results from test set

Experiment Settings For each type of experiment, we compare 4 train/test methods (which are widely used in data mining fields) 1. Training Set = FM 1, Test Set = FM 2 2. Training Set = FM 1 + A small part of FM 2, Test Set = Rest of FM 2 3. Training Set = A small part of FM 2, Test Set = Rest of FM 2 4. The same as 3, but do iterated LU training

What do the Experiments for? Comparison of the 4 methods: Can a trained classifier be applied to different feature models (domains) ? – or: Do the constraints in different domains follow the same pattern? Comparison of 2 categories: Does limited feedback (an expected practice in real world) improve the results ?

Preliminary Results (Found a bug in implementation of Method 2 – 4, so only run Method 1) Feedback strategy: constraint and higher similarity first Accuracy Without Feedback83.95% Feedback (5)86.85% Feedback (10)88.73% Feedback (15)95.45% Feedback (20)98.36% Test Model = Graph Product Line Accuracy Without Feedback97.84% Feedback (5)99.44% Feedback (10)99.44% Feedback (15)99.44% Feedback (20)99.44% Test Model = Weather Station

Outline Overview Preparing Data Classification Cross Validation & Optimization The Experiment What’s Next

Future Work More FMs for experiments Use Stanford Parser for Chinese to integrate constraints mining into CoFM