Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University,

Slides:



Advertisements
Similar presentations
Fuzzy-Rough Feature Significance for Fuzzy Decision Trees Advanced Reasoning Group Department of Computer Science The University of Wales, Aberystwyth.
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core _ Rough Membership.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
Feature Grouping-Based Fuzzy-Rough Feature Selection Richard Jensen Neil Mac Parthaláin Chris Cornelis.
Propositional and First Order Reasoning. Terminology Propositional variable: boolean variable (p) Literal: propositional variable or its negation p 
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.
Huge Raw Data Cleaning Data Condensation Dimensionality Reduction Data Wrapping/ Description Machine Learning Classification Clustering Rule Generation.
Rough Sets Theory Speaker:Kun Hsiang.
© 2002 Fadi A. Aloul, University of Michigan PBS: A Pseudo-Boolean Solver and Optimizer Fadi A. Aloul, Arathi Ramani, Igor L. Markov, Karem A. Sakallah.
What is Statistical Modeling
Feature Selection Presented by: Nafise Hatamikhah
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
_ Rough Sets. Basic Concepts of Rough Sets _ Information/Decision Systems (Tables) _ Indiscernibility _ Set Approximation _ Reducts and Core.
Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,
Feature Selection for Regression Problems
Data classification based on tolerant rough set reporter: yanan yean.
The Theory of NP-Completeness
August 2005RSFDGrC 2005, Regina, Canada 1 Feature Selection Based on Relative Attribute Dependency: An Experimental Study Jianchao Han 1, Ricardo Sanchez.
MAE 552 – Heuristic Optimization Lecture 5 February 1, 2002.
Richard Jensen and Qiang Shen Prof Qiang Shen Aberystwyth University, UK Dr. Richard Jensen Aberystwyth University, UK Interval-valued.
Richard Jensen, Chris Cornelis and Qiang Shen Dr. Chris Cornelis Ghent University, Belgium Dr. Richard Jensen Aberystwyth University,
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Efficient Model Selection for Support Vector Machines
A Simple Method to Extract Fuzzy Rules by Measure of Fuzziness Jieh-Ren Chang Nai-Jian Wang.
On Applications of Rough Sets theory to Knowledge Discovery Frida Coaquira UNIVERSITY OF PUERTO RICO MAYAGÜEZ CAMPUS
Fuzzy-rough data mining Richard Jensen Advanced Reasoning Group University of Aberystwyth
CSE & CSE6002E - Soft Computing Winter Semester, 2011 More Rough Sets.
Quantified Formulas - Decision Procedure Daniel Kroening, Ofer Strichman Presented by Changki Hong 07 NOV 08.
CHAPTERS 7, 8 Oliver Schulte Logical Inference: Through Proof to Truth.
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
3. Rough set extensions  In the rough set literature, several extensions have been developed that attempt to handle better the uncertainty present in.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View BDDs.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
On the Relation between SAT and BDDs for Equivalence Checking Sherief Reda Rolf Drechsler Alex Orailoglu Computer Science & Engineering Dept. University.
Peter Scully Investigating Rough Set Feature Selection for Gene Expression Analysis.
SAT 2009 Ashish Sabharwal Backdoors in the Context of Learning (short paper) Bistra Dilkina, Carla P. Gomes, Ashish Sabharwal Cornell University SAT-09.
Nikolaj Bjørner Microsoft Research DTU Winter course January 2 nd 2012 Organized by Flemming Nielson & Hanne Riis Nielson.
Iterative Improvement Search Including Hill Climbing, Simulated Annealing, WALKsat and more....
Improving Support Vector Machine through Parameter Optimized Rujiang Bai, Junhua Liao Shandong University of Technology Library Zibo , China { brj,
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Custom Computing Machines for the Set Covering Problem Paper Written By: Christian Plessl and Marco Platzner Swiss Federal Institute of Technology, 2002.
Logical Agents Chapter 7. Outline Knowledge-based agents Propositional (Boolean) logic Equivalence, validity, satisfiability Inference rules and theorem.
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang
Dr. Gheith Abandah 1.  Feature selection is typically a search problem for finding an optimal or suboptimal subset of m features out of original M features.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Richard Jensen and Chris Cornelis Chris Cornelis Chris Cornelis Ghent University, Belgium Richard Jensen Richard Jensen Aberystwyth University, UK Fuzzy-Rough.
Knowledge Repn. & Reasoning Lecture #9: Propositional Logic UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005.
REU 2007-ParSat: A Parallel SAT Solver Christopher Earl, Mentor: Dr. Hao Zheng Department of Computer Science & Engineering Introduction Results and Conclusions.
Genetic Algorithms An Evolutionary Approach to Problem Solving.
Rough Sets, Their Extensions and Applications 1.Introduction  Rough set theory offers one of the most distinct and recent approaches for dealing with.
國立雲林科技大學 National Yunlin University of Science and Technology Semantics-preserving dimensionality reduction rough and fuzzy rough based approaches Author.
More Rough Sets.
Inference and search for the propositional satisfiability problem
Rough Sets.
Parallelism in SAT Solvers
Data Mining (and machine learning)
Rough Sets.
Rough Set Theory.
Feature Selection Methods
Presentation transcript:

Richard Jensen, Andrew Tuson and Qiang Shen Qiang Shen Aberystwyth University, UK Richard Jensen Aberystwyth University, UK Andrew Tuson City University, UK Extending Propositional Satisfiability to Determine Minimal Fuzzy-Rough Reducts

Richard Jensen, Andrew Tuson and Qiang Shen Outline The importance of feature selection Rough set theory Fuzzy-rough feature selection (FRFS) FRFS-SAT Experimentation Conclusion

Richard Jensen, Andrew Tuson and Qiang Shen Why dimensionality reduction/feature selection? Growth of information - need to manage this effectively Curse of dimensionality - a problem for machine learning High dimensional data Dimensionality Reduction Low dimensional data Processing System Intractable Feature selection

Richard Jensen, Andrew Tuson and Qiang Shen Rough set theory Rx is the set of all points that are indiscernible with point x in terms of feature subset B Upper Approximation Set A Lower Approximation Equivalence class Rx

Richard Jensen, Andrew Tuson and Qiang Shen Discernibility approach Decision-relative discernibility matrix Compare objects Examine attribute values For attributes that differ: If decision values differ, include attributes in matrix Else leave slot blank Construct discernibility function:

Richard Jensen, Andrew Tuson and Qiang Shen Example Remove duplicates f C (a,b,c,d) ={a ⋁ b ⋁ c ⋁ d} ⋀ {a ⋁ c ⋁ d} ⋀ {b ⋁ c} ⋀ {d} ⋀ {a ⋁ b ⋁ c} ⋀ {a ⋁ b ⋁ d} ⋀ {b ⋁ c ⋁ d} ⋀ {a ⋁ d} Remove supersets f C (a,b,c,d) = {b ⋁ c} ⋀ {d}

Richard Jensen, Andrew Tuson and Qiang Shen Finding reducts Usually too expensive to search exhaustively for reducts with minimal cardinality Reducts found through: Converting from CNF to DNF (expensive) Hill-climbing search using clauses (non-optimal) Other search methods - GAs etc (non-optimal) RSAR-SAT Solve directly in SAT formulation. DPLL approach is both fast and ensures optimal reducts

Richard Jensen, Andrew Tuson and Qiang Shen Fuzzy discernibility matrices Extension of crisp approach Previously, attributes had {0,1} membership to clauses Now have membership in [0,1] Allows real-coded data as well as nominal. Fuzzy DMs can be used to find fuzzy-rough reducts

Richard Jensen, Andrew Tuson and Qiang Shen Formulation Fuzzy satisfiability In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: setup 9 benchmark datasets Features – 10 to 39 Objects – 120 to 690 Methods used: FRFS-SAT Greedy hill-climbing: fuzzy dependency, fuzzy boundary region and fuzzy discernibility. Evolutionary algorithms: genetic algorithms (GA) and particle swarm optimization (PSO) using fuzzy dependency 10x10-fold cross validation FS performed on the training folds, test folds reduced using discovered reducts

Richard Jensen, Andrew Tuson and Qiang Shen Experimentation: results

Richard Jensen, Andrew Tuson and Qiang Shen Conclusion Extended propositional satisfiability to enable search for fuzzy-rough reducts New framework for fuzzy satisfiability New DPLL algorithm Fuzzy clause simplification Future work: Non-chronological backtracking Better heuristics Unsupervised FS Other extensions in propositional satisfiability

Richard Jensen, Andrew Tuson and Qiang Shen WEKA implementations of all fuzzy-rough feature selectors and classifiers can be downloaded from:

Richard Jensen, Andrew Tuson and Qiang Shen Feature selection Feature selection (FS) is a DR technique that preserves data semantics (meaning of data) Subset generationSubset generation: forwards, backwards, random… Evaluation functionEvaluation function: determines ‘goodness’ of subsets Stopping criterionStopping criterion: decide when to stop subset search Generation Evaluation Stopping Criterion Validation

Richard Jensen, Andrew Tuson and Qiang Shen Algorithm

Richard Jensen, Andrew Tuson and Qiang Shen Example