Computational Intelligence: Methods and Applications

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Classification Algorithms – Continued. 2 Outline  Rules  Linear Models (Regression)  Instance-based (Nearest-neighbor)
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
Three kinds of learning
An Extended Introduction to WEKA. Data Mining Process.
Data Mining Techniques
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
An Exercise in Machine Learning
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Issues with Data Mining
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Chapter 7: Transformations. Attribute Selection Adding irrelevant attributes confuses learning algorithms---so avoid such attributes Both divide-and-conquer.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Computational Intelligence: Methods and Applications Lecture 30 Neurofuzzy system FSM and covering algorithms. Włodzisław Duch Dept. of Informatics, UMK.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Computational Intelligence: Methods and Applications Lecture 36 Meta-learning: committees, sampling and bootstrap. Włodzisław Duch Dept. of Informatics,
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
1 1 Slide Using Weka. 2 2 Slide Data Mining Using Weka n What’s Data Mining? We are overwhelmed with data We are overwhelmed with data Data mining is.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.
W E K A Waikato Environment for Knowledge Aquisition.
An Exercise in Machine Learning
Data Mining and Decision Support
Computational Intelligence: Methods and Applications Lecture 33 Decision Tables & Information Theory Włodzisław Duch Dept. of Informatics, UMK Google:
Computational Intelligence: Methods and Applications Lecture 21 Linear discrimination, linear machines Włodzisław Duch Dept. of Informatics, UMK Google:
Computational Intelligence: Methods and Applications Lecture 29 Approximation theory, RBF and SFN networks Włodzisław Duch Dept. of Informatics, UMK Google:
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka Tutorial. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering – association rule Created by.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Computational Intelligence: Methods and Applications Lecture 34 Applications of information theory and selection of information Włodzisław Duch Dept. of.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Computational Intelligence: Methods and Applications
Waikato Environment for Knowledge Analysis
Decision Tree Saed Sayad 9/21/2018.
WEKA.
Data Mining: Concepts and Techniques Course Outline
Data Mining Practical Machine Learning Tools and Techniques
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
Lecture 10 – Introduction to Weka
Chapter 7: Transformations
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Computational Intelligence: Methods and Applications Lecture 17 WEKA/RapidMiner Knowledge extraction from simplest decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch (c) 1999. Tralvex Yeap. All Rights Reserved

A few data mining packages A large number of data mining packages that include many CI models for data analysis is available. See long list of DM software, including large commercial packages. GhostMiner, from Fujitsu (created by our group); please get it. WEKA started the trend to collect many packages in one system. RapidMiner, formerly YALE – initially a better front-end to WEKA, includes all WEKA models, free source; please get it. New interesting projects: see my list of software. Orange, component-based data mining software, includes visualizations, SOM/MDS modules. KNIME, based on Eclipse platform, includes Weka and R-scripts, modular data exploration platform, visual data flows. R-project, language for statistical computing and graphics. (c) 1999. Tralvex Yeap. All Rights Reserved

WEKA Project Machine learning algorithms in Java: I.H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann 1999 Project Web page: www.cs.waikato.ac.nz/ml/weka One of the most popular packages. Essentially a collection of Java class libraries implementing various computational intelligence algorithms. ARFF data format, with data in CSV format (comma separated, exportable from spreadsheets), and additional information about the data, type of each feature, etc. CLI, or command line interface (only for Unix lowers), ex: java weka.classifiers.j48.J48 -t data/weather.arff calls one of the methods (here J48) from the library. (c) 1999. Tralvex Yeap. All Rights Reserved

WEKA Software “Explorer GUI” for making basic calculations, recently much improved. “Experimenter” and “Knowledge Flow” environments for performing more complex experiments is provided, this allows for averaging over crossvalidation results or combining different models. On-line documentation for library classes but little description of methods. WEKA/RM software contains: preprocessing filters, supervised and unsupervised many classification models rule-based models for knowledge discovery association rules (one method) regression, or numerical prediction models 3 clusterization (unsupervised learning) methods scatterogram visualization (2D) a collection of sample simple problems (from the UCI repository). (c) 1999. Tralvex Yeap. All Rights Reserved

More WEKA/RM Software Simple “Explorer GUI” for making basic calculations. Rather rough “Experimenter” environment for performing more complex experiments, such as averaging over crossvalidation results or combining different models is provided. On-line documentation for library classes. WEKA/RM software contains: preprocessing filters, supervised and unsupervised classification models rule-based models for knowledge discovery association rules (one method) regression, or numerical prediction models 3 clusterization (unsupervised learning) methods scatterogram visualization (2D) a collection of sample problems (from UCI repository) (c) 1999. Tralvex Yeap. All Rights Reserved

WEKA strong/weak points Platform independent – Java! Many projects created around it: listed here. Free, contains large collection of filters and algorithms. May be extended by a serious user. But ... Java programs are not so stable as Windows programs, there are problems with some Java versions; rather poor visualization of data and results; RapidMiner is a big improvement. Simple user interface has been improved recently, Knowledge Flow GUI changes this. Requires tedious programming to perform experiments – RM is easier. Algorithms are not described in details in documentation and in the book (only in the class libraries). (c) 1999. Tralvex Yeap. All Rights Reserved

RapidMiner Like WEKA, same models + few more, easier to use. Free, contains large collection of filters and algorithms. May be extended by a serious user. Download RapidMiner, start it and read the tutorial! Includes 20 visual data exploration methods: scatter, scatter matrix, interactive scatter 3D, parallel, 2D density, radial radviz, gradviz, SOM (U-distance and P-density). Unfortunately algorithms are not described in details in the documentation and you have to study class libraries to understand what exactly GridViz or RadViz does, or read original papers to understand what U, U* and P matrix SOM visualization is. Check much better descriptions of methods in Orange! (c) 1999. Tralvex Yeap. All Rights Reserved

Knowledge representation Knowledge representation is an important subject in Artificial Intelligence, here only simple forms of knowledge are considered. Decision rules: prepositional rules: IF (all conditions are true) THEN facts M-of-N rules: IF (M conditions of N are true) THEN facts fuzzy rules: IF (conditions true to some degree) THEN facts are true to some degree Linguistic variables: favorite-colors, low-noise-level, young-age, etc: subsets of nominal or discrete values, intervals of numerical values, ex: teenager ={T if age<20} constrained subsets of numerical values. (c) 1999. Tralvex Yeap. All Rights Reserved

WEKA/RM filters Many filters that can be applied to attributes (features) or to instances (vectors, samples), some specific to signal/time series data. Divided into supervised/unsupervised, attribute or instance. Create new attribute from existing ones using algebraic operations; remove instances with attribute values in some range, for example missing values; delete attributes of specific type (ex. binary) change nominal values to binary combinations, ex. Xi{a,b,c,d} => (Xi1,Xi2)({0,1},{0,1}) rank the usefulness of attributes (several schemes); evaluate usefulness of subsets of features (several schemes); perform PCA; normalize features in many ways; discretize attributes, define simple bins or look for more natural discretization, for example bins created by the Minimum Description Length (MDL) principle (called “use Kononenko”). many others ... (c) 1999. Tralvex Yeap. All Rights Reserved

Classification algorithms Divided into: Bayes – versions of probabilistic Bayesian methods Functions – parameterized functions, linear and non-linear Lazy – no parameter learning, all work done when classifying Meta – committees, voting, boosting, stacking ... metamodels. Misc – untypical models, fuzzy lattice, hyperpipes, voting features Tree-building models, recursive partitioning Rule learning models These algorithm enable: knowledge discovery, or data mining (trees, rules); predictive modeling in classification or regression tasks. See WEKA detailed presentation: http://prdownloads.sourceforge.net/weka/weka.ppt (c) 1999. Tralvex Yeap. All Rights Reserved

Decision rules Algorithm for knowledge discovery, or data mining, 10 rule and 10 tree-based, providing knowledge in form of logical rules. Zero-R, predicting majority class (or mean values) One-R, simplest one-level (one attribute) decision tree. Decision stump, one-level tree C4.5, called here J.48, since this is Java implementation of the version 8 of C4.5 decision tree algorithm. M5’ model tree learner. Naive Bayes tree classifier. PART rule learner (covering algorithm). Prototype – based algorithms: Instance –based learner (IB1, IBk, ID3) nearest neighbor method Decision table (c) 1999. Tralvex Yeap. All Rights Reserved

Regression algorithms Regression (function) and classification algorithms include: Naive Bayes (2 versions) Linear Regression, or LDA Additive regression Logistic regression LWR, Locally Weighted Regression MLP (multi-layer perceptron) neural network, VPN, voted perceptron network SMO, or Support Vector Machine algorithm K*, similarity based system with algorithmic complexity minimization. (c) 1999. Tralvex Yeap. All Rights Reserved

Other algorithms Statistical algorithms for model improvement (meta-algorithms): bagging, boosting, adaboost logit boost, stacking Clusterization: K-means, Expectation Maximization, Cobweb Association: find relations between attributes. Visualization of 2D scatterograms (c) 1999. Tralvex Yeap. All Rights Reserved

WEKA/RM example Contact lenses: do I need hard, soft or none? Very small data set, 24 instances: contact-lens.arff What is in the database? 1. age of the patient: (1) young, (2) pre-presbyopic, (3) presbyopic 2. spectacle prescription: (1) myope, (2) hypermetrope 3. astigmatic: (1) no, (2) yes 4. tear production rate: (1) reduced, (2) normal Class Distribution: hard contact lenses: 4 soft contact lenses: 5 no contact lenses: 15 (c) 1999. Tralvex Yeap. All Rights Reserved

ZeroR Zero method: for a small number of classes (categorical class variables) predict the majority class; for numerical outputs (regression problems) predict the average. Useful to establish the base rate, zero variance, large bias: if any method obtains results that are worse than ZeroR serious overfitting of data occurs. For contact-lenses: confusion matrix === Confusion Matrix === a b c <= classified as 0 0 5 | a = soft 0 0 4 | b = hard 0 0 15 | c = none 15 classified correctly, 62.5% on the whole data. What happens in 10xCV? (c) 1999. Tralvex Yeap. All Rights Reserved

DT - idea Class: {cancer, healthy} Features: cell body: {gray, stripes} nuclei: {1, 2}; tails: {1, 2} cancer healthy healthy cancer (c) 1999. Tralvex Yeap. All Rights Reserved

More ambitious tree (c) 1999. Tralvex Yeap. All Rights Reserved

1R 1R: simplest useful tree (Holte 1993), sometimes results are good. One level tree, nominal attributes. 1R algorithm: for every attribute X for every attribute value Xi: count the class frequencies N(Xi,wj) find the most frequent class c = arg maxj N(Ai,wj) create a rule (majority classifier): IF Xi THEN wc Calculate accuracy of this rule. Select rules of highest accuracy. Missing value ? is treated as any other nominal value. (c) 1999. Tralvex Yeap. All Rights Reserved

1R example Example taken from WEKA book: weather condition and decision to play an in-door games (tennis); 14 examples are given Task: find the decision rule (weather.nominal.arff). Attribute: Outlook Outlook = Sunny has 3 examples with No and 2 with Yes; Outlook = Overcast has 4 examples with Yes Rainy has 3 examples with Yes and 2 with No; Optimal rules: using only Outlook, or only Humidity. Dataset is too small to evaluate accuracy, but rules are reasonable. (c) 1999. Tralvex Yeap. All Rights Reserved

1R continuous How should the continuous values be treated? Divide the range of continuous attribute into intervals Ii(X) (discretize the attribute); treat intervals as nominal values, i.e. write X=Ii if XIi(X). For each attribute X sort all cases according to the increasing X values; define the intervals Ii(X) where class wc dominates, maxc N(Ii(X),wc). This should decrease the number of errors in 1R algorithm. Problem: if the data is noisy or some examples are quite untypical, rules should not be created! A simpler solution: use buckets, or intervals with minimum # of elements, admitting some “impurities”. (c) 1999. Tralvex Yeap. All Rights Reserved

1R example with continuous variables Discretization of temperature, instead of hot, mild, cool. To avoid noise, intervals containing not less than 4 elements are used. WEKA implementation: Bucket size = min number of elements in interval. Slightly more accurate solution is found with B=2-4, but B=1 has only 1 error – is it good or bad? Still it makes no sense ... (c) 1999. Tralvex Yeap. All Rights Reserved

1R example with continuous variables Discretization of humidity, instead of high/normal (small font = no): 65 70 70 70 75 80 80 85 86 90 90 91 95 96 To avoid noise, bucket containing B4 elements are used. 65 70 70 70 75 80 80 | 85 86 90 90 91 95 | 96 Rather naive, adds to bucket until non-majority class sample is found. WEKA implementation: Bucket size = min number of elements in interval. Slightly more accurate solution is found with B=2-4, but B=1 has only 1 error – is it good or bad? Still it makes no sense ... (c) 1999. Tralvex Yeap. All Rights Reserved

Netflix 1M$ Prize Netflix Prize, an award of $1 million to the first person or team who can achieve certain accuracy goals when recommending movies based on personal preferences – announced in Oct 2006. The company made 100 million anonymous movie ratings available to contestants for learning. Details for registering and competing for the Netflix Prize are at: http://www.netflixprize.com All members of the CI/machine learning community are invited to participate! 10% improvement is needed, 9.65% achieved in April 2009, RMSE <= 0.8563, achieved 0.8596. (c) 1999. Tralvex Yeap. All Rights Reserved