Waikato Environment for Knowledge Analysis

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or causal structures.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
An Extended Introduction to WEKA. Data Mining Process.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Jump to first page The objective of our final project is to evaluate several supervised learning algorithms for identifying pre-defined classes among web.
SEG Tutorial 2 – Frequent Pattern Mining.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 04: Data Preparation Tools.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Weka: a useful tool in data mining and machine learning Team 5 Noha Elsherbiny, Huijun Xiong, and Bhanu Peddi.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
For ITCS 6265/8265 Fall 2009 TA: Fei Xu UNC Charlotte.
W E K A Waikato Environment for Knowledge Analysis Branko Kavšek MPŠ Jožef StefanNovember 2005.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
W E K A Waikato Environment for Knowledge Aquisition.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
A new clustering tool of Data Mining RAPID MINER.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Department of Computer Science, University of Waikato, New Zealand Geoff Holmes WEKA project and team Data Mining process Data format Preprocessing Classification.
An Introduction to WEKA
Data Mining – Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association rule mining
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Association Rule Mining
Machine Learning with WEKA
Machine Learning with WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Data Mining Association Analysis: Basic Concepts and Algorithms
Machine Learning with Weka
An Introduction to WEKA
Machine Learning with Weka
Department of Computer Science National Tsing Hua University
Lecture 10 – Introduction to Weka
Association Analysis: Basic Concepts
Data Mining CSCI 307, Spring 2019 Lecture 7
Presentation transcript:

Waikato Environment for Knowledge Analysis

Contents What is WEKA? The Explorer: References and Resources Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources

What is WEKA? Waikato Environment for Knowledge Analysis It’s a data mining/machine learning tool developed by Department of Computer Science, University of Waikato, New Zealand. Weka is also a bird found only on the islands of New Zealand.

Getting started with weka Free : License GNU Multi platform (Java): Windows, Mac, Linux Easy to install Examples of dataset Documentation, Tutorial and Mooc

What is WEKA? Data Mining tool User interface / Integrated to your Java code Data filters Classification Clustering Visualization

Weka

Input data ARFF (Attribute-Relation File Format) CSV SQL Database Name of the dataset Attributes’ name, value and type Data

The explorer Preprocessing Visualization Classification Clustering Finding associations Attribute selection

Explorer: Preprocessing 49 different filters. Supervised, unsupervised On attributes or instances

Demo: Filters

Demo: Visualization

Explorer: Classifier 76 different classification algorithms Decision trees, instance-based classifiers, support vector machines (SVM), Bayes’ nets…

Demo: Classifier

Contents What is WEKA? The Explorer: References and Resources Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources

Input data ARFF (Attribute-Relation File Format) CSV SQL Database Name of the dataset Attributes’ name, value and type Data

Contents What is WEKA? The Explorer: References and Resources Preprocess data Data Visualization Classification Clustering Association Rules Attribute Selection References and Resources

Explorer: clustering data WEKA contains “clusterers” for finding groups of similar instances in a dataset Implemented schemes are: - k-Means, EM, Cobweb, X-means, FarthestFirst Clusters can be visualized and compared to “true” clusters (if given) Evaluation based on loglikelihood if clustering scheme produces a probability distribution

K-Means Clustering (contd.) Example

The K-Means Clustering Method Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step 2, stop when no more new assignment

Demo : Clustering Data

Explorer: Finding associations WEKA contains an implementation of the Apriori algorithm for learning association rules Works only with discrete data Can identify statistical dependencies between groups of attributes: milk, butter  bread, eggs (with confidence 0.9 and support 2000) Apriori can compute all rules that have a given minimum support and exceed a given confidence 9/21/2018

Basic Concepts: Frequent Patterns Tid Items bought 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk itemset: A set of one or more items k-itemset X = {x1, …, xk} (absolute) support, or, support count of X: Frequency or occurrence of an itemset X (relative) support, s, is the fraction of transactions that contains X (i.e., the probability that a transaction contains X) An itemset X is frequent if X’s support is no less than a minsup threshold Customer buys diaper buys both buys beer September 21, 2018

Basic Concepts: Association Rules Tid Items bought Find all the rules X  Y with minimum support and confidence support, s, probability that a transaction contains X  Y confidence, c, conditional probability that a transaction having X also contains Y Let minsup = 50%, minconf = 50% Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer, Diaper}:3 10 Beer, Nuts, Diaper 20 Beer, Coffee, Diaper 30 Beer, Diaper, Eggs 40 Nuts, Eggs, Milk 50 Nuts, Coffee, Diaper, Eggs, Milk Customer buys both Customer buys diaper Customer buys beer Association rules: (many more!) Beer  Diaper (60%, 100%) Diaper  Beer (60%, 75%) September 21, 2018

Demo : Association

Explorer: attribute selection Panel that can be used to investigate which (subsets of) attributes are the most predictive ones Attribute selection methods contain two parts: A search method: best-first, forward selection, random, exhaustive, genetic algorithm, ranking An evaluation method: correlation-based, wrapper, information gain, chi-squared, … Very flexible: WEKA allows (almost) arbitrary combinations of these two 9/21/2018

Demo : Attribute Selection

Different ways to use it Explorer : Preprocessor, clustering, classifier, regression analysis, visualization Experimenter: Analysis and comparison of classifiers

Different ways to use it Simple Command Line Instructions Can be integrated to your java code

Weka’s Advantages: Contains a lot of algorithms Free (most other Data Mining tools are very expensive) Open source, so adapting it to your own needs is possible Constantly under development (not only by the original designers)

Drawbacks Lack of possibilities to interface with other software Performance is often sacrificed in favor of portability, design transparency, etc. Memory limitation, because the data has to be loaded into main memory completely

Conclusion + Easy to use + No programming skill needed - Visualization and statistical tools limited