Introduction to Weka and NetDraw

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Weka & Rapid Miner Tutorial By Chibuike Muoh. WEKA:: Introduction A collection of open source ML algorithms – pre-processing – classifiers – clustering.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Decision Tree Rong Jin. Determine Milage Per Gallon.
WEKA Evaluation of WEKA Waikato Environment for Knowledge Analysis Presented By: Manoj Wartikar & Sameer Sagade.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
March 25, 2004Columbia University1 Machine Learning with Weka Lokesh S. Shrestha.
Three kinds of learning
An Extended Introduction to WEKA. Data Mining Process.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Machine Learning with WEKA. WEKA: the bird Copyright: Martin Kramer
1 © 2005 Major Web Intelligence Tools. 2 © 2005 Web Intelligence Tools I. Collection –Offline Explorer –SpidersRUs (AI Lab) –Google Scholar II. Analysis.
1 How to use Weka How to use Weka. 2 WEKA: the software Waikato Environment for Knowledge Analysis Collection of state-of-the-art machine learning algorithms.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
An Exercise in Machine Learning
CSCI 347 / CS 4206: Data Mining Module 05: WEKA Topic 01: WEKA Navigation.
 The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Contributed by Yizhou Sun 2008 An Introduction to WEKA.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
WEKA and Machine Learning Algorithms. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of.
Appendix: The WEKA Data Mining Software
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Constructing Data Mining Applications based on Web Services Composition Ali Shaikh Ali and Omer Rana
Machine Learning with Weka Cornelia Caragea Thanks to Eibe Frank for some of the slides.
Weka: Experimenter and Knowledge Flow interfaces Neil Mac Parthaláin
Artificial Neural Network Building Using WEKA Software
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Weka – A Machine Learning Toolkit October 2, 2008 Keum-Sung Hwang.
Introduction to Weka Xingquan (Hill) Zhu Slides copied from Jeffrey Junfeng Pan (UST)
 A collection of open source ML algorithms ◦ pre-processing ◦ classifiers ◦ clustering ◦ association rule  Created by researchers at the University.
An Exercise in Machine Learning
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
A new clustering tool of Data Mining RAPID MINER.
Machine Learning with WEKA - Yohan Chin. WEKA ? Waikato Environment for Knowledge Analysis A Collection of Machine Learning algorithms for data tasks.
Weka Overview Sagar Samtani and Hsinchun Chen Spring 2016, MIS 496A
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
WEKA, Mahout, and MLlib Overview
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
@relation age sex { female, chest_pain_type { typ_angina, asympt, non_anginal,
WEKA: A Practical Machine Learning Tool WEKA : A Practical Machine Learning Tool.
Department of Computer Science, University of Waikato, New Zealand Eibe Frank WEKA: A Machine Learning Toolkit The Explorer Classification and Regression.
An Introduction to WEKA
Machine Learning: Decision Trees in AIMA and WEKA
Machine Learning: Decision Trees in AIMA and WEKA
Waikato Environment for Knowledge Analysis
WEKA.
Sampath Jayarathna Cal Poly Pomona
An Introduction to WEKA
Yulei (Gavin) Zhang Yan (Mandy) Dang Chang Heon Lee (Heon)
Machine Learning with WEKA
Machine Learning with WEKA
Weka Package Weka package is open source data mining software written in Java. Weka can be applied to your dataset from the GUI, the command line or called.
Machine Learning with Weka
An Introduction to WEKA
Tutorial for WEKA Heejun Kim June 19, 2018.
Machine Learning with Weka
Machine Learning with WEKA
Lecture 10 – Introduction to Weka
Statistical Learning Introduction to Weka
Copyright: Martin Kramer
Machine Learning: Decision Trees in AIMA and WEKA
Data Mining CSCI 307, Spring 2019 Lecture 7
Machine Learning for Cyber
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Introduction to Weka and NetDraw MIS510 Spring 2009

Outline Weka NetDraw Introduction Weka Tools/Functions How to use Weka? Weka Data File Format (Input) Weka for Data Mining Sample Output from Weka (Output) Conclusion NetDraw How to use NetDraw? NetDraw Input Data File Format Draw Networks using NetDraw

Weka

Introduction to Weka (Data Mining Tool) Weka was developed at the University of Waikato in New Zealand. http://www.cs.waikato.ac.nz/ml/weka/ Weka is a open source data mining tool developed in Java. It is used for research, education, and applications. It can be run on Windows, Linux and Mac.

What can Weka do? Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset (using GUI) or called from your own Java code (using Weka Java library). Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

Weka Tools/Functions Tools (or functions) in Weka include: Data preprocessing (e.g., Data Filters), Classification (e.g., BayesNet, KNN, C4.5 Decision Tree, Neural Networks, SVM), Regression (e.g., Linear Regression, Isotonic Regression, SVM for Regression), Clustering (e.g., Simple K-means, Expectation Maximization (EM)), Association rules (e.g., Apriori Algorithm, Predictive Accuracy, Confirmation Guided), Feature Selection (e.g., Cfs Subset Evaluation, Information Gain, Chi-squared Statistic), and Visualization (e.g., View different two-dimensional plots of the data). 6

Weka’s Role in the Big Picture Input Raw data Data Ming by Weka Pre-processing Classification Regression Clustering Association Rules Visualization Output Result 7

How to use Weka? Weka Data File Format (Input) Weka for Data Mining Sample Output from Weka (Output)

Weka Data File Format (Input) The most popular data input format of Weka is “arff” (with “arff” being the extension name of your input data file). FILE FORMAT @relation RELATION_NAME @attribute ATTRIBUTE_NAME ATTRIBUTE_TYPR @data DATAROW1 DATAROW2 DATAROW3

Example of “arff” Input File @relation heart-disease-simplified @attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present} @data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,?,no,not_present ... numeric attribute nominal attribute

Weka for Data Mining There are mainly 2 ways to use Weka to conduct your data mining tasks. Use Weka Graphical User Interfaces (GUI) GUI is straightforward and easy to use. But it is not flexible. It can not be called from you own application. Import Weka Java library to your own java application. Developers can leverage on Weka Java library to develop software or modify the source code to meet special requirements. It is more flexible and advanced. But it is not as easy to use as GUI.

Weka GUI Different analysis tools/functions The value set of the chosen attribute and the # of input items with each value Different attributes to choose 12

Weka GUI Classification Algorithms

Import Weka Java library to your own Java application Three sets of classes you may need to use when developing your own application Classes for Loading Data Classes for Classifiers Classes for Evaluation

Classes for Loading Data Related Weka classes weka.core.Instances weka.core.Instance weka.core.Attribute How to load input data file into instances? Every DataRow -> Instance, Every Attribute -> Attribute, Whole -> Instances # Load a file as Instances FileReader reader; reader = new FileReader(path); Instances instances = new Instances(reader);

Classes for Loading Data Instances contains Attribute and Instance How to get every Instance within the Instances? How to get an Attribute? # Get Instance Instance instance = instances.instance(index); # Get Instance Count int count = instances.numInstances(); # Get Attribute Name Attribute attribute = instances.attribute(index); # Get Attribute Count int count = instances.numAttributes();

Classes for Loading Data How to get the Attribute value of each Instance? Class Index (Very important!) # Get value instance.value(index); or instance.value(attrName); # Get Class Index instances.classIndex(); or instances.classAttribute().index(); # Set Class Index instances.setClass(attribute); or instances.setClassIndex(index);

Classes for Classifiers Weka classes for C4.5, Naïve Bayes, and SVM Classifier: all classes which extend weka.classifiers.Classifier C4.5: weka.classifier.trees.J48 NaiveBayes: weka.classifiers.bayes.NaiveBayes SVM: weka.classifiers.functions.SMO How to build a classifier? # Build a C4.5 Classifier Classifier c = new weka.classifier.trees.J48(); c.buildClassifier(trainingInstances); Build a SVM Classifier Classifier e = weka.classifiers.functions.SMO(); e.buildClassifier(trainingInstances);

Classes for Evaluation Related Weka classes weka.classifiers.CostMatrix weka.classifiers.Evaluation How to use the evaluation classes? # Use Classifier To Do Classification CostMatrix costMatrix = null; Evaluation eval = new Evaluation(testingInstances, costMatrix); for (int i = 0; i < testingInstances.numInstances(); i++){ eval.evaluateModelOnceAndRecordPrediction(c,testingInstances.instance(i)); System.out.println(eval.toSummaryString(false)); System.out.println(eval.toClassDetailsString()) ; System.out.println(eval.toMatrixString()); }

Classes for Evaluation Cross Validation In cross validation process, we split a single dataset into N equal shares. While taking N-1 shares as a training dataset, the rest will be used as testing dataset. The most widely used is 10 cross fold validation.

Classes for Evaluation How to obtain the training dataset and the testing dataset? Random random = new Random(seed); instances.randomize(random); instances.stratify(N); for (int i = 0; i < N; i++) { Instances train = instances.trainCV(N, i , random); Instances test = instances.testCV(N, i , random); }

Sample Output from Weka

Conclusion about Weka In sum, the overall goal of Weka is to build a state-of-the-art facility for developing machine learning (ML) techniques and allow people to apply them to real-world data mining problems. Detailed documentation about different functions provided by Weka can be found on Weka website. WEKA is available at: http://www.cs.waikato.ac.nz/ml/weka

NetDraw

Introduction to NetDraw (Visualization Tool) NetDraw is an open source program written by Steve Borgatti from Analytic Technologies. It is often used for visualizing both 1-mode and 2-mode social network data. You can download it from: http://www.analytictech.com/downloadnd.htm (Compared to Weka, it is much easier to use :P)

What can NetDraw do? NetDraw can: handle multiple relations at the same time, and use node attributes to set colors, shapes, and sizes of nodes. Pictures can be saved in metafile, jpg, gif and bitmap formats. Two basic kinds of layouts are implemented: a circle and an MDS based on geodesic distance. You can also rotate, flip, shift, resize and zoom configurations.

How to use NetDraw? NetDraw Input Data File Format Draw Networks using NetDraw

NetDraw Input Data File Format “vna” Data Format The VNA data format (with “vna” being the extension name of the input data file) allows users to store not only network data but also attributes of the nodes, along with information about how to display them (color, size, etc.). *node data "ID", num "$10 Gift Card off REGIS SALON (SALON SERVICES) + E" 2 "$10 iTunes Gift Certificate exp 9/2008" 2 "$10 STARBUCKS gift CARD CERTIFICATE" 3 "$10 Target Gift Card" 3 "$10.00 iTunes Music Gift Card - Free Shipping" 2 "$100 Best Buy Gift Card" 15 "$100 Gap Gift Card - FREE Shipping" 9 … … … … … … … … *Tie data FROM TO "Strength" "Home Depot Gift Card $500." "$100 Home Depot Gift Card Accepted Nationwide" 1 "** $250 Best Buy GiftCard Gift Card Gift Certifica" "$25 Best Buy Gift Card for Store or Online!" 1 "$50 Bed Bath & Beyond Gift Card - FREE SHIPPING!" "$200 Cost Plus World Market Gift Card 4 Jewelry Be" 1 "$500.00 Best Buy gift certificate" "$15 Best Buy Gift Card *Free Shipping*" 1 "$25 Best Buy Gift Card for Store or Online!" "$15 Best Buy Gift Card *Free Shipping*" 1 "Bath and Body Works $25 Gift Card" "$200 Cost Plus World Market Gift Card 4 Jewelry Be" 1

Draw Networks using NetDraw Different functions Display setup of the nodes and relations The networks: nodes representing the individuals and links representing the relations

Analysis Example: Hot Item Analysis based on Giftcard selling information from eBay Each circle in the graph represents an active item in the database. The label of the circle is the item title. The bigger the circle and the label of circle, the hotter the item. Items are clustered together based on the brand information. Hot Topics during April 15 – April 22, 2007 Hot Topics during April 22 – April 29, 2007

Conclusion In sum, NetDraw can be used for social network visualization. There are a lot of parameters to play with in the tool. The results can be saved as EMF, WMF, BMP and JPG files. NetDraw is available at: http://www.analytictech.com/downloadnd.htm The website also provides detailed documentation. If you have interest, you may also try some other visualization tools such as JUNG (http://jung.sourceforge.net/) and GraphViz (http://www.graphviz.org/).

Some Suggestions Carefully prepare your data according to the input format required by each tool. Read the documentation of each tool that you decide to use and understand its functionality. Think how it can be applied to your project. Download and play with the tools. You cannot learn anything unless you try them by yourself!!!

Thanks! Good luck for your projects! 