Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.

Slides:



Advertisements
Similar presentations
Florida International University COP 4770 Introduction of Weka.
Advertisements

Random Forest Predrag Radenković 3237/10
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 7 – Classification and Regression Trees
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
Classifier Decision Tree A decision tree classifies data by predicting the label for each record. The first element of the tree is the root node, representing.
Creating Custom Forms. 2 Design and create a custom form You can create a custom form by modifying an existing form or creating a new form. Either way,
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Basic Data Mining Techniques
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
CS1100: Computer Science and Its Applications Creating Graphs and Charts in Excel.
CIS 674 Introduction to Data Mining
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Data Mining Techniques
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
A Genetic Algorithm-Based Approach for Building Accurate Decision Trees by Z. Fu, Fannie Mae Bruce Golden, University of Maryland S. Lele, University of.
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
The introduction to SPSS Ⅱ.Tables and Graphs for one variable ---Descriptive Statistics & Graphs.
Chapter 9 – Classification and Regression Trees
Using SAS® Information Map Studio
Copyright © 2008 Pearson Prentice Hall. All rights reserved. 1 1 Copyright © 2008 Prentice-Hall. All rights reserved. What Can I Do with a Spreadsheet.
A Picture Is Worth A Thousand Words. DAY 7: EXCEL CHAPTER 4 Tazin Afrin September 10,
Summary Statistics Review
1 (21) EZinfo Introduction. 2 (21) EZinfo  A Software that makes data analysis easy  Reveals patterns, trends, groups, outliers and complex relationships.
Introduction to SPSS. Object of the class About the windows in SPSS The basics of managing data files The basic analysis in SPSS.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Introduction to Enterprise Guide Jennifer Schmidt Rhonda Ellis Cassandra Hall.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Analytics on Hadoop Spring 2014 WPI, Mohamed Eltabakh 1.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
IENG-385 Statistical Methods for Engineers SPSS (Statistical package for social science) LAB # 1 (An Introduction to SPSS)
Data Visualization with Tableau
Chapter 10 Introduction to Data Mining
Unsupervised Learning
Data Mining – Intro.
Data Transformation: Normalization
Add More Zing to your Dashboards – Creating Zing Plot Gadgets
Introduction to Machine Learning and Tree Based Methods
DATA MINING © Prentice Hall.
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Summary Statistics
Revision (Part II) Ke Chen
Revision (Part II) Ke Chen
exploring Microsoft Office 2013 Plus
CSCI N317 Computation for Scientific Applications Unit Weka
Intro to Machine Learning
Topic 7: Visualization Lesson 1 – Creating Charts in Excel
Unsupervised Learning
Presentation transcript:

Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang

MineSet Introduction Problem in existing analytical tools MineSet Client/Server Architecture MineSet Enterprise Manager

INTRODUCTION Product of Silicon Graphics Inc. Supported by  Windows NT 4.0 (Server & Client)  Windows 95 & 98 (Client)  Memory varies with the size of data  64MB RAM  1024  768 Resolution with 65K colors  IRIX 6.4 and above (for server parellelization)

MineSet Helps to pinpoint and understand the complex patterns, relationships, and anomalies that are implicitly present in your data.

Problems in Existing Tools You must specify directly any relationships between data elements. ExampleExample Query for all the sales by region. Presupposes you have an idea that sales vary by region. Relationships may be uncovered that you did not know existed.

MineSet

MineSet Client/Server Architecture Client and Server can be on a same system or on a different system Server responsibilities u Accessing Data files u Data Transformations (Data Mover) u Running Mining operations (Classification, Association, etc…) u Generating visualization files

MineSet Client/Server Architecture Client’s Responsibility Providing GUI Integration with other systems –Support for ODBC complaint database for example SQL Server, DB2, Oracle,Sybase –Open Architecture allows you to coexist with other tools. For example SAS –inegrate with web using hotlinks –Custom Algorithms

MineSet Enterprise Manager MineSet Tool Manager MineSet 3D Visualizer MineSet Cluster Visualizer MineSet Record Visualizer MineSet Statistics Visualizer

MineSet Tool Manager Data Access and Data Transformation. Data Destinations.

MineSet Tool Manager

Basic Transformations Adding New Columns Removing existing Columns Aggregation Filtering Sampling Binning Apply Classifier

Adding New Columns Addition of new columns is possible to the existing dataset. Columns added can be derived from existing column by using expressions.

Removing Existing Columns Removing columns that are not persistent, are redundant, or contain obvious, uninteresting predictors.

Aggregation Grouping records together and finding the sum, maximum, minimum, or average.

Filtering Visualization To view strongest rules or the most profitable customer segments

Sampling Sampling the data to get a random subset of the data

Binning Breaking up of continuous range of data into discrete segments

Apply Classifier

Data Mining Tools Association. Classification. Cluster. Regression. Column Importance.

Column Importance Column importance helps one to discover which are the most important columns in predicting different values for a label column one chooses. This unlike clustering lets one to decide which label one will use to determine the importance of columns.

Column Importance Options when finding column importance One can specify Num of columns to find. Either to use weights or not. Specify the weight. No of additional importance columns. Specify purity of the columns present on right or left.

Association Rules Options Confidence (1-100) Support (1-100) Use weights or not. Unlimited items per rule / the no of items per rule. Height –Bars. Height – Disks. Color – Bars. Color – Disks. Label – Bars.

Association Rules

Interpreting association rules in Scatter Visualizer: The LHS represents items in this axis. The RHS represents items in this axis. Bar height corresponds to support. Bar colors represent lift. By pointing on the object on the bar one can get the specifications of the bar.

Clustering Single K-means Default method Iterative K-means

Clustering Single K-means Default method In single K means clustering one specifies the number of clusters Iterative K-means In iterative K – means one specifies the minimum, maximum no of clusters

Clustering Options present in creating clusters are : The distance measure (Euclidean / Manhattan). The number of iterations. The Random seeds.

Clustering

 The orders in which attributes are displayed represent the importance of the attributes.  The population shows the default settings.  Every column represents the different cluster.  On clicking each column at the top its attribute importance is shown.  Each box represents the max, min, median and deviation of the values in them.

Classifier Classification is the task of assigning a discrete label value to an unlabeled record Different modes : Classifier and Error Classifier Only Estimate Error Learning Curve

Classifier Classifier Mode Classifier mode uses all the available data to build the classifier. It is useful when you are not concerned with error estimation. Classifier and Error It uses the Holdout Error Estimation. Instead of using all the data to build the model, you can hold out the part of the data as a training set to induce the classifier. The classifier and error mode automatically partitions the data set into independent training and test subsets. Holdout ratio/ Random seed.

Classifier Error Estimate It uses the Cross Validation Error Estimation. Cross- validation is used for building the final classifier or for small datasets. Cross-validation is a method for getting a more precise estimate of error. Learning Curve The Learning Curve shows the error of the classifier generated by an inducer in proportion to the number of records used to create the classifier.

Classifier

The classification process can be induced by the following methods Decision Tree Option Tree Evidence Decision Table