Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Heuristic Search techniques

Systems Analysis and Design in a Changing World, Fifth Edition

DECISION TREES. Decision trees  One possible representation for hypotheses.

Random Forest Predrag Radenković 3237/10

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe

Decision Tree Approach in Data Mining

Information Systems in Business

Software Modeling SWE5441 Lecture 3 Eng. Mohammed Timraz

Traditional Approach to Design

Chapter 7 – Classification and Regression Trees

SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.

Chapter 12: Expert Systems Design Examples

Software Testing and Quality Assurance

Aki Hecht Seminar in Databases (236826) January 2009

Software Metrics II Speaker: Jerry Gao Ph.D. San Jose State University URL: Sept., 2001.

1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.

Basic Data Mining Techniques

1 Visualization and Data Mining techniques By- Group number- 14 Chidroop Madhavarapu( ) Deepanshu Sandhuria( ) Data Mining CSE 634 Prof.

Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.

© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.

Visualization and Data Mining Daniel A. Keim Professor and Head of Data Mining and Information Visualization University of Constance Konstanz, Germany.

SpaceTree: Supporting Exploration in Large Node Link Tree, Design Evolution and Empirical Evaluation Catherine Plaisant, Jesse Grosjean, Benjamin B.Bederson.

Rotation Forest: A New Classifier Ensemble Method 交通大學電子所蕭晴駿 Juan J. Rodríguez and Ludmila I. Kuncheva.

Chapter 10: Architectural Design

WPI Center for Research in Exploratory Data and Information Analysis From Data to Knowledge: Exploring Industrial, Scientific, and Commercial Databases.

Software Product Lines Krishna Anusha, Eturi. Introduction: A software product line is a set of software systems developed by a company that share a common.

Quality-Aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim, Aixin Sun, and Roger H. L. Chiang. In Proceedings.

LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.

PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.

Chapter 10 Architectural Design

Overview of the Database Development Process

USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.

Issues with Data Mining

1 Mining with Noise Knowledge: Error Awareness Data Mining Xindong Wu Department of Computer Science University of Vermont, USA; Hong Kong Polytechnic.

Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.

Software System Engineering: A tutorial

Perception-Based Classification (PBC) System Salvador Ledezma April 25, 2002.

Taxonomies of Visualization Techniques CMPT 455/826 - Week 12, Day 2 w12d2 Sept-Dec

Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak

1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Visual Analysis of Hierarchical Management Data Zhao Geng 1, Gaurav Gathania 2, Robert S.Laramee 1 and ZhenMin Peng 1 1 Visual Computing Group, Computer.

GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.

1 Introduction to Software Engineering Lecture 1.

Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

CS690L Data Mining: Classification

Chapter 4 Decision Support System & Artificial Intelligence.

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor ： Dr. Koh Jia-Ling Speaker ： Tu Yi-Lang Date ： Hong.

Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.

Visualization Four groups Design pattern for information visualization

Presented by: Sandeep Chittal Minimum-Effort Driven Dynamic Faceted Search in Structured Databases Authors: Senjuti Basu Roy, Haidong Wang, Gautam Das,

An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.

Data Mining and Decision Support

Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.

Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.

An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.

Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,

Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.

Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.

Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),

Federal Land Manager Environmental Database (FED) Overview and Update June 6, 2011 Shawn McClure.

DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.

PostGraduate Research Excellence Symposium (PGReS) 2015

DEFECT PREDICTION : USING MACHINE LEARNING

Visualization and Data Mining techniques

Evolutionary Ensembles with Negative Correlation Learning

Presentation transcript:

Visual Data Mining: Concepts, Frameworks and Algorithm Development Student: Fasheng Qiu Instructor: Dr. Yingshu Li

Outline Introduction to VDM VDM frameworks Case study of a specific framework VQLBCI Conclusions References

Introduction to VDM The use of visualization techniques to allow data miners to evaluate, monitor, and guide the inputs, products and process of data mining. It can help introduce user insights, preferences, and biases in the earlier stages of the data mining life-cycle to reduce its overall computation complexity and reduce the set of uninteresting patterns in the product. The new insights may facilitate the development of better algorithms and processes for data mining.

Why Visual Data Mining VDM takes advantage of both, The power of automatic calculations, and The capabilities of human processing. – Human perception is involved

Why Visual Data Mining It is useful because 1.Early user feedback about level of interest can filter out uninteresting patterns early in the process. 2.User feedback can improve the selection of appropriate learning algorithms based on the application domain. 3.Visual inspection of datasets can provide direct clues towards interesting patterns in the data. 4.Visualization of the data mining process can provide new insights and may result in the design of better algorithms for data mining.

The current research in VDM Mainly focus on the framework of integrating the information visualization paradigms and data mining techniques. For example, 1.A Flexible Approach for Visual Data Mining. 2.A framework for visual data mining of structures. 3.A Visual Data Mining Framework for Convenient Identification of Useful knowledge. 4.A Visual Data Mining Framework by loose-coupling of databases and visualization systems.

A VDM framework-VQLBCI The following slides present a VDM framework and the application of the VDM framework towards designing new algorithms that can learn decision trees by manually refining some of the decisions made by well known algorithms such as C4.5.

Components of VQLBCI The three major components of VQLBCI are Visual Representations, Computations and Events.

Visual Development of Algorithms Most interesting use of visual data mining is the development of new insights and algorithms. The figure below shows the ER diagram for learning classification decision trees. This model allows the user to monitor the quality and impact of decisions made by the learning procedure. Learning procedure can be refined interactively via a visual interface.

ER diagram for the search space of decision tree learning algorithm

General Process Learning a classification decision tree from a training data set can be regarded as a process of searching for the best decision tree that meets user-provided goal constraints. The problem space of this search process consists of Model Candidates, Model Candidate Generator and Model Constraints. Many existing classification-learning algorithms like C4.5 and CDP fit nicely within this search framework. New learning algorithms that fit user’s requirements can be developed by defining the components of the problem space.

General Process Model Candidate corresponds to the partial classification decision tree. Each node of the decision tree is a Model Atom. Search process is the process of finding a final model candidate such that it meets user goal specifications. Model Candidate Generator transforms the current model candidate into a new model candidate by selecting one model atom to expand from the expandable leaf model atoms. Model Constraints (used by Model Candidate Generator) provide controls and boundaries to the search space.

Example of Search Process 13

Example of Search Process

Acceptability Constraint Model Constraints consist of Acceptability constraints, Expandability constraints and a Data-Entropy calculation function. Acceptability constraint predicate specifies when a model candidate is acceptable and thus allows search process to stop. EX: – A1) Total # of expandable leaf model atoms = 0. (Default) – A2) Overall error rate of the model candidate <= acceptable error rate. – A3) Total number of model atoms in the model candidate>= maximal allowable tree size. A1 is used in C4.5 and CDP

Expandability Constraint An Expandability constraint predicate specifies whether a leaf model atom is expandable or not. EX: – C4.5 uses E1 and E2 – CDP uses E2 and E3

Traversal Strategy Traversal strategy ranks expandable leaf model atoms based on the model atom attributes. EX: – Breadth-first – Depth-first – Orders based on other model atom attributes. (Best-first)

Steps in Visual Algorithm Development No single algorithm is the best all the time, performance is highly data dependent. By changing different predicates of model constraints, users can construct new classification-learning algorithm. This enables users to find an algorithm that works the best on a given data set. Two algorithms are developed : BF based on Best First search idea and CDP+ which is a modification of CDP

BF This algorithm is based on the Best-First search idea. For Acceptability criteria, it includes A1 and A2 with a user specified acceptable error rate. The Traversal strategy chosen is T3 In Best-First, expandable leaf model atoms are ranked according to the decreasing order of the number of misclassified training cases. (local error rate * size of subset training data set) The traversal strategy will expand a model atom that has the most misclassified training cases, thus reducing the overall error rate the most.

CDP + CDP+ is a modification of CDP CDP has dynamic pruning using expandability constraint E3. Here, the depth is modified according to the size of the training data set of the model atom. We set b is the branching factor of the decision tree, t is the size of training data set belonging to model atom, T is the whole training data set.

Comparison of different classification learning algorithms

Experiment The new BF and CDP+ algorithms are compared with the C4.5 and CDP algorithms. Various metrics are selected to compare the efficiency, accuracy and size of final decision trees of the classification algorithm. The generation efficiency of the nodes is measured in terms of the total number of nodes generated. To compare accuracy of the various algorithms, the mean classification error on the test data sets have been computed.

Classification error for 10 data sets

Nodes generated for 10 data sets

Final decision tree size

Results CDP has accuracy comparable to C4.5 while generating considerably fewer nodes. CDP+ has accuracy comparable to C4.5 while generating considerably fewer nodes. CDP+ outperformed CDP in error rate and number of nodes generated. Considering all performance metrics together, CDP+ is the best overall algorithm. Considering classification accuracy alone, C4.5P is the winner.

Results Different datasets require different algorithms for best results. Diverse user requirements put different constraints on the final decision tree. The experiment shows that Visual Data Mining Framework can help find the most suitable algorithm for a given data set and group of user requirements.

Conclusions VDM is useful in enhancing the data mining process. Different VDM frameworks are provided. VDM can help the design of better algorithms. 28

References Matthias Kreuseler and Heidrun Schumann. A Flexible Approach for Visual Data Mining. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 8, NO. 1, JANUARY-MARCH 2002 M. Kreuseler, T.Nocke, etc. A History Mechanism for Visual Data Mining. IEEE Symposium on Information Visualization 2004 October 10-12, Austin, Texas, USA Kaidi Zhao, Bing Liu, etc. A Visual Data Mining Framework for Convenient Identification of Useful Knowledge. Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05) Hans-J¨org Schulz, Thomas Nocke, etc. A Framework for Visual Data Mining of Structures. Twenty-Ninth Australasian Computer Science Conference (ACSC2006), Hobart, Tasmania, Australia, January Alexander Kort. Visual Data Mining and Zoomable Interfaces. IUI’04, January 13–16, 2004, Madeira, Funchal, Portugal. ACM /04/0001 Koji Kato, Tomoyuki Shibata. Visual Data Mining using Omni-directional Sensor. IEEE Conference on Multisensor Fusion and Integration for Intelligent Systems 2003 Daniel A. Keim. Information Visualization and Visual Data Mining. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 7, NO. 1, JANUARY-MARCH 2002 M.Ganesh, Eui-Hong Han, etc. Visual Data Mining: Framework and Algorithm Development. Technical Report, March 12, 1996

That’s it Questions? Comments? Thank you!