Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Yu Cheng Chen Author: Hichem.
Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Rare and Frequent Events in Multi-camera Surveillance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On multidimensional scaling and the embedding of self-organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Graph self-organizing maps for cyclic and unbounded graphs.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Adaptive nonlinear manifolds and their applications to pattern.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
A Fuzzy k-Modes Algorithm for Clustering Categorical Data
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SEP/COP: An efficient method to find the best partition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Qing.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Balaji Rajagopalan Mark W. Isken 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Mining concept maps from news stories for measuring civic scientific literacy in media Presenter :
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A survey of kernel and spectral methods for clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Recognizing Partially Occluded, Expression Variant Faces.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Michael.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda Koren and David Harel A Two-Way Visualization Method for Clustered Data ACM SIGKDD international conference on Knowledge discovery and datamining

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Basic Notions Computing The x-Coordinates Computing The y-Coordinates Result Related Work Conclusions Personal Opinion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation A number of technological development have led to an explosion of raw data that has to be analyzed We are especially interested in two families of tools in this domain Clustering algorithms and data visualization methods

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective in this paper, we integrate the two approaches hierarchical clustering depicted as a dendrogram low-dimensional embedding

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction A number of technological development have led to an explosion of raw data that has to be analyzed We are especially interested in two families of tools in this domain Clustering algorithms and data visualization methods Clustering methods can be broadly classified Hierarchical and partitional

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Our main interest here is hierarchical clustering The clustering hierarchy is often visualized as a dendrogram A full binary tree has a significant disadvantage does not provide exploratory visual representations of the data itself another issue is that of cluster validity

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction we are particularly interested in methods for achieving a low-dimensional embedding of data principal component analysis (PCA) multidimensional scaling (MDS) force-directed placement solve some limitations of dendrogram but, cannot utilize external clustering information

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction for a demonstration of the relative merits of the two approaches a dendrogram vs. a low-dimensional embedding

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction in this paper, we integrate the two approaches hierarchical clustering depicted as a dendrogram low-dimensional embedding

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Basic Notions given data about n elements {1,…,n} relationships between pairs of elements are by distances d ij ≥ 0 or similarities w ij ≥ 0 2-dimentional embedding of the data id defined by two vectors x, y Є the coordinates of element i are ( x i, y i )

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Computing The x-Coordinates The embedding must place each element exactly below its corresponding leaf in the dendrogram this means that the x-coordinate must corresponding leaf in the dendrogram face the problem of computing the x-coordinates of the dendrogram leaves preserves the relationships among the data as much as possible

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Computing The x-Coordinates we exhaust all the existing methods, opting for a twofold process find the best orientation of the dendrogram this step determines the ordering of the leaves decide on the exact gaps between consecutive leaves in the ordering

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation a dendrogram has 2 n-1 different orientations example :

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation one way of defining formally what should be considered a “good” ordering associate a cost function with the dendrogram such that finding the best ordering is equivalent to optimizing this function be the classical minimum linear arrangement problem minimizes

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation in our particular problem also faced with an ordering task a permutation of {1, …, n} however, here we should not consider all possible permutations, but only agree with dendrogram’s structure n!  2 n-1 using dynamic programming, running time is exponential in the dendrogram’s height not in its size

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation introduce an additional form of the cost function maximizes

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation given an ordered dendrogram T a node v Leaves(v) : the set of leaves in the substree rooted by v x be the ordering on the leaves Let S be Leaves(v) L be the set of leaves of left of S R be the set of leaves of right of S if |L| = l, |S| = s, we have x(L) = {1,…,l}, x(S) = {l+1,…,l+x}, x(R) = {l+s+1,…,n}

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation a key concept of the algorithm is local arrangement cost, defined as : if |L| = l, |S| = s, we have x(L) = {1,…,l}, x(S) = {l+1,…,l+x}, x(R) = {l+s+1,…,n}

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Dendrogram orientation two additional related terms will be used another term that will be used in the algorithm

Intelligent Database Systems Lab N.Y.U.S.T. I. M.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining coordinates of the leaves computing the exact gaps between each two consecutive leaves example :

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining coordinates of the leaves a better approach is to take a weighted average over all influenced leaf pairs

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Computing The y-Coordinates Principle component analysis Classical multidimensional scaling Eigen-projection Stress minimization

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Result Odors dataset consists of 30 volatile odorous pure chemicals contains 262 elements, natural clusters : 30 use a UPGMA agglomerative clustering to construct the dendrogram

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Result Iris dataset an example of discriminant analysis contains 150 elements, natural clusters : 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Result Gene expression data : CDC15-synchronized cell cycle a much larger dataset of gene-expression data contains 6113 elements

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Related Work TreeView dendrogram over a color-coded matrix

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Discussion success for integrating two key methods in exploratory data analysis cluster analysis and low-dimensional embedding two unique properties Guaranteed separation between any kind of given clusters The ability to deal with a predefined hierarchical clustering

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinion  Advantages ─ has success for integrating two of clustering methods. ─ more intuition in analyzing  Application ─ Real data for clustering and analyzing. ─ May solve the problem lack of clustering information  Limited ─ cannot show the real shape of clusters