Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.

Slides:

Advertisements

Similar presentations

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On Rival Penalization Controlled Competitive Learning.

Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 SCAN: A Structural Clustering Algorithm for Networks Xiaowei.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Keng-Wei Chang Author ： Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology CONTOUR: an efficient algorithm for discovering discriminating.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology On Data Labeling for Clustering Categorical Data Hung-Leng.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien Shing Chen Author: Wei-Hao.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Ming Hsiao Author ： Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Keng-Wei Chang Author: Yehuda.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.

A Fuzzy k-Modes Algorithm for Clustering Categorical Data

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Manoranjan.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Evolving Reactive NPCs for the Real-Time Simulation Game.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Chung-hung.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Enhanced neural gas network for prototype-based clustering.

Intelligent Database Systems Lab Advisor ： Dr.Hsu Graduate ： Keng-Wei Chang Author ： Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Modeling Semantic Similarities in Multiple Maps Presenter.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Wei Xu,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Comparing Association Rules and Decision Trees for Disease.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Concept Frequency Distribution in Biomedical Text Summarization.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author ： Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering ： integrating data clustering over optimization.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien-Shing Chen Author: Gustavo.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2005.ACM GECCO.8.Discriminating and visualizing anomalies.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Chun Kai Chen Author ： Andrew.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Electricity Based External Similarity of Categorical Attributes.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive Clustering for Multiple Evolving Streams Graduate.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A clustering-based approach for prediction of cardiac.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology IEEE EC1 Generating War Game Strategies Using A Genetic.

Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence data Advisor : Dr. Hsu Graduate : Wen-Hsiang Hu Authors : Seung-Joon Oh*, Jae-Yearn Kim, 2003 Elsevier B.V. All rights reserved. Republic of Korea.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction The proposed similarity measure and hierarchical clustering algorithm. Experimental results Conclusions Personal Opinion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation some methods of similarity measure have defects (ex: edit distance 、 sequence alignment)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective generate better-quality clusters lower computational time

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction We study how to cluster sequence datasets, such as protein sequences, retail transactions, and web-logs. We propose a new similarity measure and develop a hierarchical clustering algorithm for categorical sequence data.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 The similarity between data must be decided before clustering the data. The similarity between S1 = A B C D and S2 = A C D E is calculated using the pairs of items in: S1=>(AB, AC, AD, BC, BD, CD) = E 1 S2=>(AC, AD, AE, CD, CE, DE) = E 2 Measure of similarity between sequences 1.We are given database D of sequences. 2. Sequence S =(x 1 x 2...x i...x j...x n ) is an ordered list of items, where x i is an item having a categorical value. 3. A sequence element e k is a pair of items, x i x j (i < j), in sequence S. 4. E = (e 1, e 2,...,e k,...) is the collection of sequence elements e k. 5.The number of elements in E is referred to as the size of E and is denoted by |E|. The pairs of identical items are AC, AD, CD =>

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Our proposed similarity measure between sequences Si and Sj can be converted into dissimilarities by using a simple transformation such as: The measure d is called a semimetric, if it fulfills the conditions Proposed method to compute the dissimilarity between sequences

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 The proposed hierarchical clustering algorithm First, each of the n × (n − 1)/2 pairs of possible merges is evaluated, and the two clusters that have the maximum value of the criterion function [Eq(2)] are merged. Eq.(2) is derived from Eq.(3) from Zho and Karypis[10] Where n r is the number of sequences in Cr and k is the number of clusters. (2) Where n r is the number of objects in Cr and k is the number of clusters. S1=ABCD =>(AB, AC, AD, BC, BD, CD) = E1 S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 C new

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Our proposed agglomerative hierarchical clustering algorithm is presented as follows: S1S1 S2S2 S3S3 S4S4 S6S6 S5S5 C new The proposed hierarchical clustering algorithm (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Step 1. requires large computational time S1 = 〈 a1...ai...ck...an 〉 S2 = 〈 b1...bj...cl...bm 〉 Items ai are exclusive to sequence S1; items bj are exclusive to S2; and items ck and cl are common to S1 and S2. However, ck and cl may or may not be the same sequence (see Example 2). ck is called S3 and cl is called S4. Let E1, E2, E3, and E4 be the collection of sequence elements in S1, S2, S3, and S4, respectively. The similarity between sequences S1 and S2 is defined: An efficient algorithm for measuring similarity

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 S1 = 〈 A B C F Z A 〉 are compared with S2 = 〈 A F C H 〉 identical item is inserted into S3= 〈 ACFA 〉 =>E3=(AC,AF,AA,CF,CA,FA) S2 = 〈 A F C H 〉 are compared with S1 = 〈 A B C F Z A 〉 identical item is inserted into S4 = 〈 A F C 〉 =>E4 = (AF,AC, FC) S1 = 〈 A B C F Z A 〉 and S2 = 〈 A F C H 〉 calculated using the pairs of items in: S1(AB, AC, AF,AZ,AA,BC,BF,BZ,BA,CF,CZ,CA,FZ,FA,ZA) = E1 S2(AF,AC, AH,FC,FH,CH) = E2 An efficient algorithm for measuring similarity (cont.)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Experimental results Similarity measure hierarchical clustering algorithm Algorithm 1 edit distance Proposed hierarchical clustering algorithm Algorithm 2edit distancecomplete linkage method Proposed clustering Algorithm Proposed hierarchical clustering algorithm

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 The splice dataset contains sequences for 767 EI (exon/intron) and 768 IE (intron/exon). Experimental results - Splice dataset

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 four different datasets:DS1,DS2,DS3, and DS4 Each dataset was a market basket database. Experimental results – Synthetic dataset No of misclassified transactions

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 we generated synthetic datasets (2000 transactions) No of misclassified transactions

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Conclusions We developed a hierarchical clustering algorithm and presented an efficient method for determining the similarity measure.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Personal Opinion 2- dimension => multi-dimension