Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang

Slides:

Advertisements

Similar presentations

CHAPTER 13: Alpaydin: Kernel Machines

Advertisements

Lecture 9 Support Vector Machines

ECG Signal processing (2)

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

An Introduction of Support Vector Machine

Pattern Recognition and Machine Learning

An Introduction of Support Vector Machine

Intelligent Database Systems Lab Presenter: WU, JHEN-WEI Authors: Jorge Gorricha, Victor Lobo CG Improvements on the visualization of clusters in.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. local-density based spatial clustering algorithm with noise Presenter : Lin, Shu-Han Authors : Lian Duan,

Clustering and Dimensionality Reduction Brendan and Yifang April

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Clustering data in an uncertain environment using an artificial.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.

K-Means and DBSCAN Erik Zeitler Uppsala Database Laboratory.

Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

Locally Constraint Support Vector Clustering

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Support Vector Clustering Algorithm presentation by : Jialiang Wu.

Mathematical Programming in Support Vector Machines

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.

Efficient Model Selection for Support Vector Machines

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Human eye sclera detection and tracking using a modified.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. BNS Feature Scaling: An Improved Representation over TF·IDF for SVM Text Classification Presenter : Lin,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Evaluation of novelty metrics for sentence-level novelty mining Presenter : Lin, Shu-Han Authors : Flora.

An Efficient Approach to Clustering in Large Multimedia Databases with Noise Alexander Hinneburg and Daniel A. Keim.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Data mining for credit card fraud: A comparative study.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter.

1 Gaussian Kernel Width Exploration and Cone Cluster Labeling For Support Vector Clustering Department of Computer Science University of Massachusetts.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A fast nearest neighbor classifier based on self-organizing incremental neural network (SOINN) Neuron.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Graduate ： Yu Cheng Chen Author: Manoranjan.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Novel Density-Based Clustering Framework by Using Level.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.

Biointelligence Laboratory, Seoul National University

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Rival-Model Penalized Self-Organizing Map Yiu-ming Cheung.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Enhanced neural gas network for prototype-based clustering.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning multiple nonredundant clusterings Presenter :

Other Clustering Techniques

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.

© Eric CMU, Machine Learning Support Vector Machines Eric Xing Lecture 4, August 12, 2010 Reading:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Community self-Organizing Map and its Application to Data Extraction Presenter: Chun-Ping Wu Authors:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering ： integrating data clustering over optimization.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.

Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Jian-Lin Kuo Author ： Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Nonlinear Mapping for Data Structure Analysis John W.

An Introduction of Support Vector Machine In part from of Jinwei Gu.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Portfolio Analysis and Mining for SCORM Compliant Environment Pattern Recognition (PR, 2010)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han.

SUPPORT VECTOR MACHINES

Support Vector Machines

A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:

Presentation transcript:

Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm 為SVC演算法開發的一個以混合參數搜尋方法為基礎的分群有效性評估模式 Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang PR (2008)

Outline Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

SVC SVC is from SVMs SVMs is supervised clustering technique Fast convergence Good generalization performance Robustness for noise SVC is unsupervised approach Data points map to HD feature space using a Gaussian kernel. Look for smallest sphere enclose data. Map sphere back to data space to form set of contours. Contours are treated as the cluster boundaries. 3

SVC - Sphere Analysis To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: a where ||・|| is the Euclidean norm, a is the center of the sphere, ξj are slack variables that loosen the constraints to allow some data points lying outside the sphere, C is a constant, and CΣξj is a penalty term. To solve the optimization problem as in (1), it is convenient to introduce the Lagrangian function The first and last terms are the objective function to be minimized. The second term represents the inequality constraints associated with the slack variables. The third term is the result of the non-negativity requirements on the values of ξj’s. 第一項與最後一項表示要最小化的目標函數，第二項表示加入差額變數的不等式限制式，第三項是在ξj為非負數的情形下所得到的結果。 4

SVC - Sphere Analysis 為了要使lagragian最小化，我們必須將R、a、ξj分別微分，並且設定為零。 Lagragian multiplier unknown 5

SVC - Sphere Analysis Karush-Kuhn-Tucker complementarity: Based on (3)–(7), we can classify each data point into the following: 1) an internal point; 2) an external point, and 3) a boundary point in the feature space. Point xj is classified as an internal point if βj = 0. When 0 < βj < C, the data point xj is denoted as an SV. SVs lying on the surface of the featurespace sphere are the so-called boundary points. These SVs can be used to describe the cluster contour in the input space. When βj = C, the data points located outside the feature space are defined as the external points or BSVs. Note that if C ≥ 1, no external points will exist. Hence, the value of C can be used to control the existence of external points, namely, outliers, during the clustering process. 所谓径向基函数 (Radial Basis Function 简称 RBF), 就是某种沿径向对称的标量函数。通常定义为空间中任一点x到某一中心xc之间欧氏距离的单调函数 , 可记作 k(||x-xc||), 其作用往往是局部的 , 即当x远离xc时函数取值很小。 6

Wolfe dual optimization problem SVC -Sphere Analysis To find the minimal enclose sphere with soft margin: C : existence of outliers allowed Wolfe dual optimization problem C越大，outlier越少，C等於（或大於）0的時候，就不容許有任何outlier產生了。 a Bound SV; Outlier 7

SVC -Sphere Analysis Mercer kernel Kernel: Gaussian a The distance (similarity) between x and a: q : |clusters| & the smoothness/tightness of the cluster boundaries. Mercer kernel Kernel: Gaussian By using the above conditions, (1) can be turned into a Wolfe dual optimization problem with only variablesβj 所謂徑向基函數(Radial Basis Function簡稱RBF),就是某種沿徑向對稱的標量函數。通常定義為空間中任一點x到某一中心xc之間歐氏距離的單調函數,可記作k(||x-xc||),其作用往往是局部的,即當x遠離xc時函數取值很小。 q : 如果不是使用高斯來當作kernel函數的話，就不會有q這個值了；但是無論你選什麼當kernel，都會面臨到選擇C這個數值。 Q是變異數的倒數；因為變異數越大的話，代表這個R越大，也代表這個分出來的群會比較寬鬆；相反的那麼也就是說，q越大的話，群就會越緊實。 a Gaussian function: 8

Motivation Drawbacks of Cluster validation Compactness Different densities or size As the # of clusters increases, it will monotonic decrease Separation Irregular cluster structures 9 9

Motivation Their previous study Can handle Different sizes Different densities Arbitrary shape But… 10 10

Objectives – A cluster validity method and a parameter search algorithm for SVC Auto determine the two parameter: Increasing q lead to increasing # of clusters C regulates the existence of outliers and overlapping clusters To Identify the optimal structure C這個參數代表著對outlier 跟overlap的容忍度 Q是變異數的倒數；因為變異數越大的話，代表這個R越大，也代表這個分出來的群會比較寬鬆；相反的那麼也就是說，q越大的話，群就會越緊實。 11

Methodology - Idea N=64, max # of cluster = , 8 q is related to the densities of the clusters Each cluster structure corresponds to an interval of q Identify the optimal structure is equivalent to finding the largest interval N=64, max # of cluster = , 8 12

Methodology - Problem How to locate overall search range of q How to detect outliers/noises How to identify the largest interval 13

Methodology – Locate range of q Lower bound Upper bound: Employ K-Means to get clusters, and get variance of each clusters vi Ascending order: cluster size 意思是如果資料集的range很大的話， q min就會比較小，意思是就要從更小的地方開始搜尋然後意思是如果群裡面的變異數很大的話，q max就會比較小，意思是如果群都很大的話，那就不用搜尋到這麼多q了 n =3, the biggest 3 clusters’ variance 14

Methodology – Outlier Detection singleton Set q = qmax , the tightest of q And we get Copt, remove these outlier 15

Methodology – the largest interval qopt 16

Methodology – the largest interval Fibonacci search: locate the interval where the cluster structure is the same Bisection search n: iteration 17

Methodology – Overview Locate range of q the largest interval Outlier Detection 18

Experiments - Benchmark and Artificial Examples 19

Experiments - Outlier Copt 20

Experiments ？ 21

Conclusions q C A new measure: Inspired from the observations of q Determine the optimal cluster structure with its corresponding range of q and C q C

Comments Advantage Drawback Application Inspired from observation of parameter Drawback … Application SVC DBSCAN: MinPts / Eps