Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS(2008)

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC SVC is from SVMs SVMs is supervised clustering technique  Fast convergence  Good generalization performance  Robustness for noise SVC is unsupervised approach 1. Data points map to HD feature space using a Gaussian kernel. 2. Look for smallest sphere enclose data. 3. Map sphere back to data space to form set of contours. 4. Contours are treated as the cluster boundaries. 3

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis To find the minimal enclose sphere with soft margin: To solve this problem, the Lagrangian function: 4 a

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis 5

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC - Sphere Analysis Karush-Kuhn-Tucker complementarity: 6 Bound SV; Outlier

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis To find the minimal enclose sphere with soft margin: C : existence of outliers allowed 7 Wolfe dual optimization problem a

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. SVC -Sphere Analysis The distance between x and a: q : |clusters| & the smoothness/tightness of the cluster boundaries. 8 Mercer kernel Kernel: Gaussian a Gaussian function:

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 9 The traditional cluster validity measure such as Partition coefficient (PC) Separation measures Base on fuzzy membership grades and cancroids of clusters. SVC algorithm generates boundaries to cluster are arbitrary no fuzzy membership grade. Which clustering is better?

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives Optimal cluster number  Cluster validity measure  Outlier-detection algorithm  Cluster merging mechanism 10 Outlier-detection Cluster merging

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology - Overview 11 Cluster Validity Measure for the SVC Algorithm Outlier detection Cluster-Merging Mechanism C=1, no outliers are allowed

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster Validity Measure for the SVC Algorithm 12 Compactness (intra-cluster) Separation (inter-cluster) Cluster Validity measure (ratio) for SVC min

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection 13 In SVC, outliers (BSV) are the data in boundary regions. q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection C  If C=1, result clusters are smooth, but not desirable BSV (outlier)  All outlier are SVs  Some outlier is far away from other data in clusters SVs  More SVs make too tight to fit the data q  Increase q makes clusters compact Singleton  Important criterion 14 q = 1 q = 4 q = 2 q = 1.8 C=0.02 singleton

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Outlier Detection Outlier Existence Criterion Desirable Cluster Criterion  Singleton clusters can’t exceed threshold  Datapoint’s % of SVs can’t greater than threshold, suggested 50%  Recursively adjust C to satisfy this two criterion 15 Suggested γ = 2

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism Similarity: overlapping degree 16 Gaussian function: P C = 0 P A > 0

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Cluster-Merging Mechanism 1) Agglomerative outliers/noises: identification For all ci 0, merge cluster i and cluster j. Otherwise, discard cluster i. Set K ← K − 1.} 2) Compatible clusters: Combination (similarity) Sort the size of the remaining K clusters in ascending order such that cK = max(ci), ∀ i ∈ K. For each i, i = 1,..., K, perform {Set x ← mi. For each j, j = i + 1,..., K, perform pj(x) Find l = arg max i+1≤j≤K pj(x), where arg maxa denotes the value of a at which the expression that follows is maximized. If pl > 0, merge cluster i with cluster l. Set K ← K − 1 and repeat 2) until no further combination.} 17

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – Summary 1) Initialize a small value of q, and set C = 1 and γ = 2 2) Perform SVC algorithm, get |clusters|. 3) If |clusters| < 2, increase q, go to 2). 4) If the outlier-detection criterion holds, decrease C, fix q, and go to 2). Otherwise, go to 5). 5) If |SVs|< 50% of the datapoints, go to 6). Otherwise, decrease C, and go to 2). 6) Compute validity measure index (V (m)). 7) If |clusters| > √N, increase q, and go to 2). Otherwise, stop the SVC. 8) Use cluster-merging mechanism to identify an ideal |clusters|. Output |clusters|. 18

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Bensaid Data Set 19

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples Five-Cluster Data Set & Five-Cluster Data Set With Noise 20

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 21 Five-Cluster Data Set With Noise, after cluster-merge Merge

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - Benchmark and Artificial Examples 22 Crescent Data Set

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments - IRIS Data Set 23 Misclassificatoin

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions This paper integrated for SVC:  cluster validity measure  Outlier detection  Merging mechanism Automatically determine suitable values for  Kernel parameter  Soft-margin constant Clustering with  Compact and smooth arbitrary-shaped cluster contours  Increasing robustness to outliers and noises 24

25 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage  Provide a cluster validity index for a cluster method Drawback  … Application  SVC 25


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Cluster Validity Measure With Outlier Detection for Support Vector Clustering Presenter : Lin, Shu-Han."

Similar presentations


Ads by Google