Clustering Agglomerative Clustering (Hierarchical clustering)

Clustering Agglomerative Clustering (Hierarchical clustering)

Algorithm Introduction
Algorithm Introduction **Bonus: Model Preview Code Review Live Demo(Using InAnalysis) Conclusion Reference

Hierarchical clustering Build nested clusters by merging(agglomerative) or splitting(divisive) them successively. Tree (dendrogram) - Hierarchy of clusters. Root: all the samples. Leaves: one sample.

Agglomerative Clustering Bottom up each observation starts in its own cluster, and clusters are successively merged together. Distance Metric Linkage Method

Distance Metric

Linkage method Ward minimizes the sum of squared differences within all clusters. Minimization of within-cluster variance.

Linkage method

**Bonus: Model Preview
seaborn - clustermap Linkage method='average' Distance metric='euclidean' heatmap

Code Review ->在各演算法子類別中實作 ParamsDefinition類別包含單項參數相關資料
○ Attribute：參數名稱(name)、參數型態(type)、參數範圍 (range)、參數初始值(default_value)、參數描述 (description) ○ Method：get_params_definition()會以字典型態 (key-value pair)回傳所有Attribute的資料 ● ParamsDefinitionSet由ParamsDefinition組成 ○ Attribute：參數定義組(params_definition_set)將所有參數相關資料包成List型態 ○ Method：get_params_definition_set()會回傳List型態包含所有參數相關資料

Code Review affinity=’Euclidean’ linkage=’ward’
繼承 algo_component 中的 ParamsDefinitionSet 類別 If linkage is “ward”, only “euclidean” is accepted. affinity=’Euclidean’ linkage=’ward’

Code Review 依照參數建立sklearn演算法物件放入資料訓練model 將訓練好的模型打包回傳

Code Review

Code Review 將演算法名稱和類別定義在Algorithm這enum類別並藉由algo_factory()這個函式來取得演算法物件

Code Review

Live Demo(Using InAnalysis)
Wholesale customers Data Set Wholesale_customers_nolabel_data.csv Attributes: 1) Fresh 2) Milk 3) Grocery 4) Frozen 5) Detergents_paper 6) Delicatessen 7) Channel: Horeca Or Retail 1) FRESH: annual spending (m.u.) on fresh products (Continuous); 2) MILK: annual spending (m.u.) on milk products (Continuous); 3) GROCERY: annual spending (m.u.)on grocery products (Continuous); 4) FROZEN: annual spending (m.u.)on frozen products (Continuous) 5) DETERGENTS_PAPER: annual spending (m.u.) on detergents and paper products (Continuous) 6) DELICATESSEN: annual spending (m.u.)on and delicatessen products (Continuous); 7) CHANNEL: customersâ€™ Channel - Horeca (Hotel/Restaurant/CafÃ©) or Retail channel (Nominal)

K-means vs. Agglomerative Clustering
5 Features: Milk, Grocery, Detergents Paper, Delicassen, Channel n_clusters = 3

K-means vs. Agglomerative Clustering
3 Features: Milk, Grocery, Detergents Paper n_clusters = 4

Conclusion Scikit-learn Unit test Visualization
Object-oriented programming Real data set

Reference http://seaborn.pydata.org/generated/seaborn.clustermap.html
clustering learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClusteri ng.html#sklearn.cluster.AgglomerativeClustering

Thank you

Clustering Agglomerative Clustering (Hierarchical clustering)

Similar presentations

Presentation on theme: "Clustering Agglomerative Clustering (Hierarchical clustering)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering Agglomerative Clustering (Hierarchical clustering)

Similar presentations

Presentation on theme: "Clustering Agglomerative Clustering (Hierarchical clustering)"— Presentation transcript:

Similar presentations

About project

Feedback