Download presentation

Presentation is loading. Please wait.

Published byGerald Connolly Modified over 2 years ago

1
Information Retrieval Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 For the MSc Computer Science Programme Dell Zhang Birkbeck, University of London

2
Yahoo! Hierarchy http://dir.yahoo.com/science dairy crops agronomy forestry AI HCI craft missions botany evolution cell magnetism relativity courses agriculturebiologyphysicsCSspace... … (30)...

3
Hierarchical Clustering Build a tree-like hierarchical taxonomy (dendrogram) from a set of unlabeled documents. Divisive (top-down) Start with all documents belong to the same cluster. Eventually each node forms a cluster on its own. Recursive application of a (flat) partitional clustering algorithm, e.g., k Means ( k =2) Bi-secting k Means. Agglomerative (bottom-up) Start with each document being a single cluster. Eventually all documents belong to the same cluster.

4
Dendrogram Clustering is obtained by cutting the dendrogram at a desired level: each connected component forms a cluster. The number of clusters k is not required in advance.

5
Dendrogram – Example Clusters of News Stories: Reuters RCV1

6
Dendrogram – Example Clusters of Things that People Want: ZEBO

7
HAC Hierarchical Agglomerative Clustering Starts with each doc in a separate cluster. Repeat until there is only one cluster: Among the current clusters, determine the pair of clusters, c i and c j, that are most similar. (Single-Link, Complete-Link, etc.) Then merges c i and c j to a single cluster. The history of merging forms a binary tree or hierarchy.

8
Single-Link The similarity between a pair of clusters is defined by the single strongest link (i.e., maximum cosine- similarity) between their members: After merging c i and c j, the similarity of the resulting cluster to another cluster, c k, is:

9
HAC – Example

11
d1 d2 d3 d4 d5 d1,d2 d4,d5 d3 d3,d4,d 5 As clusters agglomerate, docs are likely to fall into a dendrogram.

12
HAC – Example Single-Link

13
Take Home Message Single-Link HAC Dendrogram

Similar presentations

OK

Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.

Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on mars exploration rover Convert pdf to ppt online adobe Web services seminar ppt on 4g Ppt on national parks of india Ppt on role of mass media in education Conjunctions for kids ppt on batteries Ppt on introduction to c programming Ppt on propagation of sound waves Kid after dentist appt on your birthday Ppt on real numbers for class 9th physics