Information Retrieval Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 For the MSc Computer Science Programme Dell Zhang.

Slides:



Advertisements
Similar presentations
Lecture 15(Ch16): Clustering
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Hierarchical Clustering
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Albert Gatt Corpora and Statistical Methods Lecture 13.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document Clustering l Dr. Paula Matuszek l
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 17: Hierarchical Clustering 1.
Introduction to Bioinformatics
2004/05/03 Clustering 1 Clustering (Part One) Ku-Yaw Chang Assistant Professor, Department of Computer Science and Information.
6-1 ©2006 Raj Jain Clustering Techniques  Goal: Partition into groups so the members of a group are as similar as possible and different.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Image Segmentation Chapter 14, David A. Forsyth and Jean Ponce, “Computer Vision: A Modern Approach”.
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
4. Ad-hoc I: Hierarchical clustering
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 17: Hierarchical Clustering 1.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Unsupervised Learning: Clustering 1 Lecture 16: Clustering Web Search and Mining.
Clustering. What is clustering? Grouping similar objects together and keeping dissimilar objects apart. In Information Retrieval, the cluster hypothesis.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
tch?v=Y6ljFaKRTrI Fireflies.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Pandu Nayak and Prabhakar Raghavan.
ITCS 6265 Information Retrieval & Web Mining Lecture 15 Clustering.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Algorithms k-means Hierarchic Agglomerative Clustering (HAC) …. BIRCH Association Rule Hypergraph Partitioning (ARHP) Categorical clustering.
Introduction to Information Retrieval Introduction to Information Retrieval Modified from Stanford CS276 slides Chap. 16: Clustering.
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
CSE5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
Clustering. What is Clustering? Clustering: the process of grouping a set of objects into classes of similar objects –Documents within a cluster should.
Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree like diagram that.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 15 10/13/2011.
Clustering (Modified from Stanford CS276 Slides - Lecture 17 Clustering)
Today’s Topic: Clustering  Document clustering Motivations Document representations Success criteria  Clustering algorithms Partitional Hierarchical.
Information Retrieval and Organisation Chapter 17 Hierarchical Clustering Dell Zhang Birkbeck, University of London.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Slides adapted from Chris Manning, Prabhakar Raghavan, and Hinrich Schütze (
1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Introduction to Information Retrieval Introduction to Information Retrieval Clustering Chris Manning, Pandu Nayak, and Prabhakar Raghavan.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Introduction to Information Retrieval Introduction to Information Retrieval CS276: Information Retrieval and Web Search Christopher Manning and Prabhakar.
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining and Text Mining. The Standard Data Mining process.
Hierarchical Clustering & Topic Models
Graph clustering to detect network modules
Unsupervised Learning: Clustering
Hierarchical Clustering
Unsupervised Learning: Clustering
IAT 355 Clustering ______________________________________________________________________________________.
CSE 4705 Artificial Intelligence
Machine Learning Lecture 9: Clustering
Information Retrieval
Hierarchical Clustering
John Nicholas Owen Sarah Smith
Hierarchical clustering approaches for high-throughput data
本投影片修改自Introduction to Information Retrieval一書之投影片 Ch 16 & 17
CS 391L: Machine Learning Clustering
Text Categorization Berlin Chen 2003 Reference:
Hierarchical Clustering
Clustering Techniques
SEEM4630 Tutorial 3 – Clustering.
Hierarchical Clustering
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Information Retrieval Lecture 7 Introduction to Information Retrieval (Manning et al. 2007) Chapter 17 For the MSc Computer Science Programme Dell Zhang Birkbeck, University of London

Yahoo! Hierarchy dairy crops agronomy forestry AI HCI craft missions botany evolution cell magnetism relativity courses agriculturebiologyphysicsCSspace... … (30)...

Hierarchical Clustering Build a tree-like hierarchical taxonomy (dendrogram) from a set of unlabeled documents.  Divisive (top-down) Start with all documents belong to the same cluster. Eventually each node forms a cluster on its own. Recursive application of a (flat) partitional clustering algorithm, e.g., k Means ( k =2)  Bi-secting k Means.  Agglomerative (bottom-up) Start with each document being a single cluster. Eventually all documents belong to the same cluster.

Dendrogram Clustering is obtained by cutting the dendrogram at a desired level: each connected component forms a cluster. The number of clusters k is not required in advance.

Dendrogram – Example Clusters of News Stories: Reuters RCV1

Dendrogram – Example Clusters of Things that People Want: ZEBO

HAC Hierarchical Agglomerative Clustering  Starts with each doc in a separate cluster.  Repeat until there is only one cluster: Among the current clusters, determine the pair of clusters, c i and c j, that are most similar.  (Single-Link, Complete-Link, etc.) Then merges c i and c j to a single cluster.  The history of merging forms a binary tree or hierarchy.

Single-Link The similarity between a pair of clusters is defined by the single strongest link (i.e., maximum cosine- similarity) between their members: After merging c i and c j, the similarity of the resulting cluster to another cluster, c k, is:

HAC – Example

d1 d2 d3 d4 d5 d1,d2 d4,d5 d3 d3,d4,d 5 As clusters agglomerate, docs are likely to fall into a dendrogram.

HAC – Example Single-Link

Take Home Message Single-Link HAC Dendrogram