CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011.

Slides:



Advertisements
Similar presentations
Lecture 15(Ch16): Clustering
Advertisements

©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document Clustering l Dr. Paula Matuszek l
Unsupervised Learning
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
Clustering Beyond K-means
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
K Means Clustering , Nearest Cluster and Gaussian Mixture
K-means clustering Hongning Wang
CS347 Lecture 8 May 7, 2001 ©Prabhakar Raghavan. Today’s topic Clustering documents.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
CS728 Web Clustering II Lecture 14. K-Means Assumes documents are real-valued vectors. Clusters based on centroids (aka the center of gravity or mean)
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 6 9/8/2011.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Bayesian Networks. Male brain wiring Female brain wiring.
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning Author: Chaitanya Chemudugunta America Holloway Padhraic Smyth.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 11 9/29/2011.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 17 10/25/2011.
Text Classification, Active/Interactive learning.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Text Clustering.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
UNSUPERVISED LEARNING David Kauchak CS 451 – Fall 2013.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Artificial Intelligence 8. Supervised and unsupervised learning Japan Advanced Institute of Science and Technology (JAIST) Yoshimasa Tsuruoka.
Generalized Model Selection For Unsupervised Learning in High Dimension Vaithyanathan and Dom IBM Almaden Research Center NIPS ’ 99.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Information Retrieval and Organisation Chapter 16 Flat Clustering Dell Zhang Birkbeck, University of London.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Machine Learning Queens College Lecture 7: Clustering.
CSCI 5417 Information Retrieval Systems Jim Martin Lecture 15 10/13/2011.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Flat clustering approaches
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 CS 391L: Machine Learning Clustering Raymond J. Mooney University of Texas at Austin.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining and Text Mining. The Standard Data Mining process.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Sampath Jayarathna Cal Poly Pomona
Machine Learning Lecture 9: Clustering
May 19 Lecture Outline Introduce MPI functionality
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
CS 391L: Machine Learning Clustering
Text Categorization Berlin Chen 2003 Reference:
Clustering Techniques
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

CSCI 5417 Information Retrieval Systems Jim Martin Lecture 16 10/18/2011

Today Review clustering K-means Review naïve Bayes Unsupervised classification EM Naïve Bayes/EM for text classification Topic models model intuition

5/2/2015 CSCI IR 3 K-Means Assumes documents are real-valued vectors. Clusters based on centroids (aka the center of gravity or mean) of points in a cluster, c: Iterative reassignment of instances to clusters is based on distance to the current cluster centroids. (Or one can equivalently phrase it in terms of similarities)

5/2/2015 CSCI IR 4 K-Means Algorithm Select K random docs {s 1, s 2,… s K } as seeds. Until stopping criterion: For each doc d i : Assign d i to the cluster c j such that dist(d i, s j ) is minimal. For each cluster c_j s_j = m(c_j)

5/2/2015 CSCI IR 5 K Means Example (K=2) Pick seeds Assign clusters Compute centroids x x Reassign clusters x x x x Compute centroids Reassign clusters Converged!

5/2/2015 CSCI IR 6 Termination conditions Several possibilities A fixed number of iterations Doc partition unchanged Centroid positions don’t change

Convergence Why should the K-means algorithm ever reach a fixed point? A state in which clusters don’t change. K-means is a special case of a general procedure known as the Expectation Maximization (EM) algorithm. EM is known to converge. Number of iterations could be large. But in practice usually isn’t Sec. 16.4

5/2/2015 CSCI IR 8 Text j  single document containing all docs j for each word x k in Vocabulary n k  number of occurrences of x k in Text j Naïve Bayes: Learning From training corpus, extract Vocabulary Calculate required P(c j ) and P(x k | c j ) terms For each c j in C do docs j  subset of documents for which the target class is c j

5/2/2015 CSCI IR 9 Multinomial Model

5/2/2015 CSCI IR 10 Naïve Bayes: Classifying positions  all word positions in current document which contain tokens found in Vocabulary Return c NB, where

5/2/2015 CSCI IR 11 Apply Multinomial

Naïve Bayes Example DocCategory D1Sports D2Sports D3Sports D4Politics D5Politics DocCategory {China, soccer}Sports {Japan, baseball}Sports {baseball, trade}Sports {China, trade}Politics {Japan, Japan, exports}Politics Sports (.6) baseball3/12 China2/12 exports1/12 Japan2/12 soccer2/12 trade2/12 Politics (.4) baseball1/11 China2/11 exports2/11 Japan3/11 soccer1/11 trade2/11 Using +1; |V| =6; |Sports| = 6; |Politics| = 5

5/2/2015 CSCI IR 13 Naïve Bayes Example Classifying Soccer (as a doc) Soccer | sports =.167 Soccer | politics =.09 Sports > Politics

5/2/2015 CSCI IR 14 Example 2 Howa about? Japan soccer Sports P(japan|sports)P(soccer|sports)P(sports).166 *.166*.6 =.0166 Politics P(japan|politics)P(soccer|politics)P(politics).27 *.09 *. 4 = Sports > Politics

Break No class Thursday; work on the HW No office hours either. HW questions? The format of the test docs will be same as the current docs minus the.M field which will be removed. How should you organize your development efforts? 5/2/2015CSCI IR15

Example 3 What about? China trade Sports.166 *.166 *.6 =.0166 Politics.1818 *.1818 *. 4 =.0132 Again Sports > Politics Sports (.6) basebal l 3/12 China2/12 exports1/12 Japan2/12 soccer2/12 trade2/12 Politics (.4) baseball1/11 China2/11 exports2/11 Japan3/11 soccer1/11 trade2/11

Problem? 5/2/2015CSCI IR17 DocCategory {China, soccer}Sports {Japan, baseball}Sports {baseball, trade}Sports {China, trade}Politics {Japan, Japan, exports}Politics Naïve Bayes doesn’t remember the training data. It just extracts statistics from it. There’s no guarantee that the numbers will generate correct answers for all members of the training set.

What if? What if we just have the documents but no class assignments? But assume we do have knowledge about the number of classes involved Can we still use probabilistic models? In particular, can we use naïve Bayes? Yes, via EM Expectation Maximization

EM 1. Given some model, like NB, make up some class assignments randomly. 2. Use those assignments to generate model parameters P(class) and P(word|class) 3. Use those model parameters to re-classify the training data. 4. Go to 2

Naïve Bayes Example (EM) DocCategory D1? D2? D3? D4? D5?

Naïve Bayes Example (EM) DocCategory D1Sports D2Politics D3Sports D4Politics D5Sports DocCategory {China, soccer}Sports {Japan, baseball}Politics {baseball, trade}Sports {China, trade}Politics {Japan, Japan, exports}Sports Sports (.6) baseball2/13 China2/13 exports2/13 Japan3/13 soccer2/13 trade2/13 Politics (.4) baseball2/10 China2/10 exports1/10 Japan2/10 soccer1/10 trade2/10

Naïve Bayes Example (EM) Use these counts to reassess the class membership for D1 to D5. Reassign them to new classes. Recompute the tables and priors. Repeat until happy

Topics DocCategory {China, soccer}Sports {Japan, baseball}Sports {baseball, trade}Sports {China, trade}Politics {Japan, Japan, exports}Politics What’s the deal with trade?

Topics DocCategory {China 1, soccer 2 }Sports {Japan 1, baseball 2 }Sports {baseball 2, trade 2 }Sports {China 1, trade 1 }Politics {Japan 1, Japan 1, exports 1 }Politics {basketball 2, strike 3 }

Topics So let’s propose that instead of assigning documents to classes, we assign each word token in each document to a class (topic). Then we can some new probabilities to associate with words, topics and documents Distribution of topics in a doc Distribution of topics overall Association of words with topics 5/2/2015CSCI IR25

Topics Example. A document like {basketball 2, strike 3 } Can be said to be.5 about topic 2 and.5 about topic 3 and 0 about the rest of the possible topics (may want to worry about smoothing later. For a collection as a whole we can get a topic distribution (prior) by summing the words tagged with a particular topic, and dividing by the number of tagged tokens. 5/2/2015CSCI IR26

Problem With “normal” text classification the training data associates a document with one or more topics. Now we need to associate topics with the (content) words in each document This is a semantic tagging task, not unlike part-of-speech tagging and word-sense tagging It’s hard, slow and expensive to do right 5/2/2015CSCI IR27

Topic modeling Do it without the human tagging Given a set of documents And a fixed number of topics (given) Find the statistics that we need 5/2/2015CSCI IR28