HCC class lecture 14 comments John Canny 3/9/05. Administrivia.

Slides:



Advertisements
Similar presentations
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) Dimensionality Reductions or data projections Random projections.
Advertisements

Dimensionality Reduction PCA -- SVD
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Hinrich Schütze and Christina Lioma
CSE 291 Seminar Presentation Andrew Cosand ECE CVRR
Intro to NLP - J. Eisner1 Words vs. Terms Taken from Jason Eisner’s NLP class slides:
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
A probabilistic approach to semantic representation Paper by Thomas L. Griffiths and Mark Steyvers.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Unsupervised Learning
3D Geometry for Computer Graphics
3D Geometry for Computer Graphics
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
S IMILARITY M EASURES FOR T EXT D OCUMENT C LUSTERING Anna Huang Department of Computer Science The University of Waikato, Hamilton, New Zealand BY Farah.
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Information Retrieval
HCC class lecture 22 comments John Canny 4/13/05.
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Non Negative Matrix Factorization
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
SVD: Singular Value Decomposition
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
SINGULAR VALUE DECOMPOSITION (SVD)
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 4. Least squares.
Markov Cluster (MCL) algorithm Stijn van Dongen.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2013.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
A Clustering Method Based on Nonnegative Matrix Factorization for Text Mining Farial Shahnaz.
Clustering More than Two Million Biomedical Publications Comparing the Accuracies of Nine Text-Based Similarity Approaches Boyack et al. (2011). PLoS ONE.
Clustering Prof. Ramin Zabih
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Spectrum Sensing In Cognitive Radio Networks
HCC class lecture 21: Intro to Social Networks John Canny 4/11/05.
1 CS 430: Information Discovery Lecture 5 Ranking.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets 
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.
Vector Semantics Dense Vectors.
Lecture 5 Graph Theory prepped by Lecturer ahmed AL tememe 1.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
CSE 554 Lecture 8: Alignment
Dimensionality Reduction and Principle Components Analysis
Matrix Representation of Graphs
Document Clustering Based on Non-negative Matrix Factorization
Multimodal Learning with Deep Boltzmann Machines
HCC class lecture 13 comments
Inferring microbiome networks using graphical models
Principal Component Analysis
Word Embedding Word2Vec.
Non-Negative Matrix Factorization
Latent Semantic Analysis
Presented by Nick Janus
CS249: Neural Language Model
Presentation transcript:

HCC class lecture 14 comments John Canny 3/9/05

Administrivia

Clustering: LSA again The input is a matrix. Rows represent text blocks (sentences, paragraphs or documents) Columns are distinct terms Matrix elements are term counts (x tfidf weight) The idea is to “Factor” this matrix into A D B: Terms Text blocks Terms = MA B Text blocks Themes D

LSA again A encodes the representation of each text block in a space of themes. B encodes each theme with term weights. It can be used to explicitly describe the theme. Terms Text blocks Terms = MA B Text blocks Themes D

LSA limitations LSA has a few assumptions that don’t make much sense: – –If documents really do comprise different “themes” there shouldn’t be negative weights in the LSA matrices. – –LSA implicitly models gaussian random processes for theme and word generation. Actual document statistics are far from gaussian. – –SVD forces themes to be orthogonal in the A and B matrices. Why should they be?

Non-negative Matrix Factorization NMF deals with non-negativity and orthogonality, but still uses gaussian statistics: – –If documents really do comprise different “themes” there shouldn’t be negative weights in the LSA matrices. – –LSA implicitly models gaussian random processes for theme and word generation. Actual document statistics are far from gaussian. – –SVD forces themes to be orthogonal in the A and B matrices. Why should they be?

LSA again The consequences are: – –LSA themes are not meaningful beyond the first few (the ones with strongest singular value). – –LSA is largely insensitive to the choice of semantic space (most 300-dim spaces will do).

NMF The corresponding properties: – –NMF components track themes well (up to 30 or more). – –The NMF components can be used directly as topic markers, so the choice is important.

NMF NMF is an umbrella term for several algorithms. The one in this paper uses least squares to match the original term matrix. i.e. it minimizes:  (M – AB) 2 Another natural metric is the KL or Kullback-Liebler divergence. The KL-divergence between two probability distributions p and q is:  p log p/q Another natural version of NMF uses KL-divergence between M and its approximation as A B.

NMF KL-divergence is usually a more accurate way to compare probability distributions. However, in clustering applications, the quality of fit to the probability distribution is secondary to the quality of the clusters. KL-divergence NMF performs well for smoothing (extrapolation) tasks, but not as well as least-squares for clustering. The reasons are not entirely clear, but it may simply be an artifact of the basic NMF recurrences, which find only locally-optimal matches.

A Simpler Text Summarizer A simpler text summarizer based on inter-sentence analysis did as well as any of the custom systems on the DUC-2002 dataset (Document Understanding Conference). This algorithm called “TextRank” was based on a graphical analysis of the similarity graph between sentences in the text.

A Simpler Text Summarizer Vertices in the graph represent sentences, edge weights are similarity between sentences: S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7

Textrank TextRank computes vertex strength using a variant of Google’s Pagerank. It gives the probability of being at a vertex during a long random walk on the graph. S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7

Textrank The highest-ranked vertices comprise the summary. Textrank achieved the same summary performance as the best single-sentence summarizers at DUC (TextRank appeared in ACL 2004)

Discussion Topics T1: The best text analysis algorithms for a variety of tasks seem to use numerical (BOW or graphical models) of texts. Discuss what information these representations capture and why they might be effective.