Presented by Nick Janus

Presented by Nick Janus
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization Presented by Nick Janus

Background Text summarization Intuitively, what sentences do we want?
Extractive summarization Chooses subset of the original document's sentences Better results Abstractive summarization Complicated - requires semantic inference and language generation Often uses extractive summarization as a pre- processor Intuitively, what sentences do we want? This presentation is a mix of both!

Problem Statement Multi-document text summarization
Documents mostly share unknown topic Cluster of documents represented by a network Nodes in center are more salient to topic How are edges defined? How is centrality computed? Topological clustering of documents is often noisy

Degree Centrality Top degree nodes are the most important
Cosine similarity is used to calculate edge weights: A threshold is used to eliminate insignificant relationships Use a bag of words model with N words Each sentence/node is encoded as a N-dimensional vector - covariance matrix Results in undirected graph

Degree Centrality Example
The threshold has a considerable impact on graph structure and ranking.

LexRank with threshold
So far, all nodes hold equal votes More important sentences should have greater centrality (p()): The vectors of each sentence form a stochastic matrix To guarantee that the matrix converges to a stationary distribution, use a damping factor d, giving us PageRank: For a graph to converge it must be irreducible and aperiodic

LexRank with threshold (cont.)
So how do we calculate LexRank? Algorithm redux: Get the same cosine covariance matrix as before Bin the values of the covariance matrix with the threshold Regularize each value with neighbor degrees Apply the power method so that the covariance matrix converges The power method returns an eigenvector which contains the scores of all the sentences where U is a square matrix with all elements being equal to 1/N. The transition kernel [dU + (1 − d)B] of the resulting Markov chain is a mixture of two kernels U and B. A random walker on this Markov chain chooses one of the adjacent states of the current state with probability 1− d, or jumps to any state in the graph, including the current state, with probability d.

Continuous LexRank What’s the problem with LexRank?
Discretizes node probabilities with a Throws out information about sentences Instead keep the data and use cosine similarity values, normalized to form a stochastic matrix:

Performance Baseline: Centroid-based summarizer Evaluation setting:
Compares sentences with a centroid meta-sentence containing high idf scoring words from document. Evaluation setting: Implemented with MEAD Summarization toolkit Document Understanding Conference data sets Model summaries and document clusters Rouge metric Measures unigram co-occurence

curated sets DUC Data Sets

17% noisy Noisy Data Sets

Summing Up Centrality methods may be more resilient in the when dealing with noisy document sets Multi-document case Difficult evaluation Better performance Relation to PageRank

Presented by Nick Janus

Similar presentations

Presentation on theme: "Presented by Nick Janus"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Presented by Nick Janus

Similar presentations

Presentation on theme: "Presented by Nick Janus"— Presentation transcript:

Similar presentations

About project

Feedback