Presentation is loading. Please wait.

Presentation is loading. Please wait.

Graph-based Text Summarization

Similar presentations


Presentation on theme: "Graph-based Text Summarization"— Presentation transcript:

1 Graph-based Text Summarization
Lin Ziheng NUS WING Group Meeting

2 Aims Build a graph that models the development (for writers) and consumption (for readers) of ideas in text through time Use rhetorical relations to help in recognizing the important sentences in text NUS WING Group Meeting

3 Random Walk Depends on current state Convergence Google PageRank:
1 2 3 Depends on current state Convergence Google PageRank: 4 5 0<d<1, usually d = 0.85 NUS WING Group Meeting

4 Citation Network New papers can cite old papers
Old papers are not updated New paper Old papers NUS WING Group Meeting

5 The Internet A new page must have at least one incoming link, may link to existing pages Old pages can update their links New page Old pages NUS WING Group Meeting

6 Graph-based summarization: LexRank
Nodes = sentences Edges = cosine similarity Fully connected Undirected NUS WING Group Meeting

7 Graph-based summarization: TextRank
Nodes = sentences Edges = similarity Backward links Directed s1 s4 s2 New sentence Old sentences s3 NUS WING Group Meeting

8 Writing/Reading Process
Assumption Readers read from the beginning towards the end Writers write from the beginning towards the end NUS WING Group Meeting

9 Blog Network NUS WING Group Meeting

10 Building Graph Out degree: prop. to how long the sent. stays in the graph (e.g., 1st:3, 2nd:2, 3rd:1) In degree: importance Edges: cosine, co-occurrence, longest common subsequence, etc.. NUS WING Group Meeting

11 doc1 doc2 doc3 NUS WING Group Meeting

12 Sentence Extraction In degree Run PageRank Unbiased
Biased towards query d1s1: 2 d2s1: 3 d3s1: 3 d1s2: 1 d2s2: 4 d3s2: 0 d1s3: 4 d2s3: 1 d3s3: 0 NUS WING Group Meeting

13 Evaluation 1 Dataset: Duc’04 task 2 NUS WING Group Meeting in degree
pagerank LexRank t = 1 t = 0.9 t = 0.7 t = 0.5 t = 0.3 t = 0.2 t = 0.1 node start rank=1 rank=cosine ROUGE-1 R avg ROUGE-1 P avg ROUGE-1 F avg ROUGE-2 R avg ROUGE-2 P avg ROUGE-2 F avg ROUGE-L R avg ROUGE-L P avg ROUGE-L F avg NUS WING Group Meeting

14 Evaluation 2 Dataset: Duc’06 Unbiased / Biased Rearranging doc length
# outlinks per sent per timestep ROUGE-2 ROUGE-SU4 Unbiased no 1 yes 2 5 10 NUS WING Group Meeting

15 Conclusion from Evaluation 2
Duc’06 is query-based, so biased PageRank gives better results Rearranging doc length is not necessary if there is no extremely long document in the cluster #outlinks is important, different #outlinks gives different inlink density. We need to look at how the dimension of the graph (D * L) is related to the inlink density F(D, L) => #outlinks NUS WING Group Meeting


Download ppt "Graph-based Text Summarization"

Similar presentations


Ads by Google