Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç 24.3.2014 1.

Similar presentations


Presentation on theme: "A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç 24.3.2014 1."— Presentation transcript:

1 A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç 24.3.2014 1

2 INTRODUCTION SINGLE-DOCUMENT SUMMARIZATION MULTI-DOCUMENT SUMMARIZATION OTHER APPROACHES EVALUATION CONCLUSION 2

3 INTRODUCTION Summary: “a text that is produced from one or more texts, that conveys important information in the original text(s), and that is no longer than half of the original text(s) and usually signicantly less than that” Main techniques: extraction, abstraction, fusion, compression 3

4 Single-Document Summarization Some parts in a text are more important than the others Most work relies on extraction of sentences Earliest work(Luhn,1958) a significance factor for each sentence based on word frequency (Baxendale,1958) focuses on sentence position 4

5 Single-Document Summarization In 1990s machine learning methods have been started to be used Naive-bayes classifiers were used, that use features like sentence length, tf, idf NLP tools were used synonyms 5

6 Single-Document Summarization Another machine learning method used was decision trees, breaking away the assumption that features are independent (Lin, 1999) Summaries were based on queries Generally performed better than Naive-bayes classifiers Showed the importance of word frequencies 6

7 Single-Document Summarization Hidden Markov Models were also used (Conroy, 2001) 7

8 Single-Document Summarization DUC2001 task showed that most important sentences in a text are first few sentences Thus methods couldn’t perform well against baseline A neural network system using search engine queries and wikipedia entries performed well 8

9 Multi-Document Summarization Extraction of a single summary from multiple documents has gained popularity since mid- 1990s. Mostly domains of news articles were used for research Have been pioneered by (McKeown, 1995) SUMMONS 9

10 Multi-Document Summarization (McKeown, 1995) claimed extractive techniques wouldn’t work fo multi-document SUMMONS is an abstractive method Instead of processing raw texts, reads from a database created by another tool which processes raw texts Decides information to be put in summary Then selects right words for cohesion and grammatical correctness 10

11 Multi-Document Summarization Carbonell and Goldstein, 1998 used Maximal Marginal Relevance(MMR) measure Used in topic-driven summarization Measures query relevance and information novelty 11

12 Multi-Document Summarization Mani and Bloedorn, 1997 uses a graph-based approach No textual summary is generated but concepts and relations are identified Graph is created for documents, each node representing a single occurence of a word In the end important nodes are highlighted 12

13 Multi-Document Summarization An extractive approach, centroid-based techniques are used (Radev, 2004) Doesn not use language generation techniques. Documents are modeled as bag- of-words Documents are clustered in sentence level Sentences that are closest to centroids are chosen 13

14 Multi-Document Summarization Some work has also been done on multi- lingual multi-document summarization Evans,2005 created a system to summarize English and Arabic texts When information is similar between sentences of different languages English one is chosen 14

15 OTHER APPROACHES Short Summaries Witbrock and Mittal, 1999 designed a system to find headlines for documents Used machine learning techniques Had 0.89 overlap for 4 word length headlines 15

16 EVALUATION One of the biggest challenges in summarization Main problem is agreement between human summarizers is low Hard to find an ideal summary to compare 16

17 EVALUATION Lin, 2004 introduced ROUGE Calculating scores based on N-grams(ROUGE- N), Longest Common Subsequences(ROUGE-L) and their more specialized versions (ROUGE- W, ROUGE-S) They performed well with DUC2001 and 2002 texts Multi-document performance was not as high as single-document 17

18 EVALUATION A recent approach proposes using an Information-theoretic method Usable for both single-document and multi- document approaches 18


Download ppt "A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç 24.3.2014 1."

Similar presentations


Ads by Google