1 Reference Julian Kupiec, Jan Pedersen, Francine Chen, “A Trainable Document Summarizer”, SIGIR’95 Seattle WA USA, 1995. Xiaodan Zhu, Gerald Penn, “Evaluation.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.
Automatic Text Summarization: A Solid Base Martijn B. Wieling, Rijksuniversiteit Groningen November, 25 th 2004.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Evaluation.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
A Comparative Study on Feature Selection in Text Categorization (Proc. 14th International Conference on Machine Learning – 1997) Paper By: Yiming Yang,
Introduction to Machine Learning Approach Lecture 5.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Mining and Summarizing Customer Reviews
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Machine Learning Approach for Ontology Mapping using Multiple Concept Similarity Measures IEEE/ACIS International Conference on Computer and Information.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2006.
A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Chapter 23: Probabilistic Language Models April 13, 2004.
Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
Semantic v.s. Positions: Utilizing Balanced Proximity in Language Model Smoothing for Information Retrieval Rui Yan†, ♮, Han Jiang†, ♮, Mirella Lapata‡,
Methods for Automatic Evaluation of Sentence Extract Summaries * G.Ravindra +, N.Balakrishnan +, K.R.Ramakrishnan * Supercomputer Education & Research.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Social Tag Prediction Paul Heymann, Daniel Ramage, and Hector Garcia- Molina Stanford University SIGIR 2008.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
1 Evaluating High Accuracy Retrieval Techniques Chirag Shah,W. Bruce Croft Center for Intelligent Information Retrieval Department of Computer Science.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Erasmus University Rotterdam
Presentation transcript:

1 Reference Julian Kupiec, Jan Pedersen, Francine Chen, “A Trainable Document Summarizer”, SIGIR’95 Seattle WA USA, Xiaodan Zhu, Gerald Penn, “Evaluation of Sentence Selection for Speech Summarization”, Proceedings of the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria, pp September C.D. Paice, “Constructing literature abstracts by computer: Techniques and prospects”. Information Processing and Management, 26: , 1990.

2 A Trainable Document Summarizer Julian Kupiec, Jan Pedersen and Francine Chen Xerox Palo Alto Research Center

3 Outline Introduction A Trainable Summarizer Experiments and Evaluation Discussion and Conclusions

4 Introduction To summarize is to reduce in complexity, and hence in length, while retaining some of the essential qualities of the original This paper focuses on document extracts, a particular kind of computed document summary Document extracts consisting of roughly 20% of the original can be as informative as the full text of a document, which suggests that even shorter extracts may be useful indicative summaries Titles, key-words, tables-of-contents and abstracts might all be considered as forms of summary They approach extract election as a statistical classification problem This framework provides a natural evaluation criterion: the classification success rate or precision It does require a “training corpus” of documents with labelled extracts

5 A Trainable Summarizer Features –Paice groups sentence scoring features into seven categories Frequency-keyword heuristics The title-keyword heuristic Location heuristics Indicator phrase (e.g., “this report…..”) Related heuristic: involves cue words –Two set of words which are positively and negatively correlated with summary sentences –Bonus: e.g., greatest and significant –Stigma: e.g., hardly and impossible Ref. ---The frequency-keyword approach 、 the title-keyword method 、 The location method 、 Syntactic criteria 、 The cue method 、 The indicator-phrase method 、 Relational criteria

6 A Trainable Summarizer Features –Sentence Length Cut-off Feature Given a threshold (e.g., 5 words) The feature is true for all sentences longer than the threshold, and false otherwise –Fixed-phrase Feature This features is true for sentences that contain any of 26 indicator phrases, or that follow section heads that contain specific key words –Paragraph Feature –Thematic Word Feature The most frequent content words are defined as thematic words This feature is binary, depending on whether a sentence is present in the set of highest scoring sentences –Uppercase Word Feature

7 A Trainable Summarizer Classifier –For each sentence s, to compute the probability it will be included in a summary S given the k features, which can be expressed using Bayes’ rule as follows: Assuming statistical independence of the features: is a constant and and can be estimated directly from the training set by “counting occurrences”

8 Experiments and Evaluation The corpus –There are 188 document/summary pairs, sampled from 21 publications in the scientific/technical domain –The average number of sentences per document is 86 –Each document was “normalized” so that the first line of each file contained the document title

9 Experiments and Evaluation The corpus –Sentence Matching: Direct sentence match (verbatim or minor modification) Direct join (two or more sentences) Unmatchable Incomplete (some overlap, includes a sentence from the original document, but also contains other information) The correspondences were produced in two passes 79% of the summary sentences have direct matches

10 Experiments and Evaluation The corpus

11 Experiments and Evaluation Evaluation –Using a cross-validation strategy for evaluation –Unmatchable and incomplete sentences were excluded from both training and testing, yielding a total of 498 unique sentences –Performance: First way –the highest performance –A sentence produced by the summarizer is defined as correct here if: (direct sentence match, direct join) –Of the 568 sentences, 195 direct sentence matches and 6 direct joins were correctly identified, for a total of 201 correctly identified summary sentences : 35% Second way : –498 match-able sentences –42%

12 Experiments and Evaluation Evaluation –The best combination is (Paragraph+fixed-phrase+sentence- length) –Addition of the frequency-keyword features (thematic and uppercase word features) results in a slight decrease in overall performance –For a baseline, to select sentences from the beginning of a document (considering the sentence length cut-off feature alone) 24% (121 sentences correct)

13 Experiments and Evaluation –Figure 3 shows the performance of the summarizer (using all features) as a function of summary size –Edmundson cites a sentence-level performance of 44% –By analogy, 25% of the average document length (86 sentences) in our corpus is about 20 sentences –Reference to the table indicates performance at 84%

14 Discussion and Conclusions The trends in our results are in agreement with those of edmundson who used a subjectively weighted combination of features as opposed to training the feature weights using a corpus Frequency-keyword features also gave poorest individual performance in evaluation They have however retained these features in our final system for several reasons –The first is robustness –Secondly, as the number of sentences in a summary grows, more dispersed informative material tends to be included

15 Discussion and Conclusions The goal is to provide a summarization program that is of general utility –The first concerns robustness –The second issue concerns presentation and other forms of summary information

16 Reference Julian Kupiec, Jan Pedersen, Francine Chen, “A Trainable Document Summarizer”, SIGIR’95 Seattle WA USA, Xiaodan Zhu, Gerald Penn, “Evaluation of Sentence Selection for Speech Summarization”, Proceedings of the 2nd International Conference on Recent Advances in Natural Language Processing (RANLP-05), Borovets, Bulgaria, pp September 2005.

17 Evaluation of Sentence Selection for Speech Summarization Xiaodan Zhu and Gerald Penn Department of Computer Science University of Toronto

18 Outline Introduction Speech Summarization by Sentence Selection Evaluation Metrics Experiments Conclusions

19 Introduction This paper consider whether ASR-inspired evaluation metrics produce different results than those taken from text summarization The goal of speech summarization is to distill important information from speech data In this paper, we will focus on sentence-level extraction

20 Speech Summarization by Sentence Selection “LEAD”: sentence selection is to select the first N% of sentences from the beginning of the transcript “RAND”: random selection Knowledge-based Approach: “SEM” –To calculate semantic similarity between a given utterance and the dialogue, the noun portion of WordNet is used as a knowledge source, with semantic distance between senses computed using normalized path length –The performance of the system is reported as better than LEAD, RAND and TF*IDF based methods –Not using manually disambiguated, to apply Brill’s POS tagger to acquire the nouns –Using semantic similarity package

21 Speech Summarization by Sentence Selection MMR-based Approach: “MMR” –Whether it is more similar to the whole dialogue –Whether it is less similar to the sentences that have so far been selected Classification-Based Approaches –To formulate sentence selection as a binary classification problem –The best two have consistently been SVM and logistic regression –SVM: (OSU-SVM package) SVM seeks an optimal separating hyperplane, where the margin is maximal Decision function is :

22 Speech Summarization by Sentence Selection Features

23 Speech Summarization by Sentence Selection Classification-Based Approaches –Logistic Regression: “LOG” To model the posterior probabilities of the class label with linear functions: X are feature sets and Yare class labels

24 Evaluation Metrics Precision/Recall –When evaluated on binary annotations and using precision/recall metrics, sys1 and sys2 achieve 50% and 0% Relative Utility –For the above example, if using relative utility, sys1 gets 18/19 and sys2 gets 15/19 –The values obtained are higher than with P/R, but they are higher for all of the systems evaluated

25 Evaluation Metrics Word Error Rate –Sentence level and word level –The sum of insertion error, substitution error and deletion error of words, divided by the number of all these errors plus the number of corrects words Zechner’s Summarization Accuracy –The summarization accuracy is defined as the sum of the relevance scores of all the words in the automatic summary, divided by the maximum achievable relevance score with the same number of words ROUGE –To measuring overlapping units such as n-grams, word sequences and word pairs –ROUGE-N and ROUGE-L

26 Experiments Corpus: the SWITCHBOARD dataset (a corpus of open- domain spoken dialogue) To randomly select 27 spoken dialogues from SWITCHBOARD Three annotators are asked to assign 0/1 labels to indicate whether a sentence is in the summary or not (required to select around 10% of the sentences into the summary) Judge’s annotation relative to another are evaluated (F- scores)

27 Experiments Precision/Recall –One standard marks a sentence as in the summary only when all three annotators agree –LOG and SVM have similar performance and outperform the others, with MMR following, and then SEM and LEAD –At least two of the three judges include in the summary

28 Experiments Precision/Recall –Any of the three annotators Relative Utility –For three different human judges, an assignment of a number between 0 and 9 to each sentence are obtained, to indicate the confidence that this sentence should be included in the summary

29 Experiments Relative Utility –The performance ranks of the five summarizaers are the same here as they are in the three P/R evaluations First, the P/R agreement among annotators is not low Second, the redundancy in the data is much less than in the multi-document summarization tasks Third, the summarizers we compare might tend to select the same sentences

30 Experiments Word Error Rate and Summarization Accuracy

31 Experiments Word Error Rate and Summarization Accuracy

32 Experiments ROUGE

33 Conclusion Five summarizers were evaluated on three text- summarization-inspired metrics: (P/R), (RU), and ROUGE, as well as on two ASR-inspired evaluation metrics: (WER) and (SA) Preliminary conclusion is that considerably greater caution must be exercised when using ASR-based measures than we have witnessed to date in the speech summarization literature