Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University.

Similar presentations


Presentation on theme: "A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University."— Presentation transcript:

1 A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University of Tokyo IJCNLP 2005

2 2/21 Abstract Ordering information is a difficult but an important task for natural language generation applications This paper proposes an algorithm that learns orderings from a set of human ordered texts Our model consists of a set of ordering experts which give their precedence preference between two sentences Our experimental results show that the proposed algorithm outperforms the existing methods in all evaluation metrics

3 3/21 Introduction The task of ordering sentences arises in many fields –Multidocument summarization (MDS), question and answering, etc. Two stages of MDS –The source documents are analyzed and a set of sentences are extracted –Create a coherent summary from the extracts Sentence ordering has received lesser attention comparing to the first stage –Chronological ordering (Barzilay et al., 2002) –Topical relevance ordering (Barzilay et al., 2002) –Probabilistic model of text structuring (Lapata, 2003) –Precedent relation among sentences (Okazaki, 2005)

4 4/21 Introduction The appropriate way to combine these different methods to obtain more robust and coherent text remains unknown In this paper, we learn the optimum linear combination of these heuristics that maximizes readability of a summary using a set of human- made orderings We propose two new evaluation metrics –Weighted Kendall Coefficient –Average Continuity

5 5/21 Method To decide the order among sentences, we implement five ranking experts –Chronological –Probabilistic –Topical relevance –Precedent –Succedent Each expert e generates a pair-wise preference function defined as following (Cohen, 1999)

6 6/21 Chronological Expert Chronological expert emulates conventional chronological ordering T(u): publication date of sentence u D(u): the unique identifier of the document to which u belongs N(u): the line number of sentence u in the original document

7 7/21 Probabilistic Expert Probabilistic model (Lapata, 2003)

8 8/21 Probabilistic Expert Only nouns and verbs as features Back-off smoothing Probabilistic expert r : the lastly ordered sentence in Q

9 9/21 Topical Relevance Expert Grouping the extracted sentences which belong to the same topic, improves readability of the summary –Use cosine similarity

10 10/21 Precedent Expert When placing a sentence in the summary it is important to check whether the preceding sentences convey the necessary background information

11 11/21 Succedent Expert

12 12/21 Ordering Algorithm Find the optimal order of a given total preference function is NP-complete Greedy algorithm (Cohen,1999)

13 13/21 Learning Algorithm Weighted allocation algorithm (Cohen, 1999) W i 1 =0.2

14 14/21 Evaluation In addition to Kendall’s coefficient and Spearman’s rank correlation coefficient, we use sentence continuity and two new proposed metrics –Weighted Kendall –Average Continuity Human Objects

15 15/21 Evaluation Kendall’s coefficient –Major drawback It does not take into consideration the relative distance d between the discordant pairs Weighted Kendall Coefficient d: the number of sentences that lie between the discordant pair Q: the number of discordant pairs

16 16/21 Evaluation Continuity metric expresses the continuity of the sentences (Readability) Average Continuity –Sentence block (sentence n-gram)

17 17/21 Results 3 rd Text Summarization Challenge (TSC) corpus –News articles from Mainichi and Yomiuri –30 different topics –Create 30 summaries by ordering the extraction data of TSC-3 by hand –10-fold validation –Random order (RO), Probabilistic Ordering (PO), Chronological Ordering (CO), Learned Ordering (LO), Human-made Ordering (HO)

18 18/21 Results

19 19/21 Results

20 20/21 Result of Human Evaluation Two human judges

21 21/21 Conclusion Our method integrates all the existing approaches to sentence ordering The results reveal that our algorithm for sentence ordering did contribute to summary readability In the future, –We plan to further study on the sentence ordering problem –Extending our algorithm to other natural language generation application


Download ppt "A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation D. Bollegala, N. Okazaki and M. Ishizuka The University."

Similar presentations


Ads by Google