Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-document Summarization Sandeep Sripada Venu Gopal Kasturi Gautam Kumar Parai.

Similar presentations


Presentation on theme: "Multi-document Summarization Sandeep Sripada Venu Gopal Kasturi Gautam Kumar Parai."— Presentation transcript:

1 Multi-document Summarization Sandeep Sripada Venu Gopal Kasturi Gautam Kumar Parai

2 Problem and Data Task: Multi-document summarization on DUC 2004 data. DUC 2004 has 50 topics with 10 documents for each topic. Generate a summary of about 100 words. Evaluation based solely on ROUGE scores. Used DUC 2002 data for training.

3 Similarity Similarity Measures Vector based similarities Cosine Similarity Jaccard Similarity Tf-Idf Similarity Semantic similarities WordNet based similarity measure (using Lesk) Resnik To remove redundancy of information.

4 Similarity – Examples (a) Opposition parties lodged no confidence motions Wednesday against Prime Minister Mesut Yilmaz after allegations he interfered in the privatization of a bank and helped a businessman linked to a mobster. (b) Following charges that he interfered in a privatization contract and helped a businessman with mob ties, Turkish Prime Minister Mesut Yilmaz was forced to resign.

5 Sentence Importance Score Importance Scores: “A weighted combination of various feature values.” Feature weights were learnt by training a model on DUC 2002 data. Features considered: Td-idf score of the sentence (sum of scores of comprising words) Named entity count Sentence position in the document POS count of Nouns, Verbs, Adjectives Length of the sentence Number literals, Upper case word counts

6 Feature Normalization Sentence length Normalized by the length of the sentence Gave us better results than with no normalization Sigmoid using Z-scores Was better than the length based normalization All results mentioned are using this normalization

7 Approaches Stack decoder based method Closer to optimal solution Use sentence similarity to remove redundancy Clustering based method Use K-means to cluster sentences Choose the clusteroids as representative sentences. Form summary by selecting sentences based on importance scores. Graph decoder based method (novel) Build sentence dis-similarity graph Calculate maximal cliques and form summary after ordering the nodes based on importance scores

8 Sentence Dissimilarity Graph Nodes: Top K sentences Edge: Present if pair of sentences less than similarity threshold Candidates to be included in the summary are from a 'Clique'

9 Results ROUGE (1.5.5) scores with 95% conf. Intervals * - hand set parameter weights

10 ISS!

11 Sample Summary Oracle: Once completed, the 16-nation space station will have a mass of 1 million pounds, be longer than a football field, and house up to seven astronauts and cosmonauts. Endeavour and its astronauts closed in Sunday to capture the first piece of the international space station, the Russian-made Zarya control module that had to be connected to the Unity chamber aboard the shuttle. A last-minute alarm forced NASA to halt Thursday’s launching of the space shuttle Endeavour, on a mission to start assembling the international space station. The 36-foot, 25,000-pound Unity will serve as a connecting passageway, or vestibule, for future modules. Stack: The unmanned launch, set for 9:40 Moscow time 0640 GMT Friday from Baikonur in Kazakhstan, has been delayed for one year mainly because of the cash-strapped Russian space agency's failure to complete another part of the station. For the second time this week, Jerry Ross and James Newman floated out the hatch of the shuttle Endeavour to work on the seven-story, 35 ton station taking shape in the open cargo bay. The planned seven-hour spacewalk was not nearly as difficult or crucial as Monday night's outing, during which Ross and Newman hooked up 40 electrical connections between Zarya and Unity.


Download ppt "Multi-document Summarization Sandeep Sripada Venu Gopal Kasturi Gautam Kumar Parai."

Similar presentations


Ads by Google