Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov.

Similar presentations


Presentation on theme: "UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov."— Presentation transcript:

1 UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov Andreas Orphanides David West Cory Lown

2 Goals in 2007 Get a system up and running Components Query Expansion WordNet Lexical Compression Linguistically motivated pruning Sentence Selection Clustering

3 System Architecture

4 Query Expansion - Approach Goal: Increase responsiveness Approach A – Weak Baseline any term in topic or query B – Baseline remove stop words inc. small set of tailored terms C – Weak WordNet WordNet synsets from terms in B D – WordNet Synsets from C + synonyms

5 Query Expansion - Evaluation Query selection Rank 2006 queries by overall responsiveness Relevance 3 annotators identified sentences with “information pertinent to the topic” for 9 topics For evaluation a sentence was identified when a term from in ABC or D appeared in a gold standard sentence Inter-rater reliability Topic 6 and 34 had fair to moderate agreement Annotators reached consensus for topic 6 and 34 Annotators then reworked other topics Annotators didn’t know how the system would summarize text, but knew that the task was going to be automated

6 Query Selection

7 Query Expansion – Evaluation

8 Lexical Simplification Decision: No WordNet Query Expansion

9 Lexical Simplification Goal Increase linguistic quality Approach Representation Type Dependency Tree (de Marneffe, et al, 2006) Stanford Parser Version 1.5 (Klein & Manning, 2002; 2003) Identify short, stand-alone sentences Prune both original and short sentences using Parser tags Cue phrases identified in previous DUC submissions

10 Short Stand-Alone Sentences Sub-Sentences

11 Pruning Noun Appositive Participial Modifier For nearly a decade, Queen Latifah, the first lady of hip-hop, has been bobbing and weaving questions about … Indeed, some people reading this report could get the impression that Amnesty believes violence can be a legitimate instrument, the statement said

12 Pruning Lead Adverbials 15 cue phrases from previous DUCs Attribution Parser tags Cue phrases: said, according Separately, the report said that the murder rate by Indians in 1996 was 4 per 100000, below the national average …

13 Lexical Simplification

14 Sentence Selection - Settings No WordNet query expansion original + base form Percentage of Topic/Query Terms Num stemmed terms in query num stemmed terms in sentence Percentage of Unique Terms Num stem terms new sent that not in selected sent Num of stemmed terms in sentence Weighted Term Frequency * IDF

15 Sentence Selection - Settings Weighted Term Frequency (tottf) FeatureWeight Stopword or punctuation0 Topic/Query ^ ¬Summary1 Topic/Query ^ Summary0.5 ¬ Topic/Query ^ ¬Summary0.01 ¬ Topic/Query ^ Summary0.001

16 Sentence Selection Clustering Oracle clustering tool K-means 1000 iterations removed determiners, prepositions etc Favor Sentences from Different clusters Popular clusters – ie lots of sentences How representative the sentence is of the cluster

17 Sentence Selection – Evaluation IDDescriptionDUC06DUC07 Itottf/numWdSent * CW0.39810.4212 F%WdTopic * CW0.39790.4171 Etottf * CW0.39770.4183 BCW0.39470.4169 DTfidf0.39120.4086 GTottf0.39040.4109 Htottf/numWdSent0.37540.3913 A%WdTopic + %WdNew+CW0.37490.3963 C%WdTopic + %WdNew0.36230.3786 ROUGE-1 Score

18 Sentence Selection – Evaluation

19 Official DUC 2007 Evaluation UNC-CH = System 22 Automatic Evaluation ROUGE-2 score 0.10329 (13 th ) Manual Evaluation Responsiveness = 2.956 (7 th ) Linguistic Quality = 2.987 (24 th )

20 What we have learned so far Sentence selection Optimal Strategy: weighted term frequency / sentence length * cluster weight Clustering really helps Lexical simplification Rework sub-sentences Pronoun resolution Query expansion had negligible effect

21 Next Steps Alternative Query Expansion Error analysis of medical questions underway Concept representation Unified Medical Language System (UMLS) Tune sentence selection strategy Lexical simplification Rework sub-sentences Add basic pronoun resolution Sentence Re-Ordering Combine with lexical simplification

22 Acknowledgements The organizers for running this conference and providing manual summaries Previous DUC paper authors for making their system designs explicit Monica Sanchez and Stephanie Haas for earlier discussions Thom Hailey, Scott Krauss and Toshiba Burns- Johnson for annotating queries


Download ppt "UNC-CH at DUC2007: Query Expansion, Lexical Simplification, and Sentence Selection Strategies for Multi-Document Summarization Catherine Blake Julia Kampov."

Similar presentations


Ads by Google