Presentation is loading. Please wait.

Presentation is loading. Please wait.

807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization.

Similar presentations


Presentation on theme: "807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization."— Presentation transcript:

1 807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization

2 What is summarization? To take an information source, extract content from it, and present the most important content to the user in a condensed form and in a manner sensitive to the users application needs

3 3 Flu stopper A new compound is set for human testing (Times) Running nose. Raging fever. Aching joints. Splitting headache. Are there any poor souls suffering from the flu this winter who havent longed for a pill to make it all go away? Relief may be in sight. Researchers at Gilead Sciences, a pharmaceutical company in Foster City, California, reported last week in the Journal of the American Chemical Society that they have discovered a compound that can stop the influenza virus from spreading in animals. Tests on humans are set for later this year. The new compound takes a novel approach to the familiar flu virus. It targets an enzyme, called neuraminidase, that the virus needs in order to scatter copies of itself throughout the body. This enzyme acts like a pair of molecular scissors that slices through the protective mucous linings of the nose and throat. After the virus infects the cells of the respiratory system and begins replicating, neuraminidase cuts the newly formed copies free to invade other cells. By blocking this enzyme, the new compound, dubbed GS 4104, prevents the infection from spreading. Single-document summarization

4 4 Flu stopper A new compound is set for human testing (Times) Running nose. Raging fever. Aching joints. Splitting headache. Are there any poor souls suffering from the flu this winter who havent longed for a pill to make it all go away? Relief may be in sight. Researchers at Gilead Sciences, a pharmaceutical company in Foster City, California, reported last week in the Journal of the American Chemical Society that they have discovered a compound that can stop the influenza virus from spreading in animals. Tests on humans are set for later this year. The new compound takes a novel approach to the familiar flu virus. It targets an enzyme, called neuraminidase, that the virus needs in order to scatter copies of itself throughout the body. This enzyme acts like a pair of molecular scissors that slices through the protective mucous linings of the nose and throat. After the virus infects the cells of the respiratory system and begins replicating, neuraminidase cuts the newly formed copies free to invade other cells. By blocking this enzyme, the new compound, dubbed GS 4104, prevents the infection from spreading. Single-document summarization

5 5 Application: Headline news

6 6 Application: TV-GUIDES

7 7 Application: Abstracts of papers

8 Multi-document summarization MULTI-DOCUMENT summarization (doing this from a large number of news items) a particularly popular application

9

10

11 Human summarization and abstracting What professional abstractors do Ashworth: To take an original article, understand it and pack it neatly into a nutshell without loss of substance or clarity presents a challenge which many have felt worth taking up for the joys of achievement alone. These are the characteristics of an art form.

12 Cremmins 82, 96 Original version: There were significant positive associations between the concentrations of the substance administered and mortality in rats and mice of both sexes. There was no convincing evidence to indicate that endrin ingestion induced and of the different types of tumors which were found in the treated animals. Edited version: Mortality in rats and mice of both sexes was dose related. No treatment-related tumors were found in any of the animals.

13 13 Computational Approach: Basics Top-Down: I know what I want! dont confuse me with drivel! User needs: only certain types of info System needs: particular criteria of interest, used to focus search Bottom-Up: Im dead curious: whats in the text? User needs: anything thats important System needs: generic importance metrics, used to rate content

14 14 Query-Driven vs. Text-DRIVEN Focus Top-down: Query-driven focus – Criteria of interest encoded as search specs. – System uses specs to filter or analyze text portions. – Examples: templates with slots with semantic characteristics; termlists of important terms. Bottom-up: Text-driven focus – Generic importance metrics encoded as strategies. – System applies strategies over rep of whole text. – Examples: degree of connectedness in semantic graphs; frequency of occurrence of tokens.

15 15 NLP/IE: Approach: try to understand textre-represent content using deeper notation; then manipulate that. Need: rules for text analysis and manipulation, at all levels. Strengths: higher quality; supports abstracting. Weaknesses: speed; still needs to scale up to robust open- domain summarization. IR/word level: Approach: operate at lexical leveluse word frequency, collocation counts, etc. Need: large amounts of text. Strengths: robust; good for query-oriented summaries. Weaknesses: lower quality; inability to manipulate information at abstract levels. Paradigms: NLP/IE vs. IR/word level

16 Types of summaries Extracts – Sentences from the original document are displayed together to form a summary Abstracts – Materials is transformed: paraphrased, restructured, shortened

17 Ideal stages of summarization Analysis – Input representation and understanding Transformation – Selecting important content Realization – Generating novel text corresponding to the gist of the input

18 What current systems do Most work bottom-up Typically use shallow analysis methods – Rather than full understanding Work by sentence extraction – Identify important sentences and piece them together to form a summary More advanced work: move towards more abstractive summarization

19 Shallow approaches Relying on features of the input documents that can be easily computes from statistical analysis Word statistics Cue phrases Section headers Sentence position

20 What is the input? News, or clusters of news – a single article or several articles on a related topic Email and email thread Scientific articles Health information: patients and doctors Meeting summarization Video

21 What is the output Keywords Highlight information in the input Chunks or speech directly from the input or paraphrase and aggregate the input in novel ways Modality: text, speech, video, graphics

22 Supervised methods Ask people to select sentences Use these as training examples for machine learning – Each sentence is represented as a number of features – Based on the features distinguish sentences that are appropriate for a summary and sentences that are not Run on new inputs

23 Edmundson 69 Cue method: – stigma words (hardly, impossible) – bonus words (significant) Key method: – similar to Luhn Title method: – title + headings Location method: – sentences under headings – sentences near beginning or end of document and/or paragraphs (also [Baxendale 58])

24 Edmundson 69 Linear combination of four features: 1 C + 2 K + 3 T + 4 L Manually labelled training corpus Key not important! 0 10 20 30 40 50 60 70 80 90 100 % RANDOM KEY TITLE CUE LOCATION C + K + T + L C + T + L 1

25 Kupiec et al. 95 Extracts of roughly 20% of original text Feature set: – sentence length |S| > 5 – fixed phrases 26 manually chosen – paragraph sentence position in paragraph – thematic words binary: whether sentence is included in manual extract – uppercase words not common acronyms Corpus: 188 document + summary pairs from scientific journals

26 Kupiec et al. 95 Uses Bayesian classifier: Assuming statistical independence:

27 Kupiec et al. 95 Performance: – For 25% summaries, 84% precision – For smaller summaries, 74% improvement over Lead

28 A typical modern supervised summarization system Or, what you could do if asked to do one …

29 Features Location – Absolute location of the sentence – Section structure: first sentence, last sentence, other – Paragraph structure What section the sentence appeared in – Introduction, implementation, example, conclusion, result, evaluation, experiment etc

30 More features Sentence length – Very long and very short sentences are unusual Title word overlap Tf.idf word content – Binary feature – yes if the sentence contains one of the 18 most important words – no otherwise

31 More features Presence and type of citation Formulaic expressions – in traditional approaches, a novel method for

32

33

34 Important lessons Vector representation of sentences – Can be words – But can also be other features! The probability of a sentences belonging to a class can be computed Complex distinctions can be accurately predicted using simple features

35 Problems with supervised methods for summarization Annotation is expensive – Here---relevance and rhetorical status judgments People dont agree – So more annotators are necessary – And/or more training of the annotators

36 IR-based systems Eg Steinberger

37 Unsupervised methods for (extractive) summarization: basic idea Compute word probability from input Compute sentence weight as function of word probability Pick best sentence

38 Sentence ranking options Based on word probability – S is sentence with length n – P i is the probability of the i- th word in the sentence – Based on word tf.idf

39 Centrality measures How representative is a sentence of the overall content of a document – The more similar are sentence is to the document, the more representative it is

40 Data-driven approach Unsupervised---no information about what constitutes a desirable choice How can be supervised approaches used? – For example the scientific article summarization paper from last week

41 Beyond word-based sentence extraction Discourse information – Resolve anaphora, text structure Use external lexical resources – Wordnet, adjective polarity lists, opinion Using machine learning

42 42 Claim: The multi-sentence coherence structure of a text can be constructed, and the centrality of the textual units in this structure reflects their importance. Tree-like representation of texts in the style of Rhetorical Structure Theory (Mann and Thompson,88). Use the discourse representation in order to determine the most important textual units. Attempts: – (Ono et al., 94) for Japanese. – (Marcu, 97) for English. The role of discourse structure

43 43 Rhetorical parsing (Marcu,97) [With its distant orbit {– 50 percent farther from the sun than Earth –} and slim atmospheric blanket, 1 ] [Mars experiences frigid weather conditions. 2 ] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles. 3 ] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, 4 ] [but any liquid water formed that way would evaporate almost instantly 5 ] [because of the low atmospheric pressure. 6 ] [Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, 7 ] [most Martian weather involves blowing dust or carbon dioxide. 8 ] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. 9 ] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water. 10 ]

44 44 Rhetorical parsing (3) 5 Evidence Cause 56 4 4 5 Contrast 3 3 Elaboration 12 2 Background Justification 2 Elaboration 78 8 Concession 9 10 Antithesis 8 Example 2 Elaboration Summarization = selection of the most important units 2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6

45 Argumentative zoning What is the purpose of the sentence? To communicate – Background – Aim – Basis (related work) How can we know which sentence serves each aim?

46 Argumentative zones

47

48 Selecting important sentences (relevance) How well can it be performed by people? – Rather subjective; depends on prior knowledge and interests Even the same person would select 50% different sentences if she performs the task at different times Still, judgments can be solicited by several people to mitigate the problem For each sentence in at article---say if it is important and interesting enough to be included in a summary

49 Multi-document summarization Very useful for presenting and organizing search results – Many results are very similar, and grouping closely related documents helps cover more event facets – Summarizing similarities and differences between documents

50 Standard Approaches Salient information = similarities Pairwise similarity between all sentences Cluster sentences using similarity score (Themes) Generate one sentence for each theme – Sentence extraction (one sentence/cluster) – Sentence fusion: intersect sentences within a theme and choose the repeated phrases. Generate sentence from phrases Salient information = important words Important words are simply the most frequent in the document set SumBasic simply chooses sentences with the most frequent words. Conroy expands on this

51 MEAD (Radev et al. 00) MEAD Centroid-based Based on sentence utility Topic detection and tracking initiative [Allen et al. 98, Wayne 98] TIME

52 1. Algerian newspapers have reported that 18 decapitated bodies have been found by authorities in the south of the country. 2. Police found the ``decapitated bodies of women, children and old men,with their heads thrown on a road'' near the town of Jelfa, 275 kilometers (170 miles) south of the capital Algiers. 3. In another incident on Wednesday, seven people -- including six children -- were killed by terrorists, Algerian security forces said. 4. Extremist Muslim militants were responsible for the slaughter of the seven people in the province of Medea, 120 kilometers (74 miles) south of Algiers. 5. The killers also kidnapped three girls during the same attack, authorities said, and one of the girls was found wounded on a nearby road. 6. Meanwhile, the Algerian daily Le Matin today quoted Interior Minister Abdul Malik Silal as saying that ``terrorism has not been eradicated, but the movement of the terrorists has significantly declined.'' 7. Algerian violence has claimed the lives of more than 70,000 people since the army cancelled the 1992 general elections that Islamic parties were likely to win. 8. Mainstream Islamic groups, most of which are banned in the country, insist their members are not responsible for the violence against civilians. 9. Some Muslim groups have blamed the army, while others accuse ``foreign elements conspiring against Algeria. 1. Eighteen decapitated bodies have been found in a mass grave in northern Algeria, press reports said Thursday, adding that two shepherds were murdered earlier this week. 2. Security forces found the mass grave on Wednesday at Chbika, near Djelfa, 275 kilometers (170 miles) south of the capital. 3. It contained the bodies of people killed last year during a wedding ceremony, according to Le Quotidien Liberte. 4. The victims included women, children and old men. 5. Most of them had been decapitated and their heads thrown on a road, reported the Es Sahafa. 6. Another mass grave containing the bodies of around 10 people was discovered recently near Algiers, in the Eucalyptus district. 7. The two shepherds were killed Monday evening by a group of nine armed Islamists near the Moulay Slissen forest. 8. After being injured in a hail of automatic weapons fire, the pair were finished off with machete blows before being decapitated, Le Quotidien d'Oran reported. 9. Seven people, six of them children, were killed and two injured Wednesday by armed Islamists near Medea, 120 kilometers (75 miles) south of Algiers, security forces said. 10. The same day a parcel bomb explosion injured 17 people in Algiers itself. 11. Since early March, violence linked to armed Islamists has claimed more than 500 lives, according to press tallies. ARTICLE 18854: ALGIERS, May 20 (UPI)ARTICLE 18853: ALGIERS, May 20 (AFP)

53 MEAD INPUT: Cluster of d documents with n sentences (compression rate = r) OUTPUT: (n * r) sentences from the cluster with the highest values of SCORE SCORE (s) = i (w c C i + w p P i + w f F i )

54 Scientific article summarization Not only what the article is about, but also how it relates to work it cites Determine which approaches are criticized and which are supported – Automatic genre specific summaries are more useful than original paper abstracts

55 Other uses Document indexing for information retrieval Automatic essay grading, topic identification module

56 Evaluating summarization: the problem Which human summary makes a good gold standard? Many summaries are good At what granularity is the comparison made? When can we say that two pieces of text match?

57 Evaluation Many measures for extractive summarization – E.g., ROUGE New ones for abstractive summarization – E.g., Pyramids

58 Radev: Cluster-Based Sentence Utility ---S10 ---S9 ---S8 ---S7 ---S6 ---S5 +--S4 ---S3 +++S2 -++S1 System 2System 1 Ideal 9(+)67S4 432S3 8(+)9(+)8(+)S2 510(+) S1 System 2System 1Ideal Summary sentence extraction method CBSU method CBSU(system, ideal)= % of ideal utility covered by system summary

59 Interjudge agreement

60 Relative utility RU =

61 Relative utility 17 RU =

62 Relative utility 13 17 RU == 0.765

63 ROUGE: Recall-Oriented Understudy for Gisting Evaluation Rouge – Ngram co-occurrence metrics measuring content overlap Counts of n-gram overlaps between candidate and model summaries Total n-grams in summary model

64 ROUGE Experimentation with different units of comparison: unigrams, bigrams, longest common substring, skip-bigams, basic elements Automatic and thus easy to apply Important to consider confidence intervals when determining differences between systems – Scores falling within same interval not significantly different – Rouge scores place systems into large groups: can be hard to definitively say one is better than another Sometimes results unintuitive: – Multilingual scores as high as English scores – Use in speech summarization shows no discrimination Good for training regardless of intervals: can see trends

65 65 Pyramids Human evaluation of content: Nenkova & Passonneau (2004) based on the distribution of content in a pool of summaries Summarization Content Units (SCU): – fragments from summaries – identification of similar fragments across summaries 13 sailors have been killed ~ rebels killed 13 people SCU have – id, a weight, a NL description, and a set of contributors SCU1 (w=4) (all similar/identical content) – A1 - two Libyans indicted – B1 - two Libyans indicted – C1 - two Libyans accused – D2 – two Libyans suspects were indicted

66 66 Pyramids a pyramid of SCUs of height n is created for n gold standard summaries each SCU in tier T i in the pyramid has weight i with highly weighted SCU on top of the pyramid the best summary is one which contains all units of level n, then all units from n-1,… if D i is the number of SCU in a summary which appear in T i for summary D, then the weight of the summary is: w=n w=n-1 w=1

67 67 Pyramids score let X be the total number of units in a summary it is shown that more than 4 ideal summaries are required to produce reliable rankings

68 Human performance/Best sys Pyramid Modified Resp ROUGE-SU4 B: 0.5472 B: 0.4814 A: 4.895 A: 0.1722 A: 0.4969 A: 0.4617 B: 4.526 B: 0.1552 ~~~~~~~~~~~~~~~~~ 14: 0.2587 10: 0.2052 4: 2.85 15: 0.139 Best system ~50% of human performance on manual metrics Best system ~80% of human performance on ROUGE

69 ACKNOWLEDGMENTS Many slides borrowed from Ani Nenkova (Penn), Drago Radev (Uni Michigan) and Daniel Marcu (ISI)


Download ppt "807 - TEXT ANALYTICS Massimo Poesio Lecture 10: Summarization."

Similar presentations


Ads by Google