Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automated Text Summarization – IJCNLP-2005 Tutorial Chin-Yew Lin Natural Language Group Information Sciences Institute University of Southern California.

Similar presentations


Presentation on theme: "Automated Text Summarization – IJCNLP-2005 Tutorial Chin-Yew Lin Natural Language Group Information Sciences Institute University of Southern California."— Presentation transcript:

1

2 Automated Text Summarization – IJCNLP-2005 Tutorial Chin-Yew Lin Natural Language Group Information Sciences Institute University of Southern California cyl@isi.edu http://www.isi.edu/~cyl

3 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 How Much Information? (2003) Print, film, magnetic, and optical storage media produced about 5 exabytes (10 18 ) of new information in 2002. 92% of the new information was stored on magnetic media, mostly in hard disks. –If digitized with full formatting, the 17 million books in the Library of Congress contain about 136 terabytes of information; –5 exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections. The amount of new information stored on paper, film, magnetic, and optical media has about doubled during 1999 – 2002. Information flows through electronic channels – telephone, radio, TV, and the Internet – contained almost 18 exabytes of new information in 2002, three and a half times more than is recorded in storage media (98% are telephone calls). *Source: How much information 2003 *http://www.sims.berkeley.edu/research/projects/how-much-info- 2003/execsum.htm#summary

4 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Table of contents 1. Motivation and Introduction. 2. Approaches and paradigms. 3. Summarization methods. 4. Evaluating summaries. 5. The future.

5 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Which School is Better?

6 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Is this Shoe Good or Bad?

7 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 What’s in the News?

8 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 How to Get from A to B?

9 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Get a Job! *About.com

10 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Which Movie for Friday Night?

11 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Browse a Summarized Web * Stanford PowerBrowser Project, Orkut et al. WWW10, 2001

12 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 The Big & Bigger Search Engines Billions Of Textual Documents Indexed (6, 2002 - 9, 2003) “As those of you who follow this blog know, I recently posted a weather report alerting you to material changes to our index. Since that post, we've seen some discussion from webmasters who have noticed more of their documents in our index. As it turns out we have grown our index and just reached a significant milestone at Yahoo! Search — our index now provides access to over 20 billion items. While we typically don't disclose size (since we've always said that size is only one dimension of the quality of a search engine), for those who are curious this update includes just over 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files. Note that as with all index updates we are still tuning things so you ユ ll continue to see some fluctuation in ranking over the next few weeks.” 8/8/2005 Yahoo! Search Blog

13 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 The Bottom Line

14 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Solution: Summarization Informed decision through summarization What is summarization? –A summary is a concise restatement of the topic and main ideas of its source. Concise – giving a lot of information clearly and in a few words. (Oxford American Dictionary) Restatement – in your own words. Topic – what is the source about? Main ideas – important facts or arguments about the topic.

15 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Kinds of Summaries Based on two different perspectives –The author’s point of view Paraphrase summary (indicative vs. informative) –In the fall of 1978, Fran Tate wrote some hot checks to finance the opening of a Mexican restaurant in Barrow, Alaska. All the banks she had applied to for financing had turned her down. But Fran was sure that Barrow lusted for a Mexican restaurant, so she went ahead with her plans. ( “ In Alaska: Where the Chili is Chilly ”, by Gregory Jaynes) –The summary writer’s point of view Analytical summary –For Gregory Jaynes, Fran Tate embodies the astonishing independence and toughness that is typical of Alaska at its best. Tate has opened a Mexican restaurant – an unlikely enterprise – in Barrow, Alaska.  From “ Reaching Across the Curriculum ”, © 1997, Eva Thury and Carl Drott.

16 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 How to Summarize? Topic Identification –Find the most important information – the topic. The Egyptian civilization was one of the most important cultures of the ancient world. This civilization flourished along the rich banks and delta of the Nile River for many centuries, from 3200 B.C. until the Roman conquest in 30 B.C. … * –Find main ideas that support the topic and show how they are related. Among the many contributions made by the Egyptian culture is the hieroglyphic writing system they invented. This is one of the earliest writing systems known and it was used from about 3200 B.C. until 350 A.D. Hieroglyphics are a form of picture writing, in which each symbol stands for either a single object or a single sound. The symbols can be combined into long strings to form words and sentences. Other cultures, such as the Hittites, the Cretans, and the Mayans, also developed picture writing, but these systems are not related to the Egyptian system, nor to one another.* Topic Interpretation (I) –Combine several main ideas into a single sentence. Among the many contributions made by the Egyptian culture is the hieroglyphic writing system they invented. This is one of the earliest writing system known and it was used from about 3200 B.C. until 350 A.D. Hieroglyphics are a form of picture writing, in which each symbol stands for either a single object or a single sound. The symbols can be combined into long strings to form words and sentences. Other cultures, such as the Hittites, the Cretans, and the Mayans, also developed picture writing, but these systems are not related to the Egyptian system, nor to one another.*  The Egyptians invented hieroglyphics, a form of picture writing, which is one of the earliest writing systems known.* (* Examples adapted from Summary Street ® http://lsa.colorado.edu/summarystreet)

17 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 How to Summarize? (cont.) Topic Interpretation (II) –Substitute a general form for lists of items or events. List of items:John bought some milk, bread, fruit, cheese, potato chips, butter, hamburger meat and buns. *  John bought some groceries.* List of events:A lot of children came and brought presents. They played games and blew bubbles at each other. A magician came and showed them some magic. Later Jennifer opened her presents and blew out the candles on her cake.*  Jennifer had a birthday party.* –Remove trivial and redundant information We have gotten word of an enormous deal in the publishing deal. Hillary Clinton, senator-elect has accepted an $8 million dollars advance to publish her book. That is a figure, surpassed only by the $8 1/2 million the pope received for his book. Senator-elect Hillary Rodham Clinton Friday night agreed to sell Simon & Schuster a memoir of her years as first lady, for the near-record advance of about $8 million.  Senator-elect Hillary Rodham Clinton agreed to sell Simon & Schuster a memoir of her years as First Lady, for the near-record advance of about $8 million. (* Examples adapted from Summary Street ® http://lsa.colorado.edu/summarystreet) Generation (Writing Summaries)

18 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Aspects of Summarization Input (Sparck Jones 97,Hovy and Lin 99) –Single-document vs. multi-document...fuse together texts? –Domain-specific vs. general...use domain-specific techniques? –Genre...use genre-specific (newspaper, report…) techniques? –Scale and form…input large or small? Structured or free-form? –Monolingual vs. multilingual...need to cross language barrier? Purpose –Situation...embedded in larger system (MT, IR) or not? –Generic vs. query-oriented...author’s view or user’s interest? –Indicative vs. informative...categorization or understanding? –Background vs. just-the-news…does user have prior knowledge? Output –Extract vs. abstract...use text fragments or re-phrase content? –Domain-specific vs. general...use domain-specific format? –Style…make informative, indicative, aggregative, analytic...

19 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Table of contents 1. Motivation and Introduction. 2. Approaches and paradigms. 3. Summarization methods. 4. Evaluating summaries. 5. The future.

20 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Two Psycholinguistic Studies Coarse-grained summarization protocols from professional summarizers (Kintsch and van Dijk, 78) : –Delete material that is trivial or redundant. –Use superordinate concepts and actions. –Select or invent topic sentence. 552 finely-grained summarization strategies from professional summarizers (Endres-Niggemeyer, 98) : –Self control: make yourself feel comfortable. –Processing: produce a unit as soon as you have enough data. –Info organization: use “Discussion” section to check results. –Content selection: the table of contents is relevant.

21 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Computational Approach: Basics Top-Down: I know what I want! — don’t confuse me with nonsense! User wants only certain types of info. System needs particular criteria of interest, used to focus search. Bottom-Up: I’m dead curious: what’s in the text? User wants anything that’s important. System needs generic importance metrics, used to rate content.

22 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Typical 3 Stages of Summarization 1. Topic Identification: find/extract the most important material 2. Topic Interpretation: compress it 3. Summary Generation: say it in your own words …as easy as that!

23 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Top-Down: Info. Extraction (IE) IE task: Given a form and a text, find all the information relevant to each slot of the form and fill it in. Summ-IE task: Given a query, select the best form, fill it in, and generate the contents. Questions: 1. IE works only for very particular forms; can it scale up? 2. What about info that doesn’t fit into any form—is this a generic limitation of IE?

24 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Bottom-Up: Info. Retrieval (IR) IR task: Given a query, find the relevant document(s) from a large set of documents. Summ-IR task: Given a query, find the relevant passage(s) from a set of passages (i.e., from one or more documents). Questions: 1. IR techniques work on large volumes of data; can they scale down accurately enough? 2. IR works on words; do abstracts require abstract representations?

25 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 IE: Approach: try to ‘understand’ text—transform content into ‘deeper’ notation; then manipulate that. Need: rules for text analysis and manipulation, at all levels. Strengths: higher quality; supports abstracting. Weaknesses: speed; still needs to scale up to robust open-domain summarization. IR: Approach: operate at word level—use word frequency, collocation counts, etc. Need: large amounts of text. Strengths: robust; good for query-oriented summaries. Weaknesses: lower quality; inability to manipulate information at abstract levels. Paradigms: IE vs. IR

26 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Deep and Shallow, Down and Up... Today: Increasingly, techniques hybridize: people use word-level counting techniques to fill IE forms’ slots, and try to use IE-like discourse and quasi- semantic notions in the IR approach. Thus: You can use either deep or shallow paradigms for either top-down or bottom-up approaches!

27 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Toward the Final Answer... Problem: What if neither IR-like nor IE- like methods work? Solution: –semantic analysis of the text (NLP), –using adequate knowledge bases that support inference (AI). Mrs. Coolidge: “What did the preacher preach about?” Coolidge: “Sin.” Mrs. Coolidge: “What did he say?” Coolidge: “He’s against it.” –sometimes counting and forms are insufficient, –and then you need to do inference to understand. Word counting Inference

28 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 The Optimal Solution... Combine strengths of both paradigms…...use IE/NLP when you have suitable form(s),...use IR when you don’t… …but how exactly to do it?

29 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Table of contents 1. Motivation and Introduction. 2. Approaches and paradigms. 3. Summarization methods. 4. Evaluating summaries. 5. The future.

30 Topic Identification Classic Methods

31 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Overview of Identification Methods General method: score each sentence; combine scores; choose best sentence(s) Scoring techniques: –Position in the text: lead method; optimal position policy; title/heading method –Cue phrases: in sentences –Term frequencies: throughout the text –Cohesion: semantic links among words; lexical chains; coreference; word co-occurrence –Discourse structure of the text –Information Extraction: parsing and analysis

32 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Position-Based Method Claim: Important sentences occur at the beginning (and/or end) of texts. Lead method: just take first sentence(s)! Experiments: –In 85% of 200 individual paragraphs the topic sentences occurred in initial position and in 7% in final position (Baxendale 58). –Only 13% of the paragraphs of contemporary writers start with topic sentences (Donlan 80). –First 100-word baseline achieved 80% (B: 32.24% vs. H: 40.23% in average coverage) of human performance in DUC 2001 single-doc sum task.

33 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Optimum Position Policy (OPP) Claim: Important sentences are located at positions that are genre-dependent; these positions can be determined automatically through training: –Corpus: 13000 newspaper articles (ZIFF corpus). –Step 1: For each article, enumerate sentence positions (both  and  ). –Step 2: For each sentence, determine yield (= overlap between sentences and the index terms for the article). –Step 3: Create partial ordering over the locations where sentences containing important words occur: Optimal Position Policy (OPP). (Lin and Hovy, 97) OPP is genre dependent –For example: lead sentence is a very good baseline in news genre.

34 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 OPP (cont.) –OPP for ZIFF corpus: (T) > (P 2,S 1 ) > (P 3,S 1 ) > (P 2,S 2 ) > {(P 4,S 1 ),(P 5,S 1 ),(P 3,S 2 )} >… (T=title; P=paragraph; S=sentence) –OPP for Wall Street Journal : (T)>(P 1,S 1 )>... –Results: testing corpus of 2900 articles: Recall=35%, Precision=38%. –Results: 10%-extracts cover 91% of the salient words.

35 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Position: Title-Based Method Claim: Words in titles and headings are positively relevant to summarization. Shown to be statistically valid at 99% level of significance (Edmundson, 68). Empirically shown to be useful in summarization systems.

36 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Examples: DUC 2003 Topic 30003

37 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Examples: Yahoo! News

38 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cue-Phrase-Based Method Claim 1: Important sentences contain ‘bonus phrases’, such as significantly, In this paper we show, and In conclusion, while non-important sentences contain ‘stigma phrases’ such as hardly and impossible. Claim 2: These phrases can be compiled semi- automatically (Edmumdson 69; Paice 80; Kupiec et al. 95; Teufel and Moens 97). Method: Add to sentence score if it contains a bonus phrase, penalize if it contains a stigma phrase.

39 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 — From WAS 2004 Workshop papers Examples: “In this paper, we …”

40 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Term Significance: TF vs. TFIDF Term Rankings DUC 2003 Topic 30003 Top 30 Normalized TF & TFIDF Terms TF: Term Frequency IDF: Inverse document frequency TFIDF: TF*log(N/DF) N: # of docs in a training corpus DF: # of docs occurrences for term i

41 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Term Significance: Term Frequency-Based Method (Luhn 59)

42 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Sentence Significance: Luhn’s Method *: SIGNIFICANT WORDS ALL WORDS **** 1 2 3 4 5 6 7 SENTENCE SCORE = 4 2 /7 = 2.286 (Luhn 59)

43 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Web Search: Text Snippet Significance

44 Topic Identification Cohesion-Based

45 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion-Based Methods Claim: Important sentences/paragraphs are the highest connected entities in more or less elaborate semantic structures. Cohesion: Semantic relations that make the text hang together and provide the texture of the text. Types of Cohesion –Grammatical Reference: anaphora, cataphora, and exophora. Substitution: nominal (one, ones; same), verbal (do), and clausal (so, not). Ellipsis: substitution by zero; something left unsaid but understood. Conjunction: additive, adversative, causal, and temporal. –Lexical Lexical Cohesion: reiteration (repetition, synonym, hypernym/hyponum, general word) and collocation. –Please see “Cohesion in English”, by Halliday & Hasan for more details. Anaphora Resolution Discourse Parsing Lexical Chain

46 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion Examples The Cat only grinned when saw Alice. ‘Come, it’s pleased so far,’ thought Alice, and she went on. ‘Would you tell me, please, which way I ought to go from here?’ That depends a good deal on where you want to get to,’ said the Cat. ‘I don’t much care where —’ said Alice. ‘Then it doesn’t matter which way you go,’ said the Cat. ‘— so long as I get somewhere,’ Alice added as an explanation. ‘Oh, you’re sure to do that,’ said the Cat, ‘if you only walk long enough.’ From: “Alice's Adventures in Wonderland” by Lewis Carrol — Example taken from: “Cohesion in English” by Halliday & Hasan 1976 From: “Alice's Adventures in Wonderland” by Lewis Carrol — Example taken from: “Cohesion in English” by Halliday & Hasan 1976

47 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesive-Based Semantic Network (Skorochod’ko 1971) Sentences are linked by semantic relations. Sentence significance, F i, is determined by: – the degree (# of edges) of a node. – the total number of nodes in a semantic network. – the maximum number of connected nodes after removal of node i. Word significance is determined by the same formula but nodes represent words instead of sentences. Summaries are generated by first selecting the most significant sentences then compressing by removing the least significant words.   8 1 2 3 4 5 67 9   “a” “b” chain ring monolith piecewise

48 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Lexical Chains (1) Lexical chains: sequence of semantically related words linked together by lexical cohesion. Morris & Hirst (91) proposed the tirst computational model of lexical chains. Lexical cohesion was defined based on categories, index entries, and pointers in Rogets’ thesaurus. (eg, synonyms, antonyms, hyper-/hyponums, and mero-/holonyms) Not an automatic algorithm. Cover over 90% of intuitive lexical relations. Automated later by Hirst & St- Onge (98) using WordNet. First applied lexical chains in summarization by Barzilay & Elhadad (B&E 97). Similar to Hirst & St-Onge (98) but differed in: Use a POS tagger and a shallow parser to identify noun compounds vs. use straight WordNet nouns. Identify chains in each text segment then connect only strong chains across segments. Try to link words to all possible chains then choose the best chain at the end vs. just link to the best currently available chain.

49 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Lexical Chains-Based Method (2) Chain score is computed as follows (B&E 97): Score(C) = Length(C) - #DistinctOccurrences(C) Strong chain criterion: Score(C) > Average(Scores)+2*SD(Scores) Assumes that important sentences are those that are ‘traversed’ by strong chains. –For each chain, choose the first sentence that is traversed by the chain and that uses a representative set of concepts from that chain.

50 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Lexical Chains Examples (DUC 2003 APW19981017.0477)

51 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Coreference-Based Method Build coreference chains (noun/event identity, part-whole relations) between –query and document –title and document –sentences within document Important sentences are those traversed by a large number of chains: –a preference is imposed on chains (query > title > doc) Evaluation: 67% F-score for relevance (SUMMAC, 98). (Baldwin and Morton, 98)

52 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Graph-Based Method (1) Map texts into graphs: –The nodes of the graph are the words of the text. –Arcs represent same, adjacency, phrase, name, coreference, and alpha link (synonym and hypernym) relations. Seed the text graphs with word TFIDF weights. Given a query (or topic), a spreading activation algorithm is used to re-weight the initial graphs. Sentences with high activation weights are assumed as important. (in Multi-document summarization; Mani and Bloedorn 97)

53 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Extrinsic information retrieval task evaluation –4 TREC topics were used as testing queries: 204, 207, 210, and 215 –Corpus: Subset of TREC collection indexed by SMART Retain the top 75 documents returned for each query. A Query-biased summary was created by taking the top 5 weighting sentences for every returned document. –4 human subjects were asked to rate the relevancy of the returned documents and their summaries. Cohesion: Graph-Based Method (2) (in Multi-document summarization; Mani and Bloedorn 97)

54 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Graph-Based Method (3) (in Multi-document summarization; Mani and Bloedorn 97) Adjacent links taking into account collocations

55 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Hyperlink-Based Method (1) Paragraph Relation Map (PRM): –Apply automatic inter-document hypertext link generation method in IR to generate intra-document paragraph or sentence links. (Salton et al. 91; Salton & Allan 93; Allan 96) –Compute cosine similarity between every paragraph or sentence pairs of a document. –Paragraphs or sentences with similarity scores higher then an empirically determined cutoff threshold are linked. Summaries are generated in three ways: –Bushy* path: take the topmost N bushy nodes on the map. –Depth-first path: start from the bushiest node then visit the next most similar node for the next step. –Segmented bushy path:segment text at least bushy nodes then apply the bushy path strategy for each segment. P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 *Bushiness: # of links connecting a node to other nodes on PRM.

56 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Cohesion: Hyperlink-Based Method (2) Evaluation (Salton et al. 99) –50 articles from the Funk and Wagnalls Encyclopedia (Funk & Wagnalls 79). –100 extracts (2 per article) were created by 9 human subjects. On average, 46% overlap were found between two manual extracts. –Four summary scoring scenarios: Optimistic: take the higher overlap between two human extracts. Pessimistic: take the lower overlap between two human extracts. Intersection*: take the overlap of the intersection of two human extracts. Union*: take the overlap the union of the two human extracts. *intersection: strict precision; *union: lenient precision

57 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Hyperlink-Based Method: Example (DUC 2003 APW19981017.0477) (S03) No reason for the dates was given. (S08) No hearing date has been set. (S22) He has a pacemaker and hearing aid, but is generally in good health. (S03) No reason for the dates was given. (S08) No hearing date has been set. (S22) He has a pacemaker and hearing aid, but is generally in good health. (S11) In a statement issued in Porto, Portugal, where President Eduardo Frei is attending the Ibero American summit, the Chilean government said it is `` filing a formal protest with the British government for what it considers a violation of the diplomatic immunity which Sen. Pinochet enjoys. '’ (S14) Spanish Foreign Minister Abel Matutes, also attending the Ibero American summit, said his government `` respects the decisions taken by courts. '' (S11) In a statement issued in Porto, Portugal, where President Eduardo Frei is attending the Ibero American summit, the Chilean government said it is `` filing a formal protest with the British government for what it considers a violation of the diplomatic immunity which Sen. Pinochet enjoys. '’ (S14) Spanish Foreign Minister Abel Matutes, also attending the Ibero American summit, said his government `` respects the decisions taken by courts. '' (S06) Prime Minister Tony Blair 's office said it was `` a matter for the magistrates and the police. '' (S23) Baltasar Garzon, one of two Spanish magistrates handling probes into human rights violations in Chile and Argentina, filed a request to question Pinochet on Wednesday, a day after another judge, Manuel Garcia Castellon, filed a similar petition. (S06) Prime Minister Tony Blair 's office said it was `` a matter for the magistrates and the police. '' (S23) Baltasar Garzon, one of two Spanish magistrates handling probes into human rights violations in Chile and Argentina, filed a request to question Pinochet on Wednesday, a day after another judge, Manuel Garcia Castellon, filed a similar petition. (S04) Chile said it would protest to British authorities, arguing that the 82-year-old senator-for-life has diplomatic immunity. (S27) Pinochet, the son of a customs clerk who ousted elected President Salvador Allende in a bloody 1973 coup, remained commander-in-chief of the Chilean army until March, when he was sworn in as a senator-for-life, a post established for him in a constitution drafted by his regime. (S04) Chile said it would protest to British authorities, arguing that the 82-year-old senator-for-life has diplomatic immunity. (S27) Pinochet, the son of a customs clerk who ousted elected President Salvador Allende in a bloody 1973 coup, remained commander-in-chief of the Chilean army until March, when he was sworn in as a senator-for-life, a post established for him in a constitution drafted by his regime. (S19) Richard Bunting of the human rights group Amnesty International, which has frequently criticized Pinochet, said the British government was `` under obligation to take legal action '' against him. (S28) While in power he also pushed through an amnesty covering crimes committed before 1978, when most of his human rights abuses allegedly took place. (S19) Richard Bunting of the human rights group Amnesty International, which has frequently criticized Pinochet, said the British government was `` under obligation to take legal action '' against him. (S28) While in power he also pushed through an amnesty covering crimes committed before 1978, when most of his human rights abuses allegedly took place. S04 Chile said it would protest to British authorities, arguing that the 82- year-old senator-for-life has diplomatic immunity. S05 But Britain said Pinochet did not have diplomatic immunity. S15 British law recognizes two types of immunity : state immunity, which covers heads of state and government members on official visits, and diplomatic immunity for persons accredited as diplomats. S04 Chile said it would protest to British authorities, arguing that the 82- year-old senator-for-life has diplomatic immunity. S05 But Britain said Pinochet did not have diplomatic immunity. S15 British law recognizes two types of immunity : state immunity, which covers heads of state and government members on official visits, and diplomatic immunity for persons accredited as diplomats. S23 Baltasar Garzon, one of two Spanish magistrates handling probes into human rights violations in Chile and Argentina, filed a request to question Pinochet on Wednesday, a day after another judge, Manuel Garcia Castellon, filed a similar petition. S24 Castellon 's probe into murder, torture and disappearances in Chile during Pinochet 's regime began in 1996. S26 Pinochet is implicated in Garzon 's probe through his involvement in `` Operation Condor, '' in which military regimes in Chile, Argentina and Uruguay coordinated anti-leftist campaigns. S23 Baltasar Garzon, one of two Spanish magistrates handling probes into human rights violations in Chile and Argentina, filed a request to question Pinochet on Wednesday, a day after another judge, Manuel Garcia Castellon, filed a similar petition. S24 Castellon 's probe into murder, torture and disappearances in Chile during Pinochet 's regime began in 1996. S26 Pinochet is implicated in Garzon 's probe through his involvement in `` Operation Condor, '' in which military regimes in Chile, Argentina and Uruguay coordinated anti-leftist campaigns.

58 Topic Identification Discourse-Based

59 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Claim: The multi-sentence coherence structure of a text can be constructed, and the ‘centrality’, typically the nuclei of the textual units in this structure reflects their importance. Tree-like representation of texts in the style of Rhetorical Structure Theory (RST; Mann and Thompson 88). Use the discourse (RST) representation in order to determine the most important textual units. Attempts: –Ono et al. (1994) for Japanese. –Marcu (1997, 2000) for English –(see http://www.isi.edu/~marcu/discourse for more information).http://www.isi.edu/~marcu/discourse Discourse-Based Method

60 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 RST: A Very Short Overview Rhetorical Relation (Effective Speaking or Writing) –A relation holds between two non-overlapping text spans called nucleus and satellite. –Nucleus expresses what is more essential to the writer’s purpose than satellite. –The nucleus is comprehensible independent of the satellite but not vice versa. –Constraints on RST relations and their overall effect enforce text coherence. (Mann and Thompson 88).

61 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Rhetorical Parsing (1)  [Energetic and concrete action has been taken in Colombia during the past 60 days against the mafiosi of the drug trade,] 1 [but it has not been sufficiently effective,] 2 [because, unfortunately, it came too late.] 3  [Ten years ago, the newspaper El Espectador,] 4 [of which my brother Guillermo was editor,] 5 [began warning of the rise of the drug mafias and of their leaders' aspirations] 6 [to control Colombian politics, especially the Congress.] 7  [Then,] 8 [when it would have been easier to resist them,] 9 [nothing was done] 10 [and my brother was murdered by the drug mafias three years ago.] 11 … (Marcu 97) background evidence http://www.isi.edu/~marcu/discourse

62 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Rhetorical Parsing (Marcu 97) [With its distant orbit — 50 percent farther from the sun than Earth — and slim atmospheric blanket, 1 ] [Mars experiences frigid weather conditions. 2 ] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles. 3 ] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, 4 ] [but any liquid water formed that way would evaporate almost instantly 5 ] [because of the low atmospheric pressure. 6 ] [Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, 7 ] [most Martian weather involves blowing dust or carbon dioxide. 8 ] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. 9 ] [Yet even on the summer pole, where the sun remains in the sky all day long, temperatures never warm enough to melt frozen water. 10 ]

63 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Rhetorical Parsing (2) Summarization = selection of the most important units + determination of a partial ordering on them 2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6 Summarization = selection of the most important units + determination of a partial ordering on them 2 > 8 > 3, 10 > 1, 4, 5, 7, 9 > 6 1 1 5 Evidence Cause 5 Evidence Cause 5 5 6 6 4 4 4 5 Contrast 4 5 Contrast 3 3 3 Elaboration 3 Elaboration 2 2 2 Background Justification 2 Background Justification 2 Elaboratio n 2 Elaboratio n 7 7 8 8 8 Concessi on 8 Concessi on 9 9 10 Antithesi s 10 Antithesi s 8 Example 8 Example 2 Elaboratio n 2 Elaboratio n

64 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Discourse Method: Evaluation (Marcu WVLC-98: using a combination of heuristics for rhetorical parsing disambiguation)

65 Topic Identification Combining the Evidence

66 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Summary of Identification Methods General method: score each sentence; combine scores; choose best sentence(s) Scoring techniques: –Position in the text: lead method; optimal position policy; title/heading method –Cue phrases: in sentences –Term frequencies: throughout the text –Cohesion: semantic links among words; lexical chains; coreference; word co-occurrence –Discourse structure of the text

67 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Combining the Evidence Problem: which extraction methods to use? Answer: assume they are independent, and combine their evidence: merge individual sentence scores. Studies: –(Edumondson 69): linear combination. –(Kupiec et al. 95; Aone et al. 97; Teufel & Moens 97): Naïve Bayes. –(Mani and Bloedorn 98): SCDF, C4.5, inductive learning. –(Lin 99): C4.5, linear combination. –(Marcu 98): rhetorical parsing tuning.

68 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Linear Combination (1) Edmundson 69: –Combination of cue phrase (C), key method (K), title method (T), and position method (P). –A total of 17 experiments were performed to adjust weights, i.e.. –Co-selection ratio between the automatic and the the target extracts was used as the evaluation metric.

69 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Linear Combination (2) (Edmundson 69)

70 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Naïve Bayes (1) Kupiec, Pedersen, and Chen 95 –Features Sentence length cut-off (binary long sentence feature: only include sentences longer than 5 words) Fixed-phrase (binary cue phrase feature: such as “this letter …”, total 26 fixed phrases) Paragraph (binary position feature: the first 10 and the last 5 paragraphs in a document) Thematic Word (binary key feature: sentences contain the most frequent content words or not) Uppercase Word (binary orthographic feature: capitalize frequent words not occur at the sentence-initial position) –Corpus: 188 OCRed documents and their manual summaries sampled from 21 scientific/technical domain.

71 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Naïve Bayes (2) Summarization as a sentence ranking process based on the probability of a sentence would be included in a summary: –Assuming statistical independence of the features. – is a constant (assuming uniform distribution for all s). – and can be estimated from the training set. (Kupiec et al. 95)

72 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Naïve Bayes (3) (Kupiec et al. 95)

73 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Performance of Individual Factors (Lin CIKM 99)

74 Topic Interpretation

75 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 From extract to abstract: topic interpretation or concept fusion. Experiment (Marcu, 99): –Got 10 newspaper texts, with human abstracts. –Asked 14 judges to extract corresponding clauses from texts, to cover the same content. –Compared word lengths of extracts to abstracts: extract_length  2.76  abstract_length !! Topic Interpretation xx xxx xxxx x xx xxxx xx xxx x xxx xx xxx x xx x xxxx xxxx xxxx xx xx xxxx xxx xxx xx xx xxxx x xxx xx x xx xx xxxxx x x xx xxx xxxxxx xxxxxx x x xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx x xx xx xxxx xxx xxxx xx xxx xx xxx xxxx xx xxx x xxxx x xx xxxx xx xx xxxxx x x xx xxxxxxx xx x xxxxxx xxxx xx xx xxxxx xxx xx xxx xx xxxx x xxxxx xx xxxxx x

76 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Some Types of Interpretation Concept Generalization: Sue ate apples, pears, and bananas  Sue ate fruit Meronymy Replacement (Part-Whole) : Both wheels, the pedals, saddle, chain…  the bike Script Identification: (Schank and Abelson, 77) He sat down, read the menu, ordered, ate, paid, and left  He ate at the restaurant Metonymy: A spokesperson for the US Government announced that…  Washington announced that...

77 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 General Aspects of Interpretation Interpretation occurs at the conceptual level... …words alone are polysemous (bat  animal and sports instrument) and combine for meaning (alleged murderer  murderer). For interpretation, you need world knowledge... …the fusion inferences are not in the text! Little work so far: (Lin, 95; Radev and McKeown, 98; Reimer and Hahn, 97; Hovy and Lin, 98).

78 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Form-Based Operations Claim: Using IE systems, can aggregate forms by detecting interrelationships. 1. Detect relationships (contradictions, changes of perspective, additions, refinements, agreements, trends, etc.). 2. Modify, delete, aggregate forms using rules (Radev and McKeown, 98): Given two forms, if (the location of the incident is the same and the time of the first report is before the time of the second report and the report sources are different and at least one slot differs in value) then combine the forms using a contradiction operator. Given two forms, if (the location of the incident is the same and the time of the first report is before the time of the second report and the report sources are different and at least one slot differs in value) then combine the forms using a contradiction operator.

79 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Inferences in Terminological Logic ‘Condensation’ operators (Reimer & Hahn 97). 1.Parse text, incrementally build a terminological rep. 2.Apply condensation operators to determine the salient concepts, relationships, and properties for each paragraph (employ frequency counting and other heuristics on concepts and relations, not on words). 3.Build a hierarchy of topic descriptions out of salient constructs. Conclusion: No evaluation.

80 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Claim: Can perform concept generalization, using WordNet (Lin, 95). Find most appropriate summarizing concept: Concept Generalization: Wavefront Cash register Mainframe Dell Mac IBM Computer Calculator 18 6 6 5 5 20 5 5 0 0 2 2 PC 1. Count word occurrences in text; score WN concepts 2. Propagate scores upward 3. R  Max{scores} /  scores 4. Move downward until no obvious child: R { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/13/3720700/slides/slide_80.jpg", "name": "Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Claim: Can perform concept generalization, using WordNet (Lin, 95).", "description": "Find most appropriate summarizing concept: Concept Generalization: Wavefront Cash register Mainframe Dell Mac IBM Computer Calculator 18 6 6 5 5 20 5 5 0 0 2 2 PC 1. Count word occurrences in text; score WN concepts 2. Propagate scores upward 3. R  Max{scores} /  scores 4. Move downward until no obvious child: R

81 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Wavefront Evaluation 200 BusinessWeek articles about computers: –typical length 750 words (1 page). –human abstracts, typical length 150 words (1 par). –several parameters; many variations tried. R t = 0.67; StartDepth = 6; Length = 20%: Conclusion: need more elaborate taxonomy.

82 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Topic Signatures (1) Claim: Can approximate script identification at lexical level, using automatically acquired ‘word families’ (Lin and Hovy COLING 2000). Idea: Create topic signatures: each concept is defined by frequency distribution of its related words (concepts): signature = {head (c1,f1) (c2,f2)...} restaurant  waiter + menu + food + eat... (inverse of query expansion in IR.)

83 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Example Signatures Download topic signatures for all WordNet 1.6 noun senses: http://ixa.si.ehu.es/Ixa/resources/sensecorpus Agirre et al. LREC 2004 Download topic signatures for all WordNet 1.6 noun senses: http://ixa.si.ehu.es/Ixa/resources/sensecorpus Agirre et al. LREC 2004

84 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Topic Signatures (2) Experiment: created 30 signatures from 30,000 Wall Street Journal texts, 30 categories: –Used tf.idf to determine uniqueness in category. –Collected most frequent 300 words per term. Evaluation: classified 2204 new texts: –Created document signature and matched against all topic signatures; selected best match. Results: Precision  69.31%; Recall  75.66% –90%+ for top 1/3 of categories; rest lower, because less clearly delineated (overlapping signatures).

85 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Evaluating Signature Quality Test: perform text categorization task: 1. match new text’s ‘signature’ against topic signatures 2. measure how correctly texts are classified by signature Document Signature (DS i ): [ (t i1,w i1 ), (t i2,w i2 ), …,(t in,w in ) ] Similarity measure: –cosine similarity, cos  TS k  DSi / | TS k  DSi|

86 Summary Generation

87 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 NL Generation for Summaries Level 1: no separate generation –Produce extracts, verbatim from input text. Level 2: simple sentences –Assemble portions of extracted clauses together. Level 3: full NLG 1. Sentence Planner: plan sentence content, sentence length, theme, order of constituents, words chosen... (Hovy and Wanner, 96) 2. Surface Realizer: linearize input grammatically (Elhadad, 92; Knight and Hatzivassiloglou, 95).

88 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Full Generation Example Challenge: Pack content densely! Example (Radev and McKeown, 98): –Traverse templates and assign values to ‘realization switches’ that control local choices such as tense and voice. –Map modified templates into a representation of Functional Descriptions (input representation to Columbia’s NL generation system FUF). –FUF maps Functional Descriptions into English.

89 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Generation Example (Radev and McKeown, 98) NICOSIA, Cyprus (AP) – Two bombs exploded near government ministries in Baghdad, but there was no immediate word of any casualties, Iraqi dissidents reported Friday. There was no independent confirmation of the claims by the Iraqi National Congress. Iraq’s state-controlled media have not mentioned any bombings. NICOSIA, Cyprus (AP) – Two bombs exploded near government ministries in Baghdad, but there was no immediate word of any casualties, Iraqi dissidents reported Friday. There was no independent confirmation of the claims by the Iraqi National Congress. Iraq’s state-controlled media have not mentioned any bombings. Multiple sources and disagreement Explicit mentioning of “no information ”

90 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Fusion at Syntactic Level General Procedure: 1. Identify sentences with overlapping/related content, 2. Parse these sentences into syntax trees, 3. Apply fusion operators to compress the syntax trees, 4. Generate sentence(s) from the fused tree(s). A ABCBD X AA BB C D X

91 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Syntax Fusion (1) (Barzilay, McKeown, and Elhadad, 99) Parse tree: simple syntactic dependency notation DSYNT, using Collins parser. Tree paraphrase rules derived through corpus analysis cover 85% of cases: –sentence part reordering, –demotion to relative clause, –coercion to different syntactic class, –change of grammatical feature: tense, number, passive, etc., –change of part of speech, –lexical paraphrase using synonym, etc. Compact trees mapped into English using FUF. Evaluate the fluency of the output.

92 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Syntax Fusion (2) (Mani, Gates, and Bloedorn, 99) Elimination of syntactic constituents. Aggregation of constituents of two sentences on the basis of referential identity. Smoothing: –Reduction of coordinated constituents. –Reduction of relative clauses. Reference adjustment. Evaluation: –Informativeness. –Readability.

93 Text Summarization Information Extraction-Based

94 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Information Extraction Method Idea: content selection using forms (templates) –Predefine a form, whose slots specify what is of interest. –Use a canonical IE system to extract from a (set of) document(s) the relevant information; fill the form. –Generate the content of the form as the summary. Previous IE work: –FRUMP (DeJong, 79): ‘sketchy scripts’ of terrorism, natural disasters, political visits... –McFRUMP (Mauldin, 89): forms for conceptual IR. –SCISOR (Rau and Jacobs, 90): forms for business. –SUMMONS (McKeown and Radev, 95): forms for news. –GisTexter (Harabagiu et al., 03): Traditional IE + dynamic templates based on WordNet.

95 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 IE-Based Summarization Topic Identification –A template defines the topic of interest and important facts associated with the topic. This also defines the coverage of IE- based summarizer. –Problem: how to select a relevant template? (i.e. the frame initiation problem) Topic Interpretation –Problem: how to fill template slots and merge multiple templates? Generation –Problem: how to generate summaries from the filled and merged templates?

96 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 FRUMP (DeJong 1979) Domain: General news Input: UPI news stories Output: summaries (single doc) Template: 60 sketchy scripts that contain important events in a situation. –Meeting, earthquake, theft, war, … –Represented as conceptual dependency (CD) Encode constraints on script variables, between script variables, and between scripts. Ex: John went to New York. ((ACTOR (*John*) (*PTRANS*) OBJECT (*JOHN*) TO (*NEW YORK*))

97 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 FRUMP (cont.) Vocabulary size: 2,300 root words plus morphological rules. Interpretation Mechanism –PREDICTOR initiates candidate scripts. –SUBSTANTIATOR validates the script slots. Evaluation –6 days run from April 10 to 15 in 1980. –Lenient: recall 63%, precision 80% –Strict: recall 43%, precision 54%

98 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 FRUMP (cont.) Three ways to activate sketchy scripts: –Explicit: a word or phrase that identifies an entire script can activate the script. Ex: ARREST –Implicit: invocation of a script activates a related script. Ex: CRIME => ARREST –Event-inducing: a central event of a script can activate the script. (key request) Ex: police apprehended someone => ARREST

99 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 McFRUMP (Mauldin 1989) Domain: FRUMP + Astronomy (5 scripts) Input: UseNet news; Output: case frame (CD) *Mauldin 91

100 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 McFRUMP (cont.) Vocabulary size: 3,369 nouns, 2,148 names, and 376 basic verbs + synonyms from Webster’s 7 (28,323 word senses) + near synonyms, and proper name rules. Interpretation Mechanism –PREDICTOR initiates candidate scripts. –SUBSTANTIATOR validates the script slots. Evaluation –Use as a component in FERRET, a conceptual information retrieval system. –Measured on a test corpus of 1,065 astronomy texts with recall 52% and precision 48%.

101 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 SCISOR (Rau & Jacobs 1990) Domain: Merger & Acquisition (M&A) Input: Dow Jones News + Queries Output: Answers Template: M&A (1?) Vocabulary size: 10,000 word roots + 75 affixes and a core concept hierarchy of 1,000 nodes. Interpretation Mechanism –TRUMPET top down processor matches bottom up output with expectations. –TRUMP bottom up processor parses input into frame-like semantic representation.

102 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 SCISOR (cont.) Evaluation: NA

103 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 SUMMONS (McKeown & Radev 95) Domain: Terrorist (MUC-4) Input: Filled IE templates + Sources + Times Output: Automatically generated English summary of a few sentences (abstract) Template: Terrorist (1?, 25 fields) Vocabulary size: Large? (based on FUF, Functional Unification Formalism, Elhadad 91) Interpretation Mechanism –Summarization is achieved through a series of template merging using 7+1 major summary operators.

104 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 SUMMONS (cont.) Evaluation: NA Goal: Generate paragraph summaries describing the change of a single event over time or a series of related events. –Assume all the input templates are about the same event. (a strong assumption!) –Focus on similarity and difference of sources, times, and values of slot. Input: Cluster of templates T1T1 TmTm Conceptual combiner T2T2 ….. Combiner Paragraph planner Planning operators Linguistic realizer Sentence planner Sentence generator Lexical chooser Lexicon OUTPUT: Base summary SURGE Domain ontology

105 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 SUMMONS Operators (Topic Interpretation) *T x S i : slot i of template x; ∪ : union; ⊂ : specialization

106 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 GisTexter (Harabagiu et al. 03) (An End-to-end IE-based Summarizer) Domain:General Input: General News Output: summaries (extract) with sentence reduction Template: Natural Disasters (1?) + ad hoc template from WordNet topical relations. (match 40 out of 59 DUC 2002 topics) Vocabulary size: WordNet 2.0 + topical relations extracted from WordNet Interpretation Mechanism –Apply traditional IE to fill templates –Map filled template slots to text snippets –Resolve coreference chains in the text snippets –Generate summaries through a 5-step incremental process –If pre-encorded template cannot be found, then ad-hoc templates are automatically generated.

107 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Architecture of GisTexter *Harabagiu et al. 2003

108 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 GisTexter (cont.) No concepts of time, source, different perspective, addition, contradition, superset, refinement, or trend as in SUMMONS. Only the agreement operator is used based on slot-filler frequency. Link template slots to text snippets through extraction patterns. –Natural Disasters: [Casuality-expression {to|from} $Number {from|because of} Disaster- word]  “the death toll to 40 from last weeks’ tornado” The most frequent slot-filler across all templates is called the dominant event and is treated as the main topic of the input texts. (ex: Hurricane Andrew)

109 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 GisTexter Summarization Procedure 1.Select the most important template T 0  Importance: the sum of all frequency counts of all slots of a template. 2.Select sentences covered by the text snippets mapped from T 0. 3.Select sentences from other templates about the dominant event that are not selected in 2. 4.Select sentences from other templates that are not about the dominant event from the texts covered by 1 and 2. 5.Select sentences from the rest of the templates. (* always add antecedent sentences if anaphoric expressions exist)

110 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 GisTexter Ad Hoc Templates Based on WordNet 2.0 (Miller et al. 95) gloss and synset relation (isa, has-part, …) hierarchy + automatic template slot creation method inspired by Autoslog-TS (Riloff 96). *Harabagiu et al. 2003

111 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 GisTexter Evaluation DUC02 multi-doc 3rd in precision and 1st in recall. Limit coverage of existing templates (about 2/3). –Didn’t state how difficult it is to create a new template. –Didn’t compare systems using only existing templates v.s. using only ad hoc templates.

112 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Frame Semantic Frame –A script-like conceptual structure that describes a particular type of situation, object, or event and the participants and props involved in it.

113 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Frame Relations

114 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Lexical Units A pairing of a word with a meaning (i.e. word sense). [Austrian police] authority [on June 3] time ARRESTED [seven people] suspect [in Vienna] location [on charges of carrying radioactive material consisting of 55 grammes of feebly enriched uranium concealed in metallic discs] charge.

115 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 FrameNet II An on-line lexical resource for English, based on frame semantics. The goal is to document the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses. It contains more than 8,900 lexical units in more than 625 semantic frames. http://framenet.icsi.berkeley.edu

116 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Future: Frame-Based Summarization Topic Identification –Frame identification: ARREST –Frame Element identification ARREST::Authorities, ARREST::Suspect, ARREST::Offense, ARREST::Charges, … Topic Interpretation –Frame relation identification –Frame merging through summarization operators –Frame ranking according to task requirement Summary Generation –Sentence extraction (with sentence reduction or compression) –Sentence generation from Frames Automatic Frame Database Creation –Automatic frame creation –Automatic frame element creation –Automatic frame relation creation

117 Recent Development & Applications

118 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Headline or Very Short Summary Generation + Compression Headline or title generation –Witbrock & Mittal (SIGIR99): generative probabilistic model. –Dorr, Zajic & Schwartz (DUC 2003): hybrid statistical content generation and linguistic rules. –Zhou & Hovy (DUC 2003): position + headline word model + automatic post generation patching. Sentence Compression –Knight & Marcu (2002 AI Journal): noisy channel model and decision tree model –Riezler, King, Crouch & Zaenen (NAACL 2003): stochastic lexical functional grammar (LFG). –Turner & Charniak (ACL2005): improved noisy channel model based on K&M’s work.

119 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Summarization Models Berger & Mittal (SIGIR 2000): –Bag of words gisting model –Expanded-lexicon gisting model –Readable gisting model Kraaji, Spitters & van der Heijden (DUC 2001): –Mixture language model (content salience) + naïve Bayes model (surface salience) Jing (CL 2002): –Cut and paste model based on HMM. Barzilay & Lee (NAACL-HLT 2004): –Probabilistic content model based on HMM. Barzilay & Lapata (ACL 2005): –Entity-based local coherence model. Daume III & Marcu (ACL MSE 2005): –Bayesian query-focused model.

120 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Applications Berger & Mittal (SIGIR 2000): web summarization. Zechner & Waibel (COLING 2000): dialog summarization. Buyukkokten, Garcia-Molina & Paepcke (WWW10 2001): summarization for web browsing on handheld devices. Muresan, Tzoukermann & Klavans (CoNLL 2001): email summarization. Zhou,Ticrea & Hovy (EMNLP 2004): biography summarization. Buist, Kraaij & Raaijmakers (CLIN 2005): meeting data summarization. Zhou & Hovy (ACL 2005): online discussion summarization. Bing, Hu & Chen (WWW14 2005): opinion summarization.

121 ISI – DARPA Surprise Language Exercise 2003 (Leuski et al. 03)

122 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Table of contents 1. Motivation and Introduction. 2. Approaches and paradigms. 3. Summarization methods. 4. Evaluating summaries. 5. The future.

123 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Progress in Speech Recognition Mukund Padmanabhan and Michael Picheny, HLT Group, IBM Research, ASR Workshop, Paris, Sep 2000. Common Automatic Metric Common Corpora Reduced Error Rate

124 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 MT and Summarization Evaluations ROUGE Machine Translation –Inputs Reference translation Candidate translation –Methods Manually compare two translations in: –Adequacy –Fluency –Informativeness Auto evaluation using: –BLEU/NIST scores Auto Summarization –Inputs Reference summary Candidate summary –Methods Manually compare two summaries in: –Content overlap –Linguistic qualities Auto evaluation using:

125 Summarization Evaluation

126 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Document Understanding Conference (DUC) Part of US DARPA TIDES Project DUC 01 - 04 (http://duc.nist.gov) –Tasks Single-doc summarization (DUC 01 and 02: 30 topics) Single-doc headline generation (DUC 03: 30 topics, 04: 50 topics) Multi-doc summarization –Generic 10, 50, 100, 200 (2002), and 400 (2001) words summaries –Short summaries of about 100 words in three different tasks in 2003 »focused by an event (30 TDT clusters) »focused by a viewpoint (30 TREC clusters) »in response to a question (30 TREC Novelty track clusters) –Short summaries of about 665 bytes in three different tasks in 2004 »focused by an event (50 TDT clusters) »focused by an event but documents were translated into English from Arabic (24 topics) »in response to a “who is X?” question (50 persons) –Participants 15 systems in DUC 2001, 17 in DUC 2002, 21 in DUC 2003, and 25 in DUC 2004 A new roadmap was discussed this year at DUC 2005 in Vancouver.

127 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Snapshot of An Evaluation Session Overall Candidate Quality SEE – Summary Evaluation Environment (Lin 2001) http://www.isi.edu/~cyl/SEE

128 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Snapshot of An Evaluation Session Measuring Content Coverage Single Reference! Coarse Judgment!

129 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Inconsistency of Human Judgment Single Document Task –Total of 5,921 judgments –Among them 1,076 (18%) contain different judgments for the same pair (from the same assessor) –143 of them have three different coverage grades (2.4%) Multi-Document Task –Total of 6,963 judgments –Among them 528 (7.6%) contain different judgments for the same pair (from the same assessor) –27 of them have three different coverage grades Lin and Hovy (DUC Workshop 2002)

130 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 DUC 2003 Human vs. Human (1) Nenkova and Passonneau (HLT/NAACL 2004)

131 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 DUC 2003 Human vs. Human (2) 1.Can we get consensus among humans? 2.If yes, how many humans do we need to get consensus? 3.Single reference or multiple references?

132 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 DUC 2003 Human vs. Human (3) Can we get stable estimation of human or system performance? How many samples do we need to achieve this?

133 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Summary of Research Issues How to accommodate human inconsistency? Can we obtain stable evaluation results despite using only a single reference summary per evaluation? Will inclusion of multiple summaries make evaluation more or less stable? How can multiple references be used in improving stability of evaluations? How is stability of evaluation affected by sample size?

134 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Recent Results Van Halteren and Teufel (2003) –Stable consensus factoid summary could be obtained if 40 to 50 reference summaries were considered. 50 manual summaries of one text. Nenkova and Passonneau (2003) –Stable consensus semantic content unit (SCU) summary could be obtained if at least 5 reference summaries were used. 10 manual multi-doc summaries for three DUC 2003 topics. Hori et al. (2003) –Using multiple references would improve evaluation stability if a metric taking into account consensus. 50 utterances in Japanese TV broadcast news; each with 25 manual summaries. Lin and Hovy (2003), Lin (2004) –ROUGE, an automatic summarization evaluation method used in DUC 2003.

135 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Automatic Evaluation of Summarization Using ROUGE ROUGE summarization evaluation package –Currently (v1.5.4) include the following automatic evaluation methods: (Lin, Text Summarization Branches Out workshop 2004) ROUGE-N: N-gram based co-occurrence statistics ROUGE-L: LCS-based statistics ROUGE-W: Weighted LCS-based statistics that favors consecutive LCSes (see ROUGE note) ROUGE-S: Skip-bigram-based co-occurrence statistics ROUGE-SU: Skip-bigram plus unigram-based co-occurrence statistics –Free download for research purpose at: http://www.isi.edu/~cyl/ROUGE

136 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-N N-gram co-occurrences between reference and candidate translations. –Similar to BLEU in MT (Papineni et al. 2001) High order ROUGE-N with n-gram length greater than 1 estimates the fluency of summaries. Example: 1.police killed the gunman 2.police kill the gunman 3.the gunman kill police ROUGE-N: S2=S3 (“police”, “the gunman”)

137 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-L Longest Common Subsequence (LCS) –Given two sequences X and Y, a longest common subsequence of X and Y is a common subsequence with maximum length. –Intuition The longer the LCS of two translations is, the more similar the two translations are. (Saggion et al. 2002, MEAD) –Score Use LCS-based recall score (ROUGE-L) to estimate the similarity between two translations. (see paper for more details)

138 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-L Example Example: 1.police killed the gunman 2.police kill the gunman 3.the gunman kill police ROUGE-N: S2=S3 (“police”, “the gunman”) ROUGE-L: –S2=3/4 (“police the gunman”) –S3=2/4 (“the gunman”) –S2>S3

139 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-W Weighted Longest Common Subsequence –Example: X: [A B C D E F G] Y 1 :[A B C D H I K] Y 2 : [A H B K C I D] ROUGE-L(Y 1 ) = ROUGE-L(Y 2 ) –ROUGE-W favors strings with consecutive matches. –It can be computed efficiently using dynamic programming.

140 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-S Skip-Bigram –Any pair of words in their sentence order, allowing for arbitrary gaps. –Intuition Consider long distance dependency. Allow gaps in matches as LCS but count all in-sequence pairs; while LCS only counts the longest subsequences. –Score Use skip-bigram-based recall score (ROUGE-S) to estimate the similarity between two translations. (see paper for more details)

141 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 ROUGE-S Example Example: 1.police killed the gunman 2.police kill the gunman 3.the gunman kill police 4.the gunman police killed ROUGE-N: S4>S2=S3 ROUGE-L: S2>S3=S4 ROUGE-S: –S2=3/6 (“police the”, “police gunman”, “the gunman”) –S3=1/6 (“the gunman”) –S4=2/6 (“the gunman”, “police killed”) –S2>S4>S3

142 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Evaluation of ROUGE Corpora –DUC 01, 02, and 03 evaluation data –Including human and systems summaries Seven task formats –Single doc 10 and 100 words, multi-doc 10, 50, 100, 200, and 400 words Three versions –CASE: the original summaries –STEM: the stemmed version of summaries –STOP: STEM plus removal of stopwords Number of references –Single and different numbers of multiple references Quality criterion –Pearson’s product moment correlation coefficients between systems’ average ROUGE scores and their human assigned mean coverage score Metrics –17 ROUGE metrics: ROUGE-N with N = 1 to 9, ROUGE-L, ROUGE-W, ROUGE-S and ROUGE-SU (with maximum skip-distance of 0, 4, and 9) Statistical significance –95% confidence interval estimated using bootstrap resampling

143 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 100 Words Single-Doc Task

144 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 10 Words Single-Doc Task

145 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 100 Words Multi-Doc Task

146 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Multi-Doc Task of Different Summary Sizes

147 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Summary of Results Overall –Using multiple references achieved better correlation with human judgment than just using a single reference. –Using more samples achieved better correlation with human judgment (DUC 02 vs. other DUC data). –Stemming and removing stopwords improved correlation with human judgment. –Single-doc task had better correlation than multi-doc Specific –ROUGE-S4, S9, and ROUGE-W1.2 were the best in 100 words single- doc task, but were statistically indistinguishable from most other ROUGE metrics. –ROUGE-1, ROUGE-L, ROUGE-SU4, ROUGE-SU9, and ROUGE-W1.2 worked very well in 10 words headline like task (Pearson’s  ~ 97%). –ROUGE-1, 2, and ROUGE-SU* were the best in 100 words multi-doc task but were statistically equivalent to other ROUGE-S and SU metrics.

148 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Effect of Sample Sizes Examine the effect of sample size on: –Human assigned mean coverage (C) –ROUGE –Correlation between C and ROUGE Experimental setup –DUC 2001 100 words single and multi doc data. –Applied stemming and stopword removal. –Ran bootstrap resampling at sample size from 1 to 142 for single-doc task and 26 for multi-doc task.

149 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Pearson’s ρ ROUGE-SU4 vs. Mean Coverage (DUC 2001)

150 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Effect of Sample Size 100 Words Single-Doc Task DUC 2001 –Pearson’s  ROUGE-SU4 vs. mean coverage

151 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Effect of Sample Size 100 Words Multi-Doc Task DUC 2001 –Pearson’s  ROUGE-SU4 vs. mean coverage

152 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Summary of Results (Sample Size) Reliability of performance estimation in human assigned coverage score and ROUGE improved when the number of samples increased (tighter gray box). Reliability of correlation analysis improved when the number of samples increased. –Critical numbers to reach significant results in Pearson’s  at least 86 documents in the single-doc task. at least 18 topics in the multi-doc task. –Number of documents and topics in DUC were well above these critical numbers. Using multiple reference achieved better correlation between ROUGE and mean coverage. Sample size had a big effect on the reliability of estimation.

153 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Conclusions and Future Work ROUGE worked pretty well in evaluations of single document summarization tasks (100 words and 10 words) with Pearson’s  above 85%. ROUGE worked fine in multi-doc summarization evaluations but how to make its metrics consistently achieve Pearson’s  above 85% in all test cases is still an open research topic. If human inconsistency is unavoidable, we then need to quantify its effect using statistical significance analysis and minimize it using large number of samples.

154 Basic Elements – ROUGE+ Toward Automatic Evaluation of MT and Summarization in Semantic Space

155 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Snapshot of An Evaluation Session Measuring Content Coverage Single Reference! Coarse Judgment!

156 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 What Is the Right Span of Information Unit Information Retrieval –Document and passage Question and Answering –Factoid, paragraph, document, … Summarization –Word, phrase, clause (EDU), sentence, paragraph, …

157 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 The Factoid Method Van Halteren & Teufel (2003, 2004) Factoids –Atomic semantic units represent sentence meaning (FOPL style). –“Atomic” means that a semantic unit is used as a whole across multiple summaries. –Each factoid may carry information varying from a single word to a clause. Example: –The police have arrested a white Dutch man. A suspect was arrested. The police did the arresting. The suspect is white. The suspect is Dutch. The suspect is male.

158 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 The Pyramid Method Nenkova and Passonneau (2003) Pyramid –A weighted inventory of factoids or summarization content units (SCU) A: “Unable to make payments on a $2.1 billion debt” B: “made payments on PAL's $2 billion debt impossible” C: “with a rising $2.1 billion debt” D: “PAL is buried under a $2.2 billion dollar debt it cannot repay” SCU –F1: PAL has 2.1 million debt (All) –F2: PAL can't make payments on debt (Most)

159 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Problems with Factoid and SCU Each factoid may carry very different amount of information –How to assign fair information value to a factoid? –No predetermined size of factoids or SCUs  “counting matches” and “scoring” would be problematic. The inventory of factoids grows as more summaries are added to the reference pool –Old factoids tend to break apart to create new factoids Interdependency of factoids are ignored Totally manual creation so far and only been tested on very small data set –Factoid: 2 documents –SCU+Pyramid: 3 sets of multi-doc topics How to automate?

160 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Basic Elements (BE) Definition –A head, modifier and relation triple: BE:: –BE::HEAD is the head of a major syntactic constituent (noun, verb, adjective or adverbial phrases). –BE::MOD is a single dependent of BE::HEAD with a relation, BE::REL, between them. –BE::REL could a syntactic, semantic relation or NIL. Example –“Two Libyans were indicted for the Lockerbie bombing in 1991.” 

161 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Research Issues How can BEs be created automatically? –Extract dependency triples from automatic parse trees. BE-F: MINPAR triples* (Lin 95) BE-L: Charniak parse trees + automatic semantic role tagging* What score should each BE have? –Equal weight*, tfidf, information value, … When do two BEs match? –Lexical*, lemma*, synonym, distributional similarity, … How should an overall summary score be derived from the individual matched BEs' scores? –Consensus of references*

162 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Current Status First version, BE 1.0, released to the research community on April 13, 2005. –Package include: BE-F (Minipar) BE breakers ROUGE-1.5.5 scorer –One of the three official automatic evaluation metrics for Multilingual Summarization Evaluation 2005 (MSE 2005). –It is used in DUC 2005. –Free download for research purpose at: http://www.isi.edu/~cyl/BE

163 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Evaluation Measurement –Examine the Pearson’s correlation between human assigned mean coverage (C) and BE. –Compare results with ROUGE 1-4, S4, and SU4. Experimental setup –Use DUC 2002 (10 systems) and 2003 (18 systems) 100 words multi doc data. –Compare single vs. multiple references. –Applied stemming and stopword removal.

164 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Correlation Analysis (DUC 2002)

165 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Correlation Analysis (DUC 2003)

166 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Oracle Experiment We would like to answer the following questions: 1.How well does the topic identification module of a summarization system work? 2.How much information do we lose by only matching on lexical level? 3.How much do humans agree with each others? Experimental setup –Use DUC 2003 multi-doc data with 4 human refs. –Use BE ranking to simulate topic identification. We compare three BE ranking methods: LR (log likelihood ratio), DF (document frequency with LR), and RawDF (raw document frequency) –Use BE-F as counting unit. –Compute recall and precision of different BE ranking methods vs. reference summary BE ranking based on raw DF.

167 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Oracle Exp: BE-F Recall Lexical Gap! Better Ranking Algorithm!

168 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Oracle Exp: BE-F Precision Better Ranking Algorithm! Lexical Gap & Human Disagreement

169 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Conclusions BE-F consistently achieves over 90% Peason’s correlation with human judgments in all testing categories. –BE-F with stemming and matching only on BE::HEAD and BE:MOD (HM & HM1) has the best correlation. BE-L has over 90% correlation when both BE::HEAD and BE::MOD are considered in the matching. It also works better with multiple references. BE-F and BE-L are more stable than ROUGE across corpora. (DUC’02 R2 Org vs. DUC’03 R3 Stop) Need to go beyond lexical matching. Need to develop better BE ranking algorithms. Need to address the issue of human disagreement: –Better summary writers? –Better domain knowledge? –Better task definition …

170 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Future Directions BE breaking –Use FrameNet II frame elements in BE relations. BE matching –Paraphrases, synonyms, and distributional similarity. BE ranking –Prioritize BEs in a given application context. –Assign weights according to BE’s information content. –Utilize inter-BE dependency. Application –Develop summarization methods based on BE.

171 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Table of contents 1. Motivation and Introduction. 2. Approaches and paradigms. 3. Summarization methods. 4. Evaluating summaries. 5. The future.

172 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 What’s Next? Robust information extraction on general domains based on: FrameNet, PropBank, WordNet and other knowledge sources. Automatic frame acquisition and integration to increase coverage and development of better template/frame merging and linking methods. Design of better summarization models (statistical or not). Construction of large common summarization corpora and design better evaluation methods to facilitate more accurate fast develop and test cycles.

173 Chin-Yew LIN IJCNLP-05 Jeju Island Republic of Korea October 10, 2005 Good Bye! Please check: www.summaries.net for the latest version of this tutorial and reference information.


Download ppt "Automated Text Summarization – IJCNLP-2005 Tutorial Chin-Yew Lin Natural Language Group Information Sciences Institute University of Southern California."

Similar presentations


Ads by Google