Presentation is loading. Please wait.

Presentation is loading. Please wait.

Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.

Similar presentations


Presentation on theme: "Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media."— Presentation transcript:

1 Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media Studies, University of Tsukuba

2 motivation Existing encyclopedias often lack new terms and new definitions for existing terms Web contains an enormous volume of up-to- date information is a source to obtain new term descriptions The use of existing search engine has many problems

3 search engine?? Often retrieve extraneous pages not describing a submitted term A user has to identify page fragments describing the term Descriptions in multiple pages are independent Word senses are not distinguished for ambiguous terms

4 They propose a summarization method that produces a concise and condensed term description from multiple paragraphs In this paper, they focus on Japanese technical terms in the computer domain

5 Overview of CYCLONE

6

7 Summarization Method Given a set of paragraph-style descriptions for a single term in a specific domain, their summarization method produces a concise text describing the term from different viewpoints 12 viewpoints in computer domain: definition, abbreviation, exemplification, purpose, synonym, reference, product, advantage, drawback, history, component, function

8 Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

9 Identification A sentence is often associated with multiple viewpoints e.g. XML is an abbreviation for eXtensible Markup Language, and is markup language Segment Japanese sentences into simple sentences, and apply zero pronoun detection and anaphora resolution can be used XML is an abbreviation for eXtensible Markup Language XML is markup language Abbreviation viewpoint definition viewpoint

10 Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

11 Classification 12 viewpoints 36 linguistic patterns are used to describe terms from a specific viewpoint Simple sentences match with patterns for multiple viewpoints is classified into viewpoint group

12 Classification (cont) How about those sentences do not match any patterns? Classify remaining sentences into the group where their most similar sentence is belong Compute the similarity between an unclassified sentences and each of the classified sentences (Dice coefficient) “miscellaneous” group

13 example

14 Four steps Identification Recognize the language unit associated with a viewpoint Classification Merge units with the same viewpoint into a single group Selection Determine one or more representative units for each group Presentation Produce a summary in a format

15 Selection The number of sentences selected from each group depends on the desired size of the resultant summary Compute the score for each sentence and select sentences with greater scores in each group # of common words included (W) – sentences including frequent words are preferred Rank order in CYCLONE (R) # of characters include (C) – short sentences are preferred Normalize each factor and compute final score as a weighed average of the three factors above (W>R>C)

16 Selection (cont) For miscellaneous group, they select the most dissimilar sentence to representative sentences selected from the regular groups

17 Presentation

18 Top 50 paragraphs for the term “XML” Only one sentence was selected from each group Each viewpoint label or sentence is hyper- linked to the associated group or the source paragraph Presentation (cont)

19 Evaluation Summarization evaluation can be classified into intrinsic and extrinsic approaches Intrinsic: the quality of a text, informativeness Extrinsic: if a summary improves the efficiency of a specific task

20 Evaluation (cont) 15 Japanese terms are test inputs In order to calculate the coverage, for each of the 15 terms, two students annotate each simple sentence in the top 50 paragraphs in the CYCLONE results with one or more viewpoints They define 28 viewpoints including the 12 viewpoints Compression ratio and coverage were calculate by the top 50 paragraphs

21 Results #Reps: the number of representative sentences selected from each viewpoint group #Chars: the number of characters in a summary They select five sentences from the miscellaneous group VBS: viewpoint-based summarization method Lead: systematically extracted the top N characters from the CYCLONE results

22 Conclusion To compile encyclopedic term descriptions from the Web, they introduced a summarization method They identify the simple sentences, classify those sentences into viewpoint groups, select the representative sentences from each group and show them up VBS got good compression ratio and the coverage score is better than baseline Future work includes generating a coherent text and performing extrinsic evaluation method


Download ppt "Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media."

Similar presentations


Ads by Google