Presentation is loading. Please wait.

Presentation is loading. Please wait.

NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001.

Similar presentations


Presentation on theme: "NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001."— Presentation transcript:

1

2 NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001 DUC2001 Workshop

3 Objective Use IE technologies for Summarization –Named Entity –Automatic pattern discovery Find important phrases (patterns) of the domain Combine with Summarization technologies –Important Sentence Extraction Sentence position, length, TF/IDF, Headline

4 Important Sentence Extraction Combining 5 scores –Sentence position –Sentence length –TF/IDF –Similarity to Headline –Pattern Optimize functions/weights on training data

5 Alternative scores for Sentence position max(1/i, 1/(n-i+1)) n 1/i 1T 1 (i<T) 0 (otherwise) Sentence position Score

6 Alternative scores for Sentence length & TF/IDF Sentence length 1. Score = Length 2. Score = Length (if L>C) Length – C (other wise) TF/IDF TF = tf(w), (tf(w)-1)/tf(w), tf(w)/(tf(w)+1)

7 Alternative scores for Headline TF/IDF ratio between words overlapping words in headline and all words in sentence TF ratio between overlapping Named Entities (NE), and all NE’s in sentence TF = tf(e)/(1+tf(e))

8 Pattern Assumption Patterns (phrases) that appear often in the domain are important Strategy –Intended to use IR to find a larger set of documents in the domain, but used the given document set –NE’s were treated as class rather than the literal

9 Pattern discovery Procedure –Analyze sentences (NE, dependency) –Extract all sub-trees from the dependency trees in the domain –Score the trees based on frequency of the tree and TF/IDF of the words –High score trees are regarded as important patterns

10 Optimal weight Optimal weights are found on training set Contribution Scoreweight * std. dev. Position277 Length8 TF/IDF96 Headline18 Pattern2

11 Evaluation Result Subjective evaluation (V; out of 12) Average over all documents SystemLeadAverage Grammaticality3.711 (5)3.2363.580 Cohesion3.054 (1)2.9262.676 Organization3.215 (1)3.0812.870 Total9.980 (1)9.2439.126

12 Prospect for Single Document Summaries Important Sentence Extraction CAN be Summarization but Summarization is NOT Important Sentence Extraction

13 DUC We are aiming for Document understanding How can understanding be instantiated? –Make summary –Extract essential point, principle relations –Answer questions –Comprehension test

14 Example Earthquake jolts Los Angeles area LOS ANGELES (AP) — An earthquake shook the greater Los Angeles area Sunday, but there were no immediate reports of damage or injuries. The quake had a preliminary magnitude of 4.2 and was centered about one mile southeast of West Hollywood, said Lucy Jones of the U.S. Geological Survey. The quake was felt in downtown Los Angeles where it rolled for about four seconds and also shook in the suburban areas of Van Nuys, Whittier and Glendale.

15 Essential points Event (Earthquake) –When: Sunday, September 9, 2001 –Where: greater Los Angeles area –Magnitude: 4.2 –Injury: No –Death: No –Damage: No

16 How can we make it IE is a hint (a step) IE is a version of document understanding limited to a specific domain and task which are given in advance Document understanding can be achieved by upgrading IE technologies by deleting “specific” and “given in advance”

17 Our approach Essential points can be found by searching frequently mentioned patterns in the same domain Strategy –Given a document, find its domain by IR –Find frequently mentioned patterns –Extract information matching those patterns

18 Single Document Summarization Has to be continued –To pursue researches on “Understanding” –To find something more than sentence extraction –To observe human in summary task –To have new comers (like us)


Download ppt "NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001."

Similar presentations


Ads by Google