NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001.

Slides:



Advertisements
Similar presentations
Yansong Feng and Mirella Lapata
Advertisements

1 Opinion Summarization Using Entity Features and Probabilistic Sentence Coherence Optimization (UIUC at TAC 2008 Opinion Summarization Pilot) Nov 19,
Financial Losses from Quakes Are also quite disruptive in the modern world.
From Words to Knowledge ORION Active Structure. ORION Active Structure Two Approaches We could separate the process of turning words into knowledge into.
Copyright © 2003 by The McGraw-Hill Companies, Inc. All rights reserved. Business and Administrative Communication SIXTH EDITION.
Text Specificity and Impact on Quality of News Summaries Annie Louis & Ani Nenkova University of Pennsylvania June 24, 2011.
Information Retrieval Ling573 NLP Systems and Applications April 26, 2011.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Event Extraction: Learning from Corpora Prepared by Ralph Grishman Based on research and slides by Roman Yangarber NYU.
July 9, 2003ACL An Improved Pattern Model for Automatic IE Pattern Acquisition Kiyoshi Sudo Satoshi Sekine Ralph Grishman New York University.
Information Access I Measurement and Evaluation GSLT, Göteborg, October 2003 Barbara Gawronska, Högskolan i Skövde.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
Compare&Contrast: Using the Web to Discover Comparable Cases for News Stories Presenter: Aravind Krishna Kalavagattu.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
P Beini Ouyang Phrase Matching: Assessing Document Similarity for NASA Scientists and Engineers Beini Ouyang Department of Computer Science.
Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction Kiyoshi Sudo Ph.D. Research Proposal New York University Committee:
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Automatically Constructing a Dictionary for Information Extraction Tasks Ellen Riloff Proceedings of the 11 th National Conference on Artificial Intelligence,
Chapter 5: Information Retrieval and Web Search
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
A Compositional Context Sensitive Multi-document Summarizer: Exploring the Factors That Influence Summarization Ani Nenkova, Stanford University Lucy Vanderwende,
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Lycos Retriever: An Information Fusion Engine Brian Ulicny.
1 Text Summarization: News and Beyond Kathleen McKeown Department of Computer Science Columbia University.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Chapter 6: Information Retrieval and Web Search
Query and Analysis on the document and customer/item bag card of the DataDex Kellie Erickson.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Automatic Question Answering  Introduction  Factoid Based Question Answering.
Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda Ranked retrieval  Similarity-based ranking Probability-based ranking.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
An evolutionary approach for improving the quality of automatic summaries Constantin Orasan Research Group in Computational Linguistics School of Humanities,
A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
A Survey on Automatic Text Summarization Dipanjan Das André F. T. Martins Tolga Çekiç
MHS AP U. S. History1 Lesson 1 Understanding the Essay Prompt.
Research Progress Kieu Que Anh School of Knowledge, JAIST.
IR 6 Scoring, term weighting and the vector space model.
Entity- & Topic-Based Information Ordering
Summarizing Entities: A Survey Report
Semantic Knowledge Discovery, Organization and Use
NYU/CRL system for DUC and Prospect for Single Document Summaries
Content Augmentation for Mixed-Mode News Broadcasts Mike Dowman
CS224N: Query Focused Multi-Document Summarization
Synthesis.
Presentation transcript:

NYU/CRL system for DUC and Prospect for Single Document Summaries Satoshi Sekine (New York University) Chikashi Nobata (CRL – Japan) September 14, 2001 DUC2001 Workshop

Objective Use IE technologies for Summarization –Named Entity –Automatic pattern discovery Find important phrases (patterns) of the domain Combine with Summarization technologies –Important Sentence Extraction Sentence position, length, TF/IDF, Headline

Important Sentence Extraction Combining 5 scores –Sentence position –Sentence length –TF/IDF –Similarity to Headline –Pattern Optimize functions/weights on training data

Alternative scores for Sentence position max(1/i, 1/(n-i+1)) n 1/i 1T 1 (i<T) 0 (otherwise) Sentence position Score

Alternative scores for Sentence length & TF/IDF Sentence length 1. Score = Length 2. Score = Length (if L>C) Length – C (other wise) TF/IDF TF = tf(w), (tf(w)-1)/tf(w), tf(w)/(tf(w)+1)

Alternative scores for Headline TF/IDF ratio between words overlapping words in headline and all words in sentence TF ratio between overlapping Named Entities (NE), and all NE’s in sentence TF = tf(e)/(1+tf(e))

Pattern Assumption Patterns (phrases) that appear often in the domain are important Strategy –Intended to use IR to find a larger set of documents in the domain, but used the given document set –NE’s were treated as class rather than the literal

Pattern discovery Procedure –Analyze sentences (NE, dependency) –Extract all sub-trees from the dependency trees in the domain –Score the trees based on frequency of the tree and TF/IDF of the words –High score trees are regarded as important patterns

Optimal weight Optimal weights are found on training set Contribution Scoreweight * std. dev. Position277 Length8 TF/IDF96 Headline18 Pattern2

Evaluation Result Subjective evaluation (V; out of 12) Average over all documents SystemLeadAverage Grammaticality3.711 (5) Cohesion3.054 (1) Organization3.215 (1) Total9.980 (1)

Prospect for Single Document Summaries Important Sentence Extraction CAN be Summarization but Summarization is NOT Important Sentence Extraction

DUC We are aiming for Document understanding How can understanding be instantiated? –Make summary –Extract essential point, principle relations –Answer questions –Comprehension test

Example Earthquake jolts Los Angeles area LOS ANGELES (AP) — An earthquake shook the greater Los Angeles area Sunday, but there were no immediate reports of damage or injuries. The quake had a preliminary magnitude of 4.2 and was centered about one mile southeast of West Hollywood, said Lucy Jones of the U.S. Geological Survey. The quake was felt in downtown Los Angeles where it rolled for about four seconds and also shook in the suburban areas of Van Nuys, Whittier and Glendale.

Essential points Event (Earthquake) –When: Sunday, September 9, 2001 –Where: greater Los Angeles area –Magnitude: 4.2 –Injury: No –Death: No –Damage: No

How can we make it IE is a hint (a step) IE is a version of document understanding limited to a specific domain and task which are given in advance Document understanding can be achieved by upgrading IE technologies by deleting “specific” and “given in advance”

Our approach Essential points can be found by searching frequently mentioned patterns in the same domain Strategy –Given a document, find its domain by IR –Find frequently mentioned patterns –Extract information matching those patterns

Single Document Summarization Has to be continued –To pursue researches on “Understanding” –To find something more than sentence extraction –To observe human in summary task –To have new comers (like us)