Exploiting Timelines to Enhance Multi-document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun Li DSO National Laboratories National University.

Slides:



Advertisements
Similar presentations
[ ] Preliminary Results of Full-Scale Monitoring of Hurricane Wind Speeds and Wind Loads on Residential Buildings Peter L. Datin Graduate Research Assistant.
Advertisements

Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Design of Experiments Lecture I
Quantification of Spatially Distributed Errors of Precipitation Rates and Types from the TRMM Precipitation Radar 2A25 (the latest successive V6 and V7)
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Overview of IS Controls, Auditing, and Security Fall 2005.
Baselines for Recognizing Textual Entailment Ling 541 Final Project Terrence Szymanski.
NYU ANLP-00 1 Automatic Discovery of Scenario-Level Patterns for Information Extraction Roman Yangarber Ralph Grishman Pasi Tapanainen Silja Huttunen.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
ADVISE: Advanced Digital Video Information Segmentation Engine
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Rui Yan, Yan Zhang Peking University
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
US Army Corps of Engineers BUILDING STRONG ® Prioritizing Investments within the USACE Levee Safety Program Process and Methodology Overview Jason Needham,
Learning Information Extraction Patterns Using WordNet Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield,
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Puerto Rico Airborne Gravity Data Modeling
COMPUTER-ASSISTED PLAGIARISM DETECTION PRESENTER: CSCI 6530 STUDENT.
S DTW: COMPUTING DTW DISTANCES USING LOCALLY RELEVANT CONSTRAINTS BASED ON SALIENT FEATURE ALIGNMENTS K. Selçuk Candan Arizona State University Maria Luisa.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark Greenwood Natural Language Processing Group University of Sheffield, UK.
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
Processing of large document collections Part 7 (Text summarization: multi- document summarization, knowledge- rich approaches, current topics) Helena.
Mining and Analysis of Control Structure Variant Clones Guo Qiao.
K. Selçuk Candan, Maria Luisa Sapino Xiaolan Wang, Rosaria Rossini
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Developing Plans and Procedures
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
A Semantic Approach to IE Pattern Induction Mark Stevenson and Mark A. Greenwood Natural Language Processing Group University of Sheffield, UK.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki ( Kobe University ) User's.
Exploiting Timelines to Enhance Multi-document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan and Zhoujun Li National University of Singapore Beihang.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Statistical NLP Spring 2010 Lecture 22: Summarization Dan Klein – UC Berkeley Includes slides from Aria Haghighi, Dan Gillick.
KDD 2011 Doctoral Session Modeling Trustworthiness of Online Content V. G. Vinod Vydiswaran Advisors: Prof.ChengXiang Zhai, Prof.Dan Roth University of.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
Feature Selection and Weighting using Genetic Algorithm for Off-line Character Recognition Systems Faten Hussein Presented by The University of British.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Confidence Interval Estimation For statistical inference in decision making: Chapter 9.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)
NOAA Data & Catastrophe Modeling Prepared by Steve Bowen of Impact Forecasting September 16, 2015.
Using TIGGE Data to Understand Systematic Errors of Atmospheric River Forecasts G. Wick, T. Hamill, P. Neiman, and F.M. Ralph NOAA Earth System Research.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Hierarchical Topic Detection UMass - TDT 2004 Ao Feng James Allan Center for Intelligent Information Retrieval University of Massachusetts Amherst.
Introduction to Machine Learning, its potential usage in network area,
Constructing a Predictor to Identify Drug and Adverse Event Pairs
Exploiting Timelines to Enhance Multi-document Summarization
iSRD Spam Review Detection with Imbalanced Data Distributions
Topic: Semantic Text Mining
Introduction Dataset search
Presentation transcript:

Exploiting Timelines to Enhance Multi-document Summarization Jun-Ping Ng, Yan Chen, Min-Yen Kan, Zhoujun Li DSO National Laboratories National University of Singapore Beihang University

Outline Overview Approach Experiments and Results Discussion 2

OVERVIEW 3

Multi-document Summarization 4

Extractive Summarization Find the most salient sentences in source collection Top-k sentences are extracted to compose final summary 5

Two Storms (1)A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. (2)More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. (3)The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP 6

Two Storms (1)A fierce cyclone packing extreme winds and torrential rain smashed into Bangladesh’s southwestern coast Thursday, wiping out homes and trees in what officials described as the worst storm in years. (2)More than 100,000 coastal villagers have been evacuated before the cyclone made landfall. (3)The storm matched one in 1991 that sparked a tidal wave that killed an estimated 138,000 people, Karmakar told AFP 7

Timeline 8

APPROACH 9

Merging Timelines Into Summarization 10

Temporal Processing Based on TimeML (Pustejovsky et al 2003) Basic temporal units – events + timexes Three steps – Event-timex temporal relation classification – Event-event temporal relation classification – Timex normalization Merge to obtain timelines 11

Timelines 12

Summarization --- SWING 13

Sentence Scoring Time span importance Contextual time span importance Sentence temporal coverage density 14

Defining Timeline Features 15

Time Span Importance (TSI) Time spans which contain many events are more salient Sentences which references events in these time spans are thus better candidates for a summary 16

Scoring TSI 17

Contextual Time Span Importance (CTSI) Time spans near to “important” time spans may also be important 18

Scoring CTSI 19

Sentence Temporal Coverage Density (TCD) Number of sentences in a summary is limited Favour sentences which – contain more events – covering a wide variety of time spans 20

Scoring TCD 21

Sentence Re-ordering SWING makes use of the Maximal Marginal Relevance (MMR) algorithm to identify redundancies in selected sentences MMR is heavily biased towards lexicons and surface similarities 22

Beyond Lexical Penalties 23 An official in Barisal, 120 kilometres south of Dhaka, spoke of severe destruction as the 500 kilometre-wide mass of cloud passed overhead. “Many trees have been uprooted and houses and schools blown away,” Mostofa Kamal, a district relief and rehabilitation officer, told AFP by telephone. “Mud huts have been damaged and the roofs of several houses blown off,” said the state’s relief minister, Mortaza Hossain.

TimeMMR Novel dimension to redundancy detection Beyond lexical similarities, identify sentences which contain substantial time span overlaps Candidate sentences which share many time spans with selected sentences are penalised 24

EXPERIMENTS AND RESULTS

Results TAC-2010 data set to train regression model TAC-2011 data set to test Using timelines lead to better summaries! SystemROUGE-2 SWING Timelines0.1394* + TimeMMR

Overcoming Errors Timelines contain errors – Errors from underlying temporal processing systems – Simplifying assumptions made in timeline construction – Lack of consistency checking and validation 27

Reliability Filtering Identify timelines which potentially contain more errors Exclude these when performing summarization 28

Length as a Metric Use the length of a timeline as a gauge of its “accuracy” Drop the use of timelines which are less than the average length, computed over the whole input document collection 29

Results Experiments repeated with reliability filtering Significant improvement obtained After filtering timelines are used in 21 out of 44 document sets SystemROUGE-2 SWING Timelines0.1394* + Timelines + Filtering ** + TimeMMR TimeMMR + Filtering ** 30

DISCUSSION

Text Example 32 The Army’s surgeon general criticized stories in The Washington Post disclosing problems at Walter Reed Army Medical Center, saying the series unfairly characterized the living conditions and care for soldiers recuperating from wounds at the hospital’s facilities. Defense Secretary Robert Gates says people found to have been responsible for allowing substandard living conditions for soldier outpatients at Walter Reed Army Medical Center in Washington will be “held account- able,” although so far no one in the Army chain of com- mand has offered to resign. A top Army general vowed to personally over- see the upgrading of Walter Reed Army Medical Cen- ter’s Building 18, a dilapidated former hotel that houses wounded soldiers as outpatients. Top Army officials visited Building 18, the decrepit former hotel housing more than 80 recovering soldiers, outside “I’m not sure it was an accurate representation,” Lt. Gen. Kevin Kiley, chief of the Army Medical Com- mand which oversees Walter Reed and all Army health care, told reporters during a news conference. Timelines UsedSWING

Future Work Study the use of alternative evaluation metrics, especially for TimeMMR Look at better metrics for reliability filtering Expand the scope of the timelines that are used for more flexibility 33

Conclusion The use of time is useful for summarization! Sentence Scoring – Derive features from a timeline – Combine features with a supervised learning summarization framework Sentence Re-ordering – Use overlapping time spans to identify redundancies

Thank you! 35