We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byAmir Callaham
Modified over 2 years ago
Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We have developed multilingual text mining software for EMM (Europe Media Monitor; that automatically: Gathers 100K news articles per day in 50 languages from about 2.5K news sources (cf. Fig. 1), clusters all these articles into major news stories and tracks them over time, detects events (NEXUS) – event type, victims, perpetrators etc. Need for multilingual multi-document/update summarization Our summarization approach within the new task Information extraction for aspect capturing Entity Recognition and Disambiguation – used in LSA input representation and in capturing person/organization-related aspects Event extraction system (NEXUS) All the 20 people taken hostage by armed pirates were safe. Extracted slots: event type (kidnapping), victims (20 people), perpetrator (pirates) Automatically learnt Lexica (Ontopopulis): Sample from lexicon for countermeasures: operation, rescue operation, rescue, evacuation, treatment, assistance, relief, military operation, police operation, security operation, aid Category-focused results Overall results Conclusions © European Communities, 2011 Related publications Steinberger, Ralf, Bruno Pouliquen & Erik van der Goot (2009). An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR'09), Steinberger, Josef, Mijail Kabadjov, Bruno Pouliquen, Ralf Steinberger & Massimo Poesio. WB-JRC-UTs Participation in TAC 2009: Update Summarization and AESOP Tasks. In: Proceedings of the Text Analysis Conference (TAC09), Steinberger, Ralf & Bruno Pouliquen. Cross-lingual Named Entity Recognition. In: Named Entities - Recognition, Classification and Use, Benjamins Current Topics, Volume 19, Tanev, Hristo, Jakub Piskorski & Martin Atkinson. Real-time News Event Extraction for Global Crisis Monitoring. In: Proceedings of 13th International Conference on Applications of Natural Language to Information Systems (NLDB08), Tanev, Hristo & Bernardo Magnini. Weakly Supervised Approaches for Ontology Population. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL06), Guided task: a summary must include aspects defined for its category. Our idea: the summary should contain the frequently mentioned topics with the cluster, however, it should also be rich in the aspects. Summarizer based on co-occurrence (LSA) and aspect coverage. LSA: lexical features and named entities (as in TAC09). 1 st step: Creation of SVD input matrix A and aspect matrix P. 2 nd step: Singular Value Decomposition A=USV T. 3 rd step: Sentence selection based on values in F=SV T and P (fig. 2): iteratively: select the sentence with the largest overall score, subtract its information from F and P (select more diverse information, avoid redundancy). Figure 1. News clusters in EMMs NewsExplorer. and extracted events in NEXUS Run IDOverall ResponsivenessLinguistic qualityPyramid score 16 (the best run in Overall resp. ) 3.17 (1) 3.46 (2) 0.40 (4) 22 (the best run in Pyramid score) 3.13 (2) 3.11 (13) 0.43 (1) 25 (co-occurrence+aspects) 2.98 (10) 3.35 (4) 0.37 (18) 31 (co-occurrence only) 2.89 (19) 3.28 (6) 0.38 (13) 2 (baseline - MEAD) 2.50 (27) 2.72 (29) 0.30 (26) 1 (baseline - LEAD) 2.17 (32) 3.65 (1) 0.23 (32) Table 1. Results of initial summaries: score (rank, out of 43 runs). The approach can easily be applied to many languages (multilingual entity disambiguation and latent semantic analysis). Great results of IE-based run for the central topic of the event extraction system – criminal/terrorist attacks. NEXUS detects too many event aspects, including those of past events (background information). Co-occurrence alone works well. We thus need to work on distinguishing the main event from mentions of past events, through temporal analysis or by preferring the first event mention. Good results for aspects treated by lexical learning with Ontopopulis. Event aspect information helps if it is of high quality. Results focussed on IE-based aspects Contributors: Josef & Ralf Steinbergers, Hristo Tanev, Mijail Kabadjov European Commission Joint Research Centre Institute for the Protection and the Security of the Citizen Tel Fax Format: Run 25 – co-occurrence + aspect coverage Run 31 – only co-occurrence CategoryOverall ResponsivenessLinguistic qualityPyramid score 1. Disasters3.00 (23) (2)3.43 (3) (5)0.38 (23) (10) 2. Attacks3.71 (3) (22)3.29 (4) (16)0.56 (6) (18) 3. Health2.75 (6) (21)3.33 (6) (9)0.30 (9) (7) 4. Resources2.50 (25) (21)3.60 (3) (6)0.24 (29) (23) 5. Investigations 3.20 (6) (2)3.10 (10) (2)0.45 (14) (5) Table 2. Scores and ranks of our runs for each category (run 25 – run 31): positive – top 6, negative – average or worse. Figure 2. Sentence score computation from co-occurrence (LSA) and aspect info. AspectIE toolRun 25Run 31Best 1.1 WHATNEXUS0.60 (24)0.79 (3) WHO AFFECTEDNEXUS0.36 (25)0.41 (23) DAMAGESONTOPOPULIS0.13 (26)0.38 (10) COUNTERMEASURESONTOPOPULIS0.34 (7)0.19 (29) WHATNEXUS0.74 (21)0.79 (12) PERPETRATORSNEXUS0.48 (18)0.34 (24) WHO AFFECTEDNEXUS0.65 (2)0.54 (11) DAMAGESONTOPOPULIS0.50 (4)0 (30) COUNTERMEASURESONTOPOPULIS0.34 (18)0.20 (32) WHATNEXUS0.33 (17)0.36 (14) WHO AFFECTEDNEXUS0.29 (6)0.31 (4) COUNTERMEASURESONTOPOPULIS0.31 (1)0.24 (7) WHATONTOPOPULIS0.49 (19)0.46 (25) COUNTERMEASURESONTOPOPULIS0.36 (5)0.29 (12) WHONEXUS0.67 (17)0.65 (19) REASONSNEXUS0.46 (19)0.59 (6) CHARGESONTOPOPULIS0.33 (27)0.47 (11)0.72 Table 3. Pyramid scores and ranks of our runs for each aspect: positive score or influence of the IE tool, negative score or influence of the IE tool. TAC10 Guided Summarization Task To produce a 100-word summary for a set of 10 newswire articles for a given topic, where the topic falls into a predefined set of categories. The participants were given a list of important aspects for each category and the summaries should cover them. Example – aspects for category attacks: 2.1 WHAT, 2.2 WHEN, 2.3 WHERE, 2.4 PERPETRATORS, 2.5 WHY, 2.6 WHO AFFECTED, 2.7 DAMAGES, 2.8 COUNTERMEASURES.
×1= 9 4 1×1= 1 5 8×1= 8 6 7×1= 7 7 8×3= 24.
CALENDAR NEW CALENDAR
1 Before Between After 2 What comes before. _____ 10 _____
1 Budapest University of Technology and Economics, BME, 1872 Budapest University of Technology and Economics, BME, 1872 Happy New Year 2012.
Time and Labor Processing Day 1. Exercise #1a. Enter Time – Positive.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Copyright © Action Works 2008 All Rights Reserved - Photos by David D. Kempster 1.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Chapter 12 Static Equilibrium; Elasticity and Fracture Physics for Scientists & Engineers, 3 rd Edition Douglas C. Giancoli © Prentice Hall.
©2004 by Pearson Education11-1 R.C. Hibbeler Resistência dos Materiais, 5ª ed. 11 – Projetos de Vigas e Eixos.
Subtraction: Adding UP. Category 1 The whole is a multiple of ten.
I 1. can 2 see 3 A 4 to 5 come 6 my 7 the 8.
Break Time Remaining 10:00. Break Time Remaining 9:59.
BMU - E I 1 Development of renewable energy sources in Germany in
13:00 Clock will move after 1 minute PPT – VCIC Timer 15.ppt.
The 5S numbers game. 1. This sheet represents our current work place. Our job during a 20 second shift, is to strike out the numbers 1 to 49 in correct.
Introduction 1 The Cost of Occupational Fraud 2.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory % Grid or Zone Samples Tested Compared To Conventional Whole Field Composite.
Simulations The basics for simulations. Simulation is a way to model random events, such that simulated outcomes closely match real-world outcomes. By.
PP Test Review Sections 6-1 to 6-6 Mrs. Rivas 1. 2.
BMU – KI III 1 Development of renewable energy sources in Germany in
DLMSO Classroom Timer Select a time to count down from the clock above 60 min 45 min 30 min 20 min 15 min 10 min 5 min or less.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Unit I Topic 2-7 MAC Protocols for Ad Hoc Wireless Networks Department of Computer Science and Engineering Kalasalingam University 1 CSE 6007 Mobile Ad.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
THE ANKLE CHAPTER 18 The Ankle and Lower Leg. Introduction Have you ever sprained your ankle, or do you know anyone who has? What did you do for it? How.
Algebra Addition and Subtraction Equations Free powerpoints at
Time for a BREAK! You have 45 Minutes. Time Left 44.
Chapter 13 Fluids Physics for Scientists & Engineers, 3 rd Edition Douglas C. Giancoli © Prentice Hall.
1 STANDARD 1.3 Converting a Fraction to % Problem 1 Problem 4 Problems 3 Problem 2 Problem 5 END SHOW PRESENTATION CREATED BY SIMON PEREZ. All rights reserved.
Create an Application Title 1Y - Youth Chapter 5.
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
AP STUDY SESSION 2. Answers 1.A 2.E 3.A 4.D 5.B 6.E 7.B 8.E 9.A 10.D 11.C 12.B 13.D 14.B 15.E 16.A 17.E 18.C 19.C 20.D 21.B 22.C 23.A 24.D 25. B 26. E.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
C Copyright © 2005, Oracle. All rights reserved. Practice Solutions.
Chapter 12 Membrane Transport Essential Cell Biology Third Edition Copyright © Garland Science 2010.
Add Governors Discretionary (1G) Grants Chapter 6.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
P.A.C.E.R. Progressive Aerobic Cardiovascular Endurance Run NJ Core Content Standards: 2.6 Fitness and Physical Activity.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
1 Dr. Scott Schaefer Least Squares Curves, Rational Representations, Splines and Continuity.
1 When you see… Find the zeros You think…. 2 To find the zeros...
Chapter 10 Regression with Panel Data. 2 Regression with Panel Data (SW Chapter 10)
David Burdett May 11, 2004 Package Binding for WS CDL.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent Systems.
© 2017 SlidePlayer.com Inc. All rights reserved.