We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byAmir Callaham
Modified over 2 years ago
Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We have developed multilingual text mining software for EMM (Europe Media Monitor; that automatically: Gathers 100K news articles per day in 50 languages from about 2.5K news sources (cf. Fig. 1), clusters all these articles into major news stories and tracks them over time, detects events (NEXUS) – event type, victims, perpetrators etc. Need for multilingual multi-document/update summarization Our summarization approach within the new task Information extraction for aspect capturing Entity Recognition and Disambiguation – used in LSA input representation and in capturing person/organization-related aspects Event extraction system (NEXUS) All the 20 people taken hostage by armed pirates were safe. Extracted slots: event type (kidnapping), victims (20 people), perpetrator (pirates) Automatically learnt Lexica (Ontopopulis): Sample from lexicon for countermeasures: operation, rescue operation, rescue, evacuation, treatment, assistance, relief, military operation, police operation, security operation, aid Category-focused results Overall results Conclusions © European Communities, 2011 Related publications Steinberger, Ralf, Bruno Pouliquen & Erik van der Goot (2009). An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR'09), Steinberger, Josef, Mijail Kabadjov, Bruno Pouliquen, Ralf Steinberger & Massimo Poesio. WB-JRC-UTs Participation in TAC 2009: Update Summarization and AESOP Tasks. In: Proceedings of the Text Analysis Conference (TAC09), Steinberger, Ralf & Bruno Pouliquen. Cross-lingual Named Entity Recognition. In: Named Entities - Recognition, Classification and Use, Benjamins Current Topics, Volume 19, Tanev, Hristo, Jakub Piskorski & Martin Atkinson. Real-time News Event Extraction for Global Crisis Monitoring. In: Proceedings of 13th International Conference on Applications of Natural Language to Information Systems (NLDB08), Tanev, Hristo & Bernardo Magnini. Weakly Supervised Approaches for Ontology Population. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL06), Guided task: a summary must include aspects defined for its category. Our idea: the summary should contain the frequently mentioned topics with the cluster, however, it should also be rich in the aspects. Summarizer based on co-occurrence (LSA) and aspect coverage. LSA: lexical features and named entities (as in TAC09). 1 st step: Creation of SVD input matrix A and aspect matrix P. 2 nd step: Singular Value Decomposition A=USV T. 3 rd step: Sentence selection based on values in F=SV T and P (fig. 2): iteratively: select the sentence with the largest overall score, subtract its information from F and P (select more diverse information, avoid redundancy). Figure 1. News clusters in EMMs NewsExplorer. and extracted events in NEXUS Run IDOverall ResponsivenessLinguistic qualityPyramid score 16 (the best run in Overall resp. ) 3.17 (1) 3.46 (2) 0.40 (4) 22 (the best run in Pyramid score) 3.13 (2) 3.11 (13) 0.43 (1) 25 (co-occurrence+aspects) 2.98 (10) 3.35 (4) 0.37 (18) 31 (co-occurrence only) 2.89 (19) 3.28 (6) 0.38 (13) 2 (baseline - MEAD) 2.50 (27) 2.72 (29) 0.30 (26) 1 (baseline - LEAD) 2.17 (32) 3.65 (1) 0.23 (32) Table 1. Results of initial summaries: score (rank, out of 43 runs). The approach can easily be applied to many languages (multilingual entity disambiguation and latent semantic analysis). Great results of IE-based run for the central topic of the event extraction system – criminal/terrorist attacks. NEXUS detects too many event aspects, including those of past events (background information). Co-occurrence alone works well. We thus need to work on distinguishing the main event from mentions of past events, through temporal analysis or by preferring the first event mention. Good results for aspects treated by lexical learning with Ontopopulis. Event aspect information helps if it is of high quality. Results focussed on IE-based aspects Contributors: Josef & Ralf Steinbergers, Hristo Tanev, Mijail Kabadjov European Commission Joint Research Centre Institute for the Protection and the Security of the Citizen Tel Fax Format: Run 25 – co-occurrence + aspect coverage Run 31 – only co-occurrence CategoryOverall ResponsivenessLinguistic qualityPyramid score 1. Disasters3.00 (23) (2)3.43 (3) (5)0.38 (23) (10) 2. Attacks3.71 (3) (22)3.29 (4) (16)0.56 (6) (18) 3. Health2.75 (6) (21)3.33 (6) (9)0.30 (9) (7) 4. Resources2.50 (25) (21)3.60 (3) (6)0.24 (29) (23) 5. Investigations 3.20 (6) (2)3.10 (10) (2)0.45 (14) (5) Table 2. Scores and ranks of our runs for each category (run 25 – run 31): positive – top 6, negative – average or worse. Figure 2. Sentence score computation from co-occurrence (LSA) and aspect info. AspectIE toolRun 25Run 31Best 1.1 WHATNEXUS0.60 (24)0.79 (3) WHO AFFECTEDNEXUS0.36 (25)0.41 (23) DAMAGESONTOPOPULIS0.13 (26)0.38 (10) COUNTERMEASURESONTOPOPULIS0.34 (7)0.19 (29) WHATNEXUS0.74 (21)0.79 (12) PERPETRATORSNEXUS0.48 (18)0.34 (24) WHO AFFECTEDNEXUS0.65 (2)0.54 (11) DAMAGESONTOPOPULIS0.50 (4)0 (30) COUNTERMEASURESONTOPOPULIS0.34 (18)0.20 (32) WHATNEXUS0.33 (17)0.36 (14) WHO AFFECTEDNEXUS0.29 (6)0.31 (4) COUNTERMEASURESONTOPOPULIS0.31 (1)0.24 (7) WHATONTOPOPULIS0.49 (19)0.46 (25) COUNTERMEASURESONTOPOPULIS0.36 (5)0.29 (12) WHONEXUS0.67 (17)0.65 (19) REASONSNEXUS0.46 (19)0.59 (6) CHARGESONTOPOPULIS0.33 (27)0.47 (11)0.72 Table 3. Pyramid scores and ranks of our runs for each aspect: positive score or influence of the IE tool, negative score or influence of the IE tool. TAC10 Guided Summarization Task To produce a 100-word summary for a set of 10 newswire articles for a given topic, where the topic falls into a predefined set of categories. The participants were given a list of important aspects for each category and the summaries should cover them. Example – aspects for category attacks: 2.1 WHAT, 2.2 WHEN, 2.3 WHERE, 2.4 PERPETRATORS, 2.5 WHY, 2.6 WHO AFFECTED, 2.7 DAMAGES, 2.8 COUNTERMEASURES.
The 5S numbers game. 1. This sheet represents our current work place. Our job during a 20 second shift, is to strike out the numbers 1 to 49 in correct.
I 1. can 2 see 3 A 4 to 5 come 6 my 7 the 8.
1 Ontologizing the Ontolog Content Protégé Workshop Denise A. D. Bedford, Ph.d. July 23, 2006.
The task-centric revolution. Weaving information into workflows Dagobert Soergel College of Information Studies LACASIST
GfK Emer Ad Hoc Research OHIM USER SATISFACTION SURVEY 2008 Febrero 2009 OHIM User Satisfaction Survey 2008 February 2009 ER- 0484/1/00 A50/
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Simulations The basics for simulations. Simulation is a way to model random events, such that simulated outcomes closely match real-world outcomes. By.
February 2008 GfK GroupAd Hoc Research OHIM USER SATISFACTION SURVEY 2007 ER- 0484/1/00 OHIM USER SATISFACTION SURVEY 2007 February 2008.
Intelligence Through Learning from Data Monash University Semester 1, March 2006.
1 Data Modeling : ER Model N. L. Sarda I.I.T. Bombay.
Survey design and implementation Tampere Design, Implementation & Analysis of Innovation Surveys With a note on presenting results Anthony Arundel.
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Staff Survey. Leeds College of Art Staff Survey Highlights December 2011.
BME Teacher Rubric Orientation Thank you to Craven County District Leadership for this powerpoint 8/26/
Introduction to Crystal Reports Allows you to produce the report you want from virtually any data source. Designed to help analyze and interpret.
Copyright ©2012 California Department of Education, Child Development Division with WestEd Center for Child and Family Studies, Desired Results T&TA Project.
Ontology-Based Event Modeling for Semantic Understanding of Chinese News Story Wang Wei, Zhao Dongyan Institute of Computer Science & Technology
1 PROJECT MANAGEMENT Using Microsoft Project 2000.
1 SAS #70 (as Amended by SAS #88) Service Organizations NSAA IT Conference September 28, 2006 Nashville, TN Presented by: Michael A. Billo, CISA, CGAP.
×1= 9 4 1×1= 1 5 8×1= 8 6 7×1= 7 7 8×3= 24.
1 Idahos Early Childhood Outcomes System (Idaho ECOS) Measuring Early Childhood Outcomes Idaho Infant Toddler Program Idaho Early Childhood Special Education.
Transitioning to the Common Core Trinity County Office of Education August 15, 2012 Mathematics.
Publishing biodiversity data through IPT2 Alan Yang, Kun-Chi Lai, Lee-Sea Chen Biodiversity Research Center, Academia Sinica.
August 27, 2002Data Mining and Text-based Information - Mark Wasson 1 Data Mining and Text-based Information Mark Wasson Senior Architect, Research Scientist.
Presents Contact Management System and Photo Gallery.
© John Beveridge CobiT Update NSAA IT Conference Richmond, VA John W. Beveridge September 27, 2007.
1 Assessing Evidence Reliability In Performance Audits NSAA April 14, 2008.
Dec 2005 UMT Portfolio Manager Builder User Training.
© 2016 SlidePlayer.com Inc. All rights reserved.