Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We.

Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We have developed multilingual text mining software for EMM (Europe Media Monitor; http://emm.jrc.it/overview.html), that automatically: Gathers 100K news articles per day in 50 languages from about 2.5K news sources (cf. Fig. 1), clusters all these articles into major news stories and tracks them over time, detects events (NEXUS) – event type, victims, perpetrators etc. Need for multilingual multi-document/update summarization Our summarization approach within the new task Information extraction for aspect capturing Entity Recognition and Disambiguation – used in LSA input representation and in capturing person/organization-related aspects Event extraction system (NEXUS) All the 20 people taken hostage by armed pirates were safe. Extracted slots: event type (kidnapping), victims (20 people), perpetrator (pirates) Automatically learnt Lexica (Ontopopulis): Sample from lexicon for countermeasures: operation, rescue operation, rescue, evacuation, treatment, assistance, relief, military operation, police operation, security operation, aid Category-focused results Overall results Conclusions © European Communities, 2011 Related publications Steinberger, Ralf, Bruno Pouliquen & Erik van der Goot (2009). An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World - Proceedings of the SIGIR 2009 Workshop (SIGIR-CLIR'09), 2009. Steinberger, Josef, Mijail Kabadjov, Bruno Pouliquen, Ralf Steinberger & Massimo Poesio. WB-JRC-UTs Participation in TAC 2009: Update Summarization and AESOP Tasks. In: Proceedings of the Text Analysis Conference (TAC09), 2010. Steinberger, Ralf & Bruno Pouliquen. Cross-lingual Named Entity Recognition. In: Named Entities - Recognition, Classification and Use, Benjamins Current Topics, Volume 19, 2009. Tanev, Hristo, Jakub Piskorski & Martin Atkinson. Real-time News Event Extraction for Global Crisis Monitoring. In: Proceedings of 13th International Conference on Applications of Natural Language to Information Systems (NLDB08), 2008. Tanev, Hristo & Bernardo Magnini. Weakly Supervised Approaches for Ontology Population. In: Proceedings of the 11th conference of the European Chapter of the Association for Computational Linguistics (EACL06), 2006. Guided task: a summary must include aspects defined for its category. Our idea: the summary should contain the frequently mentioned topics with the cluster, however, it should also be rich in the aspects. Summarizer based on co-occurrence (LSA) and aspect coverage. LSA: lexical features and named entities (as in TAC09). 1 st step: Creation of SVD input matrix A and aspect matrix P. 2 nd step: Singular Value Decomposition A=USV T. 3 rd step: Sentence selection based on values in F=SV T and P (fig. 2): iteratively: select the sentence with the largest overall score, subtract its information from F and P (select more diverse information, avoid redundancy). Figure 1. News clusters in EMMs NewsExplorer. and extracted events in NEXUS Run IDOverall ResponsivenessLinguistic qualityPyramid score 16 (the best run in Overall resp. ) 3.17 (1) 3.46 (2) 0.40 (4) 22 (the best run in Pyramid score) 3.13 (2) 3.11 (13) 0.43 (1) 25 (co-occurrence+aspects) 2.98 (10) 3.35 (4) 0.37 (18) 31 (co-occurrence only) 2.89 (19) 3.28 (6) 0.38 (13) 2 (baseline - MEAD) 2.50 (27) 2.72 (29) 0.30 (26) 1 (baseline - LEAD) 2.17 (32) 3.65 (1) 0.23 (32) Table 1. Results of initial summaries: score (rank, out of 43 runs). The approach can easily be applied to many languages (multilingual entity disambiguation and latent semantic analysis). Great results of IE-based run for the central topic of the event extraction system – criminal/terrorist attacks. NEXUS detects too many event aspects, including those of past events (background information). Co-occurrence alone works well. We thus need to work on distinguishing the main event from mentions of past events, through temporal analysis or by preferring the first event mention. Good results for aspects treated by lexical learning with Ontopopulis. Event aspect information helps if it is of high quality. Results focussed on IE-based aspects Contributors: Josef & Ralf Steinbergers, Hristo Tanev, Mijail Kabadjov European Commission Joint Research Centre Institute for the Protection and the Security of the Citizen Tel. +39 0332 785648 Fax +39 0332 785154 E-mail Format: Firstname.Lastname@jrc.ec.europa.eu Run 25 – co-occurrence + aspect coverage Run 31 – only co-occurrence CategoryOverall ResponsivenessLinguistic qualityPyramid score 1. Disasters3.00 (23) - 3.57 (2)3.43 (3) - 3.29 (5)0.38 (23) - 0.43 (10) 2. Attacks3.71 (3) - 2.86 (22)3.29 (4) - 3.00 (16)0.56 (6) - 0.49 (18) 3. Health2.75 (6) - 2.42 (21)3.33 (6) - 3.25 (9)0.30 (9) - 0.31 (7) 4. Resources2.50 (25) - 2.60 (21)3.60 (3) - 3.40 (6)0.24 (29) - 0.27 (23) 5. Investigations 3.20 (6) - 3.30 (2)3.10 (10) - 3.40 (2)0.45 (14) - 0.47 (5) Table 2. Scores and ranks of our runs for each category (run 25 – run 31): positive – top 6, negative – average or worse. Figure 2. Sentence score computation from co-occurrence (LSA) and aspect info. AspectIE toolRun 25Run 31Best 1.1 WHATNEXUS0.60 (24)0.79 (3)0.89 1.5 WHO AFFECTEDNEXUS0.36 (25)0.41 (23)0.68 1.6 DAMAGESONTOPOPULIS0.13 (26)0.38 (10)1.25 1.7 COUNTERMEASURESONTOPOPULIS0.34 (7)0.19 (29)0.39 2.1 WHATNEXUS0.74 (21)0.79 (12)0.88 2.4 PERPETRATORSNEXUS0.48 (18)0.34 (24)0.69 2.6 WHO AFFECTEDNEXUS0.65 (2)0.54 (11)0.66 2.7 DAMAGESONTOPOPULIS0.50 (4)0 (30)0.75 2.8 COUNTERMEASURESONTOPOPULIS0.34 (18)0.20 (32)0.65 3.1 WHATNEXUS0.33 (17)0.36 (14)0.58 3.2 WHO AFFECTEDNEXUS0.29 (6)0.31 (4)0.39 3.5 COUNTERMEASURESONTOPOPULIS0.31 (1)0.24 (7)0.31 4.1 WHATONTOPOPULIS0.49 (19)0.46 (25)0.81 4.4 COUNTERMEASURESONTOPOPULIS0.36 (5)0.29 (12)0.50 5.1 WHONEXUS0.67 (17)0.65 (19)0.96 5.3 REASONSNEXUS0.46 (19)0.59 (6)0.67 5.4 CHARGESONTOPOPULIS0.33 (27)0.47 (11)0.72 Table 3. Pyramid scores and ranks of our runs for each aspect: positive score or influence of the IE tool, negative score or influence of the IE tool. TAC10 Guided Summarization Task To produce a 100-word summary for a set of 10 newswire articles for a given topic, where the topic falls into a predefined set of categories. The participants were given a list of important aspects for each category and the summaries should cover them. Example – aspects for category attacks: 2.1 WHAT, 2.2 WHEN, 2.3 WHERE, 2.4 PERPETRATORS, 2.5 WHY, 2.6 WHO AFFECTED, 2.7 DAMAGES, 2.8 COUNTERMEASURES.

Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We.

Similar presentations

Presentation on theme: "Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We.

Similar presentations

Presentation on theme: "Aspect-driven summarization Unit for Global Security and Crisis Management Unit for Global Security and Crisis Management Background and motivation We."— Presentation transcript:

Similar presentations

About project

Feedback