ISHER: Integrated Social History Environment for Research Sophia Ananiadou National Centre for Text Mining School of Computer Science.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Supporting the Research Process The NaCTeM Text Mining Service William Black Informatics, Manchester.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
Enrichment and Structuring of Archival Description Metadata Kalliopi Zervanou*, Ioannis Korkontzelos**, Antal van den Bosch* & Sophia Ananiadou** * Tilburg.
Modality Lecture 10. Language is not merely used for conveying factual information A speaker may wish to indicate a degree of certainty to try to influence.
13 th September 2007 UK e-Science All Hands Meeting Text Mining Services to Support e-Research Brian Rea and Sophia Ananiadou National Centre for Text.
Reinforcing Writing Across The Curriculum A guide for teaching staff.
Converging parallel universes Library services as building blocks of digital humanities research 42nd LIBER Annual Conference Munich June 2013 Gregor Horstkemper.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Environmental Terminology System and Services (ETSS) June 2007.
Designing Software for Personal Music Management and Access Frank Shipman & Konstantinos Meintanis Department of Computer Science Texas A&M University.
Reference Collections: Task Characteristics. TREC Collection Text REtrieval Conference (TREC) –sponsored by NIST and DARPA (1992-?) Comparing approaches.
4/14/20051 ACE Annotation Ralph Grishman New York University.
1 Attributions and Private States Jan Wiebe (U. Pittsburgh) Theresa Wilson (U. Pittsburgh) Claire Cardie (Cornell U.)
National Institute of Standards and Technology Computer Security Division Information Technology Laboratory Threat Information Sharing; Perspectives, Strategies,
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Secondary Sources What historians write…. Definitions A secondary source is a work that interprets or analyzes an historical event or phenomenon. Secondary.
Arabic Natural Language Processing: P-Stemmer, Browsing Taxonomy, Text Classification, RenA, ALDA, and Template Summaries — for Arabic News Articles Tarek.
Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
Ontology Development in the Sciences Some Fundamental Considerations Ontolytics LLC Topics:  Possible uses of ontologies  Ontologies vs. terminologies.
Rules Always answer in the form of a question 50 points deducted for wrong answer.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
MIS and You Chapter 1.
EU Project proposal. Andrei S. Lopatenko 1 EU Project Proposal CERIF-SW Andrei S. Lopatenko Vienna University of Technology
Attribution: speech and thought representation Bringing other voices into a text.
*Erasmus University Rotterdam P.O. Box 1738, NL-3000 DR Rotterdam, the Netherlands † Teezir BV Wilhelminapark 46, NL-3581 NL, Utrecht, the Netherlands.
NLP ? Natural Language is one of fundamental aspects of human behaviors. One of the final aim of human-computer communication. Provide easy interaction.
The Natural Language Processing Research Group u Professor Yorick Wilks u Dr. Rob Gaizauskas u Dr. Louise Guthrie u Dr. Mark Hepple.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Minimally Supervised Event Causality Identification Quang Do, Yee Seng, and Dan Roth University of Illinois at Urbana-Champaign 1 EMNLP-2011.
AN INTRODUCTION Managing Change in Healthcare IT Implementations Sherrilynne Fuller, Center for Public Health Informatics School of Public Health, University.
TimeML compliant text analysis for Temporal Reasoning Branimir Boguraev and Rie Kubota Ando.
CNI, 3rd April 2006 Slide 1 UK National Centre for Text Mining: Activities and Plans Dr. Robert Sanderson Dept. of Computer Science University of Liverpool.
ARD Prasad Indian Statistical Institute, Bangalore.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
MedKAT Medical Knowledge Analysis Tool December 2009.
Results of the 2000 Topic Detection and Tracking Evaluation in Mandarin and English Jonathan Fiscus and George Doddington.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Have we had Hard Times or Cosy Times? A Discourse Analysis of Opinions Expressed over Socio-political Events in News Editorials Bal Krishna Bal Information.
Reference Collections: Collection Characteristics.
DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level Ted Sanders.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
THE AIM: To express an opinion of a book, film, album, programme or play. To adopt an informal style of writing. INCLUDE: Heading (name of the film, book.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Requirements Analysis
March, 2007RCO LLC, RCO Text Analysis Technologies for information extraction and business intelligence We can tell you everything about.
ST. HILDA’S PRIMARY SCHOOL MEET THE PARENTS’ SESSION.
Can you write us a sentence? Use the clues given!.
Created by: M. Christoff, Enrichment Specialist Author’s Voice and Point of View What do these terms mean?
INTRODUCTION TO THE WIDA FRAMEWORK Presenter Affiliation Date.
Abstracts. What is an abstract?  a self-contained, short, and powerful statement that describes a larger work  Components vary according to discipline.
1 Software Requirements Descriptions and specifications of a system.
Report Writing Lecturer: Mrs Shadha Abbas جامعة كربلاء كلية العلوم الطبية التطبيقية قسم الصحة البيئية University of Kerbala College of Applied Medical.
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Automatically Labeled Data Generation for Large Scale Event Extraction
Intervention Strategies
Digital Video Library - Jacky Ma.
Developing writing skills – News Paper Article
Fundamentals/ICY: Databases 2010/11 WEEK 1
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
Water is my life: Rachel’s Story
Web archives as a research subject
Presentation transcript:

ISHER: Integrated Social History Environment for Research Sophia Ananiadou National Centre for Text Mining School of Computer Science

Aims

Natural Language Processing/Text Mining – UK, NaCTeM, University of Manchester (Sophia Ananiadou) – US, Cognitive Computation Group, UIUC (Dan Roth) – NL, Radboud University, Nijmegen (Antal van den Bosch) Users – NL, International Institute of Social History – US, Cline Center for Democracy, Univ. Illinois – UK, BBC The partners

Enriching text for search Search system – End-users: social history researchers – News reports on social unrest New York Times; 1.8 million articles ( ) – ACE2005 corpus NE and event annotations on news articles Develop and evaluate analytics – Access to corpora and searchable data sources – Analysis engines for entities and events Extract discourse information for social history events

Interoperable Text Mining Infrastructure Leverage and expand an interoperable infrastructure environment – Used for annotation and creating text mining workflows UIMA-based U-Compare, Argo and brat

Text mining pipelines in Argo Collection reader One or more analytic engines Annotation editor – to correct/augment A ‘consumer’ to collect annotations for indexing

NER annotations

ACE: Data set for training ACE 2005 Multilingual Training Corpus – 599 English articles 5,349 events: 8 types, 33 subtypes (e.g., Life-Marry) 9,793 roles: 35 role types (e.g., Agent) 61,321 entities/values (~ event arguments) – 54,824 entities: 7 types, 45subtypes (e.g., ORG-Media) – 5,469 timex2 elements: 1 type – 1,028 values: 5 types, 5 subtypes (e.g., Numeric-Money)

ACE 2005: Events “Several hours later, dozens of Israeli tanks advanced into the northern Gaza Strip backed by helicopters which fired at least three rockets in the Jabaliya area, Palestinian security sources said.” The marking denotes an event associated with the following details: Type: Movement Subtype: Transport Modality: Asserted Polarity: Positive Genericity:Specific Tense: Past Event arguments: Entity: “dozens of Israeli tanks”, role: Vehicle Entity: “The northern Gaza Strip”, role: Destination Anchor (or in other words trigger): “Advanced”

Event Extraction Text Mining Workflow Pipeline system with machine-learning-based modules Trigger detection Edge detection Event detection

Interpretation of events A event may: – represent a fact, a specific event, an opinion/analysis, a hypothetical situation, recommendation, etc. – be presented as the author’s own knowledge/point of view, or that of a third party – be expressed together with a level of certainty (e.g., speculation) – Represent a situation in the past, an ongoing situation, or something that will happen in the future – be negated, i.e., there is an indication that the event did not happen meta-knowledge

Enriching event annotation Enriching event annotation with meta- knowledge information allows more sophisticated event extraction systems to be trained Additional search criteria can be specified e.g. – Find only events that represent known facts – Find only events which are reported with high or complete certainty

Meta-knowledge examples A past event A past event reported by a third party

Meta-knowledge examples (2) A future event A planned event reported by a third party A negated, hypothetical event

Meta-knowledge examples (3) A hypothetical event A future event with high, but not complete confidence

Meta-knowledge Scheme for newswire Class / Type Event (Centred on an Event Trigger) Modality Genericity Specific Generic Source- Type Author Involved Third Party Participants Attacker Place, etc Asserted Presupposed Speculated Other Polarity Positive Negative Tense Past Present Future Unspecified Subjectivity Positive Negative Neutral Multi-valued

ACE Annotation Summary Meta-knowledge information for all 5349 events in ACE corpus reviewed/updated to reflect changes and extensions to meta- knowledge scheme “Clue” words and phrases annotated for each attribute type Substantial agreement reached between 2 different annotators (average of 0.76 Kappa)

ACE meta-knowledge analysis Analysis revealed varying meta-knowledge characteristics in different parts of ACE corpus – Speculated events are more than twice as likely in newsgroups discussing news stories than in news reports themselves – Asserted, definite events tend to decrease with the formality of the text type/setting – General topics, rather than specific events, are 2-3 times more likely in discussions about news than news reports

ACE meta-knowledge analysis Events with negative subjectivity more than twice as those with positive polarity “Bad news sells better than good news” Strongly negative words often chosen over more neutral words to help “sensationalise” stories – e.g. terrorism, genocide, fierce, bloody Attribution of events to sources other than the author/reporter most common in newswire (35% of all events) – Most common for such information to come from eyewitnesses

Events