COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.

Slides:

Advertisements

Similar presentations

Query Chain Focused Summarization Tal Baumel, Rafi Cohen, Michael Elhadad Jan 2014.

Advertisements

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

A Machine Learning Approach to Coreference Resolution of Noun Phrases By W.M.Soon, H.T.Ng, D.C.Y.Lim Presented by Iman Sen.

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Problem Semi supervised sarcasm identification using SASI

MINING FEATURE-OPINION PAIRS AND THEIR RELIABILITY SCORES FROM WEB OPINION SOURCES Presented by Sole A. Kamal, M. Abulaish, and T. Anwar International.

1 Relational Learning of Pattern-Match Rules for Information Extraction Presentation by Tim Chartrand of A paper bypaper Mary Elaine Califf and Raymond.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Playing the Telephone Game: Determining the Hierarchical Structure of Perspective and Speech Expressions Eric Breck and Claire Cardie Department of Computer.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.

Learning to Advertise. Introduction Advertising on the Internet = $$$ –Especially search advertising and web page advertising Problem: –Selecting ads.

A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK

Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,

Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Yin Yang (Hong Kong University of Science and Technology) Nilesh Bansal (University of Toronto) Wisam Dakka (Google) Panagiotis Ipeirotis (New York University)

Mining and Summarizing Customer Reviews

Webpage Understanding: an Integrated Approach

Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.

Recognition of Multi-sentence n-ary Subcellular Localization Mentions in Biomedical Abstracts G. Melli, M. Ester, A. Sarkar Dec. 6, 2007

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali and Vasileios Hatzivassiloglou Human Language Technology Research Institute The.

AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.

Illinois-Coref: The UI System in the CoNLL-2012 Shared Task Kai-Wei Chang, Rajhans Samdani, Alla Rozovskaya, Mark Sammons, and Dan Roth Supported by ARL,

Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.

Open Information Extraction using Wikipedia

Incident Threading for News Passages (CIKM 09) Speaker: Yi-lin,Hsu Advisor: Dr. Koh, Jia-ling. Date:2010/06/14.

Fine-Grained Location Extraction from Tweets with Temporal Awareness Date:2015/03/19 Author:Chenliang Li, Aixin Sun Source:SIGIR '14 Advisor:Jia-ling Koh.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.

A Cross-Lingual ILP Solution to Zero Anaphora Resolution Ryu Iida & Massimo Poesio (ACL-HLT 2011)

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Automatic Identification of Pro and Con Reasons in Online Reviews Soo-Min Kim and Eduard Hovy USC Information Sciences Institute Proceedings of the COLING/ACL.

Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.

Department of Software and Computing Systems Research Group of Language Processing and Information Systems The DLSIUAES Team’s Participation in the TAC.

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

4. Relationship Extraction Part 4 of Information Extraction Sunita Sarawagi 9/7/2012CS 652, Peter Lindes1.

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.

Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.

Language Identification and Part-of-Speech Tagging

INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.

Automatic Detection of Causal Relations for Question Answering

Text Mining & Natural Language Processing

Text Mining & Natural Language Processing

Presentation transcript:

COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management University, Singapore

COLING 2012 ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Introduction ► ► Defining an actionable comment as an expression with an entity such as person or organization and a suggestion that can be acted upon.

COLING 2012 ► ► Following comments are in response to the news about a car accident.   [C1]The government should lift diplomatic immunity of the ambassador.   [C2]Govt must inform the romanian government of what happened immediately.   [C3]SG government needs to cooperate closely with romania in persecuting this case.   [C4]Hope the government help the victims by at least paying the legal fees.   [C5]I believe that goverment will help the victims for legal expenses. ► ► All sentences consist of an action and the corresponding entity who should take the action.

COLING 2012 ► ► Observing entities in all the above sentences refer to the same entity, Government, but expressed in various forms. EntityAction governmentlift diplomatic immunity of the ambassador and get him to face.. governmentinform the romanian government of what happened immediately.. governmentcooperate closely with romania in persecuting.. governmenthelp victims by at least paying the legal fees

COLING 2012 ► ► Introduction ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Task Definition ► ► Goal: Extract and normalize actionable comments from user generated content in response to a news article.

COLING 2012 ► ► The actionable comments will be represented as an entity-action pair. ► ► Giving a news article A and corresponding candidate comments C = {c 1, c 2,..., c n } extracted using the keywords, our goal is to detect pairs of {ne i, na i } where ne i is a normalized entity and na i is a normalized action.

COLING 2012 ► ► Introduction ► ► Task Definition ► Nature of actionable comments  Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Sentence level study ► ► To understand how frequently a user writes an actionable comment, we randomly selected 500 sentences from AsiaOne.com ► ► 13.6% of the sentences are actionable comments.

COLING 2012 ► ► 88.3% of the actionable comments use the keywords listed in the right Table KeywordFrequency Should hope believe may be have to ought to be suggest suppose to need to must advise needs to request 54.24% 8.47% 3.39% 5.08% 1.69% 3.39% 1.69% 3.39% 1.69%

COLING 2012 ► ► Introduction ► ► Task Definition ► Nature of actionable comments   Sentence level study  Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Phrase level study ► ► Entity extraction: Identify the correct entity in the actionable comment. ► ► Normalization: Normalize the entity mentions to their canonical form. ► ► Redundancy: Normalize similar actions to aid in redundancy elimination.

COLING 2012 ► ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► Solution Method  Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Solution Method ► ► Entity-action extraction   Based on CRF model ► ► Normalization model   Based on the clustering techniques for entity and action normalization.

COLING 2012 Entity-action extraction ► ► A comment sentence x = (x 1, x 2,..., x n ) where each x i is a single token. We need to assign a sequence of labels or tags y = ( y 1, y 2,..., y n ) to x. We define our tag set as {BE, IE, BA, IA,O}

COLING 2012 Features ► ► POS features ► ► Positional features ► ► Dependency tree features

COLING 2012 POS features ► ► POS tags using the Stanford POS tagger, combine POS features of neighboring words in [-2, +2] window.

COLING 2012 Positional features ► ► Finding the position of each word, x i with respect to the keyword in the given sentence. ► ► Positive numbers for words preceding the keyword and negative numbers for words succeeding the keyword in the sentence. We do the same for neighboring words in [- 2, +2] window.

COLING 2012 Dependency tree features ► ► For each word x i, we check if it is nominal subject in the sentence and represent it by nsubj. The dependency tree features can be extracted using Stanford dependencies tool. ► ► The output is S = {e i, a i }, a set of entity- action pairs.

COLING 2012 ► ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► Solution Method   Entity-action extraction  Entity-action normalization ► ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Entity-action normalization ► ► Given S = {e i, a i }, a set of entity-action pairs, the goal is to generate NS = {ne i, na i }, a set of normalized entity-action pairs.

COLING 2012 Entity normalization ► ► Using agglomerative clustering which is a hierarchical clustering method which works bottom-up (Olson, 1995). ► ► Expanding the entity with the features from Google and Semantic-Similarity Sieves adopted from Stanford coreference algorithm (Raghunathan et al.2010).

COLING 2012 Features ► ► Alias features ► ► Semantic-similarity features

COLING 2012 Alias features ► ► Giving an entity mention, it is first expanded with the title of the news article and this query is fed to the Google API. Google outputs the ranked matching outputs.   EX: Alias features for “ Ionescu + title ” are Dr.Ionescu, Silvia Ionescu, Romanian Diplomat Ionescu etc.

COLING 2012 Semantic-similarity features ► ► Following steps from Stanford coreference resolution tool for both named and unnamed entities:   (a) Remove the text following the mention head word.   (b) Select the lowest noun phrase (NP) in the parse tree that includes the mention head word.   (c) Using the longest proper noun (NNP*) sequence that ends with the head word.   (d) Select the head word.

COLING 2012 Action normalization ► ► To remove the redundant actions. We choose clustering same as above to normalize the actions associated with same normalized entity. ► ► The feature set for this task is simply bag- of-words with stop word removal. The representative action is also chosen similar to the above.

COLING 2012 ► ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► Dataset ► ► Experiments ► ► Conclusion

COLING 2012 Dataset ► ► Dataset consists of 5 contentious news articles and comments from Asiaone.com. ► ► Using the keywords listed to extract the candidate sentences from all the comments (each comment has 1 or more sentences) in 5 news articles. ► ► We use randomly 110 candidate sentences from each article and in total 550 candidates for experiments. ► ► Agreement level using Cohen ’ s kappa is

COLING 2012 ► ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► Experiments ► ► Conclusion

COLING 2012 Experiments ► ► To prepare the ground truth, we engaged two annotators to label 550 candidate sentences. ► ► Label the entity with   BE (beginning of an entity)   IE (inside an entity). ► ► Label the action with:   BA (beginning of an action)   IA (inside an action)   The others are labeled as O (other). ► ► If both entity and action are found, sentence is a valid suggestion. Label it as 1. Otherwise, label it as 0.

Baseline ► Performing 10-fold cross validation for all our experiments, and use this pattern matching technique as a baseline. COLING 2012

Actionable knowledge detection results ► ► Precision of 88.26%, recall of 93.12% and F-score of 90.63% in classifying actionable comments. ► ► failed in detecting the actionable comments when the sentences have poor grammatical structure.   “ Don ’ t need to call the helpline.. ”

COLING 2012 Entity extraction results Exact MatchOverlap Match MetricsBaselineCRFBaselineCRF Recall Precision F-score

COLING 2012 Action extraction results Exact MatchOverlap Match MetricsBaselineCRFBaselineCRF Recall Precision F-score

COLING 2012 Experiments on entity-action normalization ► ► Single Link and complete link, which technique is more suitable for this problem? ► ► How does the clustering-based solution perform in normalizing the entity-action pairs?

COLING 2012 Single Link Vs Complete Link Single LinkComplete Link ArticlePreRecallFPreRecallF A A A A A

COLING 2012 Entity-Action Normalization Results ► ► We asked a human judge to validate the normalized entity-action pairs. ► ► Only if both entity and action are normalized, the pair is labeled as valid.

COLING 2012 ► ► Introduction ► ► Task Definition ► ► Nature of actionable comments   Sentence level study   Phrase level study ► ► Solution Method   Entity-action extraction   Entity-action normalization ► ► Dataset ► ► Experiments ► Conclusion

COLING 2012 Related Work ► ► Actionable content:   More focussed towards manufacturing applications in which the problems are identified to aid the designers in the product design improvements. ► ► To the best of our knowledge, our problem of extracting and normalizing entity-action pairs from users ’ comments is not studied.

COLING 2012 Conclusion ► ► EX: Obama ’ s state union address ► ► Apart from political and news forums, the public was asked to express opinions on Twitter using specific hashtags. This triggers the need for gathering actionable content in micro blogs. In the same line, diagnostic opinion detection that talks about what could have happened, who should be blamed, etc., is also an interesting problem.

THE END COLING 2012

F-score ► Precision = tp / (tp + fp) ► Recall = tp / (tp + fn) ► F-score=2 * (pre * recall) / (pre +recall) F-score COLING 2012