1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Slides:



Advertisements
Similar presentations
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Advertisements

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search Engines and Information Retrieval
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Introduction to Language Models Evaluation in information retrieval Lecture 4.
Scalable Text Mining with Sparse Generative Models
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
Adaptive Subjective Triggers for Opinionated Document Retrieval Kazuhiro Seki Organization of Advanced Science & Technology Kobe University Kuniaki Uehara.
1 Opinion Spam and Analysis (WSDM,08)Nitin Jindal and Bing Liu Date: 04/06/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling.
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Search Engines and Information Retrieval Chapter 1.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
Newsjunkie: Providing Personalized Newsfeeds via Analysis of Information Novelty Gabrilovich et.al WWW2004.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Chapter 6: Statistical Inference: n-gram Models over Sparse Data
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Estimating Topical Context by Diverging from External Resources SIGIR’13, July 28–August 1, 2013, Dublin, Ireland. Presenter: SHIH, KAI WUN Romain Deveaud.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Context-Sensitive Information Retrieval Using Implicit Feedback Xuehua Shen : department of Computer Science University of Illinois at Urbana-Champaign.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
LOGO Summarizing Conversations with Clue Words Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou (WWW ’07) Advisor : Dr. Koh Jia-Ling Speaker : Tu.
1 Using The Past To Score The Present: Extending Term Weighting Models with Revision History Analysis CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG,
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Gravitation-Based Model for Information Retrieval Shuming Shi, Ji-Rong Wen, Qing Yu, Ruihua Song, Wei-Ying Ma Microsoft Research Asia SIGIR 2005.
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Relevance Models and Answer Granularity for Question Answering W. Bruce Croft and James Allan CIIR University of Massachusetts, Amherst.
DISTRIBUTED INFORMATION RETRIEVAL Lee Won Hee.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Wen Chan 1 , Jintao Du 1, Weidong Yang 1, Jinhui Tang 2, Xiangdong Zhou 1 1 School of Computer Science, Shanghai Key Laboratory of Data Science, Fudan.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
QUERY-PERFORMANCE PREDICTION: SETTING THE EXPECTATIONS STRAIGHT Date : 2014/08/18 Author : Fiana Raiber, Oren Kurland Source : SIGIR’14 Advisor : Jia-ling.
LEARNING IN A PAIRWISE TERM-TERM PROXIMITY FRAMEWORK FOR INFORMATION RETRIEVAL Ronan Cummins, Colm O’Riordan (SIGIR’09) Speaker : Yi-Ling Tai Date : 2010/03/15.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,
Multimedia Information Retrieval
INF 141: Information Retrieval
A Neural Passage Model for Ad-hoc Document Retrieval
Presentation transcript:

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor: Dr. Koh, Jia-Ling

2 Outline Introduction Opinion Retrieval by a Trigger Model Empirical Evaluation Conclusion

3 Introduction User-generated contents (UGC) typically contains subjective opinions of individual authors which are difficult to find in the conventional mass media  magazines and newspapers. This paper focuses on IR aspects, specifically, opinionated document (blog post) retrieval.

4 The approaches explored by the track participants and others can be roughly categorized into lexicon-based and classification based.  uses a manually or automatically compiled list of words, such as “like” and “fantastic”  assumes the existence of those words in a document as an indicator of opinionatedness.

5 considering the proximity of pronouns and subjective expressions to objects improves opinion retrieval statistical language models  capturing such characteristic patterns often seen in opinionated documents. explore the use of the trigger models  originally proposed for dealing with long distance word dependencies.

6 Opinion Retrieval by a trigger model *Motivation To judge whether a given document contains subjective opinions  look for subjective words in the document  lexicon-based approaches: possibly be an opinion but is rather objective. i.e. like.  classic n-gram language models: increasing n will cause data sparseness and result in unreliable parameter estimation.

7 Opinion Retrieval by a trigger model *Subjective Trigger Models n-gram language models have been successfully applied to many NLP-related problems.  long-distance dependencies  trigger-based language model  trigger : a pair of words, one tends to bring about the occurrence of the other

8 A trigger model incorporating such trigger pairs is used to enhance a baseline n-gram language model by linearly interpolating the two  : a word  : a history (a set of words preceding )  : the interpolation parameter

9 To build a trigger model, we first need to identify significant triggering and triggered word pairs. a criterion to consider word w as a potential triggered word only when an n- gram model without smoothing (different from ) gives “poor” estimation for.

10 (low level triggers) given input text represented as a word sequence, the difference is computed as follows.  :based on the log-likelihood difference between an n-gram language model and a mixture model enhanced only by the pair “a→b” under consideration greatest log-likelihood difference to build the final trigger model.

11 Capturing the characteristics of subjective opinions based on two assumptions:  a subjective opinion usually contains two essential components: triggering words: the subject of the opinion (e.g., “I”) or the object that the opinion is about (e.g., “this movie”) triggered words: a subjective expression (e.g., “like” and “best”).  the triggering word typically appears as a pronoun.

12 Opinion Retrieval by a Trigger Model *Building a Subjective Trigger Model we identified trigger pairs potentially representing subjective opinions. As potential triggering words,14 pronouns:  I, my, you, it, its, he, his, she, her, we, our, they, their, and this identified up to 10,000 trigger pairs using the low level triggers criterion. we limited the history h to the prior context (preceding words) in the same sentence.

13 function words appear in many context, which sometimes leads to lower than threshold t. not directly evaluate the frequency of history,.  needs to be sufficiently high in order to obtain reasonable estimate for.

14 modified the criterion:  :defined as the ratio of the frequency of to that of i.e. prevent (mainly) function words from being identified as triggered words.

15 two advantages over such approaches.  not require NLP tools and relies only on word occurrences easily applicable to other languages as long as similar assumptions apply  not consider dependency relations it is capable of discovering not directly dependent word pairs.

16 For each identified trigger pair, their association,, is calculated as normalized maximum likelihood estimate based on their collocations  : estimated by averaging the association scores,, for every combination of and, where is a history word of.

17 Opinion Retrieval by a Trigger Model *Model Adaptation similar to pseudo-relevance feedback (PRF) difference:  do not modify the original query  updates the language model for better estimating the opinionatedness of a given blog post

18 1. Carry out initial search by a choice of an IR model for a given topic I. 2. Among the top k blog posts retrieved, identify trigger pairs,, and compute their associations, in the same way.  one could use the given topic as potential triggering words in addition to the predefined set of 14 pronouns. 3. Estimate the trigger model using either (the original term associations learned offine) or (learned in the previous step) with a greater value.

19 Empirical Evaluation Data sets: TREC Blog million posts Relevance judgment categories:  irrelevant (labeled as 0),  relevant and not opinionated (1),  relevant and only negatively opinionated (2),  relevant and both positively and negatively opinionated (3),  relevant and only positively opinionated (4).  consider only the labels 2–4 as relevant in the context of opinion retrieval.

20 *Evaluation of the Language Model perplexity: a measure commonly used for evaluating language models. defined as :cross entropy of language model on document. concatenated all the opinionated blog posts labeled 2–4 to create a single very long document and another document from all the non-opinionated blog posts labeled 1

21

22 Opinion Retrieval with the Subjective Trigger Model Initial Retrieval  initial search results returned by a general IR model. we tested two alternatives: a) VSM, b) INM( inference network model combined with a language modeling approach).

23 Integration of a Subjective Trigger Model  :a given d is relevant to user’s information need I.   m: document length (word count)  : controlling the effect of the language model enhanced by subjective triggers.

24

25 Analysis on Individual Queries

26

27 Zyrtec (AP) :  (+47.7%) Basque (AP):  (-18.8%)

28 Trigger: original (not adapted) subjective trigger model

29 Conclusion This paper discussed an application of a focused trigger model to opinion retrieval by way of reranking initial search results. the identified triggers did capture the characteristics of opinions as compared with a baseline trigram model  contributing to discriminating opinionated posts from the non-opinionated.