Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
03/20/2003Parallel IR1 Papers on Parallel IR Agenda Introduction Paper 1:Inverted file partitioning schemes in multiple disk systems Paper 2: Parallel.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Evaluating Search Engine
Search Engines and Information Retrieval
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
The Research Project - Preliminary Proposal Presentation Contextual Suggestion Track: Travel Plan Recommendation System Based on Open-web Information Presenter:
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University
Rutgers’ HARD Track Experiences at TREC 2004 N.J. Belkin, I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. L. Smith, Y. Sun, X.-J. Yuan,
Information Retrieval in Practice
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Re-ranking Documents Segments To Improve Access To Relevant Content in Information Retrieval Gary Madden Applied Computational Linguistics Dublin City.
1 LM Approaches to Filtering Richard Schwartz, BBN LM/IR ARDA 2002 September 11-12, 2002 UMASS.
Presented by Zeehasham Rasheed
On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date : 2014/05/13 Source : CIKM’13 Advisor : Prof. Jia-Ling, Koh Speaker : Yi-Hsuan.
Search Engines and Information Retrieval Chapter 1.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Leveraging Conceptual Lexicon : Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.
IR Evaluation Evaluate what? –user satisfaction on specific task –speed –presentation (interface) issue –etc. My focus today: –comparative performance.
A Comparative Study of Search Result Diversification Methods Wei Zheng and Hui Fang University of Delaware, Newark DE 19716, USA
1 Retrieval and Feedback Models for Blog Feed Search SIGIR 2008 Advisor : Dr. Koh Jia-Ling Speaker : Chou-Bin Fan Date :
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
Giorgos Giannopoulos (IMIS/”Athena” R.C and NTU Athens, Greece) Theodore Dalamagas (IMIS/”Athena” R.C., Greece) Timos Sellis (IMIS/”Athena” R.C and NTU.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
Carnegie Mellon Novelty and Redundancy Detection in Adaptive Filtering Yi Zhang, Jamie Callan, Thomas Minka Carnegie Mellon University {yiz, callan,
A Word Clustering Approach for Language Model-based Sentence Retrieval in Question Answering Systems Saeedeh Momtazi, Dietrich Klakow University of Saarland,Germany.
AN EFFECTIVE STATISTICAL APPROACH TO BLOG POST OPINION RETRIEVAL Ben He Craig Macdonald Iadh Ounis University of Glasgow Jiyin He University of Amsterdam.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Advantages of Query Biased Summaries in Information Retrieval by A. Tombros and M. Sanderson Presenters: Omer Erdil Albayrak Bilge Koroglu.
Relevance-Based Language Models Victor Lavrenko and W.Bruce Croft Department of Computer Science University of Massachusetts, Amherst, MA SIGIR 2001.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
1 13/05/07 1/20 LIST – DTSI – Interfaces, Cognitics and Virtual Reality Unit The INFILE project: a crosslingual filtering systems evaluation campaign Romaric.
1 What Makes a Query Difficult? David Carmel, Elad YomTov, Adam Darlow, Dan Pelleg IBM Haifa Research Labs SIGIR 2006.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Carnegie Mellon School of Computer Science Language Technologies Institute CMU Team-1 in TDT 2004 Workshop 1 CMU TEAM-A in TDT 2004 Topic Tracking Yiming.
Evaluation of Information Retrieval Systems Xiangming Mu.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Blog Track Open Task: Spam Blog Detection Tim Finin Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Justin.
NTU Natural Language Processing Lab. 1 Blog Track Open Task: Spam Blog Classification Advisor: Hsin-Hsi Chen Speaker: Sheng-Chung Yen Date: 2007/01/08.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
WHIM- Spring ‘10 By:-Enza Desai. What is HCIR? Study of IR techniques that brings human intelligence into search process. Coined by Gary Marchionini.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
An Effective Statistical Approach to Blog Post Opinion Retrieval Ben He, Craig Macdonald, Jiyin He, Iadh Ounis (CIKM 2008)
Information Retrieval in Practice
Modeling Influence Opinions and Structure in Social Media
Large-Scale Content-Based Audio Retrieval from Text Queries
Evaluation of IR Systems
Author: Kazunari Sugiyama, etc. (WWW2004)
Retrieval Performance Evaluation - Measures
Yingze Wang and Shi-Kuo Chang University of Pittsburgh
Preference Based Evaluation Measures for Novelty and Diversity
Presentation transcript:

Using Blog Properties to Improve Retrieval Gilad Mishne (ICWSM 2007)

2 Introduction  In 2006, TREC request the participants to locate blog posts expressing an opinion about a topic in a large collection of posts.  Three aspects involved in locating opinionated blog posts were used in this paper: Topical relevance Opinion expression Post quality

3 Introduction  Topical relevance: the degree to which a post deals with the given topic.  Opinion expression: involves identifying whether a post contains an opinion.  Post quality: an estimation of the quality of a blog post, under the assumption that higher- quality posts are more likely to contain meaningful opinions and are preferred by users. Also include detection of spam in blogs.

4 Improving Retrieval using Blog Properties  Temporal Relevance Feedback  The blogspace is a dynamic medium, quickly responding to ongoing events.  A substantial number of blog search queries are related to specific events, in many cases news- oriented ones.  The distribution of dates in relevant documents for these queries concentrated around a short period during which the event took place.

5 Improving Retrieval using Blog Properties  Temporal Relevance Feedback

6 Improving Retrieval using Blog Properties  Temporal Relevance Feedback

7 Improving Retrieval using Blog Properties  Temporal Relevance Feedback  The distribution of dates in these highly-ranked documents serves as an estimation to the distribution of dates in relevant documents.  We assume that blog posts published near the time of an event are more likely to be relevant to it.

8 Improving Retrieval using Blog Properties  Temporal Relevance Feedback  This allows us to identify relevance also for documents which do not contain the query words.  This temporal prior likelihood is then combined with its topical retrieval relevance using a linear combination to rerank the topical-retrieval only results.

9  Using Comment Information  The presence of comments in a blog post indicates that some discussion is taking place regarding the contents of the post.  Extracting blog comments is a laborious task, involving parsing the HTML contents and identifying the comments in it. Improving Retrieval using Blog Properties

10  Using Comment Information  Since we estimate dispute by the presence of comments rather than by using their content.  We can limit to identifying the number of comments instead of actually extracting them. Improving Retrieval using Blog Properties

11  Using Comment Information  One of the types of content present in HTML but not in XML is the comments of the post.  The amount of non-comment overhead involved (permalinks, blogger profile) is fixed over multiple posts in the same blog.  The only component that changes substantially is the comment data. Improving Retrieval using Blog Properties

12 Improving Retrieval using Blog Properties  Using Comment Information  The average HTML to XML ratio for a blog.

13 Improving Retrieval using Blog Properties  Using Comment Information

14  Using Comment Information  We use the discussion level as a static prior for each post, assuming that posts which attract many responses are more likely to express an opinion.  This approach assigns higher importance to longer comments – which typically indicate a more involved discussion. Improving Retrieval using Blog Properties

15  Query-dependent Spam Filtering  Spam blogs degrade the performance of blog search and analytics.  We implemented a spam filter for blogs based on SVM classification of the contents of the blogs as well, and used it to rerank retrieval results. Improving Retrieval using Blog Properties

16 Improving Retrieval using Blog Properties  Query-dependent Spam Filtering  We observed that the contribution of the spam filter to different queries varied substantially.

17  Query-dependent Spam Filtering  If we could predict the degree to which a query is commercially-oriented, we may be able to improve average performance by increasing the importance of spam filtering for those highly commercial queries, and decreasing it for the non-commercial ones. Improving Retrieval using Blog Properties

18  Query-dependent Spam Filtering  We simply count the number of documents in which the query terms co-occur with a term highly correlated with commercial activity (we use the word “shop”).  The query commercial intent value is used as a query-specific weight for spam reranking, instead of using a fixed, query-independent weight. Improving Retrieval using Blog Properties

19 Improving Retrieval using Blog Properties  Query-dependent Spam Filtering

20 Evaluation  All our experiments are based on the data used at the TREC 2006 Blog Track (including both HTML and XML content).  50 queries were used at the track.  The evaluation metrics used are the standard TREC measures: mean average precision (MAP) R-Precision precision at 10

21 Evaluation

22 Conclusions  These techniques correspond to different aspects of opinionated retrieval:  To improve topical retrieval, we incorporated temporal information into the retrieval model.  To improve identification of opinionated content, we estimated the level of discussion in a blog post using the ratio of HTML and XML content in it.  Finally, we identified that some queries require more rigorous spam filtering and used an approach to assign varying levels of spam detection according to a query’s commercial context.