Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.

Slides:

Advertisements

Similar presentations

Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.

Advertisements

Google News Personalization: Scalable Online Collaborative Filtering

Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,

Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :

Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.

Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.

WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.

Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.

INFO 624 Week 3 Retrieval System Evaluation

Link Analysis, PageRank and Search Engines on the Web

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

On Sparsity and Drift for Effective Real- time Filtering in Microblogs Date ： 2014/05/13 Source ： CIKM’13 Advisor ： Prof. Jia-Ling, Koh Speaker ： Yi-Hsuan.

Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.

1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

Leveraging Conceptual Lexicon ： Query Disambiguation using Proximity Information for Patent Retrieval Date : 2013/10/30 Author : Parvaz Mahdabi, Shima.

Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.

By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.

Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

1 Applying Collaborative Filtering Techniques to Movie Search for Better Ranking and Browsing Seung-Taek Park and David M. Pennock (ACM SIGKDD 2007)

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:

Universit at Dortmund, LS VIII

New and Improved: Modeling Versions to Improve App Recommendation Date: 2014/10/2 Author: Jovian Lin, Kazunari Sugiyama, Min-Yen Kan, Tat-Seng Chua Source:

Google News Personalization: Scalable Online Collaborative Filtering

Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,

Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.

ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

1 Blog site search using resource selection 2008 ACM CIKM Advisor ： Dr. Koh Jia-Ling Speaker ： Chou-Bin Fan Date ：

Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.

Retroactive Answering of Search Queries Beverly Yang Glen Jeh.

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.

Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.

Date: 2012/5/28 Source: Alexander Kotov. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Interactive Sense Feedback for Difficult Queries.

Topical Clustering of Search Results Date : 2012/11/8 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : Wei Chang 1.

A DDING S TRUCTURE TO T OP -K: F ORM I TEMS TO E XPANSIONS Date : Source : CIKM’ 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh 1.

Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.

Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.

Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.

CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.

INFORMATION RETRIEVAL MEASUREMENT OF RELEVANCE EFFECTIVENESS 1Adrienn Skrop.

Queensland University of Technology

Large-Scale Content-Based Audio Retrieval from Text Queries

IR Theory: Evaluation Methods

Enriching Taxonomies With Functional Domain Knowledge

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

Connecting the Dots Between News Article

Presentation transcript:

Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012

Outline Introduction Challenges Miss Tag Effect Popularity Bias Approaches Balanced Voting Model One-Class Probabilistic Model Experiment Conclusion

Introduction When I want to search: “Outstanding universities in California” What’s the result can I get? Hmm....It’s so strange..... Where’s Standford University?

Cont. How to solve this situation? A potential solution is to providing a query-by- example interface. Users can provide some examples to help to improve the quality of results. Ex. Issue a query : ”UC Berkeley” Result : UCLA, Standford University,...etc.

Cont. What’s the major challenge? Identify and rank entities that are similar to the user-provided examples on tag information. In the uncontrolled nature of user-generated metadata often cause problems of imprecision and ambiguity. So we have two challenges: missing tag effect and popularity bias.

Cont. Goal: To create a function R to measure the similarity between an entity x i in the dataset and query X Q. Tag generation Data Model Identifier: Title Entity: one Wiki page or entry Tag : user labels the page or category name

Intersection-Driven Approach T Q ∩ = T 1 ∩ T 2 ∩ T 3 ∩... X Q = {x 1,x 2 } R(x 4, X Q ) = 1 R(x 3, X Q ) = 2 What’s the problem? We don’t know user want to search city? capital? or both?

Missing tag effect When? A newly created entity might not be well tagged until its editors finish revising all the content of the entity. It could cause the system to misinterpret user intent. How to happen? How to solve?

X Q = { x 2 : Washington D.C, x 3 :London} By Intersection-driven approach, we can get {t 8 : Object} So it considers the entity Beijing as an irrelevant one and return others which contain the tag Object. How to happen?

Solve Missing Tag Effect One way is called Partial Weighting Generalization X Q = { x 2, x 3 }, T Q ∩ = {Object} We can assign scores in real number, instead of either 0 or 1 to these tags. For example: X Q = {x 2, x 3 } Now we assign 0.5 points to these tags not in T Q ∩ : R(x 4,X Q ) = = 1.5 R(x 5,X Q ) = = 1.5 R(x 6,X Q ) = = 2 In this system, if the system already returns satisfied results, we tend to not adopt generalization unless the user asks for more results.

Popularity Bias The number of tags associated with an entity follows a power law distribution. |T i | : The popularity of an entity x i based on the number of tags associated with x i if X Q = {x 1: Beijing, x 6: Lyon }, then the entity Beijing may contribute more score, we will get the result having entities like Beijing. The popular tag is probably not the concept the users intent to search for. ( like tag: object)

What do we want? We have to refine the intersection-driven approach to solve two challenges above: A popular entity in a query shouldn’t influence the results. Even a few tags are missing in input example, the system have to identify relevant entities based on tags associated with a subset of input examples.

Balanced Voting Model

X Q = {x 1 :Beijing, x 6 :Lyon}, Entity = {x 3 :London} x 1 :Beijing ->T 1 :{City, Capital, Asia, Summer Olympic, China} x 3 :London ->T 3 :{City, Europe, Summer Olympic, Object} x 6 :Lyon -> T 6 :{City, Europe, Object} R(t 1 :City, X Q ) = = 0.53 R(t 2 :Capital, X Q ) = 0.2 R(t 3 :Summer Olympic, X Q ) = 0.53 R(x 3, X Q ) = = 1.39 This way compensates biases caused by a popular entity in a query. The non-zero assignment alleviates the missing tag effect.

One-Class Probabilistic Model Now let us think something: how people create a query for finding similar items? At first, the user must have some desired property in mind, then try to recall other properties based on their knowledge.

Cont. Now we assume a user’s intent corresponds to one tag t k in a dataset. Because the intent is unpredictable so we think users may select |X Q | entities from ε(t k ). t k set is expected to be associated with all entities in the query. ε(t k ) is the set of all entities associated with t k.

Cont. There is no missing tag in the dataset All tags are independent from each other t k stands for the user’s intent.

Cont. The probability of having X Q being the query and t k being the desire tag is: Now we sum up all probabilities value to get the probability of the entity x i being a similarity entity. It’s good for alleviating the popularity tag-bias because the system will assign a low value to P(X Q |t k ) for a popular tag.

Cont. Now we deal with the missing tag effect: ε c (t k ): Return entities that are relevant to the tag tk but the tag-entity relation is missing.(Ex. ε c (t 8 :Object) ={x 1 :Beijing} ) m k : The number of entities missing a tag t k in a query. u : the number of all entities in the dataset Since the ratio of missing tag is unknown, the paper make an assumption: 50% of tag-entity relation being missing.

Experiment For evaluating ranking algorithms, we build a search engine and see how well user perceive our new ranking results. We download the dataset of Wikipedia and create a search interface on top of the dataset for collecting user survey.

Effectiveness evaluation This paper collected 600 valid questionnaires from 69 students in UCLA to create a benchmark for evaluating user satisfaction. Compute the satisfaction score:

Cont.

Compare to Google Set Query:{Beijing, Atlanta} -> Olympic!

Conclusion This paper introduced three approaches, and built a search engine on top of them, creating a benchmark for evaluation. It explains two important challenges of utilizing tag information: popularity bias and the missing tag effect, and explain how to overcome these difficulties. The framework not only can find similar items, but also shows potential of social tag information. It show that the task can be completed through providing a query consisting of entities and using only tag information, even though the tag information is uncontrolled and noisy.