CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

Slides:



Advertisements
Similar presentations
Google News Personalization: Scalable Online Collaborative Filtering
Advertisements

DQR : A Probabilistic Approach to Diversified Query recommendation Date: 2013/05/20 Author: Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Source:
Collaborative Filtering in Social Tagging System on Joint Item-Tag Recommendations Date : 2011/11/7 Source : Jing Peng et. al (CIKM’10) Speaker : Chiu.
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Dimensionality Reduction PCA -- SVD
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Ranking Text Documents Based on Conceptual Difficulty using Term Embedding and Sequential Discourse Cohesion Shoaib Jameel, Wai Lam, Xiaojun Qian Department.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
I NCREMENTAL S INGULAR V ALUE D ECOMPOSITION A LGORITHMS FOR H IGHLY S CALABLE R ECOMMENDER S YSTEMS (S ARWAR ET AL ) Presented by Sameer Saproo.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
CSM06 Information Retrieval Lecture 3: Text IR part 2 Dr Andrew Salway
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Collaborative Ordinal Regression Shipeng Yu Joint work with Kai Yu, Volker Tresp and Hans-Peter Kriegel University of Munich, Germany Siemens Corporate.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
Chapter 5: Information Retrieval and Web Search
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 On Querying Historical Evolving Graph Sequences Chenghui Ren $, Eric Lo *, Ben Kao $, Xinjie Zhu $, Reynold Cheng $ $ The University of Hong Kong $ {chren,
Group Recommendations with Rank Aggregation and Collaborative Filtering Linas Baltrunas, Tadas Makcinskas, Francesco Ricci Free University of Bozen-Bolzano.
INF 141 COURSE SUMMARY Crista Lopes. Lecture Objective Know what you know.
Which of the two appears simple to you? 1 2.
Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.
CSE 6331 © Leonidas Fegaras Information Retrieval 1 Information Retrieval and Web Search Engines Leonidas Fegaras.
ON INCENTIVE-BASED TAGGING Xuan S. Yang, Reynold Cheng, Luyi Mo, Ben Kao, David W. Cheung {xyang2, ckcheng, lymo, kao, The University.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
On Scaling Latent Semantic Indexing for Large Peer-to-Peer Systems Chunqiang Tang, Sandhya Dwarkadas, Zhichen Xu University of Rochester; Yahoo! Inc. ACM.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Online Learning for Collaborative Filtering
Chapter 6: Information Retrieval and Web Search
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
SINGULAR VALUE DECOMPOSITION (SVD)
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 IDB Lab Seminar.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Natural Language Processing Topics in Information Retrieval August, 2002.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Vector Semantics Dense Vectors.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Cross-modal Hashing Through Ranking Subspace Learning
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
BMVC 2010 Sung Ju Hwang and Kristen Grauman University of Texas at Austin.
A Collaborative Quality Ranking Framework for Cloud Components
Document Clustering Based on Non-negative Matrix Factorization
School of Computer Science & Engineering
Restructuring Sparse High Dimensional Data for Effective Retrieval
Latent Semantic Analysis
Presentation transcript:

CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao,

SOCIAL TAGGING SYSTEMS Tags 2

SEARCH IN SOCIAL TAGGING SYSTEMS Two Problems: 1.Tag Inconsistency 2.A Multitude of Aspects 3

T AG I NCONSISTENCY car? automobile? car, automobile car, Benz car car, automobile automobile Audi car 4

A M ULTITUDE OF A SPECTS moon, worm moon, Perigee moon, lunar cherry blossoms, Sakura, cherry blossom Nikon, astrophotography, D40 5

SOLUTION LSI (Latent Semantic Indexing) CubeLSI SVD (Singular Value Decomposition) Tucker Decomposition Taggers Analyzing semantic relations among tags by taking into account the role of taggers 6

PROPOSED RANKING FRAMEWORK CubeLSI Algorithm: Input: tag assignments Output: pairwise tag semantic distances 7

CONCEPT DISTILLATION Tags with pairwise distances mp3 music photo photos video movie photo photos music mp3 video movie Concepts/Clusters 8

PROPOSED RANKING FRAMEWORK 9

BAG-OF-CONCEPTS REPRESENTATION Distilled Concepts 10

PROPOSED RANKING FRAMEWORK 11

PROPOSED RANKING FRAMEWORK 12

RANKING SEARCH RESULTS x y z Query Search results are sorted in descending order of their Cosine similarity scores. Resource 1 Resource 2 13

PROPOSED RANKING FRAMEWORK CubeLSI Algorithm: Input: tag assignments Output: pairwise tag semantic distances 14

CUBELSI Tensor Second-order Tensor Third-order Tensor 15

REPRESENTING DATA AS A THIRD- ORDER TENSOR 16

PAIRWISE TAG DISTANCE Two sources of noise: 1. may not result from user considering tag to be irrelevant to 2.Tagging is a casual and ad-hoc activity 17

TUCKER DECOMPOSITION Tag Resource User Tag Resource Tag Resource User core tensor original tensor purified tensor factor matrices Purified Tag Distance: 18

SPACE & TIME COSTS Last.fm dataset (3897 users, 3326 tags, 2849 resources) 36.9 billion entries 11.1 million entries Computing the Frobenius-norm for EACH tag pair requires 11.1 million subtractions, squaring and additions. There are a total of 5.5 million tag pairs for 3326 tags ! The amount of computations needed would be prohibitively huge!!! 19

The new formula depends only on core tensor and factor matrix There is no need to compute any entries of purified tensor The relatively low dimensions of and implies much fewer computations needed SHORT-CUT TO EVALUATING impractical is a matrix that can be readily computed from the core tensor 20

EXPERIMENTAL RESULTS Dataset statistics #users #tags #resources #records 21

SAMPLE TAG CLUSTERS 22

OTHER RANKING METHODS Freq: Resources are ranked in descending order of # of users who annotate the resource with query tags. BOW (Bag-of-Words) : Use IR; each resource is a document and each tag is a word. FolkRank [Hotho et al. 2006]: A modified version of PageRank. It follows the assumption that votes cast by important users with important tags would make the annotated resources important. 23

OTHER RANKING METHODS LSI: This method projects the third-order tensor onto a 2D tag-resource matrix, and then applies traditional LSI on the tag-resource matrix using SVD. CubeSim: This method is similar to CubeLSI except that it computes the distance between two tags and directly from the original tensor by 24

RANKING QUALITY Evaluation Metric Normalized Discounted Cumulative Gain (NDCG) NDCG rewards more heavily to relevant resources that are top-ranked than those that appear lower down in the list. where denotes that the metric is evaluated only on the resources that are ranked top in the list, is the relevance level of the resource ranked in the list, and is a normalization factor that is chosen so that the optimal ranking’s NDCG score is users, each proposing 8 queries 25

RANKING QUALITY (Delicious) 26

RANKING QUALITY (Bibsonomy) 27

RANKING QUALITY (Last.fm) 28

EFFICIENCY Offline: pre-processing times (hours) Online: query processing times (seconds) Storage size: 29

RELATED WORK Matrix Factorization Our work differs from MF in two ways: We aim at capturing semantic relations among tags. We deal with a three-dimensional tensor. Hotho et al Our work differs from FolkRank in that our approach performs offline semantic analysis, which allows online query processing to be efficiently done. Wu et al Our approach is technically different from that work. Bi et al Our approach scales to large social tagging databases, which the previous work is unable to handle. 30

CONCLUSIONS We introduce a novel tag-based framework for searching resources in social tagging systems. We study the role of taggers in search quality for social tagging systems. We propose CubeLSI, which is a 3D extension of LSI, for semantic analysis over the third-order tensor of resources, taggers, and tags. We present a comprehensive empirical evaluation of CubeLSI against a number of ranking methods on real datasets. 31

THANK YOU! Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao, 32