Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 IDB Lab Seminar.

Slides:



Advertisements
Similar presentations
Improvements and extras Paul Thomas CSIRO. Overview of the lectures 1.Introduction to information retrieval (IR) 2.Ranked retrieval 3.Probabilistic retrieval.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
A Machine Learning Approach for Improved BM25 Retrieval
Enterprise Search – Where do we go from here? Aya Soffer, PhD DGM, Information and Interaction Technologies IBM Haifa Research Lab.
CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.
WSCD INTRODUCTION  Query suggestion has often been described as the process of making a user query resemble more closely the documents it is expected.
Intelligent Information Retrieval 1 Vector Space Model for IR: Implementation Notes CSC 575 Intelligent Information Retrieval These notes are based, in.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Evaluating Search Engine
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Web Mining Research: A Survey
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Web Mining Research: A Survey
Web Search – Summer Term 2006 II. Information Retrieval (Basics Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Information Retrieval
HYPERGEO 1 st technical verification ARISTOTLE UNIVERSITY OF THESSALONIKI Baseline Document Retrieval Component N. Bassiou, C. Kotropoulos, I. Pitas 20/07/2000,
University of Kansas Data Discovery on the Information Highway Susan Gauch University of Kansas.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Web Data Mining and Applications Part I
Overview of Search Engines
Best Practices Using Enterprise Search Technology Aurelien Dubot Consultant – Media and Entertainment, Fast Search & Transfer (FAST) British Computer Society.
Enterprise & Intranet Search How Enterprise is different from Web search What to think about when evaluating Enterprise Search How Intranet use is different.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Focused Matrix Factorization for Audience Selection in Display Advertising BHARGAV KANAGAL, AMR AHMED, SANDEEP PANDEY, VANJA JOSIFOVSKI, LLUIS GARCIA-PUEYO,
Ido Guy, Naama Zwerdling, Inbal Ronen, David Carmel, Erel Uziel SIGIR ’ 10.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Tag Data and Personalized Information Retrieval 1.
Beyond Basic Faceted Search Ori Ben-Yitzhak, …(10 people) IBM Research Lab & Yahoo! Research WSDM 2008 (ACM International Conference on W eb S earch and.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
The CompleteSearch Engine: Interactive, Efficient, and Towards IR&DB Integration Holger Bast, Ingmar Weber CIDR 2007) Conference on Innovative Data Systems.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Search Engine Architecture
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
1 CS 430: Information Discovery Sample Midterm Examination Notes on the Solutions.
Personalized Social Search Based on the User’s Social Network David Carmel et al. IBM Research Lab in Haifa, Israel CIKM’09 16 February 2011 Presentation.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
ASSIST: Adaptive Social Support for Information Space Traversal Jill Freyne and Rosta Farzan.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Developing GRID Applications GRACE Project
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Search Engine Architecture
Methods and Apparatus for Ranking Web Page Search Results
Search Engine Architecture
Data Mining Chapter 6 Search Engines
Panagiotis G. Ipeirotis Luis Gravano
Information Retrieval and Web Design
Introduction to Search Engines
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

Social Search and Discovery Using a Unified Approach Einat Amitay et al. IBM Research Lab in Haifa, Israel HT’09 18 March 2011 IDB Lab Seminar IDB Tagging Team, School of CSE, SNU Presented by Kangpyo Lee

A Variety of Web Search Types 2 Social Search Personalized Search Unified Search Universal Search Multi-entity Search Faceted Search Multi-faceted Search Exploratory Search Vertical Search

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 3

Introduction  Recent Web 2.0 applications (e.g., web logs, collaborative bookmarking systems, and social networks) introduce new entities & relations in addition to regular web pages  Web 2.0 entities relate to each other in several ways –Documents may relate to other documents by referencing each other –A user may relate to a document through authorship relation, as a tagger, as an author, or as mentioned in the page’s content –A user may relate to other users through social relations –A tag relates to the bookmark it is associated with, and also to the tagger  These entities & relations may prove valuable in enhancing the search experience –By serving as potential search results –By influencing ranking algorithms 4

Introduction  We present and evaluate novel methods for leveraging social information to enhance search results and discover relations between Web 2.0 applications  Our approach leverages a unified representation of the entities and their relations  We then use this intricate heterogeneous collection to establish an all-encompassing social search solution 5

Introduction  Social search solution –Allows users to query for specific entities and retrieve results of all relevant types –The system returns, in addition to standard search results, users related to the query, as well as tags that are associated with relevant documents –These tags can be further used to categorize the search results and to better refine the searcher’s information need  We use the term social search engine to describe this multi-entity search system based on “social” data  Our social search system is the only one that provides a unified approach for searching and retrieving entities of all types 6

Introduction - Unified Approach  Our social data include records of users’ public activity with documents –such as bookmarking, tagging, rating, or comments made to other public Web 2.0 entities  Our system allows the search for any object type (e.g., documents, person, or tag) and the retrieval of all entity types  The system supports –Standard textual queries –Entity queries –Any combination of the two 7

Introduction - Unified Approach  The social search engine is based on the unified search approach  Unified search –A.k.a. heterogeneous interrelated entity search –An emerging paradigm within IR –The search space is expanded to represent heterogeneous information about objects that may relate to each other in several ways  Direct relations  Indirect relations  The system must be scalable, responsive, and reflect the rapid update patterns typical in Web 2.0 systems 8

Introduction - Unified Approach  We present a novel realization of unified search paradigm based on multifaceted search –Represents each of the system’s entities by a retrievable document –Direct relations between entities are represented by marking one of the elements as a “facet” of its counterpart –The strength of the relationship between the two objects is represented by the strength of document-facet relationship 9 AB Direct Relation - A is one of B’s facets - B is one of A’s facets

Introduction - Unified Approach  An efficient mechanism for updating relations between objects as well as efficient search over the heterogeneous data –Only direct relations between objects need to be updated when new entities are added –Indirect relations are dynamically induced from the direct relations and computed on-the-fly during query execution time –Directly-related objects are retrieved and scored during run-time using the search engine’s regular scoring mechanisms –Indirectly-related entities are retrieved and scored using an implementation of faceted search 10

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 11

Related Work  Social search –The set of annotations provided by the public can be used to enrich the page content –The # of annotations of a web page can be used as additional evidence of document quality for improved ranking of search results –Social data enables users to search for other people with whom thy maintain relationships in the network  Social ranking –Ranking all entities retrieved by the social search engine –FolkRank and SocialPageRank –Applying PageRank-like computation depends heavily on the graph size and is expected to be very slow –Different entity types provide different retrieval values for the searcher, hence they should be ranked according to their own characteristics 12

Related Work - Multi-Entity Search 13  Multi-entity search –Extending basic search functionality by answering user queries with many types of entities –Usually based on analysis of the relationship between entities and documents relevant to the query  Searching over a multi-entity graph –Nodes are entities (terms, documents, persons, annotations) –Edges are the relations between the entities  SimFusion uses a Unified Relationship Matrix (URM) to represent the multi-entity graph

Related Work - Multi-Entity Search 14  Unified Relationship Matrix (URM) –Relations between two object types are represented via a relationship matrix M ij –The (k, l) entry of matrix M ij represents the strength of the relation between the object pairs (o k, o l ) of types O i and O j respectively –The URM matrix U  Encapsulates all matrices to provide a unified representation of the unified search space  Provides relationship strength between any two directly related entities, along with a theoretically elegant way to calculate indirect relations through matrix multiplication

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 15

Implementation  Our solution to unified search represents each object in the system in two ways –(1) as a retrievable document –(2) as a facet (category) of all the objects to which it relates  A unified representation of a collaborative bookmarking system –Three object types – web pages, users, and tags –Each object type is associated with a corresponding document – a web page document, a user document, and a tag document –Three relationship types  A user-type facet between a user & the tagged web page  A tag-type facet between a tag & the associated web page  A user-type facet between a user & a tag used for bookmarking 16

Implementation - Scoring Indirectly Related Objects  The strength of the indirect relation between object o 1 & o 2 –U(o, o’) – the corresponding entry in the URM matrix –Equivalent to squaring the URM matrix  Provides the relationship strength of order two between any two objects  Eq. 1 can be generalized to score objects based on their indirect relations with any query –The score vector s 0 (q) provides the direct scores of all N objects in the system to the query –The score vector s 1 (q) provides the indirect scores of all objects 17

Implementation - Scoring Indirectly Related Objects  In addition, objects can be scored according to their relative popularity, or authority –FolkRank or SocialPageRank can be used –Inverse entity frequency (ief) score  N – the # of all objects in the system  N o – the # of objects directly related to o  Penalizes objects that are related to many objects in general  The final score of object o for a query q 18

Implementation - Multifaceted Search  Multifaceted search aims to combine the two main search approaches: –Direct search –Navigational search – offering navigational refinement on the results by categorizing the search results into predefined facets along with the counts of results per facet  Multifaceted search has become the prevailing user interaction mechanism in e-commerce sites –Now being extended to deal with semi-structured data, continuous dimensions, and folksonomies 19

Implementation - Multifaceted Search  The scores of directly related objects are equivalent to the scores as represented by s 0 (q)  The score of an indirectly related object, o, is computed by aggregating its relationship strength with all matching documents, multiplied by their direct score –w(o, o i ) – the relationship strength between the document o i & its facet o –Equivalent to Eq. 2 since w(o, o i ) = U(o, o i )  Indirectly related objects are represented by accumulating all facets of the same type 20

Implementation - Efficiency Factors  Two issues regarding use of the URM matrix for social search –1) the need for efficient computation of indirect relations –2) efficient dynamic updates  The universal query (q = ‘*’) that retrieves all the objects, indexed by the system as well as all objects related to them, has a query runtime of less than four seconds  Dynamic updates are handled by a mechanism that is implemented by storing the changes in an external databases 21

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 22

Social Search within the Enterprise 23 Textual QueryEntity Query

Social Search within the Enterprise - Social Data & Social Search Application  Web 2.0 services of IBM –Dogear – a collaborative bookmarking service (373,821 bookmarks, 234,856 web pages) –BlogCentral – a central blog service (77,930 blog threads) –BluePages – the enterprise directory and employee profile application (15,779 IBMers) –About 700,000 unique entities –Cow Search – the social search application available to all users of IBM’s intranet 24

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 25

User Study  Our goal was to measure both the quality of the returned document set and the related users and tags –The evaluation methodologies for documents are well known and have standard measures –There are no standard ways of measuring the quality of related users of tags  A user study was thus used –The retrieved documents were examined and marked with three relevance levels (0-not relevant, 1-marginally relevant, 2-highly relevant) –The quality of search results was measured by the normalized discount cumulative gain (NDCG) measure –To evaluate the effectiveness of the related people, we ed and asked the 612 random users to rate on a Likert scale of 1 to 5 26

User Study - Results  Social data contribution to enterprise search –We measure the quality of search results using manual assessments of the top-k search results for the 50 chosen queries 27

User Study - Results  Related users  Related tags 28

Outline  Introduction  Related Work  Implementation  Social Search within the Enterprise  User Study  Summary 29

Summary  Social data is valuable –1. The high precision of top retrieved documents demonstrate that user feedback identifies high quality content in the corpus –2. User comments and tags are highly beneficial in general and augment the description of system entities, while providing additional evidence for object popularity  Future research –Exploiting personal social networks for search personalization –Documents or tags recommendations –Quantifying the contribution of social objects to the effectiveness of the search system 30

Thank You! Any Questions or Comments?