UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Slides:



Advertisements
Similar presentations
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Advertisements

Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Exercising these ideas  You have a description of each item in a small collection. (30 web sites)  Assume we are looking for information about boxers,
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Search Engines and Information Retrieval
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Presented by Zeehasham Rasheed
Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
University of Kansas Department of Electrical Engineering and Computer Science Dr. Susan Gauch April 2005 I T T C Dr. Susan Gauch Personalized Search Based.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Search Engines and Information Retrieval Chapter 1.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Querying Structured Text in an XML Database By Xuemei Luo.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
WEB SEARCH PERSONALIZATION WITH ONTOLOGICAL USER PROFILES Data Mining Lab XUAN MAN.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Personalized Search Xiao Liu
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Presenter: Shanshan Lu 03/04/2010
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.
 Examine two basic sources for implicit relevance feedback on the segment level for search personalization. Eye tracking Display time.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Post-Ranking query suggestion by diversifying search Chao Wang.
Web Search Personalization with Ontological User Profile Advisor: Dr. Jai-Ling Koh Speaker: Shun-hong Sie.
Learning to Estimate Query Difficulty Including Applications to Missing Content Detection and Distributed Information Retrieval Elad Yom-Tov, Shai Fine,
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
The Loquacious ( 愛說話 ) User: A Document-Independent Source of Terms for Query Expansion Diane Kelly et al. University of North Carolina at Chapel Hill.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
UOS Personalized Search Zhang Tao 장도. Zhang Tao Data Mining Contents Overview 1 The Outride Approach 2 The outride Personalized Search System 3 Testing.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
An Adaptive User Profile for Filtering News Based on a User Interest Hierarchy Sarabdeep Singh, Michael Shepherd, Jack Duffy and Carolyn Watters Web Information.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Relevant Document Distribution Estimation Method for Resource Selection Luo Si and Jamie Callan School of Computer Science Carnegie Mellon University
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Information Organization: Overview
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Data Mining Chapter 6 Search Engines
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Information Organization: Overview
Presentation transcript:

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul

Zhang Tao Data Mining 2 Contents Overview 1 Determining the content of documents 2 User Profiles 3 Improving Search Results 4 Conclusions and Future Work 5

Zhang Tao Data Mining 3 Overview  Proposing a problem  With the exponentially growing amount of information available on the Internet, the task of retrieving documents of interest has become increasingly difficult.  People have two ways to find the data they are looking for: search and browse  In terms of searching, about one half of all retrieved documents have been reported to be irrelevant. Why?  Conclusion: How is the effective personalization system?

Zhang Tao Data Mining 4 Overview  The study of this paper  This paper studies ways to model a user’s interests and shows how these profiles can be deployed for more effective information retrieval and filtering.  A user profile is created over time by analyzing surfed pages.  This paper shows how the profiles can be used to achieve search performance improvements.  Introduce the OBIWAN project  The goal of OBIWAN is to investigate a novel content- based approach to distributed information retrieval.  Websites are clustered into regions.

Zhang Tao Data Mining 5 Overview  The architecture is a hierarchy of regions.  The text classifier is a core component not only of the entire OBIWAN project, but also of the presented personalization method.  Related Work  Personalization is a broad field of very active ongoing research.  Applications include personalized access to certain resources and filtering/rating systems.  SmartPush is currently the only system to store profiles as concept hierarchies.

Zhang Tao Data Mining 6 Determining the content of documents  Importance  User interests are inferred by analyzing the web pages the user visits.  For this purpose, it is necessary to determine the content, or characterize of these surfed pages.  A hierarchy of concepts  This ontology is based on a publicly accessible browsing hierarchy.  Each node is associated with a set of documents, all of documents for node are merged into a superdocument.  Documents as well as superdocuments are represented as weighted keyword vectors

Zhang Tao Data Mining 7 Determining the content of documents  This page vector is compared with the keyword vectors associated with every node to calculate similarities.  The nodes with the top matching vectors are assumed to be most related to the content of the surfed page.

Zhang Tao Data Mining 8 User Profiles  Introduce  User profiles store approximations of the interests of a given user.  User profiles include three features: hierarchically structured, and not just a list of keywords generated automatically, without explicit user feedback Dynamical  Creation and Maintenance  Profiles are generated by analyzing the surfing behavior of a user. “Surfing behavior” here refers to the length of the visited pages and the time spent thereon.

Zhang Tao Data Mining 9 User Profiles  Four different combinations of time, length, and subject discriminators have been investigated.  In the following function, time refers to the time a user spent on a given page, and length refers to the length of the page, ɤ (d,c i ) is the strength of the match between the content of document d and category c i. △ L(c i ) represents the interest L in a category c i. (1) (2)

Zhang Tao Data Mining 10 User Profiles  Profile Evaluation: Convergence  The evaluation of the user profiles consists of two parts: A notion of convergence is introduced with respect to which 16 actual user profiles are discussed. Examines the relationship between the calculated user interests and the actual user interests.  Figure 1 shows a sample profile (adjustment function 2), it consists of roughly 75 non-zero categories.  Figure 2 shows the numbers of non-zero categories for five sample profiles with categories created using the same interest adjustment function.

Zhang Tao Data Mining 11 User Profiles

Zhang Tao Data Mining 12 User Profiles

Zhang Tao Data Mining 13 User Profiles  On average, that corresponds to roughly 320 pages, or 17 days of surfing. Table 1 summarizes the convergence properties.

Zhang Tao Data Mining 14 User Profiles  Comparison with actual user interests  Although convergence is a desirable property, it does not measure the accuracy of the generated profiles.  The sixteen users were shown the top twenty subjects in their profiles in random order and asked how appropriately these inferred categories reflected their interests.  Table 2 shows the experiment for the answers to some questions with the top 20 and top 10 categories respectively.

Zhang Tao Data Mining 15 User Profiles

Zhang Tao Data Mining 16 Improving Search Results  A problem about search results  The wealth of information available on the web is actually too large.  As to search results, the top ranked documents a user can have a look at are often not relevant to this user.  There are three common approaches to address this problem: Re-ranking: The algorithms apply a function to the ranking numbers that have been returned by the search engine. Filtering: Filtering systems determine which documents in the results sets are relevant and which are not. Query Expansion: If a query can be expanded with the user’s interests, the search results are likely to be more narrowly focused.

Zhang Tao Data Mining 17 Improving Search Results  Re-Ranking  Given a query, re-ranking is done by modifying the ranking that was returned by a publicly accessible search engine.  ProFusion ( in this case. The idea is to characterize each of the returned documents and, by referring to the user profiles, to determine how much a user is interested in these categories.  The following function is the adjustment function of the Re-ranking method.

Zhang Tao Data Mining 18 Improving Search Results  Evaluation  The results that have been produced by the different re- ranking systems must be evaluated.  The eleven point precision average is the better measure method.  The eleven point precision average evaluates ranking performance in terms of recall and precision. Recall = Number of relevant items retrieved Number of relevant items in collection Precision = Number of relevant items retrieved Total number of items retrieved

Zhang Tao Data Mining 19 Improving Search Results  Figure 3 shows the recall-precision graphs for one interest adjustment functions.  Figure 4 shows The remaining set of 16 queries were evaluated using this function.

Zhang Tao Data Mining 20 Improving Search Results

Zhang Tao Data Mining 21 Improving Search Results

Zhang Tao Data Mining 22 Improving Search Results  Filtering  To filter a set of result documents means to exclude some documents.  Filtering was done by using the above ranking functions with thresholds to decide which documents were irrelevant and which were not.  Figures 5 and 6 show the performance of the filter for the training and the testing set, respectively.

Zhang Tao Data Mining 23 Improving Search Results

Zhang Tao Data Mining 24 Conclusion and Future Work  Conclusion  These profiles have been shown to converge and to reflect actual user interests quite well.  With the presented approach, the length of a surfed page can be neglected when the interest in a page is inferred.  Future work  Future work includes the integration of the system into a web browser.  Other areas of profile deployment are conceivable.