Presentation is loading. Please wait.

Presentation is loading. Please wait.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

Similar presentations


Presentation on theme: "UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul."— Presentation transcript:

1 UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul

2 Zhang Tao Data Mining 2 Contents Overview 1 Determining the content of documents 2 User Profiles 3 Improving Search Results 4 Conclusions and Future Work 5

3 Zhang Tao Data Mining 3 Overview  Proposing a problem  With the exponentially growing amount of information available on the Internet, the task of retrieving documents of interest has become increasingly difficult.  People have two ways to find the data they are looking for: search and browse  In terms of searching, about one half of all retrieved documents have been reported to be irrelevant. Why?  Conclusion: How is the effective personalization system?

4 Zhang Tao Data Mining 4 Overview  The study of this paper  This paper studies ways to model a user’s interests and shows how these profiles can be deployed for more effective information retrieval and filtering.  A user profile is created over time by analyzing surfed pages.  This paper shows how the profiles can be used to achieve search performance improvements.  Introduce the OBIWAN project  The goal of OBIWAN is to investigate a novel content- based approach to distributed information retrieval.  Websites are clustered into regions.

5 Zhang Tao Data Mining 5 Overview  The architecture is a hierarchy of regions.  The text classifier is a core component not only of the entire OBIWAN project, but also of the presented personalization method.  Related Work  Personalization is a broad field of very active ongoing research.  Applications include personalized access to certain resources and filtering/rating systems.  SmartPush is currently the only system to store profiles as concept hierarchies.

6 Zhang Tao Data Mining 6 Determining the content of documents  Importance  User interests are inferred by analyzing the web pages the user visits.  For this purpose, it is necessary to determine the content, or characterize of these surfed pages.  A hierarchy of concepts  This ontology is based on a publicly accessible browsing hierarchy.  Each node is associated with a set of documents, all of documents for node are merged into a superdocument.  Documents as well as superdocuments are represented as weighted keyword vectors

7 Zhang Tao Data Mining 7 Determining the content of documents  This page vector is compared with the keyword vectors associated with every node to calculate similarities.  The nodes with the top matching vectors are assumed to be most related to the content of the surfed page.

8 Zhang Tao Data Mining 8 User Profiles  Introduce  User profiles store approximations of the interests of a given user.  User profiles include three features: hierarchically structured, and not just a list of keywords generated automatically, without explicit user feedback Dynamical  Creation and Maintenance  Profiles are generated by analyzing the surfing behavior of a user. “Surfing behavior” here refers to the length of the visited pages and the time spent thereon.

9 Zhang Tao Data Mining 9 User Profiles  Four different combinations of time, length, and subject discriminators have been investigated.  In the following function, time refers to the time a user spent on a given page, and length refers to the length of the page, ɤ (d,c i ) is the strength of the match between the content of document d and category c i. △ L(c i ) represents the interest L in a category c i. (1) (2)

10 Zhang Tao Data Mining 10 User Profiles  Profile Evaluation: Convergence  The evaluation of the user profiles consists of two parts: A notion of convergence is introduced with respect to which 16 actual user profiles are discussed. Examines the relationship between the calculated user interests and the actual user interests.  Figure 1 shows a sample profile (adjustment function 2), it consists of roughly 75 non-zero categories.  Figure 2 shows the numbers of non-zero categories for five sample profiles with 100-150 categories created using the same interest adjustment function.

11 Zhang Tao Data Mining 11 User Profiles

12 Zhang Tao Data Mining 12 User Profiles

13 Zhang Tao Data Mining 13 User Profiles  On average, that corresponds to roughly 320 pages, or 17 days of surfing. Table 1 summarizes the convergence properties.

14 Zhang Tao Data Mining 14 User Profiles  Comparison with actual user interests  Although convergence is a desirable property, it does not measure the accuracy of the generated profiles.  The sixteen users were shown the top twenty subjects in their profiles in random order and asked how appropriately these inferred categories reflected their interests.  Table 2 shows the experiment for the answers to some questions with the top 20 and top 10 categories respectively.

15 Zhang Tao Data Mining 15 User Profiles

16 Zhang Tao Data Mining 16 Improving Search Results  A problem about search results  The wealth of information available on the web is actually too large.  As to search results, the top ranked documents a user can have a look at are often not relevant to this user.  There are three common approaches to address this problem: Re-ranking: The algorithms apply a function to the ranking numbers that have been returned by the search engine. Filtering: Filtering systems determine which documents in the results sets are relevant and which are not. Query Expansion: If a query can be expanded with the user’s interests, the search results are likely to be more narrowly focused.

17 Zhang Tao Data Mining 17 Improving Search Results  Re-Ranking  Given a query, re-ranking is done by modifying the ranking that was returned by a publicly accessible search engine.  ProFusion (www.profusion.com) in this case. The idea is to characterize each of the returned documents and, by referring to the user profiles, to determine how much a user is interested in these categories.  The following function is the adjustment function of the Re-ranking method.

18 Zhang Tao Data Mining 18 Improving Search Results  Evaluation  The results that have been produced by the different re- ranking systems must be evaluated.  The eleven point precision average is the better measure method.  The eleven point precision average evaluates ranking performance in terms of recall and precision. Recall = Number of relevant items retrieved Number of relevant items in collection Precision = Number of relevant items retrieved Total number of items retrieved

19 Zhang Tao Data Mining 19 Improving Search Results  Figure 3 shows the recall-precision graphs for one interest adjustment functions.  Figure 4 shows The remaining set of 16 queries were evaluated using this function.

20 Zhang Tao Data Mining 20 Improving Search Results

21 Zhang Tao Data Mining 21 Improving Search Results

22 Zhang Tao Data Mining 22 Improving Search Results  Filtering  To filter a set of result documents means to exclude some documents.  Filtering was done by using the above ranking functions with thresholds to decide which documents were irrelevant and which were not.  Figures 5 and 6 show the performance of the filter for the training and the testing set, respectively.

23 Zhang Tao Data Mining 23 Improving Search Results

24 Zhang Tao Data Mining 24 Conclusion and Future Work  Conclusion  These profiles have been shown to converge and to reflect actual user interests quite well.  With the presented approach, the length of a surfed page can be neglected when the interest in a page is inferred.  Future work  Future work includes the integration of the system into a web browser.  Other areas of profile deployment are conceivable.


Download ppt "UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul."

Similar presentations


Ads by Google