Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.

Similar presentations

Presentation on theme: "Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005."— Presentation transcript:

1 Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

2 Outline Motivation The main idea of the project Accomplished tasks Remaining tasks Discussion

3 The Problem The problem we would like to solve is: –How can we best assist a person browsing the Web by providing links to the pages that they are looking for. There are many reasons we might want to do this (e.g. pages hidden in a large Web site, broken links, seminar announcements, etc.)

4 Previous Work This problem has been studied a lot and people have used many approaches. The two main ways of solving this are: –Modeling user behavior (Markov models, HMMs, etc.) –Data mining for common browsing patterns Despite all the work that has been done, many other techniques have not been tried.

5 Markov Model Approaches These primarily model a user enough to suggest which link on the page they are looking at they should click. This is not useful unless there are many links on a page (

6 Data-Mining Approaches These are better able to find pages that are several links away from the current page. –Suppose we see a sequence of requests for pages A, B, C, D, E occurring frequently, we may consider adding a shortcut from A to E.

7 New Ideas for Solving This Problem Using recent activity to make recommendations. Using the contents of Web pages to make recommendations. Combining data mining and user modeling approaches. Using a machine learning approach

8 Data Data: Web server logs –CS department Web logs from Dec 6, 2004, to Feb 28, 2005 (thanks to Chuck Thompson) –NASA Kennedy Space Center collected over July and August 1995 (available freely online) The logs are long lists of Web page requests, each request is represented by: –The requester’s IP address –The time and date requested –The page requested –Etc.

9 Data Cleaning First, for privacy reasons, data had to be “sanitized” and the actual IP addresses were removed before we can have access to it. Requests for.gif,.jpg,.css, etc. files should be discarded. –Only looking at the extension of the requested file in not enough e.g. "GET /research/areas.php?area=proglang HTTP/1.1“ has no extension. Requests from crawlers. (robots.txt) Unsuccessful GETs.(code 200 only, not 404) Refreshes (consecutive requests for the same page)

10 Recommendations by a First Order Markov Model We wrote Perl scripts to parse and store the clean data We implemented a recommending model using simple first order Markov Models –This provides the user with links to the most frequently clicked links on the current page

11 Results for First Order Markov Model Evaluation was performed on the existing logs If the next click in a browsing session is the recommended page, it is a hit, otherwise it is a miss. Hit ratio for when only one page is recommended: –CS logs: Number of testing records: approx. 500,000 Hit ratio: 18.7% –NASA logs: Number of testing records: approx. 2 million for one month Hit ratio: 30% Other researchers have performed evaluation similarly. In some cases, a hit is considered to be when any recommended page is browsed to.

12 Using Recent Activity Suppose there is an important event somewhere in the Siebel Center at 4pm. –Many people might go to to find the location between 3:45 and 4:05! –It would be good to automatically discover this and generate the link for users

13 Dynamic Markov Model To model such recent browsing activity, we need a more sophisticated model that more heavily weights recent browsing activity. To do this, we implemented an “online” recommending model using “dynamic first order Markov Models” We set a threshold t – Only the requests within the past t minutes affect the model

14 Dynamic Markov Model Results This is too simplistic to work. Most successful recommendation are for major browsing patterns that do not change over time: –/info/prospective.php -> /graduate/admissions.php Accuracy decreases as t decreases We would need to recognize that the user is looking for ephemeral pages.

15 Using the Web Page Contents (To Do) Can we use the content of the previously browsed pages to recommend some links to the user? –E.g., if the last 10 pages the user has browsed contain the word IR, recommend Prof. Zhai’s web page. Perhaps we can use a machine learning algorithms to cast this as a multi-class classification problem.

16 Hybrid approaches (To Do) How to combine user-modeling with pattern mining? How to best combine individual user patterns (personalizations) with collective patterns (recommender systems)?

17 Other Things To Do Incorporate pattern mining Experimentally evaluate new models and combinations Actual Implementation (CGI scripts and cookies) Higher order Markov Models

18 Other Paradigms for Making Recommendations (Future Work) Recommendations as: –An AI planning problem? –An optimization problem? –Others?

19 Discussion Ideas about the model? Other paradigms to consider? How can we incorporate content? Suggestions?

20 Thank You.

Download ppt "Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005."

Similar presentations

Ads by Google