Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Web Programming 1 Darby Chang Web Programming. Cookie 2 Web Programming.
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
شهره کاظمی 1 آزمايشکاه سيستم های هوشمند ( گزارش پيشرفت کار پروژه مدل مارکف.
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
Clearing your Cookies Google Chrome A short guide to help you navigate our website faster Brought to you by:
1 Profit from usage data analytics: Recent trends in gathering and analyzing IVR usage data Vasudeva Akula, Convergys Corporation 08/08/2006.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
Context Awareness System and Service SCENE JS Lee 1 UbiPhone:Human-Centered Ubiquitous Phone System.
Session Management A290/A590, Fall /25/2014.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Data Mining for Web Personalization
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
HTTP: cookies and advertising Concepts to cover:  web page content (including ads) from multiple site: composition at client  cookies  third-party cookies:
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
1 Predicting Download Directories for Web Resources George ValkanasDimitrios Gunopulos 4 th International Conference on Web Intelligence, Mining and Semantics.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Predicting and Bypassing End-to-End Internet Service Degradation Anat Bremler-BarrEdith CohenHaim KaplanYishay Mansour Tel-Aviv UniversityAT&T Labs Tel-Aviv.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Distributed Networks & Systems Lab. Introduction Collaborative filtering Characteristics and challenges Memory-based CF Model-based CF Hybrid CF Recent.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Data Mining as Pre-EDD Investigatory Tool Team 9.
USING PERL FOR CGI PROGRAMMING
Mining the Structure of User Activity using Cluster Stability Jeffrey Heer, Ed H. Chi Palo Alto Research Center, Inc – SIAM Web Analytics Workshop.
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
STATE MANAGEMENT.  Web Applications are based on stateless HTTP protocol which does not retain any information about user requests  The concept of state.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
Java server pages. A JSP file basically contains HTML, but with embedded JSP tags with snippets of Java code inside them. A JSP file basically contains.
ASP.Net, Web Forms and Web Controls 1 Outline Session Tracking Cookies Session Tracking with HttpSessionState.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Chapter VI What should I know about the sizes and speeds of computers?
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Copyright © 2010 Pearson Education, Inc. or its affiliate(s). All rights reserved.1 | Assessment & Information 1 Online Testing Administrator Training.
Some from Chapter 11.9 – “Web” 4 th edition and SY306 Web and Databases for Cyber Operations Cookies and.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data mining in web applications
Latest Updates on BlackHawk Mines Music : Privacy Policy
Web Mining Ref:
Whether you decide to use hidden frames or XMLHttp, there are several things you'll need to consider when building an Ajax application. Expanding the role.
Chapter 12: Automated data collection methods
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
Discovery of Significant Usage Patterns from Clickstream Data
Web Mining Research: A Survey
Presentation transcript:

Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005

Outline Motivation The main idea of the project Accomplished tasks Remaining tasks Discussion

The Problem The problem we would like to solve is: –How can we best assist a person browsing the Web by providing links to the pages that they are looking for. There are many reasons we might want to do this (e.g. pages hidden in a large Web site, broken links, seminar announcements, etc.)

Previous Work This problem has been studied a lot and people have used many approaches. The two main ways of solving this are: –Modeling user behavior (Markov models, HMMs, etc.) –Data mining for common browsing patterns Despite all the work that has been done, many other techniques have not been tried.

Markov Model Approaches These primarily model a user enough to suggest which link on the page they are looking at they should click. This is not useful unless there are many links on a page (e.g.

Data-Mining Approaches These are better able to find pages that are several links away from the current page. –Suppose we see a sequence of requests for pages A, B, C, D, E occurring frequently, we may consider adding a shortcut from A to E.

New Ideas for Solving This Problem Using recent activity to make recommendations. Using the contents of Web pages to make recommendations. Combining data mining and user modeling approaches. Using a machine learning approach

Data Data: Web server logs –CS department Web logs from Dec 6, 2004, to Feb 28, 2005 (thanks to Chuck Thompson) –NASA Kennedy Space Center collected over July and August 1995 (available freely online) The logs are long lists of Web page requests, each request is represented by: –The requester’s IP address –The time and date requested –The page requested –Etc.

Data Cleaning First, for privacy reasons, data had to be “sanitized” and the actual IP addresses were removed before we can have access to it. Requests for.gif,.jpg,.css, etc. files should be discarded. –Only looking at the extension of the requested file in not enough e.g. "GET /research/areas.php?area=proglang HTTP/1.1“ has no extension. Requests from crawlers. (robots.txt) Unsuccessful GETs.(code 200 only, not 404) Refreshes (consecutive requests for the same page)

Recommendations by a First Order Markov Model We wrote Perl scripts to parse and store the clean data We implemented a recommending model using simple first order Markov Models –This provides the user with links to the most frequently clicked links on the current page

Results for First Order Markov Model Evaluation was performed on the existing logs If the next click in a browsing session is the recommended page, it is a hit, otherwise it is a miss. Hit ratio for when only one page is recommended: –CS logs: Number of testing records: approx. 500,000 Hit ratio: 18.7% –NASA logs: Number of testing records: approx. 2 million for one month Hit ratio: 30% Other researchers have performed evaluation similarly. In some cases, a hit is considered to be when any recommended page is browsed to.

Using Recent Activity Suppose there is an important event somewhere in the Siebel Center at 4pm. –Many people might go to to find the location between 3:45 and 4:05! –It would be good to automatically discover this and generate the link for users

Dynamic Markov Model To model such recent browsing activity, we need a more sophisticated model that more heavily weights recent browsing activity. To do this, we implemented an “online” recommending model using “dynamic first order Markov Models” We set a threshold t – Only the requests within the past t minutes affect the model

Dynamic Markov Model Results This is too simplistic to work. Most successful recommendation are for major browsing patterns that do not change over time: –/info/prospective.php -> /graduate/admissions.php Accuracy decreases as t decreases We would need to recognize that the user is looking for ephemeral pages.

Using the Web Page Contents (To Do) Can we use the content of the previously browsed pages to recommend some links to the user? –E.g., if the last 10 pages the user has browsed contain the word IR, recommend Prof. Zhai’s web page. Perhaps we can use a machine learning algorithms to cast this as a multi-class classification problem.

Hybrid approaches (To Do) How to combine user-modeling with pattern mining? How to best combine individual user patterns (personalizations) with collective patterns (recommender systems)?

Other Things To Do Incorporate pattern mining Experimentally evaluate new models and combinations Actual Implementation (CGI scripts and cookies) Higher order Markov Models

Other Paradigms for Making Recommendations (Future Work) Recommendations as: –An AI planning problem? –An optimization problem? –Others?

Discussion Ideas about the model? Other paradigms to consider? How can we incorporate content? Suggestions?

Thank You.