Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.

Slides:



Advertisements
Similar presentations
Experience Guided Shopping & Search Guiders ® Deliver Measurable ROI Through Reports Metric Reports Deliver Unique Customer Decision Insights Guiders offer.
Advertisements

Web Mining.
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Data Mining for Web Personalization
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
Optimizing your business online Web Analytics How to raise effectiveness of websites and advertising campaigns Andrew Yunisov Managing Partner.
Back to Table of Contents
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
Chapter 12: Web Usage Mining - An introduction
Copyright © 2004 Pearson Education, Inc. Slide 7-1 E-commerce Kenneth C. Laudon Carol Guercio Traver business. technology. society. Second Edition.
Web Mining Research: A Survey
Web Usage Mining: Processes and Applications
Software Testing and Quality Assurance Testing Web Applications.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
12/11/01 Matt Bridges Advisor: Ralph Morelli. What is Web Analytics? In traditional commerce, store owners can observe their customers habits: What time.
Discovery of Aggregate Usage Profiles for Web Personalization
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Electronic Commerce Systems
Overview of Web Data Mining and Applications Part I
_______________________________________________________________________________________________________________ E-Commerce: Fundamentals and Applications1.
Alexander Hartmann.  Free service offered by Google that generates detailed statistics about the visitors to a website. A premium version is also available.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
Overview of Web Data Mining and Applications Part II
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
Web Mining ( 網路探勘 ) WM12 TLMXM1A Wed 8,9 (15:10-17:00) U705 Web Usage Mining ( 網路使用挖掘 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Consumer Behavior, Market Research
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 6-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
Web mining Web mining deals with mining of patterns from web and e-commerce data. Web data –Web pages –Web structures –Web logs –E-commerce sites – .
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
© 2008 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
10 Adding Interactivity to a Web Site Section 10.1 Define scripting Summarize interactivity design guidelines Identify scripting languages Compare common.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Google Confidential and Proprietary 1 Google University Google Analytics and Website Optimiser Dyana Najdi, Customer Analytics Manager, EMEA Lee Hunter,
©2010 John Wiley and Sons Chapter 12 Research Methods in Human-Computer Interaction Chapter 12- Automated Data Collection.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
Data Mining By Dave Maung.
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Technology for E-commerce Helena Ahonen-Myka. In this part... n search tools n metadata n personalization n collaborative filtering n data mining.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
 Shopping Basket  Stages to maintain shopping basket in framework  Viewing Shopping Basket.
Glossary of Terms Sessions - (old name: Visits) Users - (old name: Unique Visitors) Pageviews Pages/Session Avg. Session Duration Bounce Rate %New Sessions.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Ecommerce Basics Standard 2 Objective 1. Ecommerce Business conducted on the internet.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Web Analytics and Reporting Michal Neuwirth Product Manager – Kentico Software.
Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Profiling: What is it? Notes and reflections on profiling and how it could be used in process mining.
WebMiningResearchASurvey Web Mining Research: A Survey Authors: Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Computer Science Department University.
Data mining in web applications
What is Google Analytics?
Chapter 12: Automated data collection methods
Presentation transcript:

Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M. Spiliopoulou

Introduction Web usage mining: automatic discovery of patterns in clickstreams and associated data collected or generated as a result of user interactions with one or more Web sites. Goal: analyze the behavioral patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of pages, objects, or resources that are frequently accessed by groups of users with common interests.

Bing Liu 3 Introduction Data in Web Usage Mining:  Web server logs  Site contents  Data about the visitors, gathered from external channels  Further application data Not all these data are always available. When they are, they must be integrated. A large part of Web usage mining is about processing usage/ clickstream data.  After that various data mining algorithm can be applied.

Web server logs Bing Liu 4

Web usage mining process Bing Liu 5

6 Data preparation

Pre-processing of web usage data Bing Liu 7

Concepts Pageview  Most basic level of data abstraction  An aggregate representation of a collection of Web objects contributing to the display on a user’s browser resulting from a single user action (clickthrough). Session(visit)  Most basic level of behavior abstraction  A sequence of pageviews by a single user during a single visit Episode  transaction  A subset of pageviews in the session that are significant for the analysis tasks. Bing Liu 8

Data cleaning  remove irrelevant references and fields in server logs  remove references due to spider navigation  remove erroneous references  add missing references due to caching (done after sessionization) Bing Liu 9

Identify sessions (sessionization) In Web usage analysis, these data are the sessions of the site visitors: the activities performed by a user from the moment she enters the site until the moment she leaves it. Difficult to obtain reliable usage data due to proxy servers and anonymizers, dynamic IP addresses, missing references due to caching, and the inability of servers to distinguish among different visits. Bing Liu 10

Sessionization strategies Bing Liu 11

Sessionization heuristics Bing Liu 12

Sessionization example Bing Liu 13

Sessionization example (cont’d) Bing Liu 14

User identification Bing Liu 15

User identification: an example Bing Liu 16

Pageview A pageview is an aggregate representation of a collection of Web objects contributing to the display on a user’s browser resulting from a single user action (such as a click-through). Conceptually, each pageview can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart. Bing Liu 17

Path completion Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached. For instance,  if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client-side, and therefore, no request is made to the server.  This results in the second reference to A not being recorded on the server logs. Bing Liu 18

Missing references due to caching Bing Liu 19

Path completion The problem of inferring missing user references due to caching. Effective path completion requires extensive knowledge of the link structure within the site Referrer information in server logs can also be used in disambiguating the inferred paths. Problem gets much more complicated in frame-based sites. Bing Liu 20

Integrating with e-commerce events Either product oriented or visit oriented Used to track and analyze conversion of browsers to buyers.  Major difficulty for E-commerce events is defining and implementing the events for a site, however, in contrast to clickstream data, getting reliable preprocessed data is not a problem. Another major challenge is the successful integration with clickstream data Bing Liu 21

Product-Oriented Events Product View  Occurs every time a product is displayed on a page view  Typical Types: Image, Link, Text Product Click-through  Occurs every time a user “clicks” on a product to get more information Bing Liu 22

Product-Oriented Events Shopping Cart Changes  Shopping Cart Add or Remove  Shopping Cart Change - quantity or other feature (e.g. size) is changed Product Buy or Bid  Separate buy event occurs for each product in the shopping cart  Auction sites can track bid events in addition to the product purchases Bing Liu 23

Web usage mining process Bing Liu 24

Integration with page content Bing Liu 25

Data modeling for web usage mining A set of n pageviews, P={p1,p2,…,pn) A set of m user transactions, T={t1,t2,…,tm}  Each ti is a subset of P (potentially with order and weight) Bing Liu 26

User-pageview matrix (without order) Bing Liu 27

Bing Liu 28

Integration with link structure Bing Liu 29

E-commerce data analysis Bing Liu 30

Session analysis Simplest form of analysis: examine individual or groups of server sessions and e- commerce data. Advantages:  Gain insight into typical customer behaviors.  Trace specific problems with the site. Drawbacks:  LOTS of data.  Difficult to generalize. Bing Liu 31

Session analysis: aggregate reports Bing Liu 32

OLAP Bing Liu 33

Data mining Bing Liu 34

Data mining (cont.) Bing Liu 35

Some usage mining applications Bing Liu 36

Bing Liu 37

Personalization application Bing Liu 38

Standard approaches Bing Liu 39

Summary Web usage mining has emerged as the essential tool for realizing more personalized, user-friendly and business-optimal Web services. The key is to use the user-clickstream data for many mining purposes. Traditionally, Web usage mining is used by e- commerce sites to organize their sites and to increase profits. It is now also used by search engines to improve search quality and to evaluate search results, etc, and by many other applications. Bing Liu 40