Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Recommender Systems & Collaborative Filtering
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 7.1 Chapter 7 : Navigating the Web Frustration.
Link Prediction and Path Analysis using Markov Chains
Monitoring a web sites health. Web Analytics - Definition Measurement of the behavior of visitors to a website Which aspects of the website work towards.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Interception of User’s Interests on the Web Michal Barla Supervisor: prof. Mária Bieliková.
Back to Table of Contents
© Minder Chen, Web Architecture - 1 The Architecture of Internet and WWW Web Browser Client Web Server End User HTTP TCP/IP HTML documents Internet.
Dave Krause ANRCS Web Action Team.  Data is collected from a web site based on what the user does during the visit.
Chapter 12: Web Usage Mining - An introduction
Introduction to Web Analytics Web analytics is the measurement, collection, analysis and reporting of internet data for purposes of understanding and optimizing.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
Automatic Data Collection: Server Logs As with all methods, have to ask: What are the goals for your system? –What constitutes success, or good quality.
Metrics for Performance Measurement in E-Commerce MARK 3030 – Week 10.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Call Measurement with Website Click Tracking How will this help my business? Before answering a call, you can know: what pages are being viewed what source.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries.
Alexander Hartmann.  Free service offered by Google that generates detailed statistics about the visitors to a website. A premium version is also available.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Measuring Performance- Web Analytics Andre Samuel.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
1.Understand the decision-making process of consumer purchasing online. 2.Describe how companies are building one-to-one relationships with customers.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
ITIS 1210 Introduction to Web-Based Information Systems Chapter 48 How Internet Sites Can Invade Your Privacy.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 6-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
Web mining Web mining deals with mining of patterns from web and e-commerce data. Web data –Web pages –Web structures –Web logs –E-commerce sites – .
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
CSE Data Mining, 2002Lecture 11.1 Data Mining - CSE5230 Web Mining CSE5230/DMS/2002/11.
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Streaming Predictions of User Behavior in Real- Time Ethan DereszynskiEthan Dereszynski (Webtrends) Eric ButlerEric Butler (Cedexis) OSCON 2014.
Google Confidential and Proprietary 1 Google University Google Analytics and Website Optimiser Dyana Najdi, Customer Analytics Manager, EMEA Lee Hunter,
COMP3121 E-Commerce Technologies Richard Henson University of Worcester November 2011.
Personalization Speaker: Ping-Tsun Chang 3/7/2002.
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Mining Click-stream Data With Statistical and Rule-based Methods Martin Labský, Vladimír Laš, Petr Berka University of Economics, Prague.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Web Analytics MGMT 230 WEEK 10. After today’s class you will be able to: Explain the types of information routinely gathered by web servers Understand.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Stephen Panjaitan PRESIDENT UNIVERSITY ORGANIZATION BEHAVIORAL.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 3.1 Chapter 3 : The Problem of Web Navigation.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Technology for E-commerce Helena Ahonen-Myka. In this part... n search tools n metadata n personalization n collaborative filtering n data mining.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Secondary Evidence for User Satisfaction With Community Information Systems Gregory B. Newby University of North Carolina at Chapel Hill ASIS Midyear Meeting.
April 20023CSG1CRM 1 Electronic Commerce Customer relationship management John Wordsworth Department of Computer Science The University of Reading
Setting up a search engine KS 2 Search: appreciate how results are selected.
Quality Is in the Eye of the Beholder: Meeting Users ’ Requirements for Internet Quality of Service Anna Bouch, Allan Kuchinsky, Nina Bhatti HP Labs Technical.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
 by Başak Meral.  Introduction  Web Personalization  User Profiling  Log Analysis and Web Usage Mining  Research Initiatives  Conclusion.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Web Analytics Fundamentals Presented by Tejaswi, Chandrika, Sunil.
Recommender Systems & Collaborative Filtering
Why your conversion rates suck?
Web Mining Ref:
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Lin Lu, Margaret Dunham, and Yu Meng
Discovery of Significant Usage Patterns from Clickstream Data
WJEC GCSE Computer Science
Presentation transcript:

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)

Reminder - W3C Extended Log File FormatW3C Extended Log File Format cs = client-to-server actions s = server actions c = client actions sc = server-to-client actions

Analog Analog – Web Log File Analyser Gives basic statistics such as –number of hits –average hits per time period –what are the popular pages in your site –who is visiting your site –what keywords are users searching for to get to you –what is being downloaded Log data does not disclose the visitors identity What do analogs reports mean?mean Report for

Applications of Usage Mining Pre-fetching and caching web pages eCommerce and clickstream analysis Web site reorganisation Personalisation Recommendation of links and products

Identification of User By IP address –Not so reliable as IP can be dynamic –Different users may use same IP Through cookies –Reliable but user may remove cookies –Security and privacy issues Through login –Users have to register

Sessionising Time oriented (robust) –By total duration of session not more than 30 minutes –By page stay times (good for short sessions) not more than 10 minutes per page Navigation oriented (good for short sessions and when timestamps unreliable) –Referrer is previous page in session, or –Referrer is undefined but request within 10 secs, or –Link from previous to current page in web site

Mining Navigation Patterns Each session induces a user trail through the site A trail is a sequence of web pages followed by a user during a session, ordered by time of access. A pattern in this context is a frequent trail. Co-occurrence of web pages is important, e.g. shopping-basket and checkout. Use a Markov chain model.

Trails inferred from Log data (Each session results in a trail) IDTrail 1A1 > A2 > A3 2 3A1 > A2 > A3 > A4 4A5 > A2 > A4 5A5 > A2 > A4 > A6 6A5 > A2 > A3 > A6

Construct Markov Chain from Data Add a unique start state. –the start state has a transition to all visited web pages in the site. Add a unique final state. –the last page in each trail has a transition to the final state. The transition probabilities are obtained from counting click-throughs. The Markov chain built is called absorbing since we always end up in the final state.

The Markov Chain from the Data

Support and Confidence Support s in [0,1) – accept only trails whose initial probability is above s. –Setting support to be above the average click- through is reasonable. Confidence c in [0,1) – accept only trails whose probability is above c. –The probability of a trail is obtained by multiplying the transition probabilities of the links in the trail.

Mining Frequent Trails Find all trails whose initial probability is higher than s, and whose trail probability is above c. Use depth-first search on the Markov chain to compute the trails. The average time needed to find the frequent trails is proportional to the number of web pages in the site.

Frequent Trails Support = 0.1 and Confidence = 0.3 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67 A1 > A2 > A40.33 A5 > A2 > A40.33 A2 > A40.33 A4 > A60.33

Frequent Trails Support = 0.1 and Confidence = 0.5 TrailProbability A1 > A2 > A30.67 A5 > A2 > A30.67 A2 > A30.67

Content Mining Incorporate the categories that users are navigating through so we may better understand their activities. –E.g. what type of book is the user interested in; this may be used for recommendation. Classify users according to behaviour. –Is the users intent to browse, search or buy? Cluster users with common interests.

Pre-fetching and Caching Pages Learn access patterns to predict future accesses. Pre-fetch predicted pages to reduce latency. Can use Markov model and base the prediction on history of access. Also cache results of popular search engine queries.

ECommerce Click stream Analysis What is the users intention: browse, search or buy? Measure time spent on site - site stickiness Repeat visits – it has been shown that repeat visitors spend less time on the site; can be explained by learning. Measure visit-to-purchase conversion ratio, and predict purchase likelihood.

Supplementary Analyses to Improve eCommerce Web Sites Detecting visits from crawlers as opposed to human visitors. Form error analysis, e.g. login errors, mandatory fields not filled, incorrect format. When and why do people exit the site, e.g. visitor puts item in cart but exists before reaching the checkout. Analysis of local search engine logs – correlate with site behaviour. Product recommendations based on association rules (people who bought x also bought y). Geographic analysis – where are the customers? Demographic analysis – who are the customers?

Adaptive web sites Modify the web site according to user access. –Automatic synthesis of index pages (hubs that contain links on a specific topic) –Based on a clustering algorithm that uses the co-occurrence frequencies of pages from the log data. –Finds a concept that best describes each cluster.