Overview of Web Data Mining and Applications Part II

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Back to Table of Contents
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Preparation for Web Usage Analysis
Management Information Systems, Sixth Edition
E-Metrics and E-Business Analytics Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc. All rights reserved. 8-1 BUSINESS DRIVEN TECHNOLOGY Chapter Eight: Viewing and Protecting Organizational.
Chapter 12: Web Usage Mining - An introduction
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Web Usage Mining: Processes and Applications
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Metrics October 26, 2006 Steven Schwartz President, PowerWebResults.com Southeastern Massachusetts E-Commerce Network University of Massachusetts –
Metrics for Performance Measurement in E-Commerce MARK 3030 – Week 10.
Discovery of Aggregate Usage Profiles for Web Personalization
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Chapter 4: Database Management. Databases Before the Use of Computers Data kept in books, ledgers, card files, folders, and file cabinets Long response.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Data Mining – Intro.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
Insight on Google Analytics Features - Suresh. K.
Overview of Web Data Mining and Applications Part I
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Measuring Performance- Web Analytics Andre Samuel.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web logs Data Engineering Lab 성 유 진.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Dr. Guandong Xu Intelligent Web & Information Systems (IWIS) Department of Computer Science, Aalborg University Web Usage Mining & Personalization.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Fall 2006 Davison/LinCSE 197/BIS 197: Search Engine Strategies 6-1 Module II Overview PLANNING: Things to Know BEFORE You Start… Why SEM? Goal Analysis.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
ACS1803 Lecture Outline 2 DATA MANAGEMENT CONCEPTS Text, Ch. 3 How do we store data (numeric and character records) in a computer so that we can optimize.
Overview of Web Mining and E-Commerce Data Analytics
Chapter 21 Copyright ©2012 by Cengage Learning Inc. All rights reserved 1 Lamb, Hair, McDaniel CHAPTER 21 Customer Relationship Management (CRM)
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Web Analytics MGMT 230 WEEK 10. After today’s class you will be able to: Explain the types of information routinely gathered by web servers Understand.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Overview of Web Mining and E-Commerce Data Analytics Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Glossary of Terms Sessions - (old name: Visits) Users - (old name: Unique Visitors) Pageviews Pages/Session Avg. Session Duration Bounce Rate %New Sessions.
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Academic Year 2014 Spring Academic Year 2014 Spring.
April 20023CSG1CRM 1 Electronic Commerce Customer relationship management John Wordsworth Department of Computer Science The University of Reading
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
Chapter 1: Internet Marketing Foundations. Chapter Objectives Describe how computers and servers communicate to enable people to interact with webpages.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Introduction to Digital Analytics Keith MacDonald Guest Presentation.
Adobe Digital Marketing
Data Mining – Intro.
DATA MINING © Prentice Hall.
Web Mining Ref:
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 12: Automated data collection methods
Presentation transcript:

Overview of Web Data Mining and Applications Part II Bamshad Mobasher DePaul University

What is Web Mining Web Mining Definition application of data mining and machine learning techniques to extract useful knowledge from the content, structure, and usage of Web resources.

Types of Web Mining Web Mining Web Content Mining Web Usage Mining Web Structure Mining Let’s get back to our discussion of Web mining and its applications. Web mining can be categorized into three separate areas based on the type of data that is being mined or analyzed.

Types of Web Mining Web Mining Web Content Mining Web Usage Mining Web Structure Mining Extracting interesting patterns from user interactions with resources on one or more Web sites

Types of Web Mining Web Mining Web Content Mining Web Usage Mining Web Structure Mining Applications: user and customer behavior modeling Web site optimization e-customer relationship management Web marketing targeted advertising Personalization

Data Mining and Personalization Personalization: “Killer App” for big data analytics Tangible successes both in the research and in industrial applications recommender systems personalized Web agents user adaptive systems Web marketing & targeted advertising personalized search Sophisticated modeling approaches based on both predictive and unsupervised DM techniques

Web Usage Mining :: data sources Typical Sources of Data: automatically generated Web/application server access logs e-commerce and product-oriented user events (e.g., shopping cart changes, product clickthroughs, etc.) user profiles and/or user ratings meta-data, page content, site structure User Transactions sets or sequences of pageviews possibly with associated weights a pageview is a set of page files and associated objects that contribute to a single display in a Web Browser

What’s in a Typical Server Log?

Typical Fields in a Log File Entry client IP address 1.2.3.4 base url maya.cs.depaul.edu date/time 2006-02-01 00:08:43 http method GET file accessed /classes/cs589/papers.html protocol version HTTP/1.1 status code 200 (successful access) bytes transferred 9221 referrer page http://dataminingresources.blogspot.com/ user agent Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1; +SV1;+.NET+CLR+2.0.50727) In addition, there may be fields corresponding to login information client-side cookies (unique keys, issued to clients in order to identify a repeat visitor) session ids issued by the Web or application servers

Basic Entities in Web Usage Mining User (Visitor) - Single individual that is accessing files from one or more Web servers through a Browser Page File - File that is served through HTTP protocol Pageview - Set of Page Files that contribute to a single display in a Web Browser User Session - Set of Pageviews served due to a series of HTTP requests from a single User across the entire Web. Server Session - Set of Pageviews served due to a series of HTTP requests from a single User to a single site Transaction (Episode) - Subset of Pageviews from a single User or Server Session

Main Challenges in Data Collection and Preprocessing Main Questions: what data to collect and how to collect it; what to exclude how to identify requests associated with a unique user sessions (HTTP is “stateless”) how to identify/define user transactions how to identify what is the basic unit of analysis (e.g., pageviews, items purchased, user ratings, etc.) how to integrate data across channels: e-commerce data, clickstream data, user profiles, social media data, product meta data, etc.

Usage Data Preparation Tasks Data cleaning remove irrelevant references and fields in server logs remove references due to spider navigation add missing references due to client-side caching Data integration synchronize data from multiple server logs integrate e-commerce and application server data integrate meta-data Data Transformation pageview identification identification of product-oriented events identification of unique users sessionization – partitioning each user’s record into multiple sessions or transactions (usually representing different visits) integrating meta-data and user profile data with user sessions

Conceptual Representation of User Transactions or Sessions Pageview/objects Sessions/user transactions This is the typical representation of the data, after preprocessing, that is used for input into data mining algorithms. Raw weights may be binary, based on time spent on a page, or other measures of user interest in an item. In practice, need to normalize or standardize this data.

Web Usage Mining as a Process

E-Commerce Data Integrating E-Commerce and Usage Data Needed for analyzing relationships between navigational patterns of visitors and business questions such as profitability, customer value, product placement, etc. E-business / Web Analytics E.g., tracking and analyzing conversion of browsers to buyers E-Commerce v. Simple Usage Data E-commerce data is product oriented while usage data is pageview oriented Usage events (pageviews) are well defined and have consistent meaning across all Web sites E-commerce events are often only applicable to specific domains, and the definition of certain events can vary from site to site Major difficulty for Usage events is getting accurate preprocessed data Major difficulty for E-commerce events is defining and implementing the events for a particular site

Why We Need Web Analytics Are we attracting new people to our site? Is our site ‘sticky’? Which regions in it are not? What is the health of our lead qualification process? How adept is our conversion of browsers to buyers? What behavior indicates purchase propensity? What site navigation do we wish to encourage? How can profiling help use cross-sell and up-sell? How do customer segments differ? What attributes describe our best customers? Can we target other prospects like them? What makes customers loyal? How do we measure loyalty?

Three Skill Sets Required Technology How do we get the data? Are we collecting the right data? Analytics How do we turn the data into insightful information? Business Management What action do we take? How do we measure the impact of that action? Data Collection / Preprocessing / Integration Analysis Tools, OLAP, Data Mining E-Metrics

Using Analytics for E-Business Management Navigation Calibration Calculating Content Popularity Freshness Stickiness / Slipperiness / Leakage Stimulus - Inducement Conversion Quotient Interaction Computation Customer Service Assessment Customer Experience Evaluation Branding Refresh rate Visit Frequency < 1 ?

Web Usage and E-Business Analytics Different Levels of Analysis Session Analysis Static Aggregation and Statistics OLAP Data Mining

Session Analysis Simplest form of analysis: examine individual or groups of server sessions and e-commerce data. Advantages: Gain insight into typical customer behaviors. Trace specific problems with the site. Drawbacks: LOTS of data. Difficult to generalize.

Static Aggregation (Reports) Most common form of analysis. Data is aggregated by predetermined units such as days or sessions. Generally gives most “bang for the buck.” Advantages: Gives quick overview of how a site is being used. Minimal disk space or processing power required. Drawbacks: No ability to “dig deeper” into the data.

Online Analytical Processing (OLAP) Allows changes to aggregation level for multiple dimensions. Generally associated with a Data Warehouse. Advantages & Drawbacks Very flexible Requires significantly more resources than static reporting.

Data Mining: Going Deeper Frequent Itemsets and Association Rules The “Donkey Kong Video Game” and “Stainless Steel Flatware Set” product pages are accessed together in 1.2% of the sessions. When the “Shopping Cart Page” is accessed in a session, “Home Page” is also accessed 90% of the time. When the “Stainless Steel Flatware Set” product page is accessed in a session, the “Donkey Kong Video” page is also accessed 5% of the time. 30% of clients who accessed /special-offer.html, placed an online order in /products/software/ Sequential Patterns Add an extra dimension to frequent itemsets and association rules - time “x% of the time, when AB appears in a transaction, C appears within z transactions”) 40% of people who bought the book “How to cheat IRS” booked a flight to South America 6 months later The “Video Game Caddy” page view is accessed after the “Donkey Kong Video Game” page view 50% of the time. This occurs in 1% of the sessions. 15% of visitors followed the path home > * > software > * > shopping cart > checkout

Data Mining: Going Deeper Clustering: Content-Based or Usage-Based Customer/visitor segmentation Categorization of pages and products Classification Classifying users into behavioral groups (browser, likely to purchase, loyal customer, etc.) Examples: Cusotmers who access Video Game Product pages, have income of 50K+, and have 1 or more children, should get a banner ad for Xbox in their next visit. Customers who make at least 4 purchases in one year should be categorized as “loyal” Load applicants in 45K-60K income range, low debt, and good-excellent credit should be approved for a new mortgage.

Example: Path Analysis for Ecommerce Visit 90% 10% No Search Search (64% successful) Avg sale per visit: 2.2X Avg sale per visit: $X 30% 70% Last Search Failed Last Search Succeeded Avg sale per visit: 2.8X Avg sale per visit: 0.9X

Example: Association Analysis for Ecommerce Product Association Lift Confidence Website Recommended Products J Jasper Towels Fully Reversible Mats 456 41% Egyptian Cotton Towels White Cotton T-Shirt Bra Plunge 246 25% Black embroidered underwired bra Confidence 1.4% Confidence 1% Confidence: 41% who purchased Fully Reversible Mats also purchased Egyptian Cotton Towels Lift: People who purchased Fully Reversible Mats were 456 times more likely to purchase the Egyptian Cotton Towels compared to the general population

Web Usage Mining: clustering example Transaction Clusters: Clustering similar user transactions and using centroid of each cluster as a usage profile (representative for a user segment) Sample cluster centroid from dept. Web site (cluster size =330) Support URL Pageview Description 1.00 /courses/syllabus.asp?course=450-96-303&q=3&y=2002&id=290 SE 450 Object-Oriented Development class syllabus 0.97 /people/facultyinfo.asp?id=290 Web page of a lecturer who thought the above course 0.88 /programs/ Current Degree Descriptions 2002 0.85 /programs/courses.asp?depcode=96&deptmne=se&courseid=450 SE 450 course description in SE program 0.82 /programs/2002/gradds2002.asp M.S. in Distributed Systems program description

Basic Framework for E-Commerce Data Analysis customers orders products Operational Database Content Analysis Module Web/Application Server Logs Data Cleaning / Sessionization Site Map Site Dictionary Integrated Sessionized Data Integration E-Commerce Data Mart Data Mining Engine OLAP Tools Usage Pattern Data Cube