Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr.

Slides:



Advertisements
Similar presentations
Experience Guided Shopping & Search Guiders ® Deliver Measurable ROI Through Reports Metric Reports Deliver Unique Customer Decision Insights Guiders offer.
Advertisements

E-Business and e-Commerce. e-commerce and e-business e-commerce refers to aspects of online business involving exchanges among customers, business partners.
Predicting User Interests from Contextual Information
1 OOA-HR Workshop, 11 October 2006 Semantic Metadata Extraction using GATE Diana Maynard Natural Language Processing Group University of Sheffield, UK.
Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Data Mining for Web Personalization
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
® Microsoft Office 2010 Browser and Basics.
Customer information: Server log file and clickstream analysis; data mining MARK 430 Week 3.
Measuring Success: SES London 2007 An Introduction to Web Analytics ● Types of Tracking ● Why You Need Analytics ● How to Employ Tracking Data ● Specific.
Chapter 12: Web Usage Mining - An introduction
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Automatic Data Collection: Server Logs As with all methods, have to ask: What are the goals for your system? –What constitutes success, or good quality.
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
"Consumers and New Technologies: A Marketing Perspective" Course 3 Product management on the Internet: Personalisation and customisation Jacques Nantel.
Discovery of Aggregate Usage Profiles for Web Personalization
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Insight on Google Analytics Features - Suresh. K.
Measuring and Monitoring Social Media Presence Measuring and Monitoring Social Media Presence Rim Dakelbab.
Clickstream analysis - data collection, preprocessing and mining using LISp-Miner system Effective placement of on-line advertising Tomáš Kliegr KIZI A.
Search Engine Optimization Andrew Steward Matthew Golling.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
WEEK 2 TOPIC : INTERNET (CONTINUED) This is the distribution of messages, information, fascimiles of documents e.t.c from one computer terminal.
Processing and Analyzing Large log from Search Engine Meng Dou 13/9/2012.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Page 1 CSISS Center for Spatial Information Science and Systems Design and Implementation of CWIC Metrics Weiguo Han, Liping Di, Yuanzheng Shao, Lingjun.
Web Analytics Unit 4-1(2005 Fall) Managing the Digital Enterprise By Professor Michael Rappa.
Portal User Group Meeting June 13, Agenda I. Welcome II. Updates on the following: –Migration Status –New Templates –DB Breakup –Keywords –Streaming.
Web Performance and key business metrics Part II: More Findings from the Front Line of Web Acceleration.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
1 Tools for Website Effectiveness. What is your site producing? Sales PR Expanding client base Brand awareness Feedback.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
DATABASE MANAGEMENT SYSTEMS CMAM301. Introduction to database management systems  What is Database?  What is Database Systems?  Types of Database.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
Search Tools and Search Engines Searching for Information and common found internet file types.
Web-Mining …searching for the knowledge on the Internet… Marko Grobelnik Institut Jožef Stefan.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Post-Ranking query suggestion by diversifying search Chao Wang.
B. Information Technology (Hons.) CMPB245: Database Design Physical Design.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Chapter 4: Marketing on the Web. 2 How do you reach customers? Identify groups of potential customers Select the appropriate media Build the right message.
Introduction Web analysis includes the study of users’ behavior on the web Traffic analysis – Usage analysis Behavior at particular website or across.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Web Analytics and Reporting Michal Neuwirth Product Manager – Kentico Software.
A WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEB SITES.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
Data mining in web applications
PIWIK JUNIOR TIDAL ASSOCIATE PROF., WEB SERVICES & MULTIMEDIA LIBRARIAN NEW YORK CITY COLLEGE OF TECHNOLOGY, CUNY.
Automated ad placement
W3 Status Analyzer.
Latest Updates on BlackHawk Mines Music : Privacy Policy
VELTI Evaluation Methodology
On Improving Website Connectivity by Using Web-Log Data Streams
EMarketing: The Essential Guide to Marketing in a Digital World Data Analytics What you’ll learn How to set solid objectives, goals and measurements.
Star Schema.
Lin Lu, Margaret Dunham, and Yu Meng
Discovery of Significant Usage Patterns from Clickstream Data
Presentation transcript:

Clickstream analysis - data collection, preprocessing and mining using the LISp-Miner system Effective placement of on-line advertisments Tomáš Kliegr A case study approach

2 Methodology – CRISP DM

3 I. Data collection Data are collected on the server application layer No demands on the tracked website

4 Comparison with log-file based approaches Works with all browsers with enabled cookies Automatic robot filtering Storage efficiency Easy to integrate & safe to operate

5 II. Data preprocessing Problem: collected click streams have varying lengths. Goal: create higher-level abstraction of the visitor This phase creates a fixed-length visitor’s profile in a two step process Segment procedure: classifies pages into a domain specific taxonomy on several levels of granularity. Merge procedure: extracts important and characteristic information from visitor’s clickstream.

6 Assigning pages to categories Prespecified taxonomy (tuples ProductID - category, Tuples URL pattern – category) SQL Server SP Segment Pages classified on several levels of granularity Visited pages (UR addresses Stored in a database)

7 Segment procedure Classifies pages into a domain specific taxonomy on several levels of granularity. Assigns Time on page and Score to each page in visitor’s clickstream Score expresses absolute weight of a particular page in user’s click stream. S = (ln(o) + 1)* t o – order of a page in users clickstream t – time on page

8 Segment – Example output Page General category (Cat) Search Extended Category (ECat) Catalogue Topic Alps

9 Merge procedure This procedure creates the visitor profile: Basic attributes (6): Total time on web, Number of displayed pages, Day of week, Hour of day, Referring domain (constituted by URL and Cat attributes). Important points on the path (12): Entry page, Exit page, Conversion page. (Page name, Cat, ECat and S). Attributes conceptualizing the path (11): Range of interest, Most favourite topic (Topic, S), Search total (S) and Search analytically (Fulltext (S), Extended search (S),Catalogue Search (S)), General information pages total (S) and analytically (Discounts(S), Insurance (S), About (S)).

10 Merge – example output

11 III. Datamining Association Rules are the most frequently used approach [Facci, Lanza] LISp-Miner system - 4ft-Miner, SD4ft- Miner Sample task:From which referring class of websites do most converted visitors come?

12 Choosing the right quantifier LISp-Miner offers a range of quantifiers Founded implication –Support a, a/(a+b+c+d) –Confidence a/(a+b) –Problem: tight dependencies rarely found and rarely required in clickstream data Above average quantifier “Among objects satisfying Ant there are at least 100*p per cent more objects satisfying Suc then there are objects satisfying Suc in the whole data matrix.” LISp- Miner Help

13 Ilustrace Ant/SucConversionNot(Conversion) Partner webs763 Not (PW)7693 Confidence threshold max.<= 7/(63+7) <= 0.1 AAI threshold <= 0.1/0.018 <= [% of objects satisfying Suc and Ant] = 7/ 70 = 0.1 [% of objects satisfying Suc in the entire data matrix] = 14/ 770 = LISP-Miner demonstration

14 SD4ft-Miner Mines for patterns of the form    /( , ,  ) This SD4ft-Pattern means that the subsets given by Boolean attributes ,  differ in what concerns the relation of Boolean attributes ,  when condition  is satisfied. What groups of customers ,  (i.e. depending on where they come from) under what condition remarkably differ when it comes to the probability of conversion. We express “the conversion condition” by setting only the succedent (  ) and we leave the antecedent unset.

15

16 4ft Miner vs SD4ft 4ft-Miner, Above Average Quant. SD4ft-Miner, (neg. gace type for 2 nd subset) The value of increase in the conversion rate is more suitable for our purposes as the 2 nd set is disjunct with the 1 st set. The conversion rate for partner webs is 78% higher than is the average for other referrers Con1/Conf2= 0,132/0,074 = 1,784

17 Solution to Task 1 From which referring class of websites do most converted visitors come?

18 SD4Ft – cont. If the output is sorted according to Difference of values of confidence The first rule says: Conversion rate for visitors coming from partner websites is 13.2%, while conversion rate for visitors coming from company’s own websites is only 4.9%.

19 Review The goal of the second run of the CRISP- DM Cycle is to: Extend available info - log user actions Improve the heuristics for the Most favourite topic Involve page texts New development platform – Ferda boxes

20

21 References Rauch, J., Šimůnek, M.: An Alternative Approach to Mining Association Rules. In: Foundations of Data Mining and Knowledge Discovery. Berlin 2005 Rauch, J., et al: Mining for Patterns Based on Contingency Tables by KL-Miner - First Experience. In: Foundations and Novel Approaches in Data Mining. Berlin: Springer, 2005 Strossa, P., et al: Reporting Data Mining Results In a Natural Language. In: dtto Kováč, M., et al: Ferda, New Visual Environment for Data Mining. Znalosti 2006 LM Report Asistent. Znalosti 2007 Lispminer.vse.cz, ferda.sourceforge.net/