A Proposed Methodology for E-Business Intelligence Measurement Using Data Mining Techniques Stavros Valsamidis, Ioannis Kazanidis, Sotirios Kontogiannis.

Slides:



Advertisements
Similar presentations
UNIVERSITY COLLEGE DUBLIN DUBLIN CITY UNIVERSITY This material is based upon work supported by Science Foundation Ireland under Grant No. 03/IN3/1361 TEMPORAL.
Advertisements

Web Mining.
Back to Table of Contents
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
1. Abstract 2 Introduction Related Work Conclusion References.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Web Mining Research: A Survey
Data Mining By Archana Ketkar.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Chapter 12 Information Systems. 2 Chapter Goals Define the role of general information systems Explain how spreadsheets are organized Create spreadsheets.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Presenter: Teng-Chih Yang Professor: Ming-Puu Chen Date: 10/ 28/ 2009 Data mining in course management systems: Moodle case study and tutorial Romero,
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
ICT TEACHERS` COMPETENCIES FOR THE KNOWLEDGE SOCIETY
Enterprise systems infrastructure and architecture DT211 4
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
The 2nd International Conference of e-Learning and Distance Education, 21 to 23 February 2011, Riyadh, Saudi Arabia Prof. Dr. Torky Sultan Faculty of Computers.
«Tag-based Social Interest Discovery» Proceedings of the 17th International World Wide Web Conference (WWW2008) Xin Li, Lei Guo, Yihong Zhao Yahoo! Inc.,
Understanding Data Analytics and Data Mining Introduction.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
@ ?!.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
IBIS-Admin New Mexico’s Web-based, Public Health Indicator, Content Management System.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
BUSINESS DRIVEN TECHNOLOGY
Data Mining By Dave Maung.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
IBISAdmin Utah’s Web-based Public Health Indicator Content Management System.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
CMPS 435 F08 These slides are designed to accompany Web Engineering: A Practitioner’s Approach (McGraw-Hill 2008) by Roger Pressman and David Lowe, copyright.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING By Cecilia Parng CS 157B.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Research Academic Computer Technology Institute (RACTI) Patras Greece1 An Algorithmic Framework for Adaptive Web Content Christos Makris, Yannis Panagis,
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data mining in web applications
Introduction BIM Data Mining.
Improving searches through community clustering of information
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Web Mining Ref:
Software Documentation
Database Design Using the REA Data Model
ANALYSIS ON ICT USAGE OF HUNGARIAN FRUIT AND VEGETABLE PROCESSING ENTERPRISES Szilvia Botos, László Várallyai, Róbert Szilágyi,Gergely Ráthonyi, János.
Presentation transcript:

A Proposed Methodology for E-Business Intelligence Measurement Using Data Mining Techniques Stavros Valsamidis, Ioannis Kazanidis, Sotirios Kontogiannis Alexandros Karakos { { } PCI 2014

Outline Introduction Method Results Discussion Limitations Conclusions PCI 2014

Introduction (1/7) E-business Business Intelligence Knowledge Data Discovery Data Mining PCI 2014

Introduction (2/7) E-business E-business refers to any business that uses the Internet and related technologies. E- business is the conducting of business on the Internet, not only buying and selling but also servicing customers and collaborating with business partners Intelligence Luhn defined intelligence as: "the ability to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal“. PCI 2014

Introduction (3/7) Business Intelligence Business Intelligence (BI) is the emerging discipline that aims at combining corporate data with textual user-generated content (UGC) to let decision-makers analyze their business based on the trends perceived from the environment PCI 2014

Introduction (4/7) Knowledge Data Discovery The term Knowledge Data Discovery (KDD) was coined in 1989 to refer to the broad process of finding knowledge in data, and to emphasize the “high-level” application of particular data mining (DM) methods PCI 2014

Introduction (5/7) Data Mining Data mining main goal is the search for relationships and distinct patterns that exist in datasets but they are “hidden" among the vast amount of data. PCI 2014

Introduction (6/7) Indexes and metrics proposed by authors for the usage of web applications. There are not metrics specifically for measuring e-business usage in terms of BI This study contributes to the area of web usage analysis for e-business intelligence by ‘marrying’ e-business with data mining Four metrics, applied innovatively for the first time in the field of e-business PCI 2014

Introduction (7/7) This paper proposes an iterative method for designing and maintaining BI applications that reorganizes the activities and tasks normally carried out by practitioners is completed by a case study to the consumer goods area, aimed at proving that the adoption of a structured methodology positively impacts on the project success PCI 2014

Method PCI 2014

Method – Steps (1/5)  logging data: logging of specific data from e-business systems  Specifically eleven (11) fields (request_time_event, remote_host, request_uri, remote_logname, remote_user, request_method, request_time, request_protocol, status, bytes_sent, referer, agent) and user requests from different products  Pre-processing: The data contain noise such as URLs, emoticons, symbols, like asterisks, hashes, etc. PCI 2014

Method – Steps (2/5)  Indexes, metrics and rates: Attribute nameDescription of the attribute SessionsThe number of sessions per product viewed by users PagesThe number of pages per product viewed by users Unique pagesThe number of unique pages per product viewed by users Unique Pages per ProductID per Session (UPPS) The number of unique pages per product viewed by users per session HomogeneityThe homogeneity of products EnrichmentThe enrichment of products DisappointmentThe disappointment of users when they view pages of the products InterestIt is the one 's complement to the disappointment Mean rateIt represents the mean rate of the usage combining Enrichment, Homogeneity and Interest ScoreIt is the score of the product usage PCI 2014

Method – Steps (3/5)  Indexes, metrics and rates: Enrichment = 1- (Unique Pages/Total Pages) Disappointment= Sessions/Total Pages Interest=1-Disappointment Homogeneity =Unique pages/Total Sessions Mean rate = (Enrichment + Homogeneity + Interest) /3 Score = Mean rate * UPPS PCI 2014

Method – Steps (4/5)  Data mining techniques: data mining techniques are applied so that relevant data can be analyzed. Classification, clustering and association rule mining are used, based on the metrics of the third step. During this step the classification the algorithm 1R may be applied Product clustering is included in the clustering step, this is established by the Purchases attribute Clustering of user visits is performed with the use of k-means algorithm PCI 2014

Method – Steps (5/5)  Data mining techniques: Association rule mining enables relationships to be found amongst attributes in databases, revealing if-then statements regarding attribute-values An association rule X  Y shows a close correlation among items in a database. This occurs when transactions in the database in which X occurs, there is also a high probability of having Y. In an association rule X and Y are respectively named the antecedent and consequent of the rule. PCI 2014

Results (1/6) Study population and context The data of 40 products are ranked in descending order according to the column Score Product ID Sessio ns Page s Unique pages UPP S Enrichmen t Homogene ity Disappointm ent Interes t Mean rate Score Purchas es PID ,9600,1280,3140,686 0,591128, PID ,9730,1010,2630,737 0,604109, PID ,9660,0510,6720,328 0,44888, PID ,9630,1050,3470,653 0,57476, PID ,9670,0900,3700,630 0,56274, PID ,9460,0940,5780,422 0,48766, PID ,9520,0890,5370,463 0,50166, PID ,9390,1530,3990,601 0,56461, PID ,9460,2180,2490,751 0,63858, PID ,9380,2570,2430,757 0,65152, PCI 2014

Results (2/6) Data pre-processing and calculation of the metrics and rates The data are in ASCII form and are obtained from the Apache server log file. Application of data mining techniques the column Score  The attributes of the table were inserted in.cvs format into Weka  The attributes Product ID and Disappointment were removed  Product_ID is different for each instance and Disappointment is the complement to the Interest attribute. All the remaining attributes were disretized. PCI 2014

Results (3/6) Classification  In the classification step, the algorithm 1R is applied.  The attribute Purchases is used as class.  The best attribute which describes the classification is Score PCI 2014

Results (4/6) Clustering The clustering step contains products clustering, based on the Purchases attribute with the use of the SimpleKmeans algorithm PCI 2014

Results (5/6) Association rule mining  The Apriori algorithm was used to find association rules over the discretized data  Because of the obvious dependencies of the attributes Sessions, Pages and Unique Pages with the attributes Enrichment, Interest and Homogeneity, the latter group of attributes was removed from the data table  Weka shows a list of 6 rules with the support of the antecedent and the consequent (total number of items) at 0.1 minimum, and the confidence of the rule at 0.9 PCI 2014

Results (6/6) Association rule mining  There is an uninteresting rule, like rule 1.  There are some similar rules, rules with the same element in antecedent and consequent but interchanged, such as the couples of rules 3, 4 and 5, 6  It is proven that purchases of the products are dependent on the scores PCI 2014

Discussion (1/2) The indication that many pages within useful paths contribute to increased usage is fairly obvious. The more and better content on a site, the more a user might visit it. So the administrators should add some useful and helpful pages to a site. If there is an essentially blank site but it is required for the customers to visit it every day and contribute a comment, then the usage will be necessarily high. On the other hand, if there is a very elaborate web site with rich content but is not required reading, limited usage of the site would be expected PCI 2014

Discussion (2/2) Rule 2 offers to the administrators a lot of action ability, since they can pay more attention to the products with low values of Score and Sessions. An increase in sessions results in more users (customers) using the e-business system Of course, it cannot be denied that a certain number of customers only attempt to read the product information just before doing their purchases PCI 2014

Limitations The fact that only 40 products in one e-business system were investigated is a limitation to the study. Especially for the data mining techniques which demand large datasets. However, this was ineluctable since the e-business system of the case study had this number of active online products. PCI 2014

Conclusions (1/3) The proposed iterative method uses existing tools and techniques in a novel way to perform e-business systems usage analysis. The metrics enrichment, homogeneity, disappointment and interest are used. It incorporates clustering, classification and association rule mining. PCI 2014

Conclusions (2/3) Advantages I. It is independent of a specific e-business system, since it is based on the Apache log files and not the e- business system itself. Thus, it can be easily implemented for every e-business system. II. It uses indexes and metrics in order to facilitate the evaluation of each product. III. It offers useful information for a company to have to determine which parts of its web site to improve. PCI 2014

Conclusions (3/3) I. This approach may be applied after a long time period of data tracking II. The proposed approach may also be applied to other web applications such as e-government, e-learning, e-banking, blogs, social networks etc. PCI 2014

Thank You! Stavros Valsamidis, Ioannis Kazanidis, Sotirios Kontogiannis Alexandros Karakos TEI of Kavala Kavala, Greece PCI 2014