Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Web Mining.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
1 Web Research - Large-Scale Web Data Analysis Amanda Spink Queensland University of Technology Jim Jansen The Pennsylvania State University.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
LinkSelector: Select Hyperlinks for Web Portals Prof. Olivia Sheng Xiao Fang School of Accounting and Information Systems University of Utah.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Web Mining Research: A Survey
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
Application of Apriori Algorithm to Derive Association Rules Over Finance Data Set Presented By Kallepalli Vijay Instructor: Dr. Ruppa Thulasiram.
Research Project Mining Negative Rules in Large Databases using GRD.
Overview of Web Data Mining and Applications Part I
CIS 674 Introduction to Data Mining
Presenter: Teng-Chih Yang Professor: Ming-Puu Chen Date: 10/ 28/ 2009 Data mining in course management systems: Moodle case study and tutorial Romero,
Introduction to Data Mining Engineering Group in ACL.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
CS 401 Paper Presentation Praveen Inuganti
OOSE 01/17 Institute of Computer Science and Information Engineering, National Cheng Kung University Member:Q 薛弘志 P 蔡文豪 F 周詩御.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 1 Introduction to Data Mining
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining By Dave Maung.
Query trends CS 349 Presentation December 2 nd, 2008 Catherine Grevet.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Learning URL Patterns for Webpage De-duplication Authors: Hema Swetha Koppula… WSDM 2010 Reporter: Jing Chiu /12/5.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Web Usage Mining A case study of the GoMercer.com website Martin Zhao Mar 16, 2007.
Chaoyang University of Technology Clustering web transactions using rough approximation Source : Fuzzy Sets and Systems 148 (2004) 131–138 Author : Supriya.
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
A RESEARCH SUPPORT SYSTEM FRAMEWORK FOR WEB DATA MINING Jin Xu, Yingping Huang, Gregory Madey Department of Computer Science and Engineering University.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
Improvement of Apriori Algorithm in Log mining Junghee Jaeho Information and Communications University,
TUMOR BURDEN ANALYSIS ON CT BY AUTOMATED LIVER AND TUMOR SEGMENTATION RAMSHEEJA.RR Roll : No 19 Guide SREERAJ.R ( Head Of Department, CSE)
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Fuzzy Set Approach for Improving Web Log Mining Sajitha Naduvil-Vadukootu Csc 8810 : Computational Intelligence Instructor: Dr. Yanqing Zhang Dec 4, 2006.
Data mining in web applications
Clustering of Web pages
DATA MINING © Prentice Hall.
Introduction to Data Mining
CALIFORNIA STATE UNIVERSITY, SACRAMENTO
Lin Lu, Margaret Dunham, and Yu Meng
Data Warehousing and Data Mining
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Web Mining Department of Computer Science and Engg.
Presentation transcript:

Web Usage Mining Sara Vahid

Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample Methods Conclusions References

Introduction World Wide Web grows rapidly. The number of users increases every day. Web search engines should extract accurate information. Web Usage Mining is the application of data mining techniques to discover interesting usage patterns from Web data

Web Usage Mining Procedure

Preprocessing Stage

Raw Data (Transaction Logs) Communications between user and system. (W3C is an organization that defines transaction log formats) Preprocessing of Transaction Logs include (Data Cleaning, User Identification (can be assigned by search engine), Session Identification (set of pages visited by a user within the duration of a particular visit), Transactions Construction (subset of user session having homogenous pages)

Transaction Log Sample

Data Preparation Cleaning the data Session Identification User Identification Importing transaction logs data into database and normalizing the data

Data Preparation Sample

Pattern Discovery Stage

Data Mining Approaches Based on Bari and Chawan (2013), quite effective method in web usage mining mainly is classifying and clustering at the present time. Clustering –Categorization of pages and products Classification –“The Fool and his Money Video Game”, “Pokemon Video Game” and “Kineck Party Video Game” product pages are all part of Video Games product group.

Sample Methods Poongothai et al. (2011), used enhanced fuzzy C means clustering algorithm. Chitraa and Thanamani (2012), used enhanced clustering algorithm. K-mean algorithm suffers from two serious drawback, first one is that the number of the clusters is unknown, and the second is initial seed problem. Solution: first, dataset is divided into subsets and initial cluster points are calculated. Second, k-means algorithm is applied to find clusters. City Block Measures is used for calculating the similarity.

Sample Method (Cont’) Langhnoja et al. (2013), used association rule mining on clustered data. Kansara and Patel (2013), used combination of clustering and classification algorithm (classification process that identifies potential users from web log data and a clustering process that groups potential users with similar interest).

Conclusions Web Usage Mining approaches try to find useful pattern among server log data mostly use clustering techniques. In this review, authors worked more on enhancing the existing algorithm. However, preprocessing step is one of the most significant part in order to discover better pattern that should be more discussed in future.

References Ajiferuke, I., Wolfram, D., and Famoye, F. 2006, ‘Sample size and informetric model goodness-of-fit outcomes: A search engine log case study’, Journal of Information Science, vol. 32, no. 3, pp. 212–222. Bari, P., and Chawan, P., M. 2013, ‘Web Usage Mining’, Journal of Engineering, Computers and Applied Sciences, vol. 2, no. 6, pp Chitraa, V., and Thanamani, S., Antony, 2012, ‘An Enhanced Clustering Method for Web Usage Mining’, International Journal of Engineering Research and Technology, vol.1, no.4, pp Chu, M., Fang, X., Olivia, R., and Liu, S. 2005, ‘Analysis of the query logs of a Website search engine’, Journal of the American Society for Information Science and Technology, pp. 1363–1376. Jansen, B. J., Booth, D.L., and Spink, A. 2008, ‘Determining the informational, navigational, and transactional intent of Web queries’, Elsevier, vol. 44, pp

Jansen, B. J. 2006, ‘Search log analysis: What it is, what's been done, how to do it’, Elsevier, vol. 28, pp Kansara, Akshay, and Patel, Swati, 2013, ‘Improved Approach to Predict user Future Sessions using Classification and Clustering’, International Journal of Science and Research, vol. 2, no. 5, pp Langhnoja, G., Shaily, Barot, P., Mehul, and Mehta, B., Darshak, 2013, ‘Web Usage Mining Using Association Rule Mining on Clustered Data for Pattern Discovery’, International Journal of Data Mining Techniques and Applications, vol. 02, no. 01, pp Poongothai, K., Parimala, M., and Sathiyabama, S., 2011, ‘Efficient Web Usage Mining with Clustering’, ‘IJCSI International Journal of Computer Science Issues’, vol. 8, no. 3, pp

Thank You

Q & A