Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Rob Smets A user centred approach IPv6 deployment monitoring.
Digital Marketing Analytics v10. Introduction  Name / job role  What company are you with  How much experience do you have using Webtrends  Create.
Chapter 12: Web Usage Mining - An introduction
Measuring Scholarly Communication on the Web Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Bibliometric Analysis.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
Metrics for Performance Measurement in E-Commerce MARK 3030 – Week 10.
Discovery of Aggregate Usage Profiles for Web Personalization
1 Web Analytics: A Brief Tutorial by Dr. Robert J. Boncella Professor of Information Systems & Technology School of Business Washburn University Presented.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Online Registration Tam Nguyen CS491B. Why do I chose this project? LAUSD( Los Angeles Unified School District) -SiS system was developed for administrators.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
WhoWas: A Platform for Measuring Web Deployments on IaaS Clouds Liang Wang *, Antonio Nappa +, Juan Caballero +, Thomas Ristenpart *, Aditya Akella * *
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Dr. Guandong Xu Intelligent Web & Information Systems (IWIS) Department of Computer Science, Aalborg University Web Usage Mining & Personalization.
Gordon Kass CEO & President 919/ x26 Porivo Technologies Inc. Measuring end-to-end web performance.
Web mining Web mining deals with mining of patterns from web and e-commerce data. Web data –Web pages –Web structures –Web logs –E-commerce sites – .
Server tools. Site server tools can be utilised to build, host, track and monitor transactions on a business site. There are a wide range of possibilities.
Copyright © 2009 Pearson Education, Inc. Slide 6-1 Chapter 6 E-commerce Marketing Concepts.
Strategies for improving Web site performance Google Webmaster Tools + Google Analytics Marshall Breeding Director for Innovative Technologies and Research.
Web Site Performance An analytical approach for benchmarking and tuning.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
Design and Implementation of a Web Log Preprocessing System Supporting Path Completion Batchimeg AI lab
Generating Intelligent Links to Web Pages by Mining Access Patterns of Individuals and the Community Benjamin Lambert Omid Fatemieh CS598CXZ Spring 2005.
Web Usage Patterns Ryan McFadden IST 497E December 5, 2002.
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
Dan J. Grauman National Cancer Institute National Institutes of Health Department of Health and Human Services Bethesda, Maryland, USA Interactive Cancer.
Discovery of Aggregate Usage Profiles for Web Personalization Bamshad Mobasher, Honghua Dai, Tao Luo, Miki Nakagawa, Yuqing Sun, Jim Wiltshire WebKDD 2000.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
User Behavior Analysis of Location Aware Search Engine Third international Conference of MDM, 2002 Takahiko Shintani, Iko Pramudiono NTT Information Sharing.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Srivastava J., Cooley R., Deshpande M, Tan P.N.
Second Line Intrusion Detection Using Personalization DISA Sponsored GWU-CS.
Unconstrained Endpoint Profiling Googling the Internet Ionut Trestian, Supranamaya Ranjan, Alekandar Kuzmanovic, Antonio Nucci Reviewed by Lee Young Soo.
Analysing Clickstream Data: From Anomaly Detection to Visitor Profiling Peter I. Hofgesang Wojtek Kowalczyk ECML/PKDD Discovery.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
+ Web Design Terminology Digital Communications III- Frameworks-2.1 Terminology HTML Domain Name Hot Spot Site Maps.
Research Academic Computer Technology Institute (RACTI) Patras Greece1 An Algorithmic Framework for Adaptive Web Content Christos Makris, Yannis Panagis,
Web Usage Mining A case study of the GoMercer.com website Martin Zhao Mar 16, 2007.
Chaoyang University of Technology Clustering web transactions using rough approximation Source : Fuzzy Sets and Systems 148 (2004) 131–138 Author : Supriya.
WEB USAGE MINING Web Usage Mining 1. Contents Web Usage Mining 2  Web Mining  Web Mining Taxonomy  Web Usage Mining  Web analysis tools  Pattern.
Uploading Web Page  It would be meaningful to share your web page with the rest of the net user.  Thus, we have to upload the web page to the web server.
A WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEB SITES.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Web Mining (Web Usage Mining). Web Mining – The Idea In recent years the growth of the World Wide Web exceeded all expectations. Today there are several.
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
China’s regional distribution of wealth. GDP exercise Look at the figures showing the GDP per capita for different administrative regions in China Produce.
Data mining in web applications
Scaling Network Load Balancing Clusters
Guide to the Clickstream Data
Improving searches through community clustering of information
Evolution of Internet.
Strategies for improving Web site performance
MIS 451 Building Business Intelligence Systems
Latest Updates on BlackHawk Mines Music : Privacy Policy
Processes The most important processes used in Web-based systems and their internal organization.
CSCE 990: Advanced Distributed Systems
CSE 461 HTTP and the Web.
SpeedTracer: A Web usage mining and analysis tool
Presentation transcript:

Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences and Natural Resources Research, CAS

I.Introduction Data is the basic infrastructure of science. Data sharing boosts scientific research Web Usage Mining is the processes that employ data mining techniques on web server logs and other user activity records Web Usage Mining science data sharing user behavior mining National Data Sharing Platform of Earth System Science (Geodata.cn)

Data Web server logs Major data source for web usage mining, contains user’s IP, visiting time, method, URL visited, status, referrer, and client details Year ,062,608 entries II.Data and method

Data service logs User registration information online data downloads and offline data application (for the datasets restricted to offline application) of registered users. Year ,809 records II.Data and method

Data User registration information anonymous user registration information as auxiliary data to determine the sources of the users. Used user registration information includes user’s occupation, organization education II.Data and method

Preprocessing Data cleaning remove irrelevant records in the data to data mining tasks at hand, e.g. requests for graphical page content, style.css file, voice file, etc. and web crawlers User identification Step 1, assume a new IP address represents a new user. Step 2, for multiple log entries that share a same IP, if their Internet browser or Operating System is different it means they are different users. Step 3, for the users identified by the above two steps, if a URL request of a user cannot be linked to by any hyperlinks of the user’s visited pages, a new user exists. II.Data and method

Preprocessing User location identification applied the geo-IP lookup service provided by ipinfo.io Session identification A Referrer-based heuristic algorithm was adopted II.Data and method

Preprocessing Spatial data modeling II.Data and method An example of a user-pageview (transaction) matrix An example of a georeferenced user transaction data model, blue line represents a transaction vector of a user located at 30°E, 45°N.

II.Data and method

III.Results Raw log entriesAfter cleaningUsersSessionsLocations 11,062,6082,292,69776,111448,49576,069

III.Results

Spatial distribution III.Results With China’s university population: Pearson correlation r value and p value With China’s top universities: Pearson correlation confident r was 0.792, and p<0.01 to attract users more from research-oriented universities than teaching-oriented universities.

Hot spot analysis Pageview number for individual user largely clustered in Beijing, Tianjin and north part of Hebei Province, also a few in Sichuan Province. III.Results Hotspot analysis for pageviews

Hot spot analysis Hotspots of pageview session numbers were clustered in Beijing, north part of Hebei, Jiangsu, and Shanghai. Cold-spots concentrated in north part of Henan, east part of Shanxi, and Taiwan. III.Results Hotspot analysis for pageview sessions

Hot spot analysis Data downloads hotspots were clustered in Beijing, Tianjin, north part of Hebei, Shaanxi, Jiangsu, and Shanghai, showing a similar pattern to user sessions III.Results Hotspot analysis for dataset downloads

1.No evident correlation between the overall university population and the user numbers. However the user number is strongly correlated to research- oriented university population. 2.Hot spot analysis on user pageviews, user sessions, and data downloads showed different patterns. The findings can be supportive to informed decision making in data sharing strategy and regional advertising. 3.The method combing Web Usage Mining with GIS is feasible for mapping user behaviors of many types IV.Conclusions

Thank you!