Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences.

Similar presentations


Presentation on theme: "Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences."— Presentation transcript:

1 Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences and Natural Resources Research, CAS

2 I.Introduction Data is the basic infrastructure of science. Data sharing boosts scientific research Web Usage Mining is the processes that employ data mining techniques on web server logs and other user activity records Web Usage Mining science data sharing user behavior mining National Data Sharing Platform of Earth System Science (Geodata.cn)

3

4 Data Web server logs Major data source for web usage mining, contains user’s IP, visiting time, method, URL visited, status, referrer, and client details Year 2014 11,062,608 entries II.Data and method

5 Data service logs User registration information online data downloads and offline data application (for the datasets restricted to offline application) of registered users. Year 2014 170,809 records II.Data and method

6 Data User registration information anonymous user registration information as auxiliary data to determine the sources of the users. Used user registration information includes user’s occupation, organization education II.Data and method

7 Preprocessing Data cleaning remove irrelevant records in the data to data mining tasks at hand, e.g. requests for graphical page content, style.css file, voice file, etc. and web crawlers User identification Step 1, assume a new IP address represents a new user. Step 2, for multiple log entries that share a same IP, if their Internet browser or Operating System is different it means they are different users. Step 3, for the users identified by the above two steps, if a URL request of a user cannot be linked to by any hyperlinks of the user’s visited pages, a new user exists. II.Data and method

8 Preprocessing User location identification applied the geo-IP lookup service provided by ipinfo.io Session identification A Referrer-based heuristic algorithm was adopted II.Data and method

9 Preprocessing Spatial data modeling II.Data and method An example of a user-pageview (transaction) matrix An example of a georeferenced user transaction data model, blue line represents a transaction vector of a user located at 30°E, 45°N.

10 II.Data and method

11 III.Results Raw log entriesAfter cleaningUsersSessionsLocations 11,062,6082,292,69776,111448,49576,069

12 III.Results

13 Spatial distribution III.Results With China’s university population: Pearson correlation r value 0.324 and p value 0.075 With China’s top universities: Pearson correlation confident r was 0.792, and p<0.01 to attract users more from research-oriented universities than teaching-oriented universities.

14 Hot spot analysis Pageview number for individual user largely clustered in Beijing, Tianjin and north part of Hebei Province, also a few in Sichuan Province. III.Results Hotspot analysis for pageviews

15 Hot spot analysis Hotspots of pageview session numbers were clustered in Beijing, north part of Hebei, Jiangsu, and Shanghai. Cold-spots concentrated in north part of Henan, east part of Shanxi, and Taiwan. III.Results Hotspot analysis for pageview sessions

16 Hot spot analysis Data downloads hotspots were clustered in Beijing, Tianjin, north part of Hebei, Shaanxi, Jiangsu, and Shanghai, showing a similar pattern to user sessions III.Results Hotspot analysis for dataset downloads

17 1.No evident correlation between the overall university population and the user numbers. However the user number is strongly correlated to research- oriented university population. 2.Hot spot analysis on user pageviews, user sessions, and data downloads showed different patterns. The findings can be supportive to informed decision making in data sharing strategy and regional advertising. 3.The method combing Web Usage Mining with GIS is feasible for mapping user behaviors of many types IV.Conclusions

18 Thank you!


Download ppt "Science data sharing user behavior mining: an approach combining Web Usage Mining and GIS Mo Wang, Juanle Wang, Yongqing Bai Institute of Geographic Sciences."

Similar presentations


Ads by Google