Web Mining ( 網路探勘 ) 1 1011WM12 TLMXM1A Wed 8,9 (15:10-17:00) U705 Web Usage Mining ( 網路使用挖掘 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.

Slides:



Advertisements
Similar presentations
Social Media Marketing Research 社會媒體行銷研究 SMMR08 TMIXM1A Thu 7,8 (14:10-16:00) L511 Exploratory Factor Analysis Min-Yuh Day 戴敏育 Assistant Professor.
Advertisements

Case Study for Information Management 資訊管理個案
Data Mining 資料探勘 文字探勘與網頁探勘 (Text and Web Mining) Min-Yuh Day 戴敏育
Chapter 12: Web Usage Mining - An introduction
Social Media Marketing Analytics 社群網路行銷分析 SMMA02 TLMXJ1A (MIS EMBA) Fri 12,13,14 (19:20-22:10) D326 社群網路行銷分析 (Social Media Marketing Analytics) Min-Yuh.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
商業智慧實務 Practices of Business Intelligence BI04 MI4 Wed, 9,10 (16:10-18:00) (B113) 資料倉儲 (Data Warehousing) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Overview of Web Data Mining and Applications Part I
Case Study for Information Management 資訊管理個案
Data Warehousing 資料倉儲 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang.
Business Intelligence 商業智慧 BI01 IM EMBA Fri 12,13,14 (19:20-22:10) D502 Introduction to Business Intelligence 商業智慧導論 Min-Yuh Day 戴敏育 Assistant Professor.
Social Media Marketing Research 社會媒體行銷研究 SMMR06 TMIXM1A Thu 7,8 (14:10-16:00) L511 Measuring the Construct Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
社會媒體服務專題 Special Topics in Social Media Services 淡江大學 淡江大學 資訊管理學系 資訊管理學系 Dept. of Information ManagementDept. of Information Management, Tamkang UniversityTamkang.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Overview of Web Data Mining and Applications Part II
Social Media Management 社會媒體管理
Data Mining 資料探勘 DM02 MI4 Thu. 9,10 (16:10-18:00) B513 關連分析 (Association Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Web Mining ( 網路探勘 ) WM09 TLMXM1A Wed 8,9 (15:10-17:00) U705 Structured Data Extraction ( 結構化資料擷取 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept.
Business Intelligence Trends 商業智慧趨勢 BIT08 MIS MBA Mon 6, 7 (13:10-15:00) Q407 商業智慧導入與趨勢 (Business Intelligence Implementation and Trends) Min-Yuh.
Social Media Management 社會媒體管理 SMM06 TMIXM1A Fri. 7,8 (14:10-16:00) L215 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Social Media Management 社會媒體管理 SMM01 TMIXM1A Fri. 7,8 (14:10-16:00) L215 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Case Study for Information Management 資訊管理個案 CSIM4B07 TLMXB4B (M1824) Tue 2, 3, 4 (9:10-12:00) B502 Telecommunications, the Internet, and Wireless.
Web Mining ( 網路探勘 ) WM10 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Integration ( 資訊整合 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Social Media Marketing 社群網路行銷 SMM08 TLMXJ1A (MIS EMBA) Mon 12,13,14 (19:20-22:10) D504 社群網路行銷計劃 (Social Media Marketing Plan) Min-Yuh Day 戴敏育 Assistant.
Special Topics in Social Media Services 社會媒體服務專題 1 992SMS07 TMIXJ1A Sat. 6,7,8 (13:10-16:00) D502 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Data Mining 資料探勘 Introduction to Data Mining 資料探勘導論 Min-Yuh Day 戴敏育
Case Study for Information Management 資訊管理個案
Social Media Marketing Management 社會媒體行銷管理 SMMM07 TLMXJ1A Tue 12,13,14 (19:20-22:10) D325 社群網路行為研究 (Behavior Research on Social Media) Min-Yuh Day.
Case Study for Information Management 資訊管理個案 CSIM4C10 TLMXB4C Mon 8, 9, 10 (15:10-18:00) B602 E-commerce: Digital Markets, Digital Goods 1. Facebook,
Web Mining (網路探勘) Association Rules and Sequential Patterns (關聯規則和序列模式) 1011WM02 TLMXM1A Wed 8,9 (15:10-17:00) U705 Min-Yuh Day 戴敏育 Assistant Professor.
商業智慧實務 Practices of Business Intelligence BI07 MI4 Wed, 9,10 (16:10-18:00) (B113) 文字探勘與網路探勘 (Text and Web Mining) Min-Yuh Day 戴敏育 Assistant Professor.
商業智慧實務 Practices of Business Intelligence BI01 MI4 Wed, 9,10 (16:10-18:00) (B113) 商業智慧導論 (Introduction to Business Intelligence) Min-Yuh Day 戴敏育.
Web Mining ( 網路探勘 ) WM03 TLMXM1A Wed 8,9 (15:10-17:00) U705 Supervised Learning ( 監督式學習 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information.
Social Media Marketing Management 社會媒體行銷管理 SMMM12 TLMXJ1A Tue 12,13,14 (19:20-22:10) D325 確認性因素分析 (Confirmatory Factor Analysis) Min-Yuh Day 戴敏育.
Social Media Marketing Analytics 社群網路行銷分析 SMMA04 TLMXJ1A (MIS EMBA) Fri 12,13,14 (19:20-22:10) D326 測量構念 (Measuring the Construct) Min-Yuh Day 戴敏育.
Social Media Marketing Research 社會媒體行銷研究 SMMR02 TMIXM1A Thu 7,8 (14:10-16:00) U505 Social Media: Facebook, Youtube, Blog, Microblog Min-Yuh Day 戴敏育.
Social Media Management 社會媒體管理 SMM02 TMIXM1A Fri. 7,8 (14:10-16:00) L215 Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
商業智慧實務 Practices of Business Intelligence BI06 MI4 Wed, 9,10 (16:10-18:00) (B130) 資料科學與巨量資料分析 (Data Science and Big Data Analytics) Min-Yuh Day 戴敏育.
Social Media Marketing Research 社會媒體行銷研究 SMMR09 TMIXM1A Thu 7,8 (14:10-16:00) L511 Confirmatory Factor Analysis Min-Yuh Day 戴敏育 Assistant Professor.
Data Mining 資料探勘 DM04 MI4 Thu 9, 10 (16:10-18:00) B216 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Web Mining ( 網路探勘 ) WM04 TLMXM1A Wed 8,9 (15:10-17:00) U705 Unsupervised Learning ( 非監督式學習 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Case Study for Information Management 資訊管理個案 CSIM4B08 TLMXB4B Thu 8, 9, 10 (15:10-18:00) B508 Securing Information System: 1. Facebook, 2. European.
Web Mining ( 網路探勘 ) WM01 TLMXM1A Wed 8,9 (15:10-17:00) U705 Introduction to Web Mining ( 網路探勘導論 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept.
Case Study for Information Management 資訊管理個案 CSIM4B03 TLMXB4B (M1824) Tue 3,4 (10:10-12:00) L212 Thu 9 (16:10-17:00) B601 Global E-Business and Collaboration:
Case Study for Information Management 資訊管理個案
商業智慧實務 Practices of Business Intelligence BI01 MI4 Wed, 9,10 (16:10-18:00) (B130) 商業智慧導論 (Introduction to Business Intelligence) Min-Yuh Day 戴敏育.
Social Media Marketing 社群網路行銷 SMM02 TLMXJ1A (MIS EMBA) Mon 12,13,14 (19:20-22:10) D504 社群網路商業模式 (Business Models of Social Media) Min-Yuh Day 戴敏育.
商業智慧實務 Practices of Business Intelligence
Business Intelligence Trends 商業智慧趨勢 BIT01 MIS MBA Mon 6, 7 (13:10-15:00) Q407 Course Orientation for Business Intelligence Trends 商業智慧趨勢課程介紹 Min-Yuh.
Web Mining ( 網路探勘 ) WM05 TLMXM1A Wed 8,9 (15:10-17:00) U705 Partially Supervised Learning ( 部分監督式學習 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Data Mining 資料探勘 DM04 MI4 Wed, 6,7 (13:10-15:00) (B216) 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management,
Web Mining ( 網路探勘 ) WM07 TLMXM1A Wed 8,9 (15:10-17:00) U705 Social Network Analysis ( 社會網路分析 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Web Mining ( 網路探勘 ) WM06 TLMXM1A Wed 8,9 (15:10-17:00) U705 Information Retrieval and Web Search ( 資訊檢索與網路搜尋 ) Min-Yuh Day 戴敏育 Assistant Professor.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Social Media Marketing Research 社會媒體行銷研究 SMMR04 TMIXM1A Thu 7,8 (14:10-16:00) L511 Marketing Research Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Data Mining 資料探勘 DM01 MI4 Wed, 6,7 (13:10-15:00) (B216) Introduction to Data Mining ( 資料探勘導論 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of.
Data Mining 資料探勘 Introduction to Data Mining (資料探勘導論) Min-Yuh Day 戴敏育
Case Study for Information Management 資訊管理個案
商業智慧 Business Intelligence BI04 IM EMBA Fri 12,13,14 (19:20-22:10) D502 資料倉儲 (Data Warehousing) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept.
Social Computing and Big Data Analytics 社群運算與大數據分析
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Big Data Mining 巨量資料探勘 DM05 MI4 (M2244) (3094) Tue, 3, 4 (10:10-12:00) (B216) 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授.
Case Study for Information Management 資訊管理個案
分群分析 (Cluster Analysis)
商業智慧實務 Practices of Business Intelligence
Case Study for Information Management 資訊管理個案
Case Study for Information Management 資訊管理個案
Presentation transcript:

Web Mining ( 網路探勘 ) WM12 TLMXM1A Wed 8,9 (15:10-17:00) U705 Web Usage Mining ( 網路使用挖掘 ) Min-Yuh Day 戴敏育 Assistant Professor 專任助理教授 Dept. of Information Management, Tamkang University Dept. of Information ManagementTamkang University 淡江大學 淡江大學 資訊管理學系 資訊管理學系 tku.edu.tw/myday/

週次 日期 內容( Subject/Topics ) 1 101/09/12 Introduction to Web Mining ( 網路探勘導論 ) 2 101/09/19 Association Rules and Sequential Patterns ( 關聯規則和序列模式 ) 3 101/09/26 Supervised Learning ( 監督式學習 ) 4 101/10/03 Unsupervised Learning ( 非監督式學習 ) 5 101/10/10 國慶紀念日 ( 放假一天 ) 6 101/10/17 Paper Reading and Discussion ( 論文研讀與討論 ) 7 101/10/24 Partially Supervised Learning ( 部分監督式學習 ) 8 101/10/31 Information Retrieval and Web Search ( 資訊檢索與網路搜尋 ) 9 101/11/07 Social Network Analysis ( 社會網路分析 ) 課程大綱 ( Syllabus) 2

週次 日期 內容( Subject/Topics ) /11/14 Midterm Presentation ( 期中報告 ) /11/21 Web Crawling ( 網路爬行 ) /11/28 Structured Data Extraction ( 結構化資料擷取 ) /12/05 Information Integration ( 資訊整合 ) /12/12 Opinion Mining and Sentiment Analysis ( 意見探勘與情感分析 ) /12/19 Paper Reading and Discussion ( 論文研讀與討論 ) /12/26 Web Usage Mining ( 網路使用挖掘 ) /01/02 Project Presentation 1 ( 期末報告 1) /01/09 Project Presentation 2 ( 期末報告 2) 課程大綱 ( Syllabus) 3

Web Mining Web mining (or Web data mining) is the process of discovering intrinsic relationships from Web data (textual, linkage, or usage) Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 4

Web Content/Structure Mining Mining of the textual content on the Web Data collection via Web crawlers Web pages include hyperlinks – Authoritative pages – Hubs – hyperlink-induced topic search (HITS) alg Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 5

Web Usage Mining Extraction of information from data generated through Web page visits and transactions… – data stored in server access logs, referrer logs, agent logs, and client-side cookies – user characteristics and usage profiles – metadata, such as page attributes, content attributes, and usage data Clickstream data Clickstream analysis Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 6

Web Usage Mining Web usage mining applications – Determine the lifetime value of clients – Design cross-marketing strategies across products. – Evaluate promotional campaigns – Target electronic ads and coupons at user groups based on user access patterns – Predict user behavior based on previously learned rules and users' profiles – Present dynamic information to users based on their interests and profiles… Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 7

Web Usage Mining (clickstream analysis) Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 8

Web Mining Success Stories Amazon.com, Ask.com, Scholastic.com, … Website Optimization Ecosystem Source: Turban et al. (2011), Decision Support and Business Intelligence Systems 9

Chapter 12: Web Usage Mining Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” 2 nd Edition, Springer Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Introduction Web usage mining: automatic discovery of patterns in clickstreams and associated data collected or generated as a result of user interactions with one or more Web sites. Goal: analyze the behavioral patterns and profiles of users interacting with a Web site. The discovered patterns are usually represented as collections of pages, objects, or resources that are frequently accessed by groups of users with common interests. 11 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Introduction Data in Web Usage Mining: – Web server logs – Site contents – Data about the visitors, gathered from external channels – Further application data Not all these data are always available. When they are, they must be integrated. A large part of Web usage mining is about processing usage/ clickstream data. – After that various data mining algorithm can be applied. 12 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Web server logs 13 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Web usage mining process 14 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Data preparation 15 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Pre-processing of web usage data 16 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Data cleaning – remove irrelevant references and fields in server logs – remove references due to spider navigation – remove erroneous references – add missing references due to caching (done after sessionization) 17 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Identify sessions (sessionization) In Web usage analysis, these data are the sessions of the site visitors: the activities performed by a user from the moment she enters the site until the moment she leaves it. Difficult to obtain reliable usage data due to proxy servers and anonymizers, dynamic IP addresses, missing references due to caching, and the inability of servers to distinguish among different visits. 18 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Sessionization strategies 19 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Sessionization heuristics 20 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Sessionization example 21 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

User identification 22 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

User identification: an example 23 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Pageview A pageview is an aggregate representation of a collection of Web objects contributing to the display on a user’s browser resulting from a single user action (such as a click-through). Conceptually, each pageview can be viewed as a collection of Web objects or resources representing a specific “user event,” e.g., reading an article, viewing a product page, or adding a product to the shopping cart. 24 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Path completion Client- or proxy-side caching can often result in missing access references to those pages or objects that have been cached. For instance, – if a user returns to a page A during the same session, the second access to A will likely result in viewing the previously downloaded version of A that was cached on the client-side, and therefore, no request is made to the server. – This results in the second reference to A not being recorded on the server logs. 25 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Missing references due to caching 26 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Path completion The problem of inferring missing user references due to caching. Effective path completion requires extensive knowledge of the link structure within the site Referrer information in server logs can also be used in disambiguating the inferred paths. Problem gets much more complicated in frame-based sites. 27 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Integrating with e-commerce events Either product oriented or visit oriented Used to track and analyze conversion of browsers to buyers. – Major difficulty for E-commerce events is defining and implementing the events for a site, however, in contrast to clickstream data, getting reliable preprocessed data is not a problem. Another major challenge is the successful integration with clickstream data 28 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Product-Oriented Events Product View – Occurs every time a product is displayed on a page view – Typical Types: Image, Link, Text Product Click-through – Occurs every time a user “clicks” on a product to get more information 29 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Product-Oriented Events Shopping Cart Changes – Shopping Cart Add or Remove – Shopping Cart Change - quantity or other feature (e.g. size) is changed Product Buy or Bid – Separate buy event occurs for each product in the shopping cart – Auction sites can track bid events in addition to the product purchases 30 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Web usage mining process 31 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Integration with page content 32 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Integration with link structure 33 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

E-commerce data analysis 34 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Session analysis Simplest form of analysis: examine individual or groups of server sessions and e-commerce data. Advantages: – Gain insight into typical customer behaviors. – Trace specific problems with the site. Drawbacks: – LOTS of data. – Difficult to generalize. 35 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Session analysis: aggregate reports 36 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

OLAP 37 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Data mining 38 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Data mining (cont.) 39 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Some usage mining applications 40 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Personalization application 41 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Standard approaches 42 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

Summary Web usage mining has emerged as the essential tool for realizing more personalized, user-friendly and business-optimal Web services. The key is to use the user-clickstream data for many mining purposes. Traditionally, Web usage mining is used by e-commerce sites to organize their sites and to increase profits. It is now also used by search engines to improve search quality and to evaluate search results, etc, and by many other applications. 43 Source: Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” Springer, 2nd Edition,

References Bing Liu (2011), “Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data,” 2 nd Edition, Springer. Efraim Turban, Ramesh Sharda, Dursun Delen (2011), “Decision Support and Business Intelligence Systems,” Pearson, Ninth Edition. 44