Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Discovery in Database (KDD). The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database.

Similar presentations


Presentation on theme: "Knowledge Discovery in Database (KDD). The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database."— Presentation transcript:

1 Knowledge Discovery in Database (KDD)

2 The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database –It includes data selection, cleaning, enrichment, coding, data mining, and reporting –Data Mining is the key stage of Knowledge Discovery Process The process of finding the desired information from large database Knowledge Discovery Process

3 Example: the database of a magazine publisher which sells five types of magazines – cars, houses, sports, music and comics –Data mining: Find interesting customer properties What is the profile of a reader of a car magazine? Is there any correlation between an interest in cars and an interest in comics? Apply knowledge discovery process

4 Data Selection Select the information about people who have subscribed to a magazine

5 Pollutions: Type errors, moving from one place to another without notifying change of address, people give incorrect information about themselves –Pattern Recognition Algorithms Cleaning

6 Lack of domain consistency Cleaning

7 Enrichment Need extra information about the clients consisting of date of birth, income, amount of credit, and whether or not an individual owns a car or a house

8 The new information need to be easily joined to the existing client records –Extract more knowledge Enrichment

9 We select only those records that have enough information to be of value (row) Project the fields in which we are interested (column) Coding

10

11 Code the information which is too detailed –Address to region –Birth date to age –Divide income by 1000 –Divide credit by 1000 –Convert cars yes-no to 1-0 –Convert purchase date to month numbers starting from 1990 The way in which we code the information will determine the type of patterns we find Coding has to be performed repeatedly in order to get the best results Coding

12 We are interested in the relationships between readers of different magazines –Perform flattening operation Coding

13 Knowledge Discovery Process

14 Business-Question-Driven Process

15 Steps of a KDD Process Learning the Application Domain –Relevant Prior Knowledge and Goals of Application Creating a Target Data Set –Data Selection Data Cleaning and Enrichment –May Take 60% of Effort Data Reduction and Transformation (Coding) –Find Useful Features, Dimensionality Reduction Choosing Functions of Data Mining –Summarization, Association, Classification, Regression, Clustering, … Choosing the mining algorithm(s) Data mining –Search for Patterns of Interest Pattern Evaluation and Knowledge Presentation –Visualization, Transformation, Removing Redundant Patterns, etc. Use of Discovered Knowledge

16 Exercises 1 1. 何謂 RFM 指標 ? 功能為何 ? 2. 何謂資料探勘 (Data Mining)? 目標為何 ? 3. 為何小型公司不需要資料探勘 ? 4. 大型公司要如何了解客戶 ? 5. 請描述一個資料探勘的應用實例 ( 不可 以與投影片的例子相同 ). 6. 請列出並解釋 Knowledge Discovery in Database (KDD) 處理的步驟.


Download ppt "Knowledge Discovery in Database (KDD). The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database."

Similar presentations


Ads by Google