Presentation on theme: "『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining."— Presentation transcript:
1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining Analysis 7. Application 8. Reference
1. Introduction Data mining is related to - Data warehousing - Online analytical processing (OLAP) - Data visualization Data mining needs a data warehouse for effective mining. The aims of OLAP and data mining are similar but only data mining involves looking for unknown patterns. Finally, data mining requires data visualization of presentation of results.
2. Definition A technique using software tools geared for the user who typically does not know exactly what he's searching for, but is looking for particular patterns or trends. Data mining is the process of sifting through large amounts of data to produce data content relationships. This is also known as data surfing.
3. Data Mining Applications Applications in financial, telecom, insurance and retail companies for - market segmentation - fraud detection -better marketing - trend analysis - market basket analysis - customer churn
4. Data Mining Tasks Class description Association Sequential Patterns Time-Series analysis Prediction Classification Clustering
5. Overview of the System - Recommender System Product Database Customer Purchase Database Data Mining Clustering Cluster-specific Product lists Data Mining Associations Matching Algorithm Personalized Recommendation List Products eligible for recommendation Cluster assignments Normalized Customer vectors Vector for Target customer Product affinities Target Customer Products List For target customer ’ s cluster Grouping between customer & product Grouping between products
6. Data Mining Analysis (1) ▶ Clustering - Neural Clustering Algorithm - Demographic Clustering Algorithm ▶ Association Rule - Apriori Algorithm - AprioriAll Algorithm - AprioriTid Algorithm - DynamicSome Algorithm - FP-Growth Matching Algorithm (Key points in this paper)
6. Data Mining Analysis (2) ▶ Association Rule- Concept - Search for interesting relationships among items in a given data set. ▶ Association Rule- Procedure 1.Find all frequent itemsets. ; Each of these itemsets will occur at least as frequently as a pre-determined minimum support. 2.Generate strong association rules from the frequent itemsets. ; These rules must satisfy minimum support and minimum confidence.
6. Data Mining Analysis (3) ▶ Association Rule- Measure - Support (A B) = Total number of transactions number of transactions containing both A and B - Confidence (A B) = number of transactions containing A number of transactions containing both A and B P(A) P(A B) ∩ = =P(B | A) P(A B)= ∩
6. Data Mining Analysis (4) ▶ Association Rule- Example Purchased products ABCDEF Customer 1100001 Customer 2110101 Customer 3101101 Customer 4100101 Customer 5110010 Support of A & D = 3/5 = 0.6 Support of A & F = 4/5 = 0.8 Support of A & E = 1/5= 0.2 Large Itemset# of transactionsSupport (%) A5100 D360 F480 A,D360 A,F480 D,F360 A,D,F360 Minimum support = 60% Step1: Find all frequent itemsets.
6. Data Mining Analysis (5) Step2: Generate strong association rules from the frequent itemsets. Rules Support P(A ∩ B) Prob. Of ConditionsConfidence A F 80 %100 %0.8 A D 60%100 %0.6 D F 60 % 1 D, F A 60 % 1 A D : Confidence = 60%/100%= 0.6, D F : Confidence = 60%/60% = 1 Minimum Confidence = 90% Strong Association Rule : D F, etc
7. Application (1) - Safeway Stores ▶ Data Collection - Duration : 7 months - Number of Customers : 200 - Recommendation Products per each customer : 10~20
7. Application (3) - Safeway Stores ▶ Results - 1957 products were recommended. Of these, 120(6.1%) were chosen. (It is important to recall that the recommendation list will contain no products previously purchased by this customer.) This system can be used a reasonable tool for recommending new products in Supermarket.
8. References Agrawal, R. and Srikant, R., Fast Algorithms for mining association rules, In proc. of the VLDB Conf., 1994 http://www.twocrows.com/glossary.htmhttp://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary” http://www.mis.postech.ac.kr/topic/dm_e.html, “Data Mining” http://wwwmaths.anu.edu.au/~steve/pdcn.pdf