Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.

Similar presentations


Presentation on theme: "Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang."— Presentation transcript:

1 Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang

2 Outline Introduction Introduction Mining Process Mining Process Main Functionalities of Intelligent Miner Main Functionalities of Intelligent Miner Other Data Mining Products Other Data Mining Products Data Mining and Privacy Data Mining and Privacy Summary Summary References References

3 What is Data Mining Data mining: discovering interesting patterns from large amounts of data Data mining: discovering interesting patterns from large amounts of data –Knowledge discovery (mining) in databases (KDD), data/pattern analysis, information harvesting, business intelligence, etc.

4 Evolution of Database Technology 1960s: 1960s: –Data collection, database creation 1970s: 1970s: –Relational data model, relational DBMS implementation 1980s ~ present: 1980s ~ present: –RDBMS, advanced data models 1990s—2000s: –Data mining and data warehousing, multimedia databases, and Web databases

5 Data Mining VS. Database Query Database Database Data Mining Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Identify customers who have purchased more than $10,000 in the last month. – Identify customers with similar buying habits. (Clustering)

6 Data Mining Process (KDD) Data Cleaning Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation J. Han. and M. Kamber. Data Mining: Concepts and Techniques,2001

7 About DB2 Intelligent Miner DB2 Intelligent Miner for Data “focused on the large-scale mining, such as large volumes of data, parallel data mining on Windows NT, Sun Solaris, and OS/390” – IBM DB2 Intelligent Miner for Data “focused on the large-scale mining, such as large volumes of data, parallel data mining on Windows NT, Sun Solaris, and OS/390” – IBM

8 Main Functionalities Cluster analysis Cluster analysis –Group the data that share similar trends and patterns Classification Classification –Predict the outcome based on historical data Association analysis Association analysis –Finding frequent patterns.

9

10

11

12

13

14

15

16

17

18 This follows an example from Quinlan’s ID3 Classification

19

20 Classification

21 Classification

22 Association –Association Rule: identifies relationships –Example “ 30% customers buy shirts in all the transactions, 60% of these customers “ 30% customers buy shirts in all the transactions, 60% of these customers will also by a tie” will also by a tie” Confidence factor is 60% Confidence factor is 60% Support – if buying shirt and tie together is observed in 12% of all transactions, then the support is thus 12% Support – if buying shirt and tie together is observed in 12% of all transactions, then the support is thus 12% Lift = 60% / 30%=2 Lift = 60% / 30%=2

23 Association Support Confidence Type Lift Rule Body Rule Head (%) (%) 5.5286 34.0800 + 2.7300 [203] + [1207] => [1716] 7.0388 34.1300 + 2.7400 [203] + [1719]=> [1716] 5.4662 34.1700 + 2.7400 [202] + [802]=> [1716] 5.8805 34.3400 + 2.7500 [203] + [802]=> [1716] 5.0163 34.4900 + 2.7600 [203] + [705]=> [1716] 7.1279 34.7400 + 2.7800 [202] + [1718]=> [1716] 5.8226 34.7600 + 3.3900 [711] + [203]=> [710] 5.0697 34.8300 + 2.7400 [202] + [1702]=> [1703] 5.2836 34.8300 + 2.7400 [202] + [1207]=> [1703] 5.4350 34.9400 + 3.4100 [201] + [711]=> [710] 5.3459 35.0200 + 2.7600 [201] + [1702]=> [1703]

24 Data Mining Products more than 50 commercial data mining tools more than 50 commercial data mining tools Wide range of pricing Wide range of pricing –SAS Institute’s Enterprise Miner ~ $80k –SPSS Inc. Clementine ~ 75K –IBM Intelligent Miner ~ $60k –Desktop products start at few hundred dollars

25 Data Mining Products AlgorithmIBMSASSPSS Neural Network √√√ Decision Tree √√√ Clustering√√ Association√√ Nearest Neighbour √ Kohonen Self- Organizing Map √√ Data Ming Product Comparison on Algorithm

26 Data Mining & Privacy Release limited subset of data Release limited subset of data –Hide attributes that potentially related to personal information Release Encrypted Data Release Encrypted Data Audit to detect misuse of Data Audit to detect misuse of Data Set up Data Mining Controller Set up Data Mining Controller

27 Summary Introduction to Data Mining Introduction to Data Mining A KDD Data Mining Process A KDD Data Mining Process Functionalities of Intelligent Miner Functionalities of Intelligent Miner Commercial Data Mining Tools Commercial Data Mining Tools Data Mining & Privacy Data Mining & Privacy

28 References Angoss Whitepaper: http://www.angoss.com/ProdServ/AnalyticalTools/kseeker/whitepaper.html. Retrieved on Oct26th,2003 http://www.angoss.com/ProdServ/AnalyticalTools/kseeker/whitepaper.html. Retrieved on Oct26th,2003http://www.angoss.com/ProdServ/AnalyticalTools/kseeker/whitepaper.html. Retrieved on Oct26th,2003http://www.angoss.com/ProdServ/AnalyticalTools/kseeker/whitepaper.html. Retrieved on Oct26th,2003 C. Clifton. & D. Marks Security and Privacy Implications of Data Ming.1996 D.W. Abbott, I. P. Matkovsky & J. F. Elder IV. An Evaluation of High-end Data Mining Tools Elder Research. http://www.rgrossman.com/faq/dm-02.htm. Retrieved on Oct28th,2003 http://www.rgrossman.com/faq/dm-02.htm IBM. BD2 Intelligent Mine. http://www-3.ibm.com/software/data/iminer/. http://www-3.ibm.com/software/data/iminer/.http://www-3.ibm.com/software/data/iminer/ Retrieved on Oct26th,2003 Retrieved on Oct26th,2003 J. F. Elder & D. W. Abbott. August, 1988 A comparison of Leading Data Mining Tools J. Han. and M. Kamber. Data Mining: Concepts and Techniques, 2000 http://www.cald.cs.cmu.edu/summerschool03/PrivacyPreservingDM.ppt Retrieved on Nov 10th,2003 http://www.cald.cs.cmu.edu/summerschool03/PrivacyPreservingDM.ppt Retrieved on Nov 10th,2003 Robert Grossman http://www.datamininglab.com/toolcomp.html#comparison. Retrieved on Oct20th,2003 http://www.datamininglab.com/toolcomp.html#comparison SPSS. http://www.spss.com/. Retrieved on Nov12th,2003 SPSS. http://www.spss.com/. Retrieved on Nov12th,2003http://www.spss.com/

29

30 Evolution of Database Technology 1960s: 1960s: –Data collection, database creation, and network DBMS 1970s: 1970s: –Relational data model, relational DBMS implementation 1980s: 1980s: –RDBMS, advanced data models 1990s—2000s: –Data mining and data warehousing, multimedia databases, and Web databases

31 Data Mining: On What Kind of Data? Data Sources Data Sources –Relational database –Data warehouses –Transactional databases –WWW Data types Data types –Audio –Image –Text

32 Output: A Decision Tree for “buys_computer” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes 30..40

33 Neural network kk - f weighted sum Input vector x output y Activation function weight vector w  w0w0 w1w1 wnwn x0x0 x1x1 xnxn

34 0.15 0.29 0.11 0.25 0.09 0.230.32 0.27 Neural network

35

36 Applications of Clustering Pattern Recognition Pattern Recognition Image Processing Image Processing Economic Science (especially market research) Economic Science (especially market research) WWW WWW –Document classification –Cluster Weblog data to discover groups of similar access patterns

37 Data Mining & Privacy Data Mining Tool Mining Controller Data warehouse

38 Examples of Clustering Applications Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs Insurance: Identifying groups of motor insurance policy holders with a high average claim cost Insurance: Identifying groups of motor insurance policy holders with a high average claim cost City-planning: Identifying groups of houses according to their house type, value, and geographical location City-planning: Identifying groups of houses according to their house type, value, and geographical location Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

39 Association Association and pattern analysis – Applications: Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. Basket data analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. – Examples. buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%] major(x, “CS”) ^ takes(x, “DB”)  grade(x, “A”) [1%, 75%]

40 Data Mining: On What Kind of Data? Relational databases Relational databases Data warehouses Data warehouses Transactional databases Transactional databases Advanced DB and information repositories Advanced DB and information repositories –Object-oriented and object-relational databases –Text databases and multimedia databases –Heterogeneous and legacy databases –WWW

41 Steps of a KDD Process Learning the application domain:Learning the application domain: –relevant prior knowledge and goals of application Creating a target data set: data selection Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation: Data reduction and transformation: –Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining Choosing functions of data mining – summarization, classification, regression, association, clustering. Choosing the mining algorithm(s) Choosing the mining algorithm(s) Data mining: search for patterns of interest Data mining: search for patterns of interest Pattern evaluation and knowledge presentation Pattern evaluation and knowledge presentation –visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge Use of discovered knowledge

42 Strength and Weakness Strength –Algorithm breadth –Graphical output –Available for PC and mainframe environment Weakness –No automation –Data has to reside in IBM’s database system


Download ppt "Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang."

Similar presentations


Ads by Google