Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.

Similar presentations


Presentation on theme: "Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization."— Presentation transcript:

1

2 Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization

3 Data Mining Outline Introduction Classification Clustering Association Rules

4 Data Mining Outline Introduction Classification Clustering Association Rules

5 Introduction Data is growing at a phenomenal rate Users expect more sophisticated information How? UNCOVER HIDDEN INFORMATION DATA MINING

6 Data Mining Definition Finding hidden information in a database Fit data to a model: descriptive or predictive Similar terms –Exploratory data analysis –Data driven discovery –Deductive learning

7 But it isn’t Magic You must know what you are looking for You must know how to look for it Suppose you knew that a specific cave had gold: What would you look for? How would you look for it? Might need an expert miner

8 “ If it looks like a duck, walks like a duck, and quacks like a duck, then it’s a duck.” Description BehaviorAssociations Classification Clustering Link Analysis “ If it looks like a terrorist, walks like a terrorist, and quacks like a terrorist, then it’s a terrorist.”

9 Query Examples Database Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Find all credit applicants with last name of Smith. – Identify customers who have purchase more than $10,000 in last month. – Find all credit applicants who are poor credit risks. (classification) – Identify customers with similar buying habits. (Clustering)

10 KDD Process Selection: Obtain data from various sources. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner. © Prentice Hall

11 Data Mining Outline Introduction Classification – Assign data to a predefined class –Decision Trees –Neural Networks –Distance Based Clustering Association Rules

12 Insect ID Abdomen Length Antennae Length Insect Class 12.75.5 Grasshopper 28.09.1 Katydid 30.94.7 Grasshopper 41.13.1 Grasshopper 55.48.5 Katydid 62.91.9 Grasshopper 76.16.6 Katydid 80.51.0 Grasshopper 98.36.6 Katydid 108.14.7 Katydid 11 5.17.0 ??????? The classification problem can now be expressed as: Given a training database predict the class label of a previously unseen instance Given a training database predict the class label of a previously unseen instance previously unseen instance =

13 Classification Process (1): Model Construction Training Data Classification Algorithms IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’ Classifier (Model)

14 Classification Process (2): Use the Model in Prediction Classifier Testing Data Unseen Data (Jeff, Professor, 4) Tenured?

15 Training Dataset This follows an example from Quinlan’s ID3

16 Output: A Decision Tree for “ buys_computer ” age? overcast student?credit rating? noyes fair excellent <=30 >40 no yes 30..40

17 Neural Network Example Tuple Input Output

18 Data Mining Outline Introduction Classification Clustering – Place data into groups –Hierarchical –K-Means –Partitional Association Rules

19 Clustering Examples Segment customer database based on similar buying patterns. Group houses in a town into neighborhoods based on similar features. Identify new plant species Identify similar Web usage patterns

20 Clustering vs. Classification No prior knowledge –Number of clusters –Meaning of clusters Unsupervised learning

21 Data Mining Outline Introduction Classification Clustering Association Rules – Find relationships between data –Apriori

22 Association Rules Example I = { Beer, Bread, Jelly, Milk, PeanutButter} Support of {Bread,PeanutButter} is 60%

23 Association Rules Ex (cont’d)

24 AR & Market Baskets Determine items often purchased together (Marketbasket Data) Determine optimal placement of data on store floor Determine items for sales and/or specials Increase sales of items www.amazon.com

25 Summary Data Mining is a fast growing area with many applications. Data Mining algorithms are usually computationally expensive. Data Mining tools may be difficult to use effectively.


Download ppt "Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization."

Similar presentations


Ads by Google