Download presentation

Presentation is loading. Please wait.

1
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. http://iubio.bio.indiana.edu/treeapp/treeprint-sample1.html

2
© Prentice Hall2 Data Mining Outline –Introduction –Related Concepts –Data Mining Techniques

3
© Prentice Hall3 Introduction Outline Define data mining Define data mining Data mining vs. databases Data mining vs. databases Basic data mining tasks Basic data mining tasks Data mining issues Data mining issues Goal: Provide an overview of data mining.

4
© Prentice Hall4 Introduction Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Users expect more sophisticated information Users expect more sophisticated information How? How? UNCOVER HIDDEN INFORMATION DATA MINING

5
© Prentice Hall5 Data Mining Definition Finding hidden information in a database Finding hidden information in a database Data Mining has been defined as Data Mining has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data”. Similar terms Similar terms –Exploratory data analysis –Data driven discovery –Deductive learning –Discovery Science –Knowledge Discovery

6
© Prentice Hall6 Database Processing vs. Data Mining Processing Query Query –Well defined –SQL Query Query –Poorly defined –No precise query language Output Output – Subset of database Output Output –Not a subset of database

7
© Prentice Hall7 Query Examples Database Database Data Mining Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Find all credit applicants with last name of Smith. – Identify customers who have purchased more than $10,000 in the last month. – Find all credit applicants who are poor credit risks. (classification) – Identify customers with similar buying habits. (Clustering)

8
© Prentice Hall8 Data Mining Models and Tasks

9
© Prentice Hall9 Basic Data Mining Tasks I Classification maps data into predefined groups or classes Classification maps data into predefined groups or classes –Supervised learning –Pattern recognition –Prediction Regression is used to map a data item to a real valued prediction variable. Regression is used to map a data item to a real valued prediction variable. Clustering groups similar data together into clusters. Clustering groups similar data together into clusters. –Unsupervised learning –Segmentation –Partitioning H =1.31 (Fem + Fib) + 63.05

10
© Prentice Hall10 Basic Data Mining Tasks II Summarization maps data into subsets with associated simple descriptions. Summarization maps data into subsets with associated simple descriptions. –Characterization –Generalization Link Analysis uncovers relationships among data. Link Analysis uncovers relationships among data. –Affinity Analysis –Association Rules –Sequential Analysis determines sequential patterns.

11
© Prentice Hall11 KDD Process Selection: Obtain data from various sources. Selection: Obtain data from various sources. Preprocessing: Cleanse data. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner. Interpretation/Evaluation: Present results to user in meaningful manner. Modified from [FPSS96C]

12
© Prentice Hall12 KDD Process Ex: Shuttle Data Selection: Selection: –Select data (which missions etc) to use Preprocessing: Preprocessing: – Remove Spikes Transformation: Transformation: –DFT, DWT, PAA etc Data Mining: Data Mining: –Look for Rules… Interpretation/Evaluation: Interpretation/Evaluation: –Show rules to domain experts Potential User Applications: Potential User Applications: –Prediction of Failures 01002003004005006007008009001000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 01002003004005006007008009001000

13
© Prentice Hall13 Data Mining Development Similarity Measures Hierarchical Clustering IR Systems Imprecise Queries Textual Data Web Search Engines Bayes Theorem Regression Analysis EM Algorithm K-Means Clustering Time Series Analysis Neural Networks Decision Tree Algorithms Algorithm Design Techniques Algorithm Analysis Data Structures Relational Data Model SQL Association Rule Algorithms Data Warehousing Scalability Techniques

14
© Prentice Hall14 KDD Issues Human Interaction Human Interaction Overfitting Overfitting Outliers Outliers Interpretation Interpretation Visualization Visualization Large Datasets Large Datasets High Dimensionality High Dimensionality

15
© Prentice Hall15 KDD Issues (cont’d) Multimedia Data Multimedia Data Missing Data Missing Data Irrelevant Data Irrelevant Data Noisy Data Noisy Data Changing Data (streams) Changing Data (streams) Integration Integration Application Application

16
© Prentice Hall16 Social Implications of DM Privacy Privacy Profiling Profiling Unauthorized use Unauthorized use

17
© Prentice Hall17 Data Mining Metrics Usefulness Usefulness Return on Investment (ROI) Return on Investment (ROI) Accuracy Accuracy Space/Time Complexity Space/Time Complexity

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google