We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Modified over 4 years ago
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining, Introductory and Advanced Topics, Prentice Hall, 2002. http://iubio.bio.indiana.edu/treeapp/treeprint-sample1.html
© Prentice Hall2 Data Mining Outline –Introduction –Related Concepts –Data Mining Techniques
© Prentice Hall3 Introduction Outline Define data mining Define data mining Data mining vs. databases Data mining vs. databases Basic data mining tasks Basic data mining tasks Data mining issues Data mining issues Goal: Provide an overview of data mining.
© Prentice Hall4 Introduction Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Data is growing at a phenomenal rate ( read “How Much Information Is There In the World?” By Michael Lesk ) Users expect more sophisticated information Users expect more sophisticated information How? How? UNCOVER HIDDEN INFORMATION DATA MINING
© Prentice Hall5 Data Mining Definition Finding hidden information in a database Finding hidden information in a database Data Mining has been defined as Data Mining has been defined as “The nontrivial extraction of implicit, previously unknown, and potentially useful information from data”. Similar terms Similar terms –Exploratory data analysis –Data driven discovery –Deductive learning –Discovery Science –Knowledge Discovery
© Prentice Hall6 Database Processing vs. Data Mining Processing Query Query –Well defined –SQL Query Query –Poorly defined –No precise query language Output Output – Subset of database Output Output –Not a subset of database
© Prentice Hall7 Query Examples Database Database Data Mining Data Mining – Find all customers who have purchased milk – Find all items which are frequently purchased with milk. (association rules) – Find all credit applicants with last name of Smith. – Identify customers who have purchased more than $10,000 in the last month. – Find all credit applicants who are poor credit risks. (classification) – Identify customers with similar buying habits. (Clustering)
© Prentice Hall8 Data Mining Models and Tasks
© Prentice Hall9 Basic Data Mining Tasks I Classification maps data into predefined groups or classes Classification maps data into predefined groups or classes –Supervised learning –Pattern recognition –Prediction Regression is used to map a data item to a real valued prediction variable. Regression is used to map a data item to a real valued prediction variable. Clustering groups similar data together into clusters. Clustering groups similar data together into clusters. –Unsupervised learning –Segmentation –Partitioning H =1.31 (Fem + Fib) + 63.05
© Prentice Hall10 Basic Data Mining Tasks II Summarization maps data into subsets with associated simple descriptions. Summarization maps data into subsets with associated simple descriptions. –Characterization –Generalization Link Analysis uncovers relationships among data. Link Analysis uncovers relationships among data. –Affinity Analysis –Association Rules –Sequential Analysis determines sequential patterns.
© Prentice Hall11 KDD Process Selection: Obtain data from various sources. Selection: Obtain data from various sources. Preprocessing: Cleanse data. Preprocessing: Cleanse data. Transformation: Convert to common format. Transform to new format. Transformation: Convert to common format. Transform to new format. Data Mining: Obtain desired results. Data Mining: Obtain desired results. Interpretation/Evaluation: Present results to user in meaningful manner. Interpretation/Evaluation: Present results to user in meaningful manner. Modified from [FPSS96C]
© Prentice Hall12 KDD Process Ex: Shuttle Data Selection: Selection: –Select data (which missions etc) to use Preprocessing: Preprocessing: – Remove Spikes Transformation: Transformation: –DFT, DWT, PAA etc Data Mining: Data Mining: –Look for Rules… Interpretation/Evaluation: Interpretation/Evaluation: –Show rules to domain experts Potential User Applications: Potential User Applications: –Prediction of Failures 01002003004005006007008009001000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 01002003004005006007008009001000
© Prentice Hall13 Data Mining Development Similarity Measures Hierarchical Clustering IR Systems Imprecise Queries Textual Data Web Search Engines Bayes Theorem Regression Analysis EM Algorithm K-Means Clustering Time Series Analysis Neural Networks Decision Tree Algorithms Algorithm Design Techniques Algorithm Analysis Data Structures Relational Data Model SQL Association Rule Algorithms Data Warehousing Scalability Techniques
© Prentice Hall14 KDD Issues Human Interaction Human Interaction Overfitting Overfitting Outliers Outliers Interpretation Interpretation Visualization Visualization Large Datasets Large Datasets High Dimensionality High Dimensionality
© Prentice Hall15 KDD Issues (cont’d) Multimedia Data Multimedia Data Missing Data Missing Data Irrelevant Data Irrelevant Data Noisy Data Noisy Data Changing Data (streams) Changing Data (streams) Integration Integration Application Application
© Prentice Hall16 Social Implications of DM Privacy Privacy Profiling Profiling Unauthorized use Unauthorized use
© Prentice Hall17 Data Mining Metrics Usefulness Usefulness Return on Investment (ROI) Return on Investment (ROI) Accuracy Accuracy Space/Time Complexity Space/Time Complexity
DATA MINING Introductory
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
DATA MINING Introductory and Advanced Topics Part I
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Data Mining By Archana Ketkar.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CIS 674 Introduction to Data Mining
Data Mining: An Introduction Wing Kee Ho Xiaohua Luan.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
DATA MINING Part I IIIT Allahabad Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Southern Methodist University
© 2020 SlidePlayer.com Inc. All rights reserved.