Introduction to KDD.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Overview of Data Mining and the KDD Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Chapter 5: Data Mining for Business Intelligence
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1. Introduction Motivation: Why data mining?
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Knowledge Discovery and Data Mining Evgueni Smirnov.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) This.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Functionalities
Data Mining.
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
MIS 451 Building Business Intelligence Systems
Data warehouse & Data Mining: Concepts and Techniques
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Introduction to Data Mining
Data Mining: Concepts and Techniques
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Data Mining Introduction
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Presentation transcript:

Introduction to KDD

Evolution of Database Technology 1950s: First computers, use of computers for census. 1960s: Data collection, database creation (hierarchical and network models) 1970s: Relational data model, relational DBMS implementation 1980s: Ubiquitous RDBMS, advanced data models (extended-relational, OO, deductive, etc.) and application-oriented DBMS (spatial, scientific, engineering, etc) 1990s: Data mining and data warehousing, massive media digitization, multimedia databases and Web technology

Data Rich but Information Poor Databases are too big. Terrorbytes Data Mining can help discover knowledge Terrorbytes

What should we do? We are not trying to find the needle in the hay stock because DBMSs know how to do that. We are merely trying to understand the consequences of the presence of the needle if it exists.

What Led Us To This? Necessity is the Mother of Invention Technology is available to help us collect data. Barcode, scanners, satellites, cameras, etc. Technology is available to help us store data. Database, data warehouses, variety of responses We are starving for knowledge (competitive edge, research, etc). We are swamped by data that continually pour on us. We do not know what to do with this data. We need to interpret this data in search for new knowledge.

What is our Need? Knowledge Data Extract interesting knowledge (rules, regularities, patterns, constraints) from data in large collections. Knowledge Data

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

Data Collected (SOURCE) Business transactions Scientific data Medical and personal data Surveillance video and pictures Satellite sensing Games

Data Collected (Format) Digital Media CAD and Software Engineering Virtual Worlds Text Reports and Memos The World Wide Web

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

Knowledge Discovery Process of non-trivial extraction of implicit, previously unknown and potentially useful information from large collections of data.

Many Steps in KDD Process Gathering the data together. Cleanse the data and fit it together. Select the necessary data. Crunch and squeeze the data to extract the essence of it. Evaluate the output and use it.

So What is Data Mining? In theory, Data Mining is a STEP in the knowledge discovery process. It is the extraction of implicit information from a large data set. In practice, data mining and knowledge discovery are becoming synonymous.

So What is Data Mining? There are other equivalent terms: KDD, knowledge extraction, discovery of regularities, patterns discovery, data archeology, data dredging, business intelligence, information harvesting Shouldn’t it be called knowledge mining instead of data mining?

Data Mining: A KDD Process Knowledge Pattern Evaluation Data mining: the core of knowledge discovery process. Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Knowledge Pattern Evaluation Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Knowledge Multiple data sources, often heterogeneous, may be combined in a common source Pattern Evaluation Data Mining Task-relevant Data Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Knowledge Noise data and irrelevant data are removed from the collections Pattern Evaluation Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Data relevant to the analysis is decided on and retrieved from the data collection Knowledge Pattern Evaluation Data Mining Task-relevant Data Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Selected data is transformed into forms appropriate for the mining procedure Knowledge Pattern Evaluation Data Mining Task-relevant Data Data Selection and Data transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Clever techniques are applied to extract patterns potentially useful Knowledge Pattern Evaluation Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Strictly interesting patterns representing knowledge are identified based on given measures Knowledge Pattern Evaluation Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

Data Mining: A KDD Process Knowledge Knowledge representation: Uses visualization techniques to help users understand and interpret the data mining results Pattern Evaluation Data Mining Task-relevant Data Selection and transformation Data Warehouse Data Cleaning Data Integration Databases

KDD is an Iterative Process

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

What Kind of Data Can be Mined? Flat Files Heterogeneous and legacy databases Relational databases And other DB: Object-oriented and object-relational data bases Transactional databases Transaction(TID, Timestamp, {item1, items,..})

What Kind of Data Can be Mined? Data warehouses

What Kind of Data Can be Mined? Multimedia Databases

What Kind of Data Can be Mined? Spatial Databases

What Kind of Data Can be Mined? Time-Series Databases

What Kind of Data Can be Mined? Text Documents

What Kind of Data Can be Mined? World Wide Web Components of WWW: Content of the web Structure of the web Usage of the web

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

What can be discovered? What can be discovered depends on the data mining task employed. Descriptive DM Describe general properties Predictive DM Tasks Infer on available data

Data Mining Functionality Characterization Summarization of general features of objects in a target class (concept description) Ex: Characterize graduate students in Science Discrimination Comparison of general features of objects between a target class and a contrasting class (concept description) Ex: Compare students is Science and students in Arts

Data Mining Functionality Association Studies the frequency of items occurring together in transactional databases Ex: buys(x, bread)  buys(x, milk) Prediction Predicts some unknown or missing attribute values based on other information Ex: Forecast the sale value for next week based on available data.

Data Mining Functionality Classification Organizes data in given classes based on attribute values (supervised classification) Ex: classify students based on final results Clustering Organizes data in classes based on attribute values (unsupervised classification) Ex: group crime locations to find distribution patterns.

Data Mining Functionality Outlier analysis Identifies and explains exceptions (surprises) Time-series analysis: Analyzes trends and deviations, regression, sequential pattern, similar sequences

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

Is all that is Discovered Interesting? A data mining operation may generate thousands of patterns, not all of them are interesting. Data Mining results are sometimes so large that we may need to mine it too (Meta-Mining?) How to measure? INTERESTINGNESS

Interestingness Objective interestingness measures: Based on statistics and structure of patterns, e.g., support, confidence, etc. Subjective interestingness measures: Based on user’s beliefs in data, e.g., unexpectedness, novelty

Interestingness Measures: A pattern is interesting if it is: Easily understood by humans Valid on new or test data with some degree of certainty Potentially useful Novel, or validates some hypothesis that a user seeks to confirm

Introduction: Outline What kind of information are we collecting? What are Data Mining and Knowledge Discovery? What kind of data can be mined? What can be discovered? Is all that is discovered interesting and useful? Are there application examples?

Potential and/or Successful Applications Business data analysis and decision support Marketing focalization Recognizing specific market segments that respond to particular characteristics Return on mailing campaign (target marketing) Customer profiling Segmentation of customer for marketing strategies and/or product offerings Customer behavior understanding Customer retention and loyalty

Potential and/or Successful Applications Business data analysis and decision support Market analysis and management Provide summary information for decision-making Market basket analysis, cross selling, market segmentation Risk analysis and management What if analysis Forecasting Pricing analysis, competitive analysis Time-series analysis (ex. Stock market)

Potential and/or Successful Applications Fraud detection Detection of telephone fraud -Telephone call model: destination of the call, duration, time of the day or week. Analyze patterns that deviate from an expected norm. Detecting automotive and health insurance fraud Detection of credit-card fraud Detecting suspicious money transactions (money laundering)

Potential and/or Successful Applications Text Mining Message filtering (e-mail, newsgroups, etc) Newspaper article analysis Medicine Association pathology – symptoms DNA

Potential and/or Successful Applications Sports IBM Advanced Scout analyzed NBA game statistics (shots, blocks, fouls, assists) to gain competitive edge. Astronomy JPL and Palomar Observatory discovered 22 quasars with the help of data mining Identifying volcanoes on Jupiter

Potential and/or Successful Applications Surveillance cameras Use of stereo cameras and outlier analysis to detect suspicious activities or individuals Web surfing and mining IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to discover customer preference and behavior pages (e-commerce) Adaptive web sites / improving Web site organization, etc.

Warning: Data Mining Should Not be Used Blindly! Data mining find regularities from history, but history is not the same as the future Association does not dictate trend or causality Drink diet drinks lead to obesity

DBMS Transaction Subsystem Transaction manager coordinates transactions on behalf of application programs It communicates with the scheduler Transaction manager Scheduler Recovery Buffer Systems Access File The scheduler handles concurrency control. Its objective is to maximize concurrency without allowing simultaneous transactions to interfere with one another The recovery manager ensures that the database is restored to the right state before a failure occurred. The buffer manager is responsible for the transfer of data between disk storage and main memory.

Thank You