Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.

Slides:



Advertisements
Similar presentations
modified by Marius Bulacu
Advertisements

1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
July 13, 2015ICS426: Introduction1 DATA WAREHOUSING AND DATA MINING.
Data Mining – Intro.
Data mining By Aung Oo.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Business Intelligence: Essential of Business
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction Instructor: Dan Hebert.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
Knowledge Discovery in Database (KDD). The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
1 Data Mining Chapter 34 in textbook + Chapter 4 in DATA MINING by P. Adriaans and D. Zantinge.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
CENG 770. Data mining (knowledge discovery from data) – Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful)
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS 451 Building Business Intelligence Systems
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques Course Outline
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Data Mining: Concepts & Techniques

Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories We are drowning in data, but starving for knowledge! Solution: Data warehousing and data mining –Data warehousing and on-line analytical processing –Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Evolution of Database Technology

What Is Data Mining? Data mining (knowledge discovery in databases): –Extraction of interesting ( non-trivial, implicit, previously unknown and potentially useful) information or patterns from data in large databases Alternative names and their “ inside stories ” : –Data mining: a misnomer? –Knowledge discovery(mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. What is not data mining? –(Deductive) query processing. – Expert systems or small ML/statistical programs

Data Mining: A KDD Process Data mining: the core of knowledge discovery process

Steps of a KDD Process Learning the application domain: –relevant prior knowledge and goals of application Creating a target data set: data selection Data cleaning and preprocessing: (may take 60% of effort!) Data reduction and transformation: –Find useful features, dimensionality/variable reduction, invariant representation. Choosing functions of data mining – summarization, classification, regression, association, clustering. Choosing the mining algorithm(s) Data mining: search for patterns of interest Pattern evaluation and knowledge presentation –visualization, transformation, removing redundant patterns, etc. Use of discovered knowledge

The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database –It includes data selection, cleaning, enrichment, coding, data mining, and reporting –Data Mining is the key stage of Knowledge Discovery Process The process of finding the desired information from large database Knowledge Discovery Process

Example: the database of a magazine publisher which sells five types of magazines – on cars, houses, sports, music and comics –Data mining: Find interesting categorical properties –Questions: What is the profile of a reader of a car magazine? Is there any correlation between an interest in cars and an interest in comics? The knowledge discovery process consists of six stages

Data Selection Select the information about people who have subscribed to a magazine

Pollutions: Type errors, moving from one place to another without notifying change of address, people give incorrect information about themselves –Pattern Recognition Algorithms Cleaning

Lack of domain consistency Cleaning

Enrichment Need extra information about the clients consisting of date of birth, income, amount of credit, and whether or not an individual owns a car or a house

The new information need to be easily joined to the existing client records –Extract more knowledge Enrichment

We select only those records that have enough information to be of value (row) Project the fields in which we are interested (column) Coding

Code the information which is too detailed –Address to region –Birth date to age –Divide income by 1000 –Divide credit by 1000 –Convert cars yes-no to 1-0 –Convert purchase date to month numbers starting from 1990 The way in which we code the information will determine the type of patterns we find Coding has to be performed repeatedly in order to get the best results Coding

The way in which we code the information will determine the type of patterns we find

We are interested in the relationships between readers of different magazines –Perform flattening operation Coding

We may find the following rules –A customer with credit > and aged between 22 and 31 who has subscribed to a comics at time T will very likely subscribe to a car magazine five years later –The number of house magazines sold to customers with credit between and living in region 4 is increasing –A customer with credit between 5000 and who reads a comics magazine will very likely become a customer with credit between and who reads a sports and a house magazine after 12 years Data mining

Knowledge Discovery Process

Business-Question-Driven Process

Data Mining and Business Intelligence Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP

Architecture of a Typical Data Mining System

Data Mining: On What Kind of Data? Relational databases Data warehouses Transactional databases Advanced DB and information repositories –Object-oriented and object-relational databases –Spatial databases –Time-series data and temporal data –Text databases and multimedia databases –Heterogeneous databases –WWW