1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.

Slides:



Advertisements
Similar presentations
Supporting End-User Access
Advertisements

DATA MINING Introductory
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
CIS 674 Introduction to Data Mining
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: An Introduction Wing Kee Ho Xiaohua Luan.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Southern Methodist University
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
1 DATA MINING Source : Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining By Dave Maung.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part I Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
DATA MINING By Cecilia Parng CS 157B.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Chapter 2: Data Mining Dr. Goutam Sarker,
Data Mining – Intro.
DATA MINING © Prentice Hall.
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
Introduction C.Eng 714 Spring 2010.
DATA MINING Introductory and Advanced Topics Part I
Sangeeta Devadiga CS 157B, Spring 2007
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
DATA MINING Introductory and Advanced Topics Part I
Data Warehousing and Data Mining
Supporting End-User Access
DATA MINING Introductory and Advanced Topics Part I
DATA MINING Source : Margaret H. Dunham
Presentation transcript:

1 Introduction to Data Mining C hapter 1

2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining

3 Introduction

4

5 Information is Power Relevant Relevant Right Information Right Information Globalised world Globalised world Vast amount of information available Vast amount of information available

6 What is an information a collection of data a collection of data The act of human analysis and interpretation of activities The act of human analysis and interpretation of activities Decomposing it into various components and tackling them Decomposing it into various components and tackling them

7 What is Knowledge? The act of human synthesis and evaluation of information The act of human synthesis and evaluation of information Integration of the relevant components and form as a relevant whole system. Integration of the relevant components and form as a relevant whole system.

8 Data Mining Definition I The nontrivial extraction of hidden, previously unidentified, and potentially valuable knowledge from data The nontrivial extraction of hidden, previously unidentified, and potentially valuable knowledge from data A variety of techniques such as neural networks, decision trees or standard statistical techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in areas such as decision support, prediction, forecasting, and estimation. A variety of techniques such as neural networks, decision trees or standard statistical techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in areas such as decision support, prediction, forecasting, and estimation.

9 Data Mining Definition II Finding hidden information in a database Finding hidden information in a database

10 Hidden Information Number of years of experiences Number of years of experiences Great secret recipes Great secret recipes Success Factors Success Factors

11 Database Processing vs. Data Mining Processing Query Query –Well defined –SQL Query Query –Poorly defined –No precise query language Data Data – Operational data Output Output – Precise – Subset of database Data Data – Not operational data Output Output – Fuzzy – Not a subset of database

12 Query Examples Database Database Data Mining Data Mining – Find all customers who have purchased bread – Find all items which are frequently purchased with bread. (association rules) – Find all credit applicants with surname name of Lee. – Identify customers who have purchased more than $100,000 in the last year. – Find all credit applicants who are good credit risks. (classification) – Identify customers with similar eating habits. (Clustering)

13 Data Mining Models and Tasks

14 Data Mining vs. KDD Knowledge Discovery in Databases (KDD): process of finding useful information and patterns in data. Knowledge Discovery in Databases (KDD): process of finding useful information and patterns in data. Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process. Data Mining: Use of algorithms to extract the information and patterns derived by the KDD process.

15 KDD Process Selection ( Pre-Mining 1): Obtain data from various sources. Selection ( Pre-Mining 1): Obtain data from various sources. Preprocessing (Pre-Mining 2) : Cleanse data. Preprocessing (Pre-Mining 2) : Cleanse data. Transformation (Pre-Mining 3): Convert to common format. Transform to new format. Transformation (Pre-Mining 3): Convert to common format. Transform to new format. Data Mining: Obtain desired results. Data Mining: Obtain desired results. Interpretation/Evaluation (Post-Mining): Present results to user in meaningful manner. Interpretation/Evaluation (Post-Mining): Present results to user in meaningful manner. Modified from [FPSS96C]

16 KDD Process Ex: Web Log Selection: Selection: –Select log data (dates and locations) to use Preprocessing: Preprocessing: – Remove identifying URLs – Remove error logs Transformation: Transformation: –Sessionize (sort and group) Data Mining: Data Mining: –Identify and count patterns –Construct data structure Interpretation/Evaluation: Interpretation/Evaluation: –Identify and display frequently accessed sequences. Potential User Applications: Potential User Applications: –Cache prediction –Personalisation

17 Data Mining Development Similarity Measures Hierarchical Clustering IR Systems Imprecise Queries Textual Data Web Search Engines Bayes Theorem Regression Analysis EM Algorithm K-Means Clustering Time Series Analysis Neural Networks Decision Tree Algorithms Algorithm Design Techniques Algorithm Analysis Data Structures Relational Data Model SQL Association Rule Algorithms Data Warehousing Scalability Techniques