Data Mining – Intro.

Slides:



Advertisements
Similar presentations
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advertisements

modified by Marius Bulacu
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining: Concepts and Techniques
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Course Overview. About the course – Administrivia Instructor: George Kollios, MCS 288, Mon 2:30-4:00PM.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
Geographic Data Mining Marc van Kreveld Seminar for GIVE Block 1, 2003/2004.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Lingma Acheson Department of Computer and Information Science, IUPUI
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Data Mining: Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150 DW Chapter 1. Introduction Instructor: Dan Hebert.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) This.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
MIS2502: Data Analytics Advanced Analytics - Introduction.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining Functionalities
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining.
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining – Intro.
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Presentation transcript:

Data Mining – Intro

Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining

Data Mining Overview Data Mining Data warehouses and OLAP (On Line Analytical Processing.) Association Rules Mining Clustering: Hierarchical and Partitional approaches Classification: Decision Trees and Bayesian classifiers Sequential Patterns Mining Advanced topics: outlier detection, web mining

What is Data Mining? Data Mining is: (1) The efficient discovery of previously unknown, valid, potentially useful, understandable patterns in large datasets (2) The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner

What is Data Mining? Very little functionality in database systems to support mining applications Beyond SQL Querying: SQL (OLAP) Query: - How many widgets did we sell in the 1st Qtr of 1999 in California vs New York? Data Mining Queries: - Which sales region had anomalous sales in the 1st Qtr of 1999 - How do the buyers of widgets in California and New York differ? - What else do the buyers of widgets in Cal buy along with widgets

Overview of terms Data: a set of facts (items) D, usually stored in a database Pattern: an expression E in a language L, that describes a subset of facts Attribute: a field in an item i in D. Interestingness: a function ID,L that maps an expression E in L into a measure space M

Overview of terms The Data Mining Task: For a given dataset D, language of facts L, interestingness function ID,L and threshold c, find the expression E such that ID,L(E) > c efficiently.

Examples of Large Datasets Government: IRS, … Large corporations WALMART: 20M transactions per day MOBIL: 100 TB geological databases AT&T 300 M calls per day Scientific NASA, EOS project: 50 GB per hour Environmental datasets

Examples of Data mining Applications 1. Fraud detection: credit cards, phone cards 2. Marketing: customer targeting 3. Data Warehousing: Walmart 4. Astronomy 5. Molecular biology

How Data Mining is used 1. Identify the problem 2. Use data mining techniques to transform the data into information 3. Act on the information 4. Measure the results

The Data Mining Process 1. Understand the domain 2. Create a dataset: Select the interesting attributes Data cleaning and preprocessing 3. Choose the data mining task and the specific algorithm 4. Interpret the results, and possibly return to 2

Data Mining Tasks 1. Classification: learning a function that maps an item into one of a set of predefined classes 2. Regression: learning a function that maps an item to a real value 3. Clustering: identify a set of groups of similar items

Data Mining Tasks 4. Dependencies and associations: identify significant dependencies between data attributes 5. Summarization: find a compact description of the dataset or a subset of the dataset

Data Mining Methods 1. Decision Tree Classifiers: Used for modeling, classification 2. Association Rules: Used to find associations between sets of attributes 3. Sequential patterns: Used to find temporal associations in time series 4. Hierarchical clustering: used to group customers, web users, etc

Are All the “Discovered” Patterns Interesting? Interestingness measures: A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm Objective vs. subjective interestingness measures: Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc. A data mining system/query may generate thousands of patterns, not all of them are interesting. Suggested approach: Human-centered, query-based, focused mining

Can We Find All and Only Interesting Patterns? Find all the interesting patterns: Completeness Can a data mining system find all the interesting patterns? Association vs. classification vs. clustering Search for only interesting patterns: Optimization Can a data mining system find only the interesting patterns? Approaches First general all the patterns and then filter out the uninteresting ones. Generate only the interesting patterns—mining query optimization

Data Mining: Classification Schemes General functionality Descriptive data mining Predictive data mining Different views, different classifications Kinds of databases to be mined Kinds of knowledge to be discovered Kinds of techniques utilized Kinds of applications adapted

A Multi-Dimensional View of Data Mining Classification Databases to be mined Relational, transactional, object-oriented, object-relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW, etc. Knowledge to be mined Characterization, discrimination, association, classification, clustering, trend, deviation and outlier analysis, etc. Multiple/integrated functions and mining at multiple levels Techniques utilized Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. Applications adapted Retail, telecommunication, banking, fraud analysis, DNA mining, stock market analysis, Web mining, Weblog analysis, etc.

Other issues Numerical or quantitative attributes (real values) vs categorical attributes: ordinal (education) vs nominal (male or female) Classification is used for categorical attributes, Regression for numerical Overfitting (statistics vs data mining)

Major Issues in Data Mining (1) Mining methodology and user interaction Mining different kinds of knowledge in databases Interactive mining of knowledge at multiple levels of abstraction Incorporation of background knowledge Data mining query languages and ad-hoc data mining Expression and visualization of data mining results Handling noise and incomplete data Pattern evaluation: the interestingness problem

Major Issues in Data Mining (2) Performance and scalability Efficiency and scalability of data mining algorithms Parallel, distributed and incremental mining methods Issues relating to the diversity of data types Handling relational and complex types of data Mining information from heterogeneous databases and global information systems (WWW)