Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.

Slides:



Advertisements
Similar presentations
Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 DATA MINING. 2 Introduction Outline Define data mining Data mining vs. databases Basic data mining tasks Data mining development Data mining issues.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
CIS 674 Introduction to Data Mining
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Smart Web Search Agents Data Search Engines >> Information Search Agents - Traditional searching on the Web is done using one of the following three: -
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining Functionalities
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
Data Mining ICCM
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction to Data Mining
Adrian Tuhtan CS157A Section1
MIS5101: Data Analytics Advanced Analytics - Introduction
Sangeeta Devadiga CS 157B, Spring 2007
כריית מידע -- מבוא ד"ר אבי רוזנפלד.
Data Warehousing and Data Mining
Supporting End-User Access
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques

Definition: Data Mining is defined as finding a hidden information in a database. General database is access as follows : DBMS Database SQL Results PC

Data Mining involves number of algorithms to accomplish the tasks: The algorithms examine the data and determine a model that is closest to the characteristics of the data being examined. Data mining algorithms are categorized as : 1)Model : To fit a model for data 2)Preference: Some criteria must be used to fit one model over another. 3)Search: All algorithms require some technique to search the data.

Data Mining Models and Tasks Data mining Predictive Descriptive Classification Regression Time series analysis Prediction Clustering Summarization Association rules Sequence Theory

Predictive model makes prediction based on the previous result sets ; it uses historical data. For e.g a credit card use might be refused not because of the user’s own credit history, but because of the current purchase is similar to earlier purchases that were subsequently found to be made stolen cards. Here the predictive model is used to predict the credit risk. A descriptive model identifies patterns or relationship

Classification: - Maps data into predefined groups or classes - It is also referred as supervised learning because the classes are defined before examining the data. -E.g whether to make a bank loan and identifying credit risks. -Pattern recognition is a type of classification.

In pattern recognition an input pattern is classified into one of several classes based on its similarity to these predefined classes Example: An airport security screening station used to determine if passenger is terrorist or criminals

Regression: It is used to map a data item to a real valued prediction variable. In regression there is a learning of function that does mapping. Regression assumes that the target data fit into some known type of function (e.g linear, logistic,etc); For e.g A professor want to reach a certain level of savings

Time Series Analysis : The value of an attribute is examined as it varies over time. The values are obtained as evenly spaced(daily,weekly,hourly etc.). The time series plot is used to visualize the time series.

Prediction: Prediction is a type of classification. The only difference is that prediction is predicting a future state rather than current state. e.g Predicting flooding ;

Clustering: Clustering is alternatively referred to as unsupervised learning or segmentation. The clustering is usually accomplished by determining the similarity among data on predefined attributes. For e.g Catlogs of demographic groups;

Summarization : It maps data into subsets with associated simple descriptions. Summarization is also called characterization or generalization. It extracts or derives representative information about the database. For e.g One of many criteria used to compare universities by the U.S News and World Report is the average SAT or ACT score.

Association Rules: An association rule is a model that identifies specific types of data associations. Sequence Discovery: Sequential analysis is used to determine sequential patterns in data.And these patterns are based on a time sequence of actions. They are also similar to associations in that data are found to be related, but the relationship is based on time.

Data Mining versus Knowledge Discovery Databases : Knowledge discovery in databases is the process of finding useful information and patterns in data. While, data mining is the use of algorithms to extract the information and patterns derived by the KDD process.

KDD is a process which has data as an input and the output is useful information. SQL stmt. Database Result

selection preprocessingtransformation Data mining Interpretation Initial data target data Preprocessed data Transformed data Knowledge The KDD process consists of the following five steps:

Some Related Concepts -Database / OLTP -FUZZY sets and FUZZY LOGIC -Information Retrieval -Decision Support System -Dimensional Modeling -Data Warehousing -OLAP

Some Related Concepts -Web Search Engine -Statistics -Machine Learning -Pattern Matching

Database/OLTP Systems -A Database contains the data of an organization or enterprise. -A database follows the database techniques and handles the entire data with respect to its model and relationship among its entities. -To describe the data a data model is design

ER Model Example Employee Job HasJob IDName Job No Job Desc AddressSalary Basic

Fuzzy Sets Fuzzy Logic means reasoning with uncertainty A Set of fuzzy values. -fuzzy values means appropriate values Consider a Fuzzy set F, F = { x | x Є Z+ and x<= 5}

Information Retrieval - Users ComputerIRS Keywords

IR query result measures IR systems consists of a set of documents, Where, D = { D1, D2,…., Dn}. Input to the system is query q ( which contains the keywords). Then, Similarity between the query and each document is calculated by : sim(q,Di). So the effectiveness of the system in processing the query is measured by, precision and recall

IR query result measures Precision = | Relevant and Retrieved | |Retrieved| Recall = | Relevant and Retrieved | |Relevant| Precision value is to answer : “Are all documents retrieved ones ?“ And, Recall value is : “Have all relevant documents been retrieved?”

Decision Support System -Dimensional Modeling A dimension is a collection of logically related attributes and is viewed as an axis for modeling the data. The time dimension : year, time, month, century, decade etc;

Web Search Engine Web Search engines are treated as IR systems. Keywords Search Process Servers

Search Engine Limitations Search Engine is facing a lot of problems: -Abundance Single query cannot retrieve all the database on the Web; -Limited Coverage Though the search engines are available but only limited data is searched by it -Limited Query : Limitations due to search engines. -Limited Customization : lack of knowledge to the user

Machine Learning Machine learning is the area of AI that examines how to write programs that can learn. In data mining machine learning is used for prediction or classification. For data mining applications it follows some model. The two types of machine learning are : - Supervised Learning - Unsupervised learning

Pattern Matching