Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Tang: Introduction to Data Mining (with modification by Ch. Eick) I: Introduction to Data Mining A.Short Preview 1.Initial Definition of Data Mining 2.Motivation.
David Corne, and Nick Taylor, Heriot-Watt University - These slides and related resources:
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
2 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion Objectives, Prerequisite and.
1 SHIM 413 Database Applications for Healthcare Fall 2006 Slides by H. T. Bao.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
HW#2: A Strategy for Mining Association Rules Continuously in POS Scanner Data.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Chapter 2: Data Mining Dr. Goutam Sarker,
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
CSE591: Data Mining by H. Liu
Presentation transcript:

Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department

Data Mining

What is Data Mining ??  Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data.  It is the discovery of knowledge (in the form of rules, trees, frequent patterns etc.) from large volumes of data.  It is the automated process of finding relationships and patterns in stored data.  It is different from the use of SQL queries and other business intelligence tools.

Data Mining – Why is it important?  The explosive growth in data collection.  Data are being generated in enormous quantities.  Data are being collected over long periods of time.  Data are being kept for long periods of time.  Computing power is formidable and cheap.  A variety of Data Mining software is available.

Data Mining: On What Kind of Data?  Relational databases.  Data warehouses.  Transactional databases.  Advanced DB and information repositories.  Object-oriented and object-relational databases.  Spatial databases.  Time-series data and temporal data.  Text databases and multimedia databases.  WWW

Knowledge discovery in databases (KDD)  Knowledge discovery in databases (KDD) is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.  KDD is the automatic extraction of non-obvious, hidden knowledge from large volumes of data.  KDD is the process of extracting previously unknown, valid, and actionable (understandable) information from large databases while Data mining is a step in the KDD process of applying data analysis and discovery algorithms.

The Knowledge Discovery Process  The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. non-trivial process Multiple process valid Justified patterns/models novel Previously unknown useful Can be used understandable by human and machine

Data Mining & KDD Data Mining  Step in KDD process  Consists of particular Data Mining algorithms.  Under specified computational efficiency limitations.  produces specific enumeration of patterns. KDD process  The process of using Data Mining methods (Algorithms) to extract knowledge according to the specifications of measures using the database along with any required preprocessing, and transformations of that database.

What are basic steps of data mining for knowledge discovery?  Define business problem.  Build data mining database.( not easy)  Explore data.  Prepare data for modeling.  ( select variables,rows Constant N-variables Trans variables)  Build model.  Evaluate model.  Deploy model and results.

 The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database.  It includes data selection, cleaning, coding, data mining, and reporting.  Data Mining is the key stage of Knowledge Discovery Process.  The process of finding the desired information from large database. Knowledge Discovery Process

Stages of KDD

The Knowledge Discovery Process KDD is inherently interactive and iterative a step in the KDD process consisting of methods that produce useful patterns or models from the data, under some acceptable computational efficiency limitations Understand the domain and Define problems Collect and Preprocess Data Data Mining Extract Patterns/Models Interpret and Evaluate discovered knowledge Putting the results in practical use

Knowledge Discovery in Databases Process

Data Cleaning and Integration:  Integration of data from different sources  Mapping of attribute names.  Joining different tables.  Elimination of inconsistencies  Imputation of Missing Values (if necessary and possible)  Fill in missing values by some strategy (e.g. default value, average value)  Normalization.

Focusing on task-relevant data:  Selections  Select the relevant rows from the database tables.  Projections  Select the relevant attributes/columns from the database tables.  Transformations  Computation of numerical attributes.  Computation of derived rows and derived attributes/columns.  New attributes.

Basic Data Mining Tasks:  Clustering  Classification  Association Rules  Concept Characterization and Discrimination  Other methods

Evaluation of patterns:  Interestingness of patterns “ A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validation”

Visualization:  Visual Data Mining Present the data in some visual form, allowing the human to get insight into the data, draw conclusions, and directly interact with the data.

Common Types of Information from Data Mining  Associations : identifies occurrences that are linked to a single event.  Sequences : identifies events that are linked over time.  Classification : recognizes patterns that describe the group to which an item belongs.  Clustering : discovers different groupings within the data.  Forecasting : estimates future values.

The Data Mining Process  Required:  Personnel with domain,  Data warehousing, and  Data mining expertise.  Required data selection, data extraction, data cleaning, and data transformation.  Is an iterative and interactive process.

The Data Mining Process Based on the questions being asked and the required ”form” of the output. 1. Select the data mining mechanisms that will use. 2. Make sure the data is properly coded for the selected mechanisms.  Ex. A tool may accept numeric input only 3. Perform rough analysis using traditional tools.  Create a simple prediction using statistics.  The data mining tools must do better than the prediction. 4. Run the tool and examine the results.

Data Mining Tasks Data Mining is generally divided into two tasks: 1. Predictive tasks:  Predict the value of a specific attribute based on the value of other attributes. Prediction Method uses some variables to predict unknown or future values of other variables. 2. Descriptive tasks:  To derive patterns that summarizes the underlying relationship between data. Description Method uses human-interpretable patterns that describe the data.

Data Mining Tasks  Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery [Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive]

Classification  Data defined in terms of attributes, one of which is the class  Find a model for class attribute as a function of the values of other attributes.  Given data is usually divided into training and test sets.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets.  Training Data: used to build the model.  Test data: used to validate the model (determine accuracy of the model).

Classification Example categorical continuous class Test Set Training Set Model Learn Classifier

Classification: Application  Fraud Detection  Goal: Predict fraudulent cases in credit card transactions.  Approach:  Use credit card transactions and the information on its account-holder as attributes.  When does a customer buy, what does he buy, how often he pays on time, etc  Label past transactions as fraud or fair transactions. This forms the class attribute.  derive a model for the class of the transactions.  Use this model to detect fraud by observing credit card transactions on an account.

Clustering  Clustering: Partition data set into clusters.  Cluster: a collection of data objects  Similar to one another within the same cluster.  Dissimilar objects are in different clusters.  Example: Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that:  data points in one cluster are more similar to one another.  data points in separate clusters are less similar to one another.

Data Warehouse  A Warehouse : A storage place for data awaiting use.  Data warehousing is a process for assembling and managing data from various resources for the purpose of gaining a single detailed view of part or all of a business.  Integrated diverse data sources.  Provide support to decision making operations.  Usually based on a relational database and DBMS.

Data Warehouse – why?  For organisational learning to take place data from many sources must be gathered together over time and organised in a consistent and useful way.  Data Warehousing allows an organisation to remember its data and what it has learned about its data.  Data Mining techniques make use of the data in a Data Warehouse and subsequently add their results to it.

Data Warehouse - Contents  A Data Warehouse is a copy of transaction data specifically structured for querying, analysis and reporting.  The data will normally have been transformed when it was copied into the Data Warehouse.  The contents of a Data Warehouse, once acquired, are fixed and cannot be updated or changed later by the transaction system - but they can be added to of course.

Questions? ?