An Introduction to Data Mining. Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Supporting End-User Access
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
Chapter 14 The Second Component: The Database.
July 13, 2015ICS426: Introduction1 DATA WAREHOUSING AND DATA MINING.
Data Mining – Intro.
Data mining By Aung Oo.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Mining.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Special Topics in Data Mining. Direct Objectives To learn data mining techniques To see their use in real-world/research applications To get an understanding.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
1 Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
Data Mining Copyright KEYSOFT Solutions.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING It is a process of extracting interesting(non trivial, implicit, previously, unknown and useful ) information from any data repository. The.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Book web site:
Data Mining.
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
MIS2502: Data Analytics Advanced Analytics - Introduction
An Introduction to Data Mining
Introduction to Data Mining
MIS 451 Building Business Intelligence Systems
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Supporting End-User Access
Presentation transcript:

An Introduction to Data Mining

Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast amount of data.  It is the process used to find new, hidden or unexpected patterns in data to predict the future of the business.  It is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data.

Data mining zProcess of semi-automatically analyzing large databases to find patterns that are: yvalid: hold on new data with some certainty ynovel: non-obvious to the system yuseful: should be possible to act on the item yunderstandable: humans should be able to interpret the pattern zAlso known as Knowledge Discovery in Databases (KDD)

zData Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data zThe ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. zThe process of data mining consists of three stages: y (1) the initial exploration, y(2) model building or pattern identification with validation/verification, and y(3) deployment (i.e., the application of the model to new data in order to generate predictions).

DATA MININING zData Mining refers to extracting or ‘Mining ‘ Knowledge from large amounts of data. zMining is is the characterization of process of extracting precious material from set of raw materials.

The KDD process zProblem formulation zData collection ysubset data: sampling might hurt if highly skewed data yfeature selection: principal component analysis, heuristic search zPre-processing: cleaning yname/address cleaning, different meanings (annual, yearly), duplicate removal, supplying missing values zTransformation: ymap complex objects e.g. time series data to features e.g. frequency zChoosing mining task and mining method: zResult evaluation and Visualization: Knowledge discovery is an iterative process

Knowledge Discovery Process Phases: 1.Data Selection 2.Data Integration 3.Data Cleaning 4.Enrichment 5.Data Transformation or encoding 6.Data Mining

 Data selection, is about specific items or categories of items from stores in a specific region or area of the country may be selected.  Data integration is where multiple data sources are integrated.  The data cleaning process then may be correct invalid zip codes or eliminate records with incorrect phone prefixes.  Enrichment typically enhances the data with additional sources of information.  Data transformation and encoding may be done to reduce the amount of data.  Data mining techniques are used to mine different rules and patterns Knowledge Discovery Process

____ __ __ Transformed Data Patterns and Rules Target Data Raw Dat a Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge

Why Use Data Mining Today? Human analysis skills are inadequate: yVolume and dimensionality of the data yHigh data growth rate Availability of: yData yStorage yComputational power yOff-the-shelf software yExpertise

Why Data Mining zCredit ratings/targeted marketing : yGiven a database of 100,000 names, which persons are the least likely to default on their credit cards? yIdentify likely responders to sales promotions zFraud detection yWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? zCustomer relationship management : yWhich of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? : Data Mining helps extract such information

Data Mining Step in Detail 2.1 Data preprocessing yData selection: Identify target datasets and relevant fields yData cleaning xRemove noise and outliers xData transformation xCreate common units xGenerate new fields 2.2 Data mining model construction 2.3 Model evaluation

Preprocessing and Mining Original Data Target Data Preprocessed Data Patterns Knowledge Data Integration and Selection Preprocessing Model Construction Interpretation

Applications zBanking: loan/credit card approval ypredict good customers based on old customers zCustomer relationship management: yidentify those who are likely to leave for a competitor. zTargeted marketing: yidentify likely responders to promotions zFraud detection: telecommunications, financial transactions yfrom an online stream of event identify fraudulent events zManufacturing and production: yautomatically adjust knobs when process parameter changes

Applications zMedicine: disease outcome, effectiveness of treatments yanalyze patient disease history: find relationship between diseases zMolecular/Pharmaceutical: identify new drugs zScientific data analysis: yidentify new galaxies by searching for sub clusters zWeb site/store design and promotion: yfind affinity of visitor to pages and modify layout

Application Areas IndustryApplication FinanceCredit Card Analysis InsuranceClaims, Fraud Analysis TelecommunicationCall record analysis TransportLogistics management Consumer goodspromotion analysis Data Service providersValue added data UtilitiesPower usage analysis

Relationship of Data Mining with other fields zOverlaps with machine learning, statistics, artificial intelligence, databases, visualization but more stress on yscalability of number of features and instances ystress on algorithms and architectures whereas foundations of methods and formulations provided by statistics and machine learning. yautomation for handling large, heterogeneous data

Data Mining in Use zThe US Government uses Data Mining to track fraud zA Supermarket becomes an information broker zBasketball teams use it to track game strategy zCross Selling zTarget Marketing zHolding on to Good Customers zWeeding out Bad Customers