Data Mining Knowledge Discovery in Databases Data 31.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Unit 7: Store and Retrieve it Database Management Systems (DBMS)
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Copyright 2007 John Wiley & Sons, Inc. Chapter 41 Data and Knowledge Management.
DATA, TEXT, AND WEB MINING
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Business Intelligence Business intelligence (BI) refers to all of the applications and technologies used to, provide access to, and information to efforts.
Data mining By Aung Oo.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Data Mining: A Closer Look
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Chapter 4 Data, Text, and Web Mining
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Chapter 5: Data Mining for Business Intelligence
CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Data Mining Functionalities
Data Mining.
Introduction BIM Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
Business Intelligence
Adrian Tuhtan CS157A Section1
Data Mining Concepts and Techniques
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Kenneth C. Laudon & Jane P. Laudon
Business Intelligence
Presentation transcript:

Data Mining Knowledge Discovery in Databases Data 31

Data Mining Data mining is a capability to support the recognition of previously unknown but potentially useful relationships within large databases/ data warehouses. Aim: find useful patterns in the data. Uses statistical, mathematical, artificial intelligence, and machine-learning techniques Data 32

Data Mining Tools Data mining tools use statistical or rules-based methods to identify patterns and create predictive models. Tools look for patterns using a variety of models – Statistical methods e.g. correlation – Decision trees – Case based reasoning – Neural computing – Intelligent agents – Genetic algorithms Data 33

Text Mining Text Mining – Analyse text documents. – Find Hidden content – Group by themes – Determine relationships between documents Data 34

Process of Data Mining/ Knowledge Discovery Data 35 Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

What does it let you do? Data mining automates the process of sifting through historical data in order to discover new information. Data Mining techniques enable users to identify patterns and correlations within a set of data These can then be used as predictive models that anticipate behaviour or events based on trends in the data. Data 36

Correlation versus Causation Correlation – A statistical relation between two or more variables such that changes in the value of one variable are accompanied by changes in the value of the other Causation – Changes in one variable cause changes in another. Data 37

What do you need for Data Mining? Massive data collection Powerful computers Data mining algorithms Data 38

Five Basic Operations Clustering – Identifies groups of items that share a particular characteristic Classification – infers the defining characteristics of a certain group Association – identifies relationships between events that occur at the one time Sequencing: – relationships over time Forecasting – estimates future values based on patterns within large sets of data Data 39

Clustering The process of identifying relationships between similar records without any preconceived notion of what that that similarity might involve. Examples: – Disease clusters, – Similarities in customers telephone usage Often used as an exploratory exercise before further data mining using a classification technique. Data 310

Classification DM system learns from examples of the data how to partition or classify the data i.e. it formulates classification rules which can be used for prediction. – Example : Bank classifies customers and may offer them differing levels of service, different offers, different charges. Can build loan approval models. Data 311

Association Looks for links between records in a data set – e.g. items purchased at the one time. Patterns can be identified to indicate probabilities e.g. 500,000 transactions 20,000 nappies 30,000 beer 10,000 nappies + beer – Beer and nappies occur together in 2% of transactions. – “when people buy beer they buy nappies 1/3 of the time” – “when people buy nappies they buy beer 50% of the time” Data 312

Sequential Analysis A form of association used to track relationships over time. – E.g. health insurance claims. – E.g. 10% of customers who bought a tent bought a backpack within one month. – Weather patterns e.g. tidal wave in Hawaii follows hurricane in N. Atlantic x% of the time. Data 313

Forecasting Concerns the prediction of continuous variables e.g. sales, share values, stock market levels, oil prices etc. Often done with regression functions statistical methods for examining the relationship between variables in order to predict a future value. 2 types – Forecasting single continuous value based on unordered examples. e.g. predict income based on personal details. – Predict one or more values based on a sequential pattern – time series forecasting. Data 314

Data Mining Tools in more detail Case-based Reasoning – Use historical cases to identify patterns. Neural Computing : – Examine historical data for pattern recognition e.g. identify potential customers for a new product. Intelligent agents – Retrieve information from large databases. Other tools e.g. decision trees, rule induction, data visualisation. Data 315

Some Key Applications Areas Data mining is used in many different areas Two big areas are: – Market analysis and management Initial Data Gathered From Credit card transactions, loyalty cards, discount coupons, customer complaint calls, lifestyle studies, focus groups – Fraud detection and management Data 316

Examples Market analysis and management Target marketing – Find clusters of “model” customers who share the same characteristics: e.g. interests, income Determine customer purchasing patterns over time Cross-market analysis uses associations/co-relations between product sales and predicts based on the association information Customer profiling: – What types of customers buy what products Identifying customer requirements- – Identifying the best products for different customers, use prediction to find what factors will attract new customers Data 317

Fraud detection and management Used in health care, retail, credit card services, telecommunications (phone card fraud), etc. Use historical data to build models of fraudulent behavior and use data mining to help identify similar instances Examples – auto insurance: detect a group of people who stage accidents to collect on insurance – money laundering: detect suspicious money transactions – medical insurance: detect professional patients and ring of doctors and ring of references Data 318

Text Mining -Application of data mining to unstructured or less structured files. -Text mining operates with less structured information and helps organisations to:- – Find hidden content of documents including useful relationships. – Relate documents across unnoticed divisions e.g. customers in 2 product division have the same characteristics. – Group documents by themes e.g. all customers who have similar complaints. Data 319

Some more example applications by area Marketing:- Predicting customers to respond to internet banners or buy a product. Segmenting customer demographics. Banking : forecasting bad loans and fraudulent credit card usage, credit card spending by new customers and which customers will respond bet to new loan offers. Retailing and Sales: Predicting sales, correct stock levels, distribution schedules Manufacturing and Production: predicting when to expect machinery failures, finding key factors that control the optimisation of manufacturing capacity. Data 320

Brokerage and Securities Trading:- Predicting when bond prices will change, forecasting range of stock fluctuation for particular issues, determining when to trade stock. Insurance: forecasting claim amounts, medical coverage costs, classifying the most important elements that affect medical coverage, predicting which customers will buy new policies. Computer Hardware and Software: Predicting drive failure, forecasting creation time for new chips, predicting potential security violations. Government and Defence: Forecasting cost of moving military equipment, testing strategies for potential military engagements, predicting resource consumption. Data 321

Airlines: Capturing data on what customers are flying and destination of those who change carriers midflight. Healthcare : correlating demographics of patients with critical illnesses. Broadcasting – programs best shown in prime time and how to maximize returns by inserting advertisements. Police: tracking crime patterns, locations, criminal behaviour and attributes to help crack criminal cases. Data 322

Problems with data mining Need clear business objectives and access to the appropriate data. Need the right data. – Bad data quality can lead to spurious results Models are not fail-safe. Privacy, property and other legal and ethical issues. Companies must change mode of operation and maintain the effort (e.g. loyalty programs such as air miles). Data 323

Conclusion Data Mining is an attractive sounding technology which is still evolving. The key is that the algorithms discover useful relationships. – Unlike standard research where researchers hypothesise correlations and then search for them. There are ethical issues: – E.g. Criminal profiling. Data 324