Data Mining and Data Visualization

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
© 2003, Prentice-Hall Chapter Chapter 3: Data Mining and Data Visualization Modern Data Warehousing, Mining, and Visualization: Core Concepts by.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Chapter 9 Business Intelligence Systems
1 Chapter 12: Decision-Support Systems for Supply Chain Management CASE: Supply Chain Management Smooths Production Flow Prepared by Hoon Lee Date on 14.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Chapter 14 The Second Component: The Database.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
CS2032 DATA WAREHOUSING AND DATA MINING
Knowledge Discovery Centre: CityU-SAS Partnership 1 Speakers: Prof Y V Hui, CityU Dr H P Lo, CityU Dr Sammy Yuen, CityU Dr K W Cheng, SAS Institute Mr.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Data Mining Techniques
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
SharePoint 2010 Business Intelligence Module 6: Analysis Services.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Enabling Organization-Decision Making
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
OnLine Analytical Processing (OLAP)
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
1 Topics about Data Warehouses What is a data warehouse? How does a data warehouse differ from a transaction processing database? What are the characteristics.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Information systems and management in business Chapter 8 Business Intelligence (BI)
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Copyright 2004 John Wiley & Sons, Inc Information Technology: Strategic Decision Making For Managers Henry C. Lucas Jr. John Wiley & Sons, Inc Dinesh.
Business Intelligence Transparencies 1. ©Pearson Education 2009 Objectives What business intelligence (BI) represents. The technologies associated with.
MIS2502: Data Analytics Advanced Analytics - Introduction.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Mining Copyright KEYSOFT Solutions.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Introduction to Business Analytics
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Managing Data Resources File Organization and databases for business information systems.
01-Business intelligence
Popular Database Management Systems
Data Mining.
Data Mining – Intro.
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
MANAGING DATA RESOURCES
Data Warehousing and Data Mining
Supporting End-User Access
Chapter 17 Designing Databases
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Data Mining and Data Visualization SOM 485 Fall 2007

Getting Started What is Data Mining? Online Analytical Processing Data Mining Techniques Market Basket Analysis Limitations and Challenges to Data Mining Data Visualization Siftware Technologies

What is Data Mining (DM)? Group of activities used to find different patterns in data Information provided through a Data Warehouse Provides valuable information for different types of research. -Set of activities used to find new, hidden, or unexpected patterns in data -A data warehouse is main source where all data is stored. Example: database -Research may be used for marketing or Customer Relationship Management

Applications of DM Customer Relationship Management (CRM) software is an application that can benefit DM Activities of CRM One-to-One Marketing Sales Force Automation Sales Campaign Management Marketing Encyclopedia Call Center Automation Information found in Concepts in Enterprise Resource Planning by Brady, Monk, and Wagner

Verification of DM Requires a lot of prior knowledge on the decision maker’s part Used mainly in casinos i.e. Can determine if a new customer is a high roller, a souvenir buyer, a ticket purchaser, etc. Uses Siftware to help discover new patterns of customer spending habits Allows effective targeting to a specific group of customers -Requires a great deal of a priori knowledge on the part of the decision maker in order to verify a suspected relationship through the query. -The ability to categorize a new customer through their database has proven highly profitable. -Siftware: software specifically designed to find new and previously unclassified patterns in data.

Online Analytical Processing Online Analytical Processing (OLAP) was introduced by E. F. Codd in 1993 OLAP: computer process that allows a user to extract data from different view points Scientific and Academic organizations store about 1 terabyte (1 trillion bytes) of new data each day. -Proposed that standard relational database used for transaction processing has reached its limit -Example: a user can request data to be analyzed to display a spreadsheet showing all of a company's beach ball products sold in Florida in the month of July, compare revenue figures with those for the same products in September, and then see a comparison of other product sales in Florida in the same time period -Source came from a 2000 report from GTE research center.

OLAP continue… Codd’s 12 Rules for OLAP Multidimensional View Transparent to the User Accessible Consistent Reporting Client-Server architecture Generic Dimensionality Dynamic Sparse Matrix Handling Multi-user Support Cross-Dimensional Operations Intuitive Data Manipulation Flexible Reporting Infinite Levels of Dimension and Aggregation -Codd developed 12 rules -To this day, not one implementation exist where all 12 rules are strictly obeyed. It may even be impossible

OLAP: MOLAP & ROLAP OLAP data is stored in a Multidimensional Database (MBD) MOLAP: OLAP application that accesses data from a multidimensional database MBD are frequently created using input from an existing Relational Database ROLAP: Relational Database server that can work with SQL for portability and scalability. - -MOLAP is a 3 dimensional database whereas ROLAP is 2 dimensional Information found on www.searchdatabase.com

DATA MINING TECHNIQUES The popularity of data mining is growing at an astounding rate, and the new and innovative techniques to mine the warehouse are emerging at an unprecedented rate. Data mining techniques are sophisticated statistical and modeling software.

FOUR MAJOR CATEGORIES Classification Association Sequence Cluster What are the techniques used to mine the data Data mining methods may be classified by the function they perform or by their class of application

CLASSIFICATION Mining processes intended to discover rules that define whether an item belongs to a particular class of data Two Sub-processes: 1) Building a Model 2) Predicting Classifications Suppose we want to look for undetermined buying patterns in a customer. A classification model can be constructed that maps the various customer attributes such as their age, gender, income with various product purchases like automobiles, clothing, books. From there given a set of predicting attributes, the model can be used against a list of customers to determine those most likely to make a particular purchase.

ASSOCIATION Techniques that employ association search all details from operational systems for patterns with a high probability of repetition Example: Market Basket Analysis Using a linkage approach, a retailer can mine data generated by a point-of-sale system, such as the price scanner at the grocery store. can find less obvious associations such as sixty-eight percent of the time that a customer buys beverages, he or she also buys pretzels. This type of information can be used to determine the location and content of promotional or end-of-aisle displays.

SEQUENCE Time series analysis methods relate events in time based on a series of preceding events Through analysis, various hidden trends, often highly predictive of future events, can be discovered. Example: Mail Industry An example of this application can be found in the direct mail industry, using a customers information, a catalog containing specific product types can be target mailed to a customer associated with a known sequence of purchases.

CLUSTER To create partitions so that all members of each set are similar according to some metric Simply a set of objects grouped together by virtue of their similarity or proximity to each other Example: Credit Card Transactions For instance, this approach might be used to mine credit card purchase data to discover that meals charged on a business-issued gold card are typically purchased on weekdays and have an average value greater than $250, whereas meals purchased using a personal platinum card occur mostly on weekends, have an average value of $175

DATA MINING TECHNOLOGIES Providing new answers to old questions Developing new knowledge and understanding through discovery Statistical Analysis – statistically evaluating products and making a decision based on logical reasoning Neural Networks – attempts to mirror the way the human brain works in recognizing patterns by developing mathematical structures with the ability to learn There are numerous techniques that are available to assist in mining the data

DATA MINING TECHNOLOGIES CONT’ Genetic Algorithms and Fuzzy Logic – machine learning techniques derive meaning from complicated and imprecise data and can extract patterns from and detect trends within the data that are far too complex to be noticed by humans Decision Trees – assists in data mining applications by the classification of items or events contained within the warehouse

NEW APPLICATIONS FOR DATA MINING Two new categories of applications 1) Text Mining – summarizes, navigates, and clusters documents contained in a database 2) Web Mining – integrates data and text mining within a Web site; enhances the Web site with intelligent behavior, such as suggesting related links or recommending new products to the consumer

Market Basket Analysis

Market Basket Analysis

Market Basket Analysis Market Basket Analysis is an algorithm that examines a long list of transactions in order to determine which items are most frequently purchased together. It takes its name from the idea of a person in a supermarket throwing all of their items into a shopping cart (a "market basket").

Market basket analysis one of the most common and useful types of data analysis for marketing. With the data gathered from MBA, marketers can group products that customers like and group them together. Market basket analysis can improve the effectiveness of marketing and sales tactics.

Benefits of Market Basket Analysis: A good indication of consumer behavior Increase in sales Improves customer satisfaction Tracks what types of products interest consumer and finds relative alternative ones to introduce to the consumer.

ASSOCIATION RULES for MBA Support Confidence Lift Method Association rules- are a common undirected data mining technique and complement market basket analysis. These rules are unidirectional Left-hand side rule IMPLIES Right-hand side rule ex. Pasta IMPLIES Wine, but Wine IMPLIES Pasta may not hold

40% of transactions that contain Pasta also contain Wine 40% of transactions that contain Pasta also contain Wine. 4% of transaction contain both of these items. Support- % measure of baskets where the association rule is true between the Left-hand side & the Right-hand side. ex. 4% of transactions contain both Confidence- Probability that the Right-hand side item is present once the Left-hand side item is present. ex. 40% of transactions that contain Pasta… p=.40 Lift- compares the likelihood of finding the right-hand side item in any random basket. Measures how well and associative rules performs by comparing how well an item can sell without the other item (improvement).

Method Frozen Pizza Milk Cola Potato Chips Pretzels 2 1 3

Market Basket Analysis Market Basket analysis- determines what products customers purchase together

Limits to Market Basket Analysis A large number of data is req. to obtain meaningful data, but data’s accuracy is compromised if all the products don’t occur w/in similar frequency. ex. Milk sells almost every transaction, but Elmer’s glue sells sporadically, its not effective to put them in same basket analysis. Sometimes presents results that are actually due to the success of previous market campaigns. ex. Discounted price of cola with purchase of pizza.

Using Data from MBA Once information has been gathered about different items and how they sell with respect to other items, a store may want to change their layout of items to improve their profits. ex. Lunchboxes and School Supplies For business without an actual storefront, they may want to offer promotions for products that sell together-increasing sales.

MARKET BASKET ANALYSIS In a Nutshell

Current Limitations and Challenges to Data Mining

Current Limitations & Challenges to Data Mining New and underdeveloped field Identification of missing information Most companies run legacy systems Not DW (data warehouse) friendly DW designers have to convert existing ODSs (operational data stores) to homogenous form of DW

Current Limitations & Challenges to Data Mining Not all knowledge about application domains are present in the data ODSs are normally limited to those needed by the operational application associated with that DB Data warehouse designers need to include mechanisms for “inventorying” data

Data noise & missing values Most operational databases contain data errors in their values and/or classification Errors lead to misclassification Future data mining systems must incorporate more sophisticated mechanisms for treating “noisy data” Bayesian technique – a statistical technique

Large Databases & high dimensionality Databases are large & dynamic Contents are always changing Data patterns must be constantly updated New discovery applications have to portion problems into smaller chunks of manageable data without losing any essential attributes of the data

Data Visualization Process by which numerical data are converted into meaningful 3-D images Example Intended to analyze complex data Data from: satellite photos, sonar measurements, surveys, or computer simulations

History of Data Visualization Originated from statistics and science Example of 2-D Advancement credited to NCSA National Center for Supercomputing Applications Newest developments by Xerox PARC in virtual reality

Human Visual Perception Human visual cortex dominates our perception Accelerates the identification of hidden patterns in data “A picture is worth a thousand words”

Geographical Information Systems (GIS) A special-purpose DB which common spatial coordinate system is primary means of reference Requires: Data input Data storage, retrieval, and query Data transformation, analysis, and modeling Data reporting Integrates info. and aids in decision making

GIS continued Spatial Data – elements stored in map form Contain three basic components: Points Lines Polygons Attribute Data – describes spatial data Example of GIS

Applications of Data Visualization Techniques Retail Banking Government Insurance Health Care and Medicine Telecommunications Transportation Capital Markets Asset Management

Siftware Technologies

Siftware Technologies IBM Informix Red Brick DB2 Oracle Silicon Graphics Sybase

Offers several Data Mining solutions, depending on users need. IBM Information Warehouse Solutions IBM Visualizer Red Brick

Informix Three-tier model Tier 1: “Client” presentation layer Tier 2: Hewlett-Packard hardware Tier 3: Data layer INFORMIX –OnLine database

Sybase Warehouse WORKS Assemble data from may sources Transform data for a consistent and understandable view Distribute data where needed Provide high-speed access to the data

Leading company for large-scale data mining Data spread across mutliple databases Data spread across processors for faster queries

Discover new patterns and trends that may not be realized using traditional SQL Three-dimensional Visualization Visual models can save days and even months from the review process

Review Data mining (DM) Techniques used to mine data Market Basket Analysis: The King of DM Algorithms

Review continued….. Current Limitations and Challenges to Data Mining Data Visualization Siftware Technologies