Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.

Slides:



Advertisements
Similar presentations
Brian Chase.  Retailers now have massive databases full of transactional history ◦ Simply transaction date and list of items  Is it possible to gain.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Ch2 Data Preprocessing part3 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2009.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Techniques Cluster Analysis Induction Neural Networks OLAP Data Visualization.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
ICS 421 Spring 2010 Data Mining 1 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/6/20101Lipyeow Lim.
Chapter 16 Parallel Data Mining 16.1From DB to DW to DM 16.2Data Mining: A Brief Overview 16.3Parallel Association Rules 16.4Parallel Sequential Patterns.
1 Data Warehousing. 2 Data Warehouse A data warehouse is a huge database that stores historical data Example: Store information about all sales of products.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Lecture14: Association Rules
Mining Association Rules
Mining Association Rules
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
CS 349: Market Basket Data Mining All about beer and diapers.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Data Mining Find information from data data ? information.
Data Warehousing Lecture-30 What can Data Mining do? Virtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research.
Association Rule Mining
DISCOVERING SPATIAL CO- LOCATION PATTERNS PRESENTED BY: REYHANEH JEDDI & SHICHAO YU (GROUP 21) CSCI 5707, PRINCIPLES OF DATABASE SYSTEMS, FALL 2013 CSCI.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining – Intro.
Data Mining ICCM
DATA MINING © Prentice Hall.
Sangeeta Devadiga CS 157B, Spring 2007
Transactional data Algorithm Applications
I don’t need a title slide for a lecture
Association Rule Mining
Presentation transcript:

Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association rules.

Knowledge Discovery in Databases What is it? Why do we need it? How does data mining fit in?

Steps of a KDD Process Learn the application domain. Create a target dataset. Data cleaning and preprocessing. Choose the type of data mining to perform. Pick an algorithm. Data mining! Interpretation.

Data Warehousing How does it differ from a database? –Databases provide support for: Queries over current data Persistent storage Atomic updates –Data warehouses provide support for: Storage of all data, current or not Details or summaries Metadata Integrated data (to reduce time in cleaning data)

Types of Data Mining Classification Rules Clustering Sequence similarity Sampling/summarizing Association Rules

Classification Rules Rules that partition data into separate groups. Used to classify people as good/bad credit risks, etc. A variation is the BestN problem; decide the best N of a set for a given problem (such as find the best N people to mail ski vacation packages to).

Clustering Goal is to put data tuples into a class. Map each data tuple to a point in n dimensional space, and identify clusters based on spatial proximity. Differs from classification rules because here the idea is to group based on similarity overall, not to find the ones that lead to a similar outcome.

Sequence Similarity Used where there are time series or ordered data. The goal is generally to give an example and look for “similar” patterns. Example: “When AT&T stock goes up on 2 consecutive days and DEC stock does not fall during this period, IBM stock goes up the next day 75% of the time.”

Sampling/Summarization Sampling: finding samples of the data to carry out more detailed analysis on. Goal is to get the best sample. Summarization: finding summaries of all of the data. Goal is to help people figure out what to do with their data, or to prepare reports.

Association Rules Rules that express when two or more items are found in the same “basket” of data. Used to try to find when certain members of the data cause other members: example, people who buy diapers tend to buy beer.

Support and Confidence Association rules are measured in terms of –support = a and b occur together in at lest s% of the n baskets –confidence = of all of the baskets containing a, at least c% also contain b.

Association Rules Algorithms Focuses on measuring support for “itemsets” (the number of transactions that contain the data set). Confidence is an easier problem and is figured out later. The naïve method: Start with all the itemsets of size 1. Find the “large” itemsets. Combine to find itemsets of size 2, etc. Apriori algorithm: Tests to make sure that not only does each step combine only large itemsets, but that every subset of the set is also a large itemset.

Apriori Algorithm In general, the set of large itemsets of size k is referred to as L k and C k is the candidate set of size k For each itemset in L k-1, if all of the items in the set are the same except the last item, then the two itemsets are combined and this is put into the set C k, which is the list of candidates for large itemsets of size k.

Check to see if for each itemset in C k all of the subsets are in L k-1. This allows you to discard some itemsets without having to check for them in the dataset Count the support of the remaining items in C k and remove those without enough support. This is now the set L k. Bump k and repeat steps 2 through 4 until there is only one set remaining in L k. When there is only one set in L k you have found all of the large itemsets.

Example Find all Itemsets with support of 2

Itemsets of size 1

C2C2

Finding C 3