Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Data Mining, Frequent-Itemset Mining
ICS 421 Spring 2010 Data Mining 1 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/6/20101Lipyeow Lim.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Section 5 Data Mining.
Data Mining Adrian Tuhtan CS157A Section1.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.
Chapter Extension 12 Database Marketing.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
Database System Concepts - 6 th Edition20.1 Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification Association.
Data Mining An Introduction.
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Copyright: Silberschatz, Korth and Sudarshan 1 Data Mining.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Other Topics 2: Warehousing,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
Data Mining Brandon Leonardo CS157B (Spring 2006).
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
DATA MINING Using Association Rules by Andrew Williamson.
Academic Year 2014 Spring Academic Year 2014 Spring.
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
David M. Kroenke and David J. Auer Database Processing Fundamentals, Design, and Implementation Appendix J: Business Intelligence Systems.
Computing & Information Sciences Kansas State University Friday, 01 Dec 2006CIS 560: Database System Concepts Lecture 40 of 42 Friday, 01 December 2006.
Chapter 26: Data Mining Prepared by Assoc. Professor Bela Stantic.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Warehousing and Data Mining. Data Warehousing Data Mining Classification Association Rules Clustering.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Data Mining Functionalities
Data Mining.
Chapter 20: Data Analysis
Adrian Tuhtan CS157A Section1
Data Analysis.
Data Science introduction.
Chapter 20: Data Analysis
Presentation transcript:

Data Mining By Fu-Chun (Tracy) Juang

What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover rules and patterns from data. ► Similar to knowledge discovery (in artificial intelligence) or statistical analysis. ► => Knowledge discovery in database.

Type of Knowledge Discovered ► Classification ► Association Rules ► Clustering ► Others -- Sequential Pattern -- Pattern within Time Series -- Pattern within Time Series

Classification ► Deal with Prediction ► Work from an existing set of events to create hierarchy of classes. Use this classification hierarchy to predict which “ class ” a new item belong. Use this classification hierarchy to predict which “ class ” a new item belong.

Classification (cont.) ► Example: Credit-card company classified population into 4 range of credit worthiness (bad, average, good and excellent) based on payment history of the existing customers. Credit-card company classified population into 4 range of credit worthiness (bad, average, good and excellent) based on payment history of the existing customers. The company will find some rules between credit worthiness and other information about the customers, such as their educational history, age and salary. The company will find some rules between credit worthiness and other information about the customers, such as their educational history, age and salary. Use this classification rules to determine (predict) credit worthiness of a new applicant. Use this classification rules to determine (predict) credit worthiness of a new applicant.

Classification : Rules ► Some of the rules looks like: ∀ person P, P.degree = masters and P.income > 75,000 => P.credit = excellent P.income > 75,000 => P.credit = excellent ∀ person P, P.degree = bachelors or ( P.income ≥ 25,000 and P.income ≤75,000) ( P.income ≥ 25,000 and P.income ≤75,000) => P.credit = good => P.credit = good

Classification : Decision-Tree ► A popular technique for classification. ► Each leaf node of the tree represents a class ( e.g. good credit & bad credit) ► Each internal node has a function associate with it, to determine which child to go to for the new item. (e.g. married & salary range) (e.g. married & salary range) ► When trying to place a new item in a class, we traverse the decision-tree until we reach a leaf node.

Decision-Tree

Classification : Regression ► A special application of classification rules. ► Regression deals with the prediction of a value, rather than a class. ► e.g. If having a series of test results of a patient, use regression rule to predict the probability of survival of that patient.

Association Rules ► Retail shops are often interested in Associations between different items that people buy. ► X => Y, if a costumer buys X, he is likely to buy Y ► e.g. A female retail shopper buys a handbag, she is likely to buy shoes. association rule: Handbag => Shoes association rule: Handbag => Shoes ► e.g. A person who bought the book Database System Concept is likely to buy Operating System Concepts. association rule: DBS Concept => OS Concept association rule: DBS Concept => OS Concept

Association Rules : Support & Confidence ► Association Rules need to have degree of Support and Confidence. ► Data miners use Support and Confidence of the association rules to determine whether the particular association rule is significant.

Association Rule: Support ► Support is a measure of what fraction of the population satisfies both LHS and RHS of the rule. ► Which is how frequently a specific itemset (LHS + RHS) occurs in the database. (LHS + RHS) occurs in the database. ► If only 0.001% of all purchases in store include Milk and Screwdrivers, then the support of rule: milk => screwdriver is low. milk => screwdriver is low. ► If 50% purchases include Milk and Juice, the support of rule: milk => juice is high.

Association Rule: Confidence ► Confidence is a measure of how often the RHS (consequent) is true when the LHS (antecedent) is true ► e.g. the rule: bread => milk has a confidence of 80% if 80% of the purchases that include bread also include milk. has a confidence of 80% if 80% of the purchases that include bread also include milk. ► A rule with low confidence is not meaningful.

Clustering ► Clustering is to group similar points together in a single set. ► In business, groups of customers who has similar buying patterns. ► In medicine, groups of patients who shows similar reactions to prescribed drugs.

References ► A. Silberschatz, H.F. Korth, S. Sudershan: Database System Concepts, 5th ed., McGraw-Hill, 2006 ► R. Elmasri, S.B. Navathe: Fundamentals Of Database Systems, 4 th ed., Addison Wesley, 2003