Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.

Slides:



Advertisements
Similar presentations
PARTITIONAL CLUSTERING
Advertisements

Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining, Frequent-Itemset Mining
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Data Mining Adrian Tuhtan CS157A Section1.
Mining Association Rules
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Database System Concepts - 6 th Edition20.1 Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification Association.
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data mining: some basic ideas Francisco Moreno Excerpts from Fundamentals of DB Systems, Elmasri & Navathe and other sources.
Copyright: Silberschatz, Korth and Sudarshan 1 Data Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Other Topics 2: Warehousing,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining By Dave Maung.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
The Three Analytics Techniques. Decision Trees – Determining Probability.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling.
Association Rule Mining
DATA MINING By Cecilia Parng CS 157B.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Brandon Leonardo CS157B (Spring 2006).
Academic Year 2014 Spring Academic Year 2014 Spring.
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Computing & Information Sciences Kansas State University Friday, 01 Dec 2006CIS 560: Database System Concepts Lecture 40 of 42 Friday, 01 December 2006.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Sitecore. Compelling Web Experiences Page 1www.sitecore.net Patrick Schweizer Director of Sales Enablement 2013.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Nearest Neighbour and Clustering. Nearest Neighbour and clustering Clustering and nearest neighbour prediction technique was one of the oldest techniques.
Data Warehousing and Data Mining. Data Warehousing Data Mining Classification Association Rules Clustering.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
CLASS INHERITANCE TREE (CIT)
By Arijit Chatterjee Dr
Data Mining Motivation: “Necessity is the Mother of Invention”
Data Mining-Association Rule
Chapter 20: Data Analysis
I. Association Market Basket Analysis.
Waikato Environment for Knowledge Analysis
Adrian Tuhtan CS157A Section1
Exam #3 Review Zuyin (Alvin) Zheng.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Analysis.
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Chapter 20: Data Analysis
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Data Mining By: Thai Hoa Nguyen Pham

Data Mining  Define Data Mining  Classification  Association  Clustering

Define Data Mining  Also known as KDD (Knowledge-Discovery in Database).  Data mining is the semiautomatic process of analyzing data to find useful patterns.  Why semiautomatic? Manual preprocessing of data and postprocessing of data.

Examples of Data Mining  A simple example would be of a clothing retail store. A data mining system could be used to list the customers who often buy t-shirts during the Summer season.  Another example would be of the urban legend of how Walmart used data mining to find a correlation between customers buying beer and baby diapers. So they put the two aisles close together to increase profits.

Classification  If it is given that items in databases are put into classes, a problem arises when a new item wants to be added to the database.  The class for the new item is unknown, so other methods have to be used to find the right class for the item to be put in. Rules then come in to solve the problems.

Example of a rule P, P.degree = masters and P.income > 75,000 => P.credit = excellent P, P.degree = bachelors and P.income P.credit = bad

Decision Tree Classifiers  Widely used technique for classification.  Internal nodes either called functions or predicates  Leaf nodes are associated classes.

Example of Decision Tree Classifiers Functions  Classes  Root 

Example of Decision Tree Classifiers  Internal nodes or functions are inside the boxes—degree (root) and income.  Leaf nodes or associated classes are the four different circles—bad, average, good, excellent.

Association  An example of an association for beer and diapers would be: Beer => Diapers  As already mentioned, the above association just means that customers that buy beer often buy diapers, too.

Association Rules  Support—is a measure of what fraction of the population satisfies both the antecedent and the consequent. In other words, in the association below: milk => screwdrivers Higher percentage of the above association happening is worth more attention than lower percentage.

Association Rule 2  Confidence– The measure of how often the consequent is true when the antecedent is true. bread = > milk For example, if the association above had a confidence of 50 percent, it just means that 50 percent of the purchases include bread and milk, but it leaves room for other items purchased with the bread.

Clustering  Clustering refers to finding clusters of points in a given data and grouping them in different subsets.  Widely used clustering techniques— Hierarchical clustering, agglomerative clustering, and divisive clustering.

Types of Clustering  Hierarchical—clustering that deals with grouping things by importance.  Agglomerative—start by building small clusters, then progressively merge into larger clusters.  Decisive—begins with whole set and successively divides into smaller clusters.

Example of agglomerative hierarchical clustering An example of a agglomerative clustering, where we have separate elements of a set merging with each internal node until the last merge “abcdef” is achieved.

Other types of mining  Text Mining– data mining techniques to textual documents. An example would be how there is a tool to form clusters on pages that users have visited. So if a user supplies a site and defines that he/she wants a site containing the keyword “Japan”, a list of sites that used the keyword “Japan” the most will appear.  Data Visualization—helps users to examine large volumes of data, and to detect patterns visually. So instead of seeing problems through text, visual displays can use maps and charts to pinpoint where the problem is with some color coding scheme.

Example of Text Mining This example shows what happens when a user does a search for “Japan”. The points closer to the center of the circle has more information on Japan. We can think of the points as websites or research articles.

Example of Data-visualization We could say a number of things for this example. We could say the map depicts poverty levels or which state grows more apples.

References  Data mining. (2006, October 27). In Wikipedia, The Free Encyclopedia. Retrieved 05:59, October 30, 2006, from  Data clustering. (2006, October 29). In Wikipedia, The Free Encyclopedia. Retrieved 06:03, October 30, 2006, from  GISmatters ( ) Retrived on October 31, 2006, from  Martin, G., Spath, J. (2000) Kryptasthesie. Retrieved on October 31, 2006 from  Silberschaz, A., Korth, H., Sudarshan, S. (2002). Database System Concepts. New York: New York.