Data Mining Brandon Leonardo CS157B (Spring 2006).

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Decision Tree Approach in Data Mining
Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Chapter 18: Data Analysis and Mining Kat Powell. Chapter 18: Data Analysis and Mining ➔ Decision Support Systems ➔ Data Analysis and OLAP ➔ Data Warehousing.
Chapter 9 Business Intelligence Systems
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Chapter Extension 14 Database Marketing © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Classical Techniques: Statistics, Neighborhoods, and Clustering.
Data Mining By Archana Ketkar.
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Data Mining Adrian Tuhtan CS157A Section1.
Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Copyright © 2013 Pearson Education, Inc. publishing as Prentice Hall12-1.
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Basic Data Mining Techniques
Database System Concepts - 6 th Edition20.1 Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification Association.
1 An Introduction to Data Mining Hosein Rostani Alireza Zohdi Report 1 for “advance data base” course Supervisor: Dr. Masoud Rahgozar December 2007.
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Copyright: Silberschatz, Korth and Sudarshan 1 Data Mining.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Final Exam Review. The following is a list of items that you should review in preparation for the exam. Note that not every item in the following slides.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Other Topics 2: Warehousing,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 20: Data Analysis.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Computing & Information Sciences Kansas State University Friday, 01 Dec 2006CIS 560: Database System Concepts Lecture 40 of 42 Friday, 01 December 2006.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Data Warehousing and Data Mining. Data Warehousing Data Mining Classification Association Rules Clustering.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining Functionalities
Data Mining.
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
Chapter 20: Data Analysis
Adrian Tuhtan CS157A Section1
Data Analysis.
MIS2502: Data Analytics Classification using Decision Trees
Chapter 20: Data Analysis
Presentation transcript:

Data Mining Brandon Leonardo CS157B (Spring 2006)

What Is Data Mining? A way to discover knowledge “Semiautomatically analyzing large databases to find useful patterns” Notable Characteristics Large amounts of data Data Stored on Disk

What Are We Looking For? Rules Use sets of rules to predict/classify objects Ex. “Students with annual income less than $20,000 year are most likely to get a student loan” Patterns Different kinds of patterns Multiple patterns in one data set

What Can Data Mining Do? Applications Prediction What class the data will belong in or what the value will be based on attributes What kind of animal will this be, considering that it has stripes, 4 legs, and talks? What customers are likely to switch to a competitor?

What Can Data Mining Do? Applications Association Data that goes together in a class Amazon – books that are bought together Causality Whether riding a motorcycle increases your chances of dying in an accident Descriptive patterns Clusters

Classification Taking a new item (training instance) and, given past instances, figure out which class the new item belongs in How? Rules Decision Trees Bayesian Classifiers

Rule Classifiers Break down what classes some data belongs in based on rules Ex. If a new customer signs up for a credit card, and makes less than $30,000 a year, then place them in a high risk category

Decision Tree Classifiers Traverse the tree based on attributes, making a decision at each node until a leaf is reached Ex. Being Hired At Google Degree School HiredNot Hired PhDBachelors Not StanfordStanford HiredNot Hired Not StanfordStanford

Bayesian Classifiers Bayesian Predict the probability of an item being in a class for every class The class with the largest probability “wins” P(cj|d) = p(d|cj)p(cj) / p(d) P(d|cj) – probability of generating instance d given class cj P(cj) – probability of getting class cj P(d) – probability of d occurring If a variable isn’t present, it isn’t included in probability

Regression Linear regression/Curve fitting Y = a0 + a1*X1 + a2*X2 + … + an * Xn You create the co-efficients a0, a1, a2, …, an Find the best fit Not always exact noise in data relationship isn’t polynomial

Association Rules Rules denoted by ‘=>’ Support What fraction of population has both the antecedent and consequent of the rule Confidence How often the consequent is true when the antecedent is true Ex. Owning car => Buying Gas Support – 99.9% Confidence – 99.9% Probably True

Association Rules Shortcomings Sometimes there are correlations that aren’t really caused by each other Ex. Haircuts and Grocery Shopping 99% of population gets haircuts 100% of population goes grocery shopping Everybody who gets a haircut goes grocery shopping, but does that mean that one correlates with the other Deviation from existing patterns Correlation (positive and negative)

Clustering Clusters of points in a data set Break the set down into subsets Types Hierarchical clustering Based on different levels, break things down as you go deeper Agglomerative clustering Start small, then create higher levels Divisive clustering Start big, then create lower levels

Other Types of Mining Text mining Mining text documents Data visualization Maps, charts, other graphical things Don’t analyze the data, just present it for users (humans are good at seeing patterns)

References Database System Concepts