Data Mining Adrian Tuhtan 004757481 CS157A Section1.

Slides:



Advertisements
Similar presentations
Data Mining Tri Nguyen. Agenda Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples.
Advertisements

DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining Knowledge Discovery in Databases Data 31.
Data Mining By Archana Ketkar.
Data Mining and Data Warehousing – a connected view.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data mining By Aung Oo.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Warehousing by Industry Chapter 4 e-Data. Retail Data warehousing’s early adopters Capturing data from their POS systems  POS = point-of-sale Industry.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Dr. Awad Khalil Computer Science Department AUC
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining By Jason Baltazar, Phil Cademas, Jillian Latham, Rachel Peeler & Kamila Singh.
Data Mining Techniques
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Data Mining Overview Professor P. Batchelor Furman University.
© Negnevitsky, Pearson Education, Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Other Topics 2: Warehousing,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Computing & Information Sciences Kansas State University Friday. 30 Nov 2007CIS 560: Database System Concepts Lecture 39 of 42 Friday, 30 November 2007.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Data Mining Database Systems Timothy Vu. 2 Mining Mining is the extraction of valuable minerals or other geological materials from the earth, usually.
DATA MINING By Cecilia Parng CS 157B.
Data Mining Brandon Leonardo CS157B (Spring 2006).
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
CHAPTER 8 DATA MINING BASICS.
Academic Year 2014 Spring Academic Year 2014 Spring.
Miloš Kotlar 2012/115 Single Layer Perceptron Linear Classifier.
Data Mining UNIT-III (BIA). What Is Data Mining? Data mining –Extraction of interesting (p reviously unknown and potentially useful) patterns or knowledge.
©2013 Cengage Learning. All Rights Reserved. Business Management, 13e Nature and Scope of Marketing Nature of Marketing Elements of.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Warehousing and Data Mining. Data Warehousing Data Mining Classification Association Rules Clustering.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Data Mining Functionalities
Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
Applications of Data Mining in Software Engineering
Adrian Tuhtan CS157A Section1
Data Analysis.
Data Science introduction.
Kenneth C. Laudon & Jane P. Laudon
CSE591: Data Mining by H. Liu
Presentation transcript:

Data Mining Adrian Tuhtan CS157A Section1

Overview Introduction Explanation of Data Mining Techniques Advantages Applications Privacy

Data Mining What is Data Mining? “The process of semi automatically analyzing large databases to find useful patterns” (Silberschatz) KDD – “Knowledge Discovery in Databases” (3) “Attempts to discover rules and patterns from data” Discover Rules  Make Predictions Areas of Use Internet – Discover needs of customers Economics – Predict stock prices Science – Predict environmental change Medicine – Match patients with similar problems  cure

Example of Data Mining Credit Card Company wants to discover information about clients from databases. Want to find: Clients who respond to promotions in “Junk Mail” Clients that are likely to change to another competitor Clients that are likely to not pay Services that clients use to try to promote services affiliated with the Credit Card Company Anything else that may help the Company provide/ promote services to help their clients and ultimately make more money.

Data Mining & Data Warehousing Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz) Collect data  Store in single repository Allows for easier query development as a single repository can be queried. Data Mining: Analyzing databases or Data Warehouses to discover patterns about the data to gain knowledge. Knowledge is power.

Discovery of Knowledge

Data Mining Techniques Classification Clustering Regression Association Rules

Classification Classification: Given a set of items that have several classes, and given the past instances (training instances) with their associated class, Classification is the process of predicting the class of a new item. Therefore to classify the new item and identify to which class it belongs Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications “Responds Rarely, Responds Sometimes, Responds Frequently”. The bank will then attempt to find rules about the customers that respond Frequently and Sometimes. The rules could be used to predict needs of potential customers.

Technique for Classification Decision-Tree Classifiers Job Income Job Income Carpenter Engineer Doctor BadGoodBadGood BadGood <30K<40K<50K>50K>90K >100K Predicting credit risk of a person with the jobs specified.

Clustering “Clustering algorithms find groups of items that are similar. … It divides a data set so that records with similar content are in the same group, and groups are as different as possible from each other. ” (2) Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased. The categories are unspecified and this is referred to as ‘unsupervised learning’

Clustering Group Data into Clusters Similar data is grouped in the same cluster Dissimilar data is grouped in the same cluster How is this achieved ? K-Nearest Neighbor A classification method that classifies a point by calculating the distances between the point and points in the training data set. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).(2) Hierarchical Group data into t-trees

Regression “Regression deals with the prediction of a value, rather than a class.” (1, P747) Example: Find out if there is a relationship between smoking patients and cancer related illness. Given values: X1, X2... Xn Objective predict variable Y One way is to predict coefficients a0, a1, a2 Y = a0 + a1X1 + a2X2 + … anXn Linear Regression

Regression Example graph: Line of Best Fit Curve Fitting

Association Rules “An association algorithm creates rules that describe how often events have occurred together.” (2) Example: When a customer buys a hammer, then 90% of the time they will buy nails.

Association Rules Support: “is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule”(1, p748) Example: People who buy hotdog buns also buy hotdog sausages in 99% of cases. = High Support People who buy hotdog buns buy hangers in 0.005% of cases. = Low support Situations where there is high support for the antecedent are worth careful attention E.g. Hotdog sausages should be placed in near hotdog buns in supermarkets if there is also high confidence.

Association Rules Confidence: “is a measure of how often the consequent is true when the antecedent is true.” (1, p748) Example: 90% of Hotdog bun purchases are accompanied by hotdog sausages. High confidence is meaningful as we can derive rules. Hotdog bun  Hotdog sausage 2 rules may have different confidence levels and have the same support. E.g. Hotdog sausage  Hotdog bun may have a much lower confidence than Hotdog bun  Hotdog sausage yet they both can have the same support.

Advantages of Data Mining Provides new knowledge from existing data Public databases Government sources Company Databases Old data can be used to develop new knowledge New knowledge can be used to improve services or products Improvements lead to: Bigger profits More efficient service

Uses of Data Mining Sales/ Marketing Diversify target market Identify clients needs to increase response rates Risk Assessment Identify Customers that pose high credit risk Fraud Detection Identify people misusing the system. E.g. People who have two Social Security Numbers Customer Care Identify customers likely to change providers Identify customer needs

Applications of Data Mining (4) Source IDC 1998

Privacy Concerns Effective Data Mining requires large sources of data To achieve a wide spectrum of data, link multiple data sources Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked: Shopping History Credit History Bank History Employment History The users life story can be painted from the collected data

References 1. Silberschatz, Korth, Sudarshan, “Database System Concepts”, 5 th Edition, Mc Graw Hill, “Two Crows, Data Mining Glossary” “Wikipedia” on.html on.html