DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006.

Slides:



Advertisements
Similar presentations
Chapter 1 Business Driven Technology
Advertisements

C6 Databases.
MICROSOFT OFFICE ACCESS 2007.
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Warehousing M R BRAHMAM.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Data Mining.
Chapter 15 Data Warehousing, OLAP, and Data Mining
Data Mining Adrian Tuhtan CS157A Section1.
Chapter 14 The Second Component: The Database.
INTRODUCTION TO OLAP MIS 497. Why OLAP? Online Analytical Processing vs. Online Transaction Processing Online Analytical Processing vs. Online Transaction.
Data Mining By: Thai Hoa Nguyen Pham. Data Mining  Define Data Mining  Classification  Association  Clustering.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Lecture-8/ T. Nouf Almujally
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
What is Business Intelligence? Business intelligence (BI) –Range of applications, practices, and technologies for the extraction, translation, integration,
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Data Mining Techniques
©Silberschatz, Korth and Sudarshan18.1Database System Concepts - 5 th Edition, Aug 26, 2005 Buzzword List OLTP – OnLine Transaction Processing (normalized,
5.1 © 2007 by Prentice Hall 5 Chapter Foundations of Business Intelligence: Databases and Information Management.
Systems analysis and design, 6th edition Dennis, wixom, and roth
The McGraw-Hill Companies, Inc Information Technology & Management Thompson Cats-Baril Chapter 3 Content Management.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Data Mining CS157B Fall 04 Professor Lee By Yanhua Xue.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Data Mining By Fu-Chun (Tracy) Juang. What is Data Mining? ► The process of analyzing LARGE databases to find useful patterns. ► Attempts to discover.
Database Design Part of the design process is deciding how data will be stored in the system –Conventional files (sequential, indexed,..) –Databases (database.
Lecturer: Gareth Jones. How does a relational database organise data? What are the principles of a database management system? What are the principal.
Data warehousing and online analytical processing- Ref Chap 4) By Asst Prof. Muhammad Amir Alam.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Other Topics 2: Warehousing,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Data Mining: Association Rule By: Thanh Truong. Association Rules In Association Rules, we look at the associations between different items to draw conclusions.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
ITGS Databases.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
DATABASES AND DATA WAREHOUSES
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Chapter 5 DATA WAREHOUSING Study Sections 5.2, 5.3, 5.5, Pages: & Snowflake schema.
DATA MINING By Cecilia Parng CS 157B.
Data Mining Brandon Leonardo CS157B (Spring 2006).
Foundations of Business Intelligence: Databases and Information Management.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
MIS2502: Data Analytics Advanced Analytics - Introduction.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
I am Xinyuan Niu I am here because I love to give presentations. Data Warehousing.
1 Management Information Systems M Agung Ali Fikri, SE. MM.
The Concepts of Business Intelligence Microsoft® Business Intelligence Solutions.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
INTRODUCTION TO INFORMATION SYSTEMS LECTURE 9: DATABASE FEATURES, FUNCTIONS AND ARCHITECTURES PART (2) أ/ غدير عاشور 1.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Chapter 20 Data Warehousing and Mining 1 st Semester, 2016 Sanghyun Park.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Managing Data Resources File Organization and databases for business information systems.
MIS2502: Data Analytics Advanced Analytics - Introduction
Adrian Tuhtan CS157A Section1
Data Analysis.
Data Warehousing Concepts
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

DATA MINING Prof. Sin-Min Lee Surya Bhagvat CS 157B – Spring 2006

Making sense out of data With the hard drives prices becoming inexpensive the amount of data stored in the databases by the corporations has increased dramatically. Just having the raw data in the database is of no use unless someone makes sense of the data. For example one could store a decade of customer data but for the data to become useful one needs to find the patterns in the data to identify the customer behavior. Would SQL solve the above problem?

Traditional SQL and Analytics Traditional SQL is useful in performing very large queries and one could argue saying that SQL is all but necessary in order to get the information. This argument holds good for small sets of data but when a query is performed against a huge database which stores about terabytes of data then the performance of SQL would go down. Also identifying patterns in the data is not always feasible with the traditional SQL querying. This is where the field of Analytics come into play

Analytics Analytics is basically identifying patterns of data in order to make better decisions. For example if you are maintaining a commercial ecommerce web site, then one thing which you want to know would be the visitors behavior patterns like from which search engine they came from, how they go on about searching for items in your web site and so on. Basically what we are trying to do here is identify the patterns of customer behavior which would be useful later on to target that particular customer with promotional offers.

Analytics (Continued….) Google recently came up with Google Analytics for free. The URL for this is site is Right now one needs to do sign up for their invitation and once they accept it all one needs to do is to include google analytics tracking code in your web site and then you can start monitoring the customer behavior.

Transactional Systems In transactional systems the information about day-to-day transactions is stored. For example retail stores like Safeway records each transaction that happens during the day at the time the purchase is made. Identifying patterns on transactional systems is relatively hard because the data stored in these systems usually run up to terabytes and if a SQL query is performed across such a huge database then it may bring the whole system down. So what’s the alternative?

Decision support Systems For decision making activities like to determine patterns or to run complex SQL’s a separate database or system is usually maintained and those systems are known as Decision Support systems. The high level data is pulled out from the transactional systems and then stored into these databases for performing analytics or data mining techniques. The downside to this is the data may not be real time. But a service could be written which runs in the background which updates the decision support systems at real time.

Decision support systems (contd…) Decision support systems can be classified into three kinds Statistical analysis, OLAP (On-line Analytical Processing) and Data warehouses. If detailed statistical analysis of data needs to be performed then SQL is very limited and one needs to go for commercial packages like SAS. Further information could be found at ?sgc=u ?sgc=u

Decision support systems (contd….) OLAP provides very fast access to data. The data from RDBMS is gathered and placed it into multidimensional cubes which are then made available to the users. Cognos powerplay is the best selling OLAP product. The link to this product is

Data warehousing The third kind of a decision support system is data warehouse. Data mining is usually performed on these data warehouses. The data in an enterprise is usually stored in various transactional systems or databases. For example some data might be stored in Oracle database, the other data might be stored in DB2 or Teradata or in some systems it may just be stored in text files or excel files. When one wants to combine all this data to look for patterns it becomes very difficult, so all this disparate data from various different sources are pulled together to form a data warehouse.

Data warehousing (Contd…) The steps involved in building a data warehouse includes: 1)Getting the raw data from different sources and storing it as is in a temporary staging area. Typically ETL tools are used for this process. 2) The data from the temporary staging area is then cleansed and various business rules are applied to load the data into the actual data warehouse tables.

Predictive analytics and Data Mining Data Mining is about finding the patterns in data and is essentially used for predicting customer behavior. For example Data Mining could be used to predict based on customer complaints whether that customer is going to go to another competitor. Applications of Data Mining are varied and is used in almost all applications from CRM to Earthquake predictions.

Predictive analytics and Data Mining Predictive analytics is based on predictor, a single value. Predictive analytics is extensively used in CRM applications. A predictor for a customer could be 'Recent purchase' made. For example if you are calling customers for promotions then based on this predictor one would call the most recent customer first followed by the customers who purchased items like a month ago.

Procedures in Data Mining The key procedures used in Data mining include : 1)Association rules 2)Classification 3)Clustering

Association rules Association rules have an associated population which consists of a set of instances. For example if one buys an iPod from Amazon.com then the association with this product would be the accessories that come with iPod and displayed by Amazon include Apple iPod Nano Armband Grey, Apple iPod Nano Dock and Apple iPod Nano Lanyard Headphones. Association rule measures are Support and Confidence

Association rules Support: Is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule. For example the support for iPod=>DVD player is percent, that means the support is very low. Confidence: Is a measure of how often the consequent is true when the antecedent is true. For example the rule iPod=>Apple iPod Nano Armband Grey would be say 80 percent

Support and Confidence examples

Classification The most popular way to classify the items is using Decision tree classifiers. In the example degree is masters and the person's income is 40K starting from the root, we follow the edge labeled 25K to 75K to reach a leaf. The class at the leaf is "good" so we predict that the credit risk of that person is good

Clustering Grouping similar data into clusters is what clustering is all about. The degree of association would be strong in the case of same cluster and weak between different clusters Clustering is based on the distance measures like Euclidian, probabilistic etc. K-means is one of the most famous clustering algorithm

Resources A.Silberschatz, H.F. Korth, S. Sudarshan Database System Concepts, 5th Ed., McGraw-Hill, c=u