Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining.

Similar presentations


Presentation on theme: "Data Mining."— Presentation transcript:

1 Data Mining

2 What Is Data Mining? Data mining is the principle of extracting the information from large amounts of data. In other words… Data mining (knowledge discovery from data) Extraction of interesting patterns or knowledge from huge amount of data Other names Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis etc.

3 Data Mining Data Mining is principle of extracting the information from the large amount of data. In other words, we can say that data mining is mining the knowledge from data.

4 Need of Data Mining There is huge amount of data available in Information Industry.  Analysing this huge amount of data and extracting useful information from it is necessary. In other words, In field of Information technology, we have huge amount of data available that need to be turned into useful information.

5 Knowledge Discovery/Data Mining Process
Here is the list of steps involved in knowledge discovery process: Data Cleaning - In this step the noise and inconsistent data is removed. Data Integration - In this step multiple data sources are combined.( It merges the data from multiple heterogeneous data sources into a coherent data store.) Data Selection - In this step relevant to the analysis task are retrieved from the database.

6 Data Transformation - In this step data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations. Data Mining - In this step intelligent methods are applied in order to extract data patterns. Pattern Evaluation - In this step, data patterns are evaluated. Knowledge Presentation - In this step, knowledge is represented

7 Knowledge Discovery (KDD) Process
Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases

8

9 Introduction Data mining is the process of analyzing large databases to find useful patterns (data or info.)

10 Data Mining Tasks(Techniques)
Data Mining deals with what kind of data to be mined. There are two kind of functions involved in Data Mining, that are : Descriptive Classification and Prediction

11 Data Mining Models and Tasks

12 Descriptive The descriptive function deals with general properties of data in the database. Here is the list of descriptive functions: Class/Concept Description Mining of Frequent Patterns Mining of Associations Mining of Correlations Mining of Clusters

13 Class/Concepts Description
Class/Concepts refers the data to be associated with classes or concepts.  For example, in a company classes of items for sale include computer and printers, and concepts of customers include big spenders and budget spenders. Such descriptions of a class or a concept are called class/concept descriptions.   

14 Ways of Class/Concepts Description
Characterization: provides a concise and succinct summarization of the given collection of data Example: The characteristics of customers who spend more than $1000 a year at All Electronics. The result can be a general profile such as age, employment status or credit ratings.

15 Ways of Class/Concepts Description
Characterization: provides a concise and succinct summarization of the given collection of data Comparison: provides descriptions comparing two or more collections of data. Example: The user may like to compare the general features of software products whose sales increased by 10% in the last year with those whose sales decreased by about 30% in the same duration.

16 Mining of Frequent Patterns
As the name suggests patterns that occur frequently in data. It describes the specific pattern within the data.

17 Mining of Association/ co-relations
Association Analysis: from marketing perspective, determining which items are frequently purchased together within the same transaction. Example: An example is mined from the (some store) All Electronic transactional database. buys (X, “Computers”)  buys (X, “software”) [Support = 1%, confidence = 90% ] X represents customer confidence = 90% , if a customer buys a computer there is a 50% chance that he/she will buy software as well. Support = 1%, means that 1% of all the transactions under analysis showed that computer and software were purchased together.

18 Cont..  Confidence indicates the number of times the if/then statements have been found to be true. Support is an indication of how frequently the items appear in the database. 

19 Association rules Association is a data mining function that discovers the probability of the co- occurrence of items in a collection. The relationships between co-occurring items are expressed as association rules or co- relations. Note: In data mining, association rules are useful for analysing and predicting customer behavior.

20 Mining of Clusters Cluster refers to a group of similar kind of objects.  Cluster analysis refers to forming group of objects that are very similar to each other but are highly different from the objects in other clusters. The goal is to place records into groups where the records in a group are highly similar to each other and dissimilar to records in other groups.

21 Cont.. Example: A bank may cluster its customers into several groups based on the similarities of their age, income, residence, etc. and the common characteristics of the customers in a group can be used to describe that group of customers. the clusters will help the bank to understand its costumers better and thus provide more suitable products and customized services.

22 Cluster Analysis

23 Predictive functions:
Classification Regression Outlier Analysis (Deviation Detection) Evolution Analysis

24 Classification Classification is the process of learning a model that is able to describe different classes of data. Classification model can be represented in various forms such as A decision tree

25 Tree Structure:

26 Cont.. Customer renting property> 2 years???
Customer age> 25 years??? Rent property Rent property Buy property

27

28 Age? Income?? Class C Class B Class A

29 Regression (Prediction)
Regression is a data mining function that predicts a number. Age, weight, distance, temperature, income, or sales could all be predicted using regression techniques. For example, a regression model could be used to predict children's height, given their age, weight, and other factors.

30 Cont.. Regression modeling has many applications in business planning,
financial forecasting, time series prediction, environmental modeling

31 Outlier Analysis  Outlier Analysis - The Outliers may be defined as the data objects that do not comply with general behaviour or model of the data available. Such data objects, which are grossly different from or inconsistent with the remaining set of data, are called outliers. The outliers may be of particular interest, such as in the case of fraud detection, where outliers may indicate fraudulent activity. Thus, outlier detection and analysis is an interesting data mining task, referred to as outlier mining or outlier analysis.

32 Deviation detection Discovering the most significant changes in data
A data object that deviates significantly from the normal objects as if it were generated by a different mechanism Deviation detection are different from the noise data Noise is random error or variance in a measured variable Noise should be removed before outlier detection Applications: Credit card fraud detection

33 Data visualization Data visualization: using graphical methods to show patterns in data. Data visualization systems help users examine large volumes of data and detect patterns visually Can visually encode large amounts of information on a single screen

34 Evolution Analysis  Evolution Analysis - Evolution Analysis refers to description and model regularities or trends for objects whose behaviour changes over time. Example Stock market predictions: future stock prices

35 Cont.. Example: Time-series data. If the stock market data (time-series) of the last several years available from the New York Stock exchange and one would like to invest in shares of high tech industrial companies. A data mining study of stock exchange data may identify stock evolution regularities for overall stocks and for the stocks of particular companies. Such regularities may help predict future trends in stock market prices, contributing to one’s decision making regarding stock investments.

36 Ex: Time Series Analysis
Example: Stock Market Predict future values Determine similar patterns over time Classify behavior © Prentice Hall

37 Applications This information further can be used for various applications such as market analysis, fraud detection, customer retention, production control, science exploration etc.

38 Market Analysis and Management
Customer Profiling - Data Mining helps to determine what kind of people buy what kind of products. Identifying Customer Requirements - Data Mining helps in identifying the best products for different customers. It uses prediction to find the factors that may attract new customers. Cross Market Analysis - Data Mining performs Association/correlations between product sales.

39 Target Marketing - Data Mining helps to find clusters of model customers who share the same characteristics such as interest, spending habits, income etc. Determining Customer purchasing pattern - Data mining helps in determining customer purchasing pattern. Providing Summary Information - Data Mining provide us various multidimensional summary report

40 Fraud Detection Data Mining is also used in fields of credit card services and telecommunication to detect fraud. In fraud telephone call it helps to find destination of call, duration of call, time of day or week. 

41 Corporate Analysis & Risk Management
Finance Planning and Asset Evaluation - It involves cash flow analysis and prediction, contingent claim analysis to evaluate assets. Resource Planning - Resource Planning It involves summarizing and comparing the resources and spending. Competition - It involves monitoring competitors and market directions.

42 ADVANTAGES OF DATA MINING
Marketing/Retailing: Data mining can aid direct marketers by providing them with useful and accurate trends about their customers’ purchasing behavior. Banking/Crediting: Data mining can assist financial institutions in areas such as credit reporting and loan information.     Researchers: Data mining can assist researchers by speeding up their data analyzing process; thus, allowing them more time to work on other projects.

43 DISADVANTAGES OF DATA MINING
Security issues: Although companies have a lot of personal information about us available online, they do not have sufficient security systems in place to protect that information.  Misuse of information: Some of the company will answer your phone based on your purchase history. If you have spent a lot of money or buying a lot of product from one company, your call will be answered really soon. So you should not think that your call is really being answer in the order in which it was receive.

44 Thanks……


Download ppt "Data Mining."

Similar presentations


Ads by Google