Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717.

Similar presentations


Presentation on theme: "Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717."— Presentation transcript:

1 Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717

2 Content oData, Information& Knowledge oWhat is data mining? oNeed of data mining oOn What Kind Of Data? oData Mining Vs Data Warehouse oKnowledge Discovery In Databases oData Mining Vs KDD oData Mining Tasks oApplications of Data Mining oData Mining Tools

3 Data, Information& Knowledge

4 What is data mining?  Data mining is a process that uses a variety of data analysis tools to discover patterns and relationships in data that may be used to make valid and accurate predictions.

5 Need of data mining  Data, Data everywhere … We are drowning in data but starving for knowledge !!

6 On What Kind Of Data? It is not specific to one type of the data. o Flat files oRelational databases oData warehouse o Multimedia Databases oSpatial Databases: oTime-Series Databases:

7 DATA MINING VS DATA WAREHOUSE  Data mining is the process of extracting meaningful data from that database Example : Credit Card  Data warehousing is the process of centralizing or aggregating data from multiple sources into one common repository Example : Facebook So, The data mining process relies on the data compiled in the data warehousing phase in order to detect meaningful patterns.

8 Knowledge Discovery In Databases  Data mining is actually one step of a larger process known as Knowledge Discovery in Databases.  The iterative process consists of the following steps:

9 DATA MINING VS KDD

10 Data MINING TASK Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-classes, called clusters

11 CLUSTERING ALGORITHM oK-means clustering algorithm oInput: a database D, of m records r1,…, rm and a desired number of clusters. k oOutput: set of k clusters Begin Randomly choose k records as the centroids for the k clusters’ Repeat Assign each record, ri, to a cluster such that the distance between ri and the cluster centroid (mean) is the smallest among the k clusters; Recalculate the centroid (mean) for each cluster based on the records assigned to the cluster; Until no change; End;

12 EXAMPLE

13  CLASSIFICATION AND PREDICTION Classification Predicts categorical class labels (discrete or nominal) " Use labels of the training data to classify new data Example : A marketing manager at a company needs to analyse a customer with a given profile, who will buy a new computer Prediction Models continuous-valued functions, i.e., predicts unknown or missing values Prediction is a short name for numeric prediction Example: A marketing manager would like to predict how much a given costumer will spend during a sale

14 CLASSIFICATION STEPS Step1: Model Construction Classification Classifier Algorithm Training data Age Income Class 27 28K Budget-Spender 35 36K Big-Spender 38 28K Budget-Spender 65 45K Budget-Spender 20 18k Budget-Spender 75 40k Budget-Spender 28 50k Big-Spender 40 60k Big-Spender 60 65k Big-Spender If age <30 & income <30k Then, Budget-Spender If age 30k Then, Big-Spender If 30 30k Then Big-spender If 30<age<60 and income <30k Then Budget-spender If age >60, Then Budget-spender

15 Step2 : Model Usage 1 - Test the classifier Age Income Class label 27 28K Budget-Spenders 25 36K Big-Spenders 70 45K Budget-Spenders 40 35k Big-Spender 2 - If acceptable accuracy Unlabelled data Age Income 18 28K 37 40K 60 45K 40 36k Test Classifier Accuracy Classified data Classifier Age Income Class label 18 28K Budget-Spenders 37 40K Big-Spenders 60 45K Budget-Spenders 40 36k Budget-Spenders

16  Association : Association is the discovery of togetherness or connection of objects. Such kind of togetherness or connection is termed as association rule.  Summarization : It is the process of representing the collected data in an accurate and compact way without losing any information, it also involves getting a information from collected data. Example : Long distance calls of customer

17 Applications of Data Mining Data mining is widely used in diverse areas. There are a number of commercial data mining system available today and yet there are many challenges in this field. The list of areas where data mining is widely used : oRetail Industry oTelecommunication Industry oBiomedical and DNA data analysis oFinancial Data Analysis

18 DATA MINING TOOLS Oracle Data Miner http://www.oracle.com/technology/products/bi/od m/odminer.html Data To Knowledge http://alg.ncsa.uiuc.edu/do/tools/d2k SAS http://www.sas.com/ Clementine http://spss.com/clemetine/ Intelligent Miner http://www-306.ibm.com/software/data/iminer/

19


Download ppt "Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717."

Similar presentations


Ads by Google