Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Introduction to Data Mining. Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast.

Similar presentations


Presentation on theme: "An Introduction to Data Mining. Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast."— Presentation transcript:

1 An Introduction to Data Mining

2 Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast amount of data.  It is the process used to find new, hidden or unexpected patterns in data to predict the future of the business.  It is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data.

3 Data mining zProcess of semi-automatically analyzing large databases to find patterns that are: yvalid: hold on new data with some certainty ynovel: non-obvious to the system yuseful: should be possible to act on the item yunderstandable: humans should be able to interpret the pattern zAlso known as Knowledge Discovery in Databases (KDD)

4 zData Mining is an analytic process designed to explore data (usually large amounts of data - typically business or market related) in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the detected patterns to new subsets of data zThe ultimate goal of data mining is prediction - and predictive data mining is the most common type of data mining and one that has the most direct business applications. zThe process of data mining consists of three stages: y (1) the initial exploration, y(2) model building or pattern identification with validation/verification, and y(3) deployment (i.e., the application of the model to new data in order to generate predictions).

5 DATA MININING zData Mining refers to extracting or ‘Mining ‘ Knowledge from large amounts of data. zMining is is the characterization of process of extracting precious material from set of raw materials.

6 The KDD process zProblem formulation zData collection ysubset data: sampling might hurt if highly skewed data yfeature selection: principal component analysis, heuristic search zPre-processing: cleaning yname/address cleaning, different meanings (annual, yearly), duplicate removal, supplying missing values zTransformation: ymap complex objects e.g. time series data to features e.g. frequency zChoosing mining task and mining method: zResult evaluation and Visualization: Knowledge discovery is an iterative process

7 Knowledge Discovery Process Phases: 1.Data Selection 2.Data Integration 3.Data Cleaning 4.Enrichment 5.Data Transformation or encoding 6.Data Mining

8  Data selection, is about specific items or categories of items from stores in a specific region or area of the country may be selected.  Data integration is where multiple data sources are integrated.  The data cleaning process then may be correct invalid zip codes or eliminate records with incorrect phone prefixes.  Enrichment typically enhances the data with additional sources of information.  Data transformation and encoding may be done to reduce the amount of data.  Data mining techniques are used to mine different rules and patterns Knowledge Discovery Process

9 ____ __ __ Transformed Data Patterns and Rules Target Data Raw Dat a Knowledge Data Mining Transformation Interpretation & Evaluation Selection & Cleaning Integration Understanding Knowledge Discovery Process DATA Ware house Knowledge

10 Why Use Data Mining Today? Human analysis skills are inadequate: yVolume and dimensionality of the data yHigh data growth rate Availability of: yData yStorage yComputational power yOff-the-shelf software yExpertise

11 Why Data Mining zCredit ratings/targeted marketing : yGiven a database of 100,000 names, which persons are the least likely to default on their credit cards? yIdentify likely responders to sales promotions zFraud detection yWhich types of transactions are likely to be fraudulent, given the demographics and transactional history of a particular customer? zCustomer relationship management : yWhich of my customers are likely to be the most loyal, and which are most likely to leave for a competitor? : Data Mining helps extract such information

12 Data Mining Step in Detail 2.1 Data preprocessing yData selection: Identify target datasets and relevant fields yData cleaning xRemove noise and outliers xData transformation xCreate common units xGenerate new fields 2.2 Data mining model construction 2.3 Model evaluation

13 Preprocessing and Mining Original Data Target Data Preprocessed Data Patterns Knowledge Data Integration and Selection Preprocessing Model Construction Interpretation

14 Applications zBanking: loan/credit card approval ypredict good customers based on old customers zCustomer relationship management: yidentify those who are likely to leave for a competitor. zTargeted marketing: yidentify likely responders to promotions zFraud detection: telecommunications, financial transactions yfrom an online stream of event identify fraudulent events zManufacturing and production: yautomatically adjust knobs when process parameter changes

15 Applications zMedicine: disease outcome, effectiveness of treatments yanalyze patient disease history: find relationship between diseases zMolecular/Pharmaceutical: identify new drugs zScientific data analysis: yidentify new galaxies by searching for sub clusters zWeb site/store design and promotion: yfind affinity of visitor to pages and modify layout

16 Application Areas IndustryApplication FinanceCredit Card Analysis InsuranceClaims, Fraud Analysis TelecommunicationCall record analysis TransportLogistics management Consumer goodspromotion analysis Data Service providersValue added data UtilitiesPower usage analysis

17 Relationship of Data Mining with other fields zOverlaps with machine learning, statistics, artificial intelligence, databases, visualization but more stress on yscalability of number of features and instances ystress on algorithms and architectures whereas foundations of methods and formulations provided by statistics and machine learning. yautomation for handling large, heterogeneous data

18 Data Mining in Use zThe US Government uses Data Mining to track fraud zA Supermarket becomes an information broker zBasketball teams use it to track game strategy zCross Selling zTarget Marketing zHolding on to Good Customers zWeeding out Bad Customers


Download ppt "An Introduction to Data Mining. Definition  Data mining refers to the mining or discovery of new information in terms of patterns or rules from vast."

Similar presentations


Ads by Google