2 What is Data Mining/KDD Data mining (knowledge discovery from data)Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data
3 What is Data MiningBy definition is the process of extracting previously unknown data from large databases and using it to make orgnisational decisions.Is concerned with the discovery of hidden knowledge.Usually works on large volumes of dataIs useful in making critical organisationnal decisions, particularly those of strategic nature
4 Data Mining Data Mining referred using a number of names: Data Fishing, Data Dredging (1960…):Used by statisticians (as bad name)Knowledge Discovery in Databases (1989…):Used by AI, Machine Learning CommunityBusiness Intelligence (1990…):Business management termAlso data archaeology, information harvesting, information discovery, knowledge extraction, data/pattern analysis, etc.
5 Data Mining: On What Kinds Of Data? Relational databaseData warehouseTransactional databaseAdvanced database and information repositoryObject-relational databaseSpatial and temporal dataTime-series dataStream dataMultimedia databaseText databases & WWW
6 Data Mining Functionalities Concept descriptionGeneralize, summarize, and contrast data characteristics, e.g., dry vs. wet regionsAssociation (correlation and causality)Nappies & BeerClassification and PredictionConstruct models that describe and distinguish classes or concepts for future predictionPredict some unknown or missing numerical values
7 Data Mining Functionalities Cluster analysisClass label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patternsOutlier analysisOutlier: a data object that does not comply with the general behavior of the dataNoise or exception? No! useful in fraud detection and rare event analysisOther pattern-directed or statistical analyses
8 Data Mining is Multidisciplinary StatisticsPatternRecognitionNeurocomputingMachineLearningAIData MiningDatabasesKDD
9 Why we Need Data Mining Data explosion problem Automated data collection tools and mature database technology lead to huge amounts of data accumulatedWe are drowning in data, but starving for knowledge!Solution: Data warehousing and data miningData warehousing and on-line analytical processingMining interesting knowledge (rules, regularities, patterns, constraints) from data in large databases
10 Potential Applications Data analysis and decision supportMarket analysis and managementRisk analysis and managementFraud detection and detection of unusual patternsOther applicationsText mining ( , documents) and Web miningStream data miningDNA and bio-data analysis
11 Stages of KDD Evaluation & Presentation Data Mining KnowledgeEvaluation & PresentationData MiningSelection & TransformationData WarehouseCleaning & IntegrationDatabases
12 Issues and Challenges of Data Mining Data mining methodologyMining different kinds of knowledge from diverse data types, e.g., bio, stream, WebPerformance: efficiency, effectiveness, and scalabilityPattern evaluation: the interestingness problemIncorporation of background knowledgeHandling noise and incomplete dataParallel, distributed and incremental mining methodsIntegration of the discovered knowledge with existing one: knowledge fusion
13 Issues and Challenges of Data Mining User interactionData mining query languages and ad-hoc miningExpression and visualization of resultant knowledgeInteractive mining of knowledge at multiple levels of abstractionApplications and social impactsDomain-specific data mining & invisible data miningProtection of data security, integrity, and privacy
14 Market Analysis And Management Where does the data come from?Credit card transactions, loyalty cards, discount coupons, customer complaint calls, etcTarget marketingFind clusters of “model” customers who share the same characteristicsDetermine customer purchasing patterns over timeCross-market analysisAssociations/co-relations between product sales, & prediction based on such association
15 Market Analysis And Management (cont…) Customer profilingWhat types of customers buy what products (clustering or classification)Customer requirement analysisIdentifying the best products for different customersPredict what factors will attract new customersProvision of summary informationMultidimensional summary reportsStatistical summary information (data central tendency and variation)
16 Corporate Analysis & Risk Management Finance planning and asset evaluationCash flow analysis and predictionContingent claim analysis to evaluate assetsCross-sectional and time series analysis (financial-ratio, trend analysis, etc.)Resource planningSummarize and compare the resources and spendingCompetitionMonitor competitors and market directionsGroup customers into classes and a class-based pricing procedureSet pricing strategy in a highly competitive market
17 Fraud Detection & Mining Unusual Patterns Applications: Health care, retail, credit card service, telecommunicationsAuto insurance: ring of collisionsMoney laundering: suspicious monetary transactionsMedical insuranceProfessional patients, ring of doctors, and ring of referencesUnnecessary or correlated screening testsTelecommunications: phone-call fraudPhone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected normRetail industryAnalysts estimate that 38% of retail shrink is due to dishonest employeesAnti-terrorismApproaches: Clustering, model construction, outlier analysis, etc.