Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Overview of Data Mining and the KDD Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Data Mining: Introduction
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining: Concepts and Techniques
Data Mining Knowledge Discovery in Databases Data 31.
Dr. Tahar Kechadi Dr. Joe Carthy
Data Mining By Archana Ketkar.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
10 Data Mining. What is Data Mining? “Data Mining is the process of selecting, exploring and modeling large amounts of data to uncover previously unknown.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
2015年10月22日星期四 2015年10月22日星期四 2015年10月22日星期四 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Data Mining: Concepts and Techniques. Overview 1.Introduction 2.Data Preprocessing 3.Data Warehouse and OLAP Technology: An Introduction 4.Advanced Data.
1 Knowledge Discovery from DataBases (KDD) A.K.A. Data Mining & by other names as well Carlo Zaniolo UCLA CS Dept.
January 8, 2016Data Mining: Concepts and Techniques1 Data Mining: Trends and Applications.
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Academic Year 2014 Spring Academic Year 2014 Spring.
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
LECTURE 3-DATA MINING. WHY DATA MINING AND THE KDD PROCESS? 2.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
DATA MINING BY: PRADEEP AGRAWAL MBA (SEC – A) ALLIANCE UNIVERSITY – SCHOOL OF BUSINESS.
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
What is Pattern Recognition?
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Course Introduction CSC 576: Data Mining.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Presentation transcript:

Lecture-2 Bscshelp.com

 Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2

 Huge volumes of Data available: from terabytes to petabytes  Data collection and data availability  Automated data collection tools, database systems, Web, computerized society  Major sources of abundant data  Business: Web, e-commerce, transactions, stocks, …  Science: Remote sensing, bioinformatics, scientific simulation, …  Society and everyone: news, digital cameras, YouTube  Medical data, demographic data, financial data and marketing data  We are drowning in data, but starving for knowledge!  “Necessity is the mother of invention”—Data mining— Automated analysis of massive data sets

 Data analysis and decision support  Market analysis and management  Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation  Risk analysis and management  Forecasting, customer retention, quality control, competitive analysis  Fraud detection and detection of unusual patterns (outliers)  Other Applications  Text mining (news group, , documents) and Web mining  Stream data mining  Bioinformatics and bio-data analysis

 Where does the data come from?—Credit card transactions, loyalty cards, discount coupons, customer complaint calls, plus (public) lifestyle studies  Target marketing  Find clusters of “model” customers who share the same characteristics: interest, income level, spending habits, etc.  Determine customer purchasing patterns over time  Cross-market analysis—Find associations/co-relations between product sales, & predict based on such association  Customer profiling—What types of customers buy what products (clustering or classification)  Customer requirement analysis  Identify the best products for different groups of customers  Predict what factors will attract new customers  Provision of summary information  Multidimensional summary reports

 Finance planning and asset evaluation  cash flow analysis and prediction  Resource planning  summarize and compare the resources and spending  Competition  monitor competitors and market directions  group customers into classes and a class-based pricing procedure  set pricing strategy in a highly competitive market

 Approaches: Clustering & model construction for frauds, outlier analysis  Applications: Health care, retail, credit card service, telecomm.  Money laundering: suspicious monetary transactions  Telecommunications: phone-call fraud  Phone call model: destination of the call, duration, time of day or week. Analyze patterns that deviate from an expected norm  Retail industry  Analysts estimate that 38% of retail shrink is due to dishonest employees  Anti-terrorism

 Approaches: Clustering & Classification  Applications:  Automated diagnosis  Discovery of disease trends  Prediction of epidemics  Discovering causes for certain conditions  Patient data retrieval

 Data mining is a multidisciplinary field, borrowing from various areas including  Database technology,  machine learning,  statistics,  pattern recognition,  information retrieval,  neural networks,  knowledge-based systems,  artificial intelligence,  high-performance computing and data visualization.

Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization

 Database systems research focuses on the creation, maintenance, and use of databases for organizations and end-users.  Database systems are often well known for their high scalability in processing very large, relatively structured data sets.  Many data mining tasks need to handle large data sets or even real-time, fast streaming data.  So, data mining can make good use of scalable database technologies to achieve high efficiency and scalability on large data sets.  A data warehouse integrates data originating from multiple sources and various timeframes.

 Machine learning investigates how computers can learn (or improve their performance) based on data.  A main research area is for computer programs to automatically learn to recognize complex patterns and make intelligent decisions based on data.  Supervised learning, Unsupervised learning, Semi- supervised learning, Active learning are some classic problems in machine learning that are highly related to data mining.

Supervised learning:  A synonym for classification  The supervision in the learning comes from the labeled examples in the training data set  E.g. the postal code recognition problem

Unsupervised learning:  A synonym for clustering  The learning process is unsupervised since the input examples are not class labeled.  We may use clustering to discover classes within the data.  E.g. an unsupervised learning method can take, as input, a set of images of handwritten digits. Suppose that it finds 10 clusters of data. These clusters may correspond to the 10 distinct digits of 0 to 9, respectively. since the training data are not labeled, the learned model cannot tell us the semantic meaning of the clusters found.

Semi-supervised learning :  It is a class of machine learning techniques that make use of both labeled and unlabeled examples when learning a model  Labeled examples are used to learn class models and unlabeled examples are used to refine the boundaries between classes.  For a two-class problem, we can think of the set of examples belonging to one class as the positive examples and those belonging t o the other class as the negative examples.

Semi-supervised learning :

Active learning :  It lets users play an active role in the learning process.  An active learning approach can ask a user (e.g., a domain expert) to label an example, which may be from a set of unlabeled examples or synthesized by the learning program.  The goal is to optimize the model quality by actively acquiring knowledge from human users, given a constraint on how many examples they can be asked to label.

 Statistics studies the collection, analysis, interpretation or explanation, and presentation of data.  Data mining has an inherent connection with statistics.  A statistical model is a set of mathematical functions that describe the behavior of the objects in a target class in terms of random variables and their associated probability distributions.  Statistical models are widely used to model data and data classes.  For example, in data mining tasks like data characterization and classification, statistical models of target classes can be built.

 Information retrieval ( IR ) is the science of searching for documents or information in documents.  Documents can be text or multimedia, and may reside on the Web.  The differences between traditional information retrieval and database systems are twofold: Information retrieval assumes that 1. the data under search are unstructured; 2. and the queries are formed mainly by keywords, which do not have complex structures (unlike SQL queries in database systems).

 Increasingly large amounts of text and multimedia data have been accumulated and made available online due to the fast growth of the Web and applications such as digital libraries, digital governments, and health care information systems.  Their effective search and analysis have raised many challenging issues in data mining.  Therefore, text mining and multimedia data mining, integrated with information retrieval methods, have become increasingly important.

 Pattern recognition is the study of methods and algorithms for putting data objects into categories.  Pattern Recognition is an application of Machine Learning.  Pattern recognition systems are in many cases trained from labeled "training" data ( supervised learning ), but when no labeled data are available other algorithms can be used to discover previously unknown patterns ( unsupervised learning ).

 An artificial neural network (ANN), often just called a "neural network" (NN), is a mathematical model or computational model based on biological neural networks, in other words, is an emulation of biological neural system.  It consists of an interconnected group of artificial neurons and processes information using a connectionist approach to computation.  In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase.

 Data mining—core of knowledge discovery process Data Cleaning Data Integration Databases Data Warehouse Task-relevant Data Selection Data Mining Pattern Evaluation

 Learning the application domain  relevant prior knowledge and goals of application  Creating a target data set: data selection  Data cleaning and preprocessing: (may take 60% of effort!)  Data reduction and transformation  Find useful features, dimensionality/variable reduction  Choosing functions of data mining  summarization, classification, regression, association, clustering  Choosing the mining algorithm(s)  Data mining: search for patterns of interest  Pattern evaluation and knowledge presentation  visualization, transformation, removing redundant patterns, etc.  Use of discovered knowledge

 Data mining may generate thousands of patterns: Not all of them are interesting  Interestingness measures  A pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm  Objective vs. subjective interestingness measures  Objective (Data Driven): based on statistics and structures of patterns, e.g., support, confidence, etc.  Subjective (User Driven) : based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.

 What Kinds of Patterns Can Be Mined? June 21,