CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) 235-5932 This.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Introduction to KDD.
Overview of Data Mining & The Knowledge Discovery Process Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
1 Introduction and Review CS 636 – Adv. Data Mining.
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining: Concepts and Techniques
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Data Mining.
Business Intelligence
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining : Introduction Chapter 1. 2 Index 1. What is Data Mining? 2. Data Mining Functionalities 1. Characterization and Discrimination 2. MIning.
Lingma Acheson Department of Computer and Information Science, IUPUI
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Chapter 1. Introduction Motivation: Why data mining?
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Data Mining: Concepts and Techniques
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
Data Mining Techniques As Tools for Analysis of Customer Behavior Lecture 2:
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
2015年10月18日星期日 2015年10月18日星期日 2015年10月18日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
October 18, 2015 Data Mining: Concepts and Techniques 1 DATA MINING Motivation: Why data mining? What is data mining? Data Mining: On what kind of data?
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining: Concepts and Techniques. Overview 1.Introduction 2.Data Preprocessing 3.Data Warehouse and OLAP Technology: An Introduction 4.Advanced Data.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
January 17, 2016Data Mining: Concepts and Techniques 1 What Is Data Mining? Data mining (knowledge discovery from data) Extraction of interesting ( non-trivial,
Conclusions. Why Data Mining? -- Potential Applications Database analysis and decision support – Market analysis and management target marketing, customer.
Data Mining and Decision Support
February 13, 2016 Data Mining: Concepts and Techniques 1 1 Data Mining: Concepts and Techniques These slides have been adapted from Han, J., Kamber, M.,
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
Data Warehousing/Mining 1. 2 Chapter 1. Introduction v Motivation: Why data mining? v What is data mining? v Data Mining: On what kind of data? v Data.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
2016年6月12日星期日 2016年6月12日星期日 2016年6月12日星期日 Introduction to Data Mining 1 Chapter 1 Introduction to Data Mining Chen. Chun-Hsien Department of Information.
Lecture-2 Bscshelp.com.  Why Data Mining and What Kinds of Data Can Be Mined?  Potential Applications 2.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
July 7, 2016 Data Mining: Concepts and Techniques 1 1.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
There is an inherent meaning in everything. “Signs for people who can see.”
Data Mining Functionalities
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Data Mining.
Data warehouse & Data Mining: Concepts and Techniques
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 3 Introduction to Data Mining
Introduction to Data Mining
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining Concepts and Techniques
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
CSE591: Data Mining by H. Liu
Presentation transcript:

CS690L - Lecture 6 1 CS690L Data Mining and Knowledge Discovery Overview Yugi Lee STB #555 (816) This lecture was designed based on [Zaïane, 1999]

2 CS690L - Lecture 6 Data Rich and Information Poor Swamped by data that continuously pours on us. –Technology is available to help us collect data (e.g., Bar code, scanners, satellites, cameras, etc.) –Technology is available to help us store data (e.g., Databases, data warehouses, variety of repositorie, etc) Starving for knowledge (competitive edge, research, etc.) –We do not know what to do with this data –We need to interpret this data in search for new knowledge

3 CS690L - Lecture 6 Evolution of Database Technology 1950s: First computers, use of computers for census 1960s: Data collection, database creation (hierarchical and network models) 1970s: Relational data model, relational DBMS implementation. 1980s: Ubiquitous RDBMS, advanced data models (extendedrelational, OO, deductive, etc.) and application- oriented DBMS (spatial, scientific, engineering, etc.). 1990s: Data mining and data warehousing, massive media digitization, multimedia databases, and Web technology. 2000s: Web mining, Semi-structure data mining (XML) and Semantic data mining (RDF)

4 CS690L - Lecture 6 Knowledge Discovery Process of non trivial extraction of implicit, previously unknown and potentially useful information from large collections of data U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996

5 CS690L - Lecture 6 So What Is Data Mining? In theory, Data Mining is a step in the knowledge discovery process. It is the extraction of implicit information from a large dataset. In practice, data mining and knowledge discovery are becoming synonyms. There are other equivalent terms: KDD, knowledge extraction, discovery of regularities, patterns discovery, data archeology, data dredging, business intelligence, information harvesting…

6 CS690L - Lecture 6 Many Steps in KD Process Gathering the data together Cleanse the data and fit it in together Select the necessary data Crunch and squeeze the data to extract the essence of it Evaluate the output and use it

7 CS690L - Lecture 6 Steps of a KDD Process Learning the application domain (relevant prior knowledge and goals of application) Gathering and integrating of data Cleaning and preprocessing data (may take 60% of effort!) Reducing and projecting data (Find useful features, dimensionality/variable reduction,…) Choosing functions of data mining (summarization, classification, regression, association, clustering,…) Choosing the mining algorithm(s) Data mining: search for patterns of interest Evaluating results Interpretation: analysis of results. (visualization, alteration, removing redundant patterns, …) Use of discovered knowledge

8 CS690L - Lecture 6

9 Data Collected Business transactions Scientific data Medical and personal data Surveillance video and pictures Satellite sensing Games Digital media CAD and Software engineering Virtual worlds Text reports and memos The World Wide Web (The content of the Web, The structure of the Web, The usage of the Web) Multimedia and Spatial databases Time Series Data and Temporal Data

10 CS690L - Lecture 6 Data Mining: On What Kind of Data? Flat Files Heterogeneous and legacy databases Relational databases and other DB: Object- oriented and object-relational databases Transactional databases Transaction(TID, Timestamp, UID, {item1, item2,…}) Data Warehouses HTML, XML, RDF files

11 CS690L - Lecture 6 What Can Be Discovered? What can be discovered depends upon the data mining task employed. –Descriptive DM tasks: Describe general properties –Predictive DM tasks: Infer on available data

12 CS690L - Lecture 6 Data Mining Functionality Characterization: Summarization of general features of objects in a target class. (Concept description) Ex: Characterize grad students in Science Discrimination: Comparison of general features of objects between a target class and a contrasting class. (Concept comparison) Ex: Compare students in Science and students in Arts Association: Studies the frequency of items occurring together in transactional databases. Ex: buys(x, bread) Æ buys(x, milk). Prediction: Predicts some unknown or missing attribute values based on other information. Ex: Forecast the sale value for next week based on available data.

13 CS690L - Lecture 6 Data Mining Functionality Classification: Organizes data in given classes based on attribute values. (supervised classification) Ex: classify students based on final result. Clustering: Organizes data in classes based on attribute values. (unsupervised classification) Ex: group crime locations to find distribution patterns. Minimize inter- class similarity and maximize intra-class similarity Outlier analysis: Identifies and explains exceptions (surprises) Time-series analysis: Analyzes trends and deviations; regression, sequential pattern, similar sequences…

14 CS690L - Lecture 6 Is all that is Discovered Interesting? A data mining operation may generate thousands of patterns, not all of them are interesting. – Suggested approach: Human-centered, query-based, focused mining Data Mining results are sometimes so large that we may need to mine it too (Meta-Mining?) How to measure? Interestingness

15 CS690L - Lecture 6 Interestingness Objective vs. subjective interestingness measures: – Objective: based on statistics and structures of patterns, e.g., support, confidence, etc. –Subjective: based on user’s beliefs in the data, e.g., unexpectedness, novelty, etc. Interestingness measures: A pattern is interesting if it is –easily understood by humans –valid on new or test data with some degree of certainty. –potentially useful –novel, or validates some hypothesis that a user seeks to confirm

16 CS690L - Lecture 6 Can we Find All and Only the Interesting Patterns? Find all the interesting patterns: Completeness. –Can a data mining system find all the interesting patterns? Search for only interesting patterns: Optimization. –Can a data mining system find only the interesting patterns? –Approaches First find all the patterns and then filter out the uninteresting ones. Generate only the interesting patterns --- mining query optimization Like the concept of precision and recall in information retrieval

17 CS690L - Lecture 6 Data Mining: Classification Schemes Different views, different classifications: –Kinds of knowledge to be discovered Different mining approaches: Summarization, comparison, association, classification, clustering, etc Mining knowledge at different abstraction levels: primitive level, high level, multiple-level, etc. –Kinds of databases to be mined, and– Transaction data, multimedia data, text data, World Wide Web, etc. –Kinds of techniques adopted : Database-oriented, data warehouse (OLAP), machine learning, statistics, visualization, neural network, etc. –Kinds of Data model on which the data to be mined: Relational database, extended/object-relational database, object-oriented database, deductive database, data warehouse, flat files, etc.

18 CS690L - Lecture 6 Requirements/Challenges in Data Mining Security and social issues: –Social impact Private and sensitive data is gathered and mined without individual’s knowledge and/or consent. New implicit knowledge is disclosed (confidentiality, integrity) Appropriate use and distribution of discovered knowledge (sharing) –Regulations Need for privacy and DM policies User Interface Issues: –Data visualization. Understandability and interpretation of results Information representation and rendering Screen real-estate –Interactivity Manipulation of mined knowledge Focus and refine mining tasks Focus and refine mining results

19 CS690L - Lecture 6 Requirements/Challenges in Data Mining Mining methodology issues: –Mining different kinds of knowledge in databases. –Interactive mining of knowledge at multiple levels of abstraction. –Incorporation of background knowledge –Data mining query languages and ad-hoc data mining. –Expression and visualization of data mining results. –Handling noise and incomplete data –Pattern evaluation: the interestingness problem. Performance issues: – Efficiency and scalability of data mining algorithms. Linear algorithms are needed: no medium-order polynomial complexity, and certainly no exponential algorithms. Sampling – Parallel and distributed methods Incremental mining Can we divide and conquer?

20 CS690L - Lecture 6 Requirements/Challenges in Data Mining Data source issues: –Diversity of data types Handling complex types of data Mining information from heterogeneous databases and global information systems. Is it possible to expect a DM system to perform well on all kinds of data? (distinct algorithms for distinct data sources) –Data glut Are we collecting the right data with the right amount? Distinguish between the data that is important and the data that is not. Other issues –Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem.

21 CS690L - Lecture 6 Data Mining Should Not be Used Blindly! Data mining approaches find regularities from history, but history is not the same as the future. Context should be considered. –Location dependency –Time dependency –Target dependency –Task dependency –Constraints

22 CS690L - Lecture 6 References Osmar R. Zaïane, University of Alberta, Lecture on Principles of Knowledge Discovery in Databases es/ch1s.pdf