We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published byDarryl Oviatt
Modified about 1 year ago
Chapter 1 INTRODUCTION Cios / Pedrycz / Swiniarski / Kurgan
© 2007 Cios / Pedrycz / Swiniarski / Kurgan2 Outline What is Data Mining How does Data Mining differ from other approaches? -How to use this book?
© 2007 Cios / Pedrycz / Swiniarski / Kurgan3 Introduction The aim of data mining is 1) to make sense of 2) large amounts of 3) mostly unsupervised data, in some 4) domain
© 2007 Cios / Pedrycz / Swiniarski / Kurgan4 1) to make sense we envision that new knowledge should exhibit a series of essential attributes: be understandable valid novel useful Introduction
© 2007 Cios / Pedrycz / Swiniarski / Kurgan5 Introduction 1) to make sense the most important requirement is that the discovered new knowledge needs to be understandable to data owners who want to use it to some advantage
© 2007 Cios / Pedrycz / Swiniarski / Kurgan6 Introduction 1) to make sense a model that can be described in easy-to-understand terms, like production rules such as: IF abnormality (obstruction) in coronary arteries THEN coronary artery disease
© 2007 Cios / Pedrycz / Swiniarski / Kurgan7 Introduction 1) to make sense the second most important requirement is that the discovered new knowledge should be valid If all the generated rules were already known the rules would be considered trivial and of no interest (although the generation of the already-known rules validates the generated model)
© 2007 Cios / Pedrycz / Swiniarski / Kurgan8 Introduction 1) to make sense the third requirement is that the discovered new knowledge must be novel If the knowledge about how to diagnose a patient had been discovered not in terms of but, say, a neural network then this knowledge may or may not be acceptable, since a neural network is a “black box” model. A trained neural network might still be acceptable if it were proven to work well on hundreds of new cases.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan9 Introduction 1) to make sense the fourth requirement is that the discovered new knowledge must be useful Usefulness must hold true regardless of the type of model used.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan10 Introduction 2) large amounts DM is about analyzing large amounts of data that cannot be dealt with by analyzing them manually.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan11 Introduction 2) large amounts AT&T handles over 300 million calls daily to serve about 100 million customers and stores the information in a multiterabyte database Wal-Mart, in its stores handles about 21 million transactions a day, and stores the information in a database of about a dozen terabytes NASA generates several gigabytes of data per hour through its Earth Observing System
© 2007 Cios / Pedrycz / Swiniarski / Kurgan12 Introduction 2) large amounts Oil companies like Mobil Oil store hundreds of terabytes of data about different aspects of oil exploration The Sloan Digital Sky Survey project will collect observational data of about 40 terabytes Modern biology creates, in projects such as human genome/proteome, data measured in terabytes and petabytes Homeland Security is collecting petabytes of data on its own and other countries’ citizens.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan13 Introduction 3) mostly unsupervised data It is much easier, and far less expensive, to collect unsupervised data than supervised data. For supervised data we must have known inputs that correspond to known outputs, as determined by domain experts.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan14 Introduction 3) mostly unsupervised data What can we do if only unsupervised data are collected? Use algorithms that are able to find “natural” groupings/clusters or relationships/associations in the data. If clusters are found they can be possibly labeled by domain experts. If we are able to do it unsupervised data becomes supervised and the problem becomes much easier.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan15 Introduction 3) mostly unsupervised data What to do when the data are semisupervised (meaning that there are a few known training data pairs along with thousands of unsupervised data points)? Can these few data points help in the process of making sense of the entire data set? There exist techniques, called semi-supervised learning, which take advantage of these few training data points, for instance partially supervised clustering
© 2007 Cios / Pedrycz / Swiniarski / Kurgan16 Introduction 3) mostly unsupervised data A DM algorithm that works well on both small and large data is called scalable unfortunately, few are.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan17 Introduction 4) domain The success of DM projects depends heavily on access to domain knowledge. Discovering new knowledge from data is a highly interactive (with domain experts) and iterative (within knowledge discovery) process. We cannot take a successful DM system, built for some domain, and apply it to another domain and expect good results.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan18 Introduction This book is about making sense of data. Its ultimate goal is to provide readers with the fundamentals of frequently used DM methods and to guide readers in their DM projects: from understanding the problem and the data, through preprocessing the data, to building models of the data and validating these to putting the newly discovered knowledge to use.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan19 Introduction This web site is by far the best source of information about all aspects of DM.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan20 How does Data Mining Differ from Other Approaches? Data mining came into existence in response to technological advances in many disciplines: Computer Engineering contributed significantly to the development of more powerful computers in terms of both speed and memory; Computer Science and Mathematics continued to develop more and more efficient database architectures and search algorithms; and the combination of these disciplines helped to develop the World Wide Web (WWW).
© 2007 Cios / Pedrycz / Swiniarski / Kurgan21 How does Data Mining Differ from Other Approaches? Along with dramatic increase in the amount of stored data came demands for better, faster, cheaper ways to deal with those data. In other words, all the data in the world are of no value without mechanisms to efficiently and effectively extract information/knowledge from them.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan22 How does Data Mining Differ from Other Approaches? Early DM pioneers: U. Fayyad H. Mannila G. Piatetsky-Shapiro G. Djorgovski W. Frawley P. Smith many others…
© 2007 Cios / Pedrycz / Swiniarski / Kurgan23 How does Data Mining Differ from Other Approaches? Data mining is not just an “umbrella” term coined for the purpose of making sense of data. The major distinguishing characteristic of DM is that it is data driven, as opposed to other methods that are often model driven. In statistics, researchers frequently deal with the problem of finding the smallest data size that gives sufficiently confident estimates. In DM, we deal with the opposite problem, namely, data size is large and we are interested in building a data model that is small (not too complex) but still describes the data well.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan24 How does Data Mining Differ from Other Approaches? Finding a good model of the data, which at the same time is easy to understand, is at the heart of DM. We need to keep in mind that none of the generated models will be complete (using all the relevant variables/attributes of the data), and that almost always we will look for a compromise between model completeness and model complexity.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan25 How does Data Mining Differ from Other Approaches? Word of caution: although many commercial as well as open-source DM tools exist they do not by any means produce automatic results in spite of the vendors hype about them. The users should understand that the application of even a very good tool to one’s data will most often not result in the generation of valuable knowledge for the data owner after simply clicking “run”.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan26 How to Use this Book for a Course on Data Mining? The indispensable core elements of the book, which we will cover in depth, are: data preprocessing methods, described in Part III, model building, described in Part IV and model assessment, covered in Part V. For hands-on experience you will be given a real data set and asked to follow the knowledge discovery process for performing your DM project.
© 2007 Cios / Pedrycz / Swiniarski / Kurgan27 References Cios, K.J., Pedrycz, W., and Swiniarski, R Data Mining Methods for Knowledge Discovery. Kluwer Han, J., and Kamber, M Data Mining: Concepts and Techniques. Morgan Kaufmann Hand, D., Mannila, H., and Smyth, P Principles of Data Mining. MIT Press Hastie, T., Tibshirani, R., and Friedman, J The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer Kecman, V Learning and Soft Computing. MIT Press Witten, H., and Frank, E Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann
Intelligence Through Learning from Data Monash University Semester 1, March 2006.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 11: Data Mining and Data Visualization Decision Support Systems.
Chapter 1 Introduction 1. The Evolution of Data Analysis To Support Business Intelligence 2 Evolutionary Step Business Question Enabling Technologies.
UNIT V: LEARNING. LEARNING Learning from Observation Inductive Learning Decision Trees Explanation based Learning Statistical Learning methods Reinforcement.
Information Systems Management CIS 318
Imagine It! Inquiry. Why Use the Inquiry Process? Instruction in reading, writing, speaking, and listening is often fragmented and lacking in a coherent.
Of An Expert System. Introduction What is AI? Intelligent in Human & Machine? What is Expert System? How are Expert System used? Elements of ES Who are.
Nica Valentin–Danut SEM 2012 Service system fundamentals: Work system, value chain, and life cycle.
Introduction to knowledge management. What is knowledge management Knowledge management can be difficult to define, because it encompasses a wide range.
Artificial Intelligence and the Internet Edward Brent University of Missouri – Columbia and Idea Works, Inc. Theodore Carnahan Idea Works, Inc.
BECOMING A POSTGRADUATE / GRADUATE. “ UNDER YOUR OWN MANAGEMENT” is the key nature of postgraduate (especially the PhD / doctoral) education. In undergraduate.
ICT619 Intelligent Systems Unit Coordinator: Graham Mann Room ECL Building Phone:
A Primer in Theory Construction By Paul Davidson Reynolds.
Classification and Prediction Monash University Semester 1, March 2006.
An Introduction to Data Mining By Rand Ali Computer Engineering & Information Technology Department.
1 APO KM Tools and Techniques. 2 Objectives To present and discuss some of the key KM methods, tools, technologies and techniques to be considered for.
Software Gets Smarter Artificial Intelligence in the Enterprise Dr Kaustubh Chokshi CEO, Intelligent Business Systems.
Chapter 4 Slide 1 Original 33 slides by Prof. Anita Beecroft, Kwantlen Polytechnic University 16 slides added Feb 2011 by Prof. Tim Richardson, University.
Chapter 13 Planning for Electronic Commerce. Learning Objectives In this chapter, you will learn about: Planning electronic commerce initiatives Strategies.
INTERMEDIATE 1 PHYSICAL EDUCATION STRUCTURES AND STRATEGIES INFORMATION PACK Name : _____________________________________ Class : _________ Year : ______.
Distributed Computing Dr. Eng. Ahmed Moustafa Elmahalawy Computer Science and Engineering Department.
Knowledge Acquisition and modelling Introduction to Knowledge Acquisition and Elicitation.
Software Development QA Best Practices May 20, 2010 Suzette Hackl, CSM Senior Project Manager Skyline Technologies, Inc.
Chapter 1: Introduction to Expert Systems. 2 Objectives Learn the meaning of an expert system Understand the problem domain and knowledge domain Learn.
© Negnevitsky, Pearson Education, Lecture 1 Introduction to knowledge-base intelligent systems Intelligent machines, or what machines can do Intelligent.
Tools and techniques that support user interface development Design support is needed because designing software is typically very complex and requires.
Artificial Intelligence. Outline of Lecture What is Artificial Intelligence? – Definition and Aims – AI in Fiction, Fact, and History – Common Themes.
DCS209: Introduction to Database Management (I) prepared and delivered by Iya Abubakar Computer Centre (IACC)Ahmadu Bello University, Zaria M-Auwal Gene.
1 Ontologizing the Ontolog Content Protégé Workshop Denise A. D. Bedford, Ph.d. July 23, 2006.
© 2016 SlidePlayer.com Inc. All rights reserved.