 William M. Pottenger, Ph.D. Computing the Future of Data Mining An Introduction to Data Mining Visit to Messiah College September 4, 2006 William M.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS Advanced Technologies Seminar June 15, 2000.
Data Mining By Archana Ketkar.
Data Mining – Intro.
University of Illinois at Urbana-Champaign 1 Analytical and Visual Data Mining Michael Welge Automated Learning Group, NCSA
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Mining.
Business Intelligence
WHT/ HPCC Systems Flavio Villanustre VP, Products and Infrastructure HPCC Systems Risk Solutions.
CIT 858: Data Mining and Data Warehousing Course Instructor: Bajuna Salehe Web:
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Large Data at SDSC Natasha Balac, Ph.D.. A Deluge of Data Astronomy Life Sciences Modeling and Simulation Data Management and Mining Geosciences.
Data Mining Using IBM Intelligent Miner Presented by: Qiyan (Jennifer ) Huang.
Chapter 5: Data Mining for Business Intelligence
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining: Introduction. Why Data Mining? l The Explosive Growth of Data: from terabytes to petabytes –Data collection and data availability  Automated.
Database Systems – Data Warehousing
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Chapter 1 Introduction to Data Mining
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Reference: An Overview of Business Intelligence Technology, Communications of The ACM, August VOL 54 NO.8
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data Mining – A First View Roiger & Geatz. Definition Data mining is the process of employing one or more computer learning techniques to automatically.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
Data Mining: Software Helping Business Run
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
3-1 Data Mining Kelby Lee. 3-2 Overview ¨ Transaction Database ¨ What is Data Mining ¨ Data Mining Primitives ¨ Data Mining Objectives ¨ Predictive Modeling.
Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS IMA Text Mining Workshop April 17, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Next Back MAP 3-1 Management Information Systems for the Information Age Copyright 2002 The McGraw-Hill Companies, Inc. All rights reserved Chapter 3 Data.
Introduction to Data-Mining Marko Grobelnik Institut Jozef Stefan.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Why BI….? Most companies collect a large amount of data from their business operations. To keep track of that information, a business and would need to.
CS507 Information Systems. Lesson # 11 Online Analytical Processing.
Academic Year 2014 Spring Academic Year 2014 Spring.
Waqas Haider Bangyal. 2 Source Materials “ Data Mining: Concepts and Techniques” by Jiawei Han & Micheline Kamber, Second Edition, Morgan Kaufmann, 2006.
LECTURE 2: DATA MINING. WHAT IS DATA MINING? 2 D ATA M INING AND D ATA W AREHOUSES ? It evolved in to being as the science of databases evolved Database.
Business Intelligence Overview. What is Business Intelligence? Business Intelligence is the processes, technologies, and tools that help us change data.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
There is an inherent meaning in everything. “Signs for people who can see.”
Ghada H. El-Khawaga Marwa M. El-Sadeeq  What is data mining ?  Why data mining?  Data mining types  Data mining tasks  Knowledge discovery.
THE LEONS COLLEGE OF LAW1 Organizing Data and Information Chapter 4.
Popular Database Management Systems
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining Motivation: “Necessity is the Mother of Invention”
MIS2502: Data Analytics Advanced Analytics - Introduction
Introduction C.Eng 714 Spring 2010.
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
C.U.SHAH COLLEGE OF ENG. & TECH.
Data Mining: Concepts and Techniques
Supporting End-User Access
Data Mining: Concepts and Techniques
Web Mining Department of Computer Science and Engg.
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

 William M. Pottenger, Ph.D. Computing the Future of Data Mining An Introduction to Data Mining Visit to Messiah College September 4, 2006 William M. Pottenger, Ph.D. Computer Science & Engineering Department

 William M. Pottenger, Ph.D. Knowledge Workers are Overwhelmed The user of software tools and computers are domain experts, NOT the computer science professionals –Too much data –Too much technology –Not enough useful information

 William M. Pottenger, Ph.D. Data Mining Roots: A Confluence of Multiple Disciplines Database Systems, Data Warehouses, and OLAP Machine Learning Information Theory & Statistics Mathematical Programming Visualization High Performance Computing … Algorithms have been known for awhile…Google™

 William M. Pottenger, Ph.D. Data Mining: On What Kind of Data? Relational Databases Data Warehouses Transactional Databases Advanced Database Systems –Object-Relational –Text –Heterogeneous: Legacy, Distributed, … –WWW … the Bible!

 William M. Pottenger, Ph.D. Why Do We Need Data Mining? Leverage organization’s data assets –Only a small portion (typically - 5%-10%) of the collected data is ever analyzed –Data that may never be analyzed continues to be collected, at a great expense, out of concern that something which may prove important in the future is missed –Growth rates of data preclude traditional “manual intensive” approach: need automated data fusion techniques based on data mining

 William M. Pottenger, Ph.D. Why Do We Need Data Mining? As databases and problems grow, the ability to support the decision support process using traditional query languages become infeasible –Many queries of interest are difficult to state in a query language (Query formulation problem) –“find all cases of fraud” –“find all individuals likely to buy a FORD Expedition” –“find all documents that are similar to this customers problem”

 William M. Pottenger, Ph.D. What (exactly) is Data Mining? Let’s take a few moments and consider this question. Is it: –Knowledge Discovery? –Knowledge Management? –Information Retrieval? –On-line Analytic Processing (OLAP)? –Machine Learning? –Decision Support? –Process Modeling/Control? –…

 William M. Pottenger, Ph.D. Definitions Data mining is the application of computer technology and machine learning algorithms to discover patterns, anomalies, trends, and knowledge from data. –SGI Mineset Product Description Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. –Data Mining by Witten and Frank Data mining, also popularly referred to as knowledge discovery in databases (KDD), is the automated or convenient extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. –Data Mining: Concepts and Techniques by Han and Kamber

 William M. Pottenger, Ph.D. What is Text Mining? Swanson (‘91) posed problem: Migraine headaches (M) –stress associated with M –stress leads to loss of magnesium –calcium channel blockers prevent some M –magnesium is a natural calcium channel blocker –spreading cortical depression (SCD) implicated in M –high levels of magnesium inhibit SCD –M patients have high platelet aggregability –magnesium can suppress platelet aggregability All extracted from medical journal titles Slide reused with permission of Marti UCB

 William M. Pottenger, Ph.D. Gathering Evidence stress migraine CCB magnesium PA magnesium SCD magnesium Slide reused with permission of Marti UCB

 William M. Pottenger, Ph.D. Novel Discovery: Magnesium & Migraines! migraine magnesium stress CCB PA SCD Slide reused with permission of Marti UCB No single author knew/wrote about this connection… this distinguishes Text Mining from Information Retrieval.

 William M. Pottenger, Ph.D. Why Use Data Mining? Data mining will become much more important, and companies will throw away nothing about their customers because it will be so valuable. If you’re not doing this, you’re out of business. –Arno Penzias, Chief Bell Labs We are deluged by data – scientific data, medical data, demographic data, financial data, and marketing data. People have no time to look at this data. Human attention has become a precious resource. –Jim Gray, Microsoft Research in preface to Data Mining by Han and Kamber Necessity is the mother of invention –Unknown

 William M. Pottenger, Ph.D. How is Data Mining Used? Direct Marketing Customer Acquisition Customer Retention Cross-selling Trend Analysis Fraud Detection Forecasting in Financial Markets Process Modeling Process Control …

 William M. Pottenger, Ph.D. But What is Data Mining (Really)? Data Mining: A Process Copyright © 1997 Stiftelsen Østfoldforskning: Used with permission

 William M. Pottenger, Ph.D. An Example of Data Mining in Process Modeling and Control at HP Quality Assurance troubleshooting –KnowledgeSeeker  Decision Tree Data Mining Tool identified critical factors impacting production of HP IIc Color Scanner Process control –KnowledgeSeeker  Decision Tree Data Mining Tool derived rules necessary to identify situations where process was about to go out of control.

 William M. Pottenger, Ph.D. How Do Decision Trees Work? Decision trees predict results but also tell about structure.

 William M. Pottenger, Ph.D. Be right back … A Demonstration of Data Mining Featuring KnowledgeSEEKER by Angoss Knowledge Engineering

 William M. Pottenger, Ph.D. Examples of Commercial Data Mining Systems IBM’s DB2 Intelligent Miner – SAS Institute’s Enterprise Miner – SPSS’s Clementine – Angoss’ KnowledgeSeeker – Plus many more …

 William M. Pottenger, Ph.D.Asymptopia We are always given finite amounts of data … and rarely do we reach asymptopia. Asymptopia is the mythical land, the data miners 'utopia', where the amount of data is infinite and all algorithms converge and all users are satisfied... Naturally, asymptopia can be reached only in the limit. Ron Kohavi Nuggets 96:21 (