Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, December 7, 1999 William.

Slides:



Advertisements
Similar presentations
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
DATA WAREHOUSING.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Chapter 13 – Data Warehousing. Databases  Databases are developed on the IDEA that DATA is one of the critical materials of the Information Age  Information,
Introduction to machine learning
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Warehouse & Data Mining
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Lecture 0 Friday, January.
CIS 9002 Kannan Mohan Department of CIS Zicklin School of Business, Baruch College.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Datawarehouse Objectives
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Kansas State University Department of Computing and Information Sciences CIS 690: Data Mining Systems Lecture 0 Monday, May 15, 2000 William H. Hsu Department.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Kansas State University Department of Computing and Information Sciences CIS 690: Implementation of High-Performance Data Mining Systems Friday, 23 May.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, March 29, 2000.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, January 24, 2001.
University of Illinois at Urbana-Champaign PET Program Year-End Review Wednesday, August 4, 1999 William H. Hsu, Loretta Auvil, Tom Redman, Michael Welge.
Machine Learning Extract from various presentations: University of Nebraska, Scott, Freund, Domingo, Hong,
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence Wednesday, February 2, 2000.
Kansas State University Department of Computing and Information Sciences CIS 690: Data Mining Systems Lab 0 Monday, May 15, 2000 William H. Hsu Department.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Friday, 14 November 2003 William.
Kansas State University Department of Computing and Information Sciences CIS 732: Machine Learning and Pattern Recognition Monday, 03 March 2008 William.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Advanced Database Concepts
Data Mining and Decision Support
Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 490 / 730: Artificial Intelligence Lecture 17 of 42 Wednesday, 04 October.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Dr.S.Sridhar,Ph.D., RACI(Paris),RZFM(Germany),RMR(USA),RIEEEProc.
Data Mining – Intro.
SNS COLLEGE OF TECHNOLOGY
Introduction C.Eng 714 Spring 2010.
Chapter 5 Data Management
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 13 – Data Warehousing
Data Mining: Concepts and Techniques Course Outline
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Course Introduction CSC 576: Data Mining.
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Data Warehousing Data Mining Privacy
Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Presentation transcript:

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, December 7, 1999 William H. Hsu Department of Computing and Information Sciences, KSU Readings: Handout, “Data Mining with MLC++”, Kohavi et al Knowledge Discovery in Databases (KDD) and Data Mining Lecture 28

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Lecture Outline Readings: “Data Mining with MLC++”, Kohavi et al Final Exam –Format Open book 110 minutes 10 questions (see format online) –Sample questions online Knowledge Discovery in Databases (KDD) and Data Mining –Problem framework (stages) –Design and implementation issues Role of Machine Learning and Inference in Data Mining –Unsupervised learning –Supervised learning –Decision support (information retrieval, prediction, policy optimization) Next Lecture: Final Review Session

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning What Is Data Mining? Two Definitions (FAQ List) –The process of automatically extracting valid, useful, previously unknown, and ultimately comprehensible information from large databases and using it to make crucial business decisions –“Torturing the data until they confess” Data Mining: An Application of Machine Learning –Guides and integrates learning (model-building) processes Learning methodologies: supervised, unsupervised, reinforcement Includes preprocessing (data cleansing) tasks Extends to pattern recognition (inference or automated reasoning) tasks –Geared toward such applications as: Anomaly detection (fraud, inappropriate practices, intrusions) Crisis monitoring (drought, fire, resource demand) Decision support What Data Mining Is Not –Data Base Management Systems: related but not identical field –“Discovering objectives”: still need to understand performance element

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning KDD and Software Engineering Rapid KDD Development Environment

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Stages of Data Mining

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Databases and Data Mining Database Engineering  Data Mining! –Database design and engineering Data Base Management System (DBMS): computational system that supports efficient organization, retrieval, and processing of data Data warehouse: repository of integrated information for queries, analysis –Data mining Often an application of DBMS and data warehousing systems Includes inductive model building (learning), pattern recognition, inference Selection –Guides and integrates learning (model-building) processes –Learning methodologies: supervised, unsupervised, reinforcement –Includes preprocessing (data cleansing), pattern recognition and inference Online Analytical Processing (OLAP) –Efficient collection, storage, manipulation, reproduction of multidimensional data –Objective: analysis (e.g., for decision support) –See:

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Data Integrity and Data Modeling: Ontologies Caution/WarningFuel Systems Spatial/GPS/ Navigation Data Bus/Control/ Diagnostics Electrical ProfilometerTiming HydraulicsBallisticsUnused

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Data Aggregation and Sampling

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Unsupervised Learning Unsupervised Learning in Support of Supervised Learning –Given: D  labeled vectors (x, y) –Return: D’  new training examples (x’, y’) –Constructive induction: transformation step in KDD Feature “construction”: generic term Cluster definition Feature Construction: Front End –Synthesizing new attributes Logical: x 1   x 2, arithmetic: x 1 + x 5 / x 2 Other synthetic attributes: f(x 1, x 2, …, x n ), etc. –Dimensionality-reducing projection, feature extraction –Subset selection: finding relevant attributes for a given target y –Partitioning: finding relevant attributes for given targets y 1, y 2, …, y p Cluster Definition: Back End –Form, segment, and label clusters to get intermediate targets y’ –Change of representation: find good (x’, y’) for learning target y Constructive Induction (x, y) x’ / (x 1 ’, …, x p ’) Cluster Definition (x’, y’) or ((x 1 ’, y 1 ’), …, (x p ’, y p ’)) Feature (Attribute) Construction and Partitioning

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Relevance Determination 0,0,0,0 1,0,0,00,1,0,00,0,1,00,0,0,1 1,0,1,00,1,1,01,0,0,10,1,0,11,1,0,00,0,1,1 1,1,1,01,1,0,11,0,1,10,1,1,1 1,1,1,1 Subset Inclusion State Space Poset Relation: Set Inclusion A  B = “B is a subset of A” “Up” operator: DELETE “Down” operator: ADD {} {1}{2}{3}{4} {1}{3}{2,3}{1,4}{2,4}{1,2}{3,4} {1,2,3}{1,2,4}{1,3,4}{2,3,4} {1,2,3,4}

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Wrappers for Performance Enhancement Wrappers –“Outer loops” for improving inducers –Use inducer performance to optimize Applications of Wrappers –Combining knowledge sources Committee machines (static): bagging, stacking, boosting Other sensor and data fusion –Tuning hyperparameters Number of ANN hidden units GA control parameters Priors in Bayesian learning –Constructive induction Attribute (feature) subset selection Feature construction Implementing Wrappers –Search [Kohavi, 1995] –Genetic algorithm Relevant Inputs (Single Objective) Decomposition Methods Heterogeneous Data (Multiple Sources) Relevant Inputs (Multiple Objectives) Decision Support System Single-Task Model Selection Task-Specific Model Selection Definition of New Learning Problem(s) Supervised Reduction of Inputs Subdivision of Inputs Unsupervised

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Supervised Learning Framework Multiattribute Data Set Attribute Selection and Partitioning Subproblem Definition ? ? ? ? Partition Evaluator Metric-Based Model Selection Learning Architecture Learning Method Learning Specification Subproblem ( Architecture, Method ) Data Fusion Overall Prediction

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Performance Element: Decision Support Systems (DSS) Environment (Data Model) Learning Element Knowledge Base Performance Element Model Identification (Relational Database) –Specify data model –Group attributes by type (dimension) –Define queries Prediction Objective Identification –Identify target function –Define hypothesis space Transformation of Data –Reduce data: e.g., decrease frequency –Select relevant data channels (given prediction objective) –Integrate models, sources of data (e.g., interactively elicited rules) Supervised Learning Analysis and Assimilation: Performance Evaluation using DSS

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Case Study: Automobile Insurance Risk Analysis

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Case Study: Fraud Detection NCSA D2K -

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Control Interfaces –Actuators: fire/smoke suppression, electrical isolation, counterflooding –Intelligent sensors Simulation Module –Process/agent simulation –Automation simulation –Predictive validation for sensors Learning Modules –Time series learning –Control knowledge acquisition Intelligent Reasoning Modules –Crisis recognition –Casualty response Intelligent Displays Module –Interactive design and visualization –Supervisory interface Case Study: Prognostic Monitoring Simulator Processes Sensors Agents Predictive Validation Intelligent Sensors Supervisory Interface Actuators Learning and Data Mining Inference and Decision Support Intelligent Displays Multimedia Visualization Automatic Speech Recognition Filtered Views

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Terminology Data Mining –Operational definition: automatically extracting valid, useful, novel, comprehensible information from large databases and using it to make decisions –Constructive definition: expressed in stages of data mining Databases and Data Mining –Data Base Management System (DBMS): data organization, retrieval, processing –Data warehouse: repository of integrated information for queries, analysis –Online Analytical Processing (OLAP): storage/CPU-efficient manipulation of data for summarization (descriptive statistics), inductive learning and inference Stages of Data Mining –Data selection (aka filtering): sampling original (raw) data –Data preprocessing: sorting, segmenting, aggregating –Data transformation: change of representation; feature construction, selection, extraction; quantization (scalar, e.g., histogramming, vector, aka clustering) –Machine learning: unsupervised, supervised, reinforcement for model building –Inference: application of performance element (pattern recognition, etc.); evaluation, assimilation of results

Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Summary Points Knowledge Discovery in Databases (KDD) and Data Mining –Stages: selection (filtering), processing, transformation, learning, inference –Design and implementation issues Role of Machine Learning and Inference in Data Mining –Roles of unsupervised, supervised learning in KDD –Decision support (information retrieval, prediction, policy optimization) Case Studies –Risk analysis, transaction monitoring (filtering), prognostic monitoring –Applications: business decision support (pricing, fraud detection), automation Resources Online –Microsoft DMX Group (Fayyad): –KSU KDD Lab (Hsu): –CMU KDD Lab (Mitchell): –KD Nuggets (Piatetsky-Shapiro): –NCSA Automated Learning Group (Welge) ALG home page: NCSA D2K: