Overview of Distributed Data Mining Xiaoling Wang March 11, 2003.

Slides:



Advertisements
Similar presentations
An Introduction to Data Mining
Advertisements

PARTNERSHIP BETWEEN ATC AND UDEL PRINCIPAL INVESTIGATORS Ashfaq Khokhar, Phd. Alan Scramlin DATA MINING PROJECT.
Mining databases with different schema: Integrating incompatible classifiers Andreas L Prodromidis Salvatore Stolfo Dept of Computer Science Columbia University.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Web Usage Mining: Processes and Applications
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
University of Minnesota
Data Mining By Archana Ketkar.
Data Mining and Data Warehousing – a connected view.
Mining Behavior Models Wenke Lee College of Computing Georgia Institute of Technology.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Dr. Awad Khalil Computer Science Department AUC
MAKING THE BUSINESS BETTER Presented By Mohammed Dwikat DATA MINING Presented to Faculty of IT MIS Department An Najah National University.
1 Data Mining DT211 4 Refer to Connolly and Begg 4ed.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Intrusion Detection Jie Lin. Outline Introduction A Frame for Intrusion Detection System Intrusion Detection Techniques Ideas for Improving Intrusion.
Issues with Data Mining
Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
Chapter 1 Introduction to Data Mining
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Institut für Softwarewissenschaft - Universität WienP.Brezany 1 Meta-Learning in Distributed Datamining Systems Peter Brezany Institut für Softwarewissenschaft.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Mining Weather Data for Decision Support Roy George Army High Performance Computing Research Center Clark Atlanta University Atlanta, GA
Chapter 14 Data Mining Transparencies. 2 Chapter Objectives u The concepts associated with data mining. u The main features of data mining operations,
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
MIS2502: Data Analytics Advanced Analytics - Introduction.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Advanced Database Concepts
Data Mining and Decision Support
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Data Mining Concepts and Techniques Course Presentation by Ali A. Ali Department of Information Technology Institute of Graduate Studies and Research Alexandria.
Data Warehousing Data Mining Privacy. Reading FarkasCSCE Spring
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Network Management Lecture 13. MACHINE LEARNING TECHNIQUES 2 Dr. Atiq Ahmed Université de Balouchistan.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining – Intro.
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 13 – Data Warehousing
Sangeeta Devadiga CS 157B, Spring 2007
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Data Mining: Introduction
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Welcome! Knowledge Discovery and Data Mining
Presentation transcript:

Overview of Distributed Data Mining Xiaoling Wang March 11, 2003

2 Data Mining “ We are drowning in information, but starving for knowledge.” - John Naisbett What is data mining? –Closely related to knowledge discovery –Discovering useful, usually unknown patterns from data –Data: a set of facts F (e.g., cases in a database) –Pattern: an expression E describing facts in a subset FE

3 Goals of Data Mining Goals –Prediction –Description Domains –Induction, Compression, Querying, Approximation, Search

4 Basic Techniques of Data Mining Basic techniques –Clustering –Association rule discovery –Classification –Sequential pattern discovery –Outlier detection

5 Data Warehouse Architecture Data Warehouse Data source … Data Transformation & Integration Extractor Data Mining Algorithm

6 Distributed Data Mining Framework Data source … Local Model Aggregation Final Model Data Mining Algorithm Data Mining Algorithm Data Mining Algorithm Local Model Local Model Local Model

7 Distributed Data Source Definitions Homogeneous –Contain the same set of attributes across distributed data sites Heterogeneous –Define different sets of attributes across distributed data sites

8 Distributed Data Mining Techniques Distributed classifier learning –Meta-learning framework –Distributed learning with knowledge probing Collective data mining Distributed clustering Distributed association rule mining Others

9 Meta-learning Chan, Florida Institute of Technology & Stolfo, Columbia University “base classifiers” and “meta-classifier” Meta-learning rules: voting, arbitrating, and combining Scalability, efficiency, portability, compatibility, adaptivity, extensibility, and effectiveness For heterogeneous data sites, apply bridging methods

10 Meta-learning Framework Training Data Meta-level Training Data Validation Data Meta-learning (Arbitration and Combining) Final Classifier System Classifier Learning Algorithm Training Data Learning Algorithm Classifier Prediction

11 Distributed Learning with Knowledge Probing Guo & Sutiwaraphun, Imperial College Objective: distributed classification Meta-learning based technique Applied on homogeneous data sites Knowledge probing: to extract descriptive knowledge from a black box model from a new data set whose classes are assigned by the model

12 DLKP (Cont.) Data source 1 Data source 2Data source k … Prediction Scheme Final Model Local Model Derivation Local Model Derivation Local Model Derivation Local Model 1 Local Model 2 Local Model 3 Probing set Probing Strategy

13 Collective Data Mining (CDM) Kargupta, University of Maryland & Park, Washington State University Objective: predictive data modeling Applied to heterogeneous (vertically partitioned) data sites Foundation: any function can be represented in a distributed fashion using an appropriate set of basis functions (orthonormal) Example: Collective Principal Component Analysis (CPCA)

14 CDM Framework Step 1: Generate approximate orthonormal basis coefficients at each local site Step 2: Move a chosen sample of data sets from each site to a single site; Generate approximate basis coefficients corresponding to non-linear cross terms Step 3: Combine the local models; Transform it into user described representation; Output the model

15 Distributed Clustering Sources from parallel center-based clustering algorithms, such as k-means, etc Applied on homogeneous scenarios Two basic approaches –Approximate the underlying distance measure by aggregation –Provide the exact measure by data broadcasting

16 Distributed Association Rule Mining Two main approaches –Count Distribution (CD) data is partitioned homogeneously into several data sites –Data Distribution (DD) maximizing parallelism

17 Applications of Distributed Data Mining Credit card fraud detection Intrusion detection Information retrieval from Internet Ad hoc sensor networks

18 Challenges of Distributed Data Mining Real-time distributed data mining Adaptive to changing environment, new data, new pattern