Distributed Data Classification in Sensor Networks DE: Verteilte Daten-Klassifikation in Sensor-Netzwerken FR: Classification distribuée de données dans.

Slides:



Advertisements
Similar presentations
Gossip-Based Computation of Aggregation Information
Advertisements

Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri.
Kick-off Meeting, July 28, 2008 ONR MURI: NexGeNetSci Distributed Coordination, Consensus, and Coverage in Networked Dynamic Systems Ali Jadbabaie Electrical.
Cognitive Publish/Subscribe for Heterogeneous Clouds Šarūnas Girdzijauskas, Swedish Institute of Computer Science (SICS) Joint work with:
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Salvatore giorgi Ece 8110 machine learning 5/12/2014
Data and Computer Communications Ninth Edition by William Stallings Chapter 12 – Routing in Switched Data Networks Data and Computer Communications, Ninth.
K Means Clustering , Nearest Cluster and Gaussian Mixture
Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel.
Dynamic Computations in Ever-Changing Networks Idit Keidar Technion, Israel 1Idit Keidar, TADDS Sep 2011.
Paper Discussion: “Simultaneous Localization and Environmental Mapping with a Sensor Network”, Marinakis et. al. ICRA 2011.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Segmentation Divide the image into segments. Each segment:
Distributed Clustering for Robust Aggregation in Large Networks Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel.
Clustering.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
What is Cluster Analysis?
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
What is Cluster Analysis?
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Correctness of Gossip-Based Membership under Message Loss Maxim Gurevich, Idit Keidar Technion.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Radial Basis Function Networks
Evaluating Performance for Data Mining Techniques
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Computer Vision James Hays, Brown
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Data mining and machine learning A brief introduction.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Clustering What is clustering? Also called “unsupervised learning”Also called “unsupervised learning”
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CountTorrent: Ubiquitous Access to Query Aggregates in Dynamic and Mobile Sensor Networks Abhinav Kamra, Vishal Misra and Dan Rubenstein - Columbia University.
Clustering.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Global Clock Synchronization in Sensor Networks Qun Li, Member, IEEE, and Daniela Rus, Member, IEEE IEEE Transactions on Computers 2006 Chien-Ku Lai.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Database Systems Group Research Overview OLAP Statistical Tests Goal: Isolate factors that cause significant changes in a measured value – Ex:
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Sensor Networks © M Jamshidi.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Scalable Load-Distance Balancing
Computing and Compressive Sensing in Wireless Sensor Networks
Data Mining K-means Algorithm
Classification of unlabeled data:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
EM Algorithm and its Applications
Presentation transcript:

Distributed Data Classification in Sensor Networks DE: Verteilte Daten-Klassifikation in Sensor-Netzwerken FR: Classification distribuée de données dans des réseaux de capteurs IT: Classificazione distribuita di dati nelle reti del sensore Ittay Eyal, Idit Keidar, Raphi Rom Technion, Israel PoDC, Zurich, July 2010

Sensor Networks Today Temperature, humidity, seismic activity etc. Data collection and analysis is easy – small (10s of motes) networks. 2

Sensor Networks Tomorrow Scale out Thousands of lightweight sensors (e.g. fire detection) Lots of data to be analyzed (too much for motes) Centralized solution is not feasible. 3 And also: Wide area, limited battery  non-trivial topology Failures

The Goal Model: A large number of sensors Connected topology Problem: Each sensor takes a sample All learn the same classification of all sampled data 4

Classification Classification: 1.Partition 2.Summarization 5 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. Wiley-Interscience, 2nd edition, Classification Algorithm: Finds an optimal classification (Centralized solutions e.g. k-means, EM: Iterations) Example – k-means: Minimize the sum of distances between samples and the average of their component

The Distributed Challenge 6 -4 o -10 o -5 o -6 o 120 o -12 o -11 o 98 o Each should learn: Two components, averages 109 and -8. D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of aggregate information. In FOCS, Nath,Gibbons,Seshan,Anderson. Synopsis diffusion for robust aggregation in sensor networks. SenSys‘04. S. Datta, C. Giannella, and H. Kargupta. K-means clustering over a large, dynamic network. In SDM, W. Kowalczyk and N. A. Vlassis. Newscast EM. In NIPS, 2004.

Our Contributions Generic distributed classification algorithm Multidimensional information E.g., temperature, humidity, location Any classification representation & strategy E.g., k-means, GM/EM Convergence proof of this algorithm All nodes learn the same classification 7

The Algorithm – K-means example 8 Each node maintains a classification - a weighted set of averages Gossip – fast propagation, low bandwidth Closest averages get merged

The Algorithm – K-means example 9 Original samples Classification Classification

The Algorithm – K-means example 10 Initially: Classification based on input Occasionally, communicate and smart merge (limit k) b a Before DuringAfter

But what does the mean mean? New Sample Mean A Mean B The variance must be taken into account Gaussian A Gaussian B 11

The Algorithm – GM/EM example a b Merge (EM) 12

The Generic Algorithm 13 Summaries and merges respect axioms (see paper) Connected topology, weakly fair gossip Quantization – no infinitesimal weight Classification is a weighted set of summaries Asynchronous, any topology, any gossip variant Merge rule – application dependent

Convergence? 14 Challenge: Non-deterministic distributed algorithm Asynchronous gossip among arbitrary pairs Application-defined merges Different nodes can have different rules Proof: In R n space Some trigo Some calculus Some distributed systems

Summary Distributed classification algorithm for sensor networks Generic Summary representation Classification strategy Asynchronous and any connected topology Implementations K-means Gaussian mixture Convergence proof – for the generic algorithm: All nodes reach a classification of the sampled values. 15 Ittay Eyal, Idit Keidar, Raphael Rom. Distributed Data Classification in Sensor Networks, PoDC 2010.

Convergence Proof System-wide collection pool Collection genealogy: Collections are the descendants of the collections they were formed by. Samples’ mass is mixed on every merge, and split on every split operation. Mixture space: A dimension for every sample. Each collection is a vector. Vectors (i.e. collections) are eventually be partitioned. 16

It works where it matters Not Interesting Easy 17

It works where it matters Error No outlier detection With outlier detection 18