Download presentation

Presentation is loading. Please wait.

Published byElisa Lucken Modified over 3 years ago

1
Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri Wednesday, April 15, 2015

2
Roadmap Motivation Algorithms – Local L2 – K-means Results Related Work Conclusions

3
3 P2P Network Networks connect millions of individuals Economical No structural bias – ad hoc connections Nodes equivalent in functionality Volatile network structure Motivation Algorithms Results Related Work Conclusions

4
4 P2P Setup Millions of peers (Skype ~50 millions) Dynamic topology and data Communication – reliable, bandwidth-limited, asynchronous, asymmetric Impracticalities / impossibilities – global communication – global synchronization Motivation Algorithms Results Related Work Conclusions

5
5 P2P Applications P2P file sharing – audio, video (e-Mule, Kazaa, BitTorrents) P2P sensor network applications Grid Computing Motivation Algorithms Results Related Work Conclusions

6
6 P2P Data Monitoring Models (or predicates) e.g. k-means, eigenstates, of current data Data and topology changes rapidly Does current model still represent data ? Motivation Algorithms Results Related Work Conclusions

7
7 Developed a monitoring algorithm Monitors the quality of data mining results Can be deployed in large peer-to-peer networks with very low resource consumption Our Work Motivation Algorithms Results Related Work Conclusions

8
8 Local Algorithms Property: – There exists k such that for any N there are instances (Graph, inputs) with runtime / messaging / memory below k – Eventual correctness guaranteed – Local stopping rule Motivation Algorithms Results Related Work Conclusions

9
9 Local L2 Algorithm Initial setup: each peer has – A data vector – Some global pattern vector Monitoring Problem: – is the L2 norm of the distance between the average data vector and the pattern vector greater than a given constant Motivation Algorithms Results Related Work Conclusions

10
10 K-means Monitoring Centroids of data are pattern vector Monitoring problem: Monitor the distance between current centroids and global average – raise flag if error more than C Computing Centroids: Expensive, non-local, best effort sampling Motivation Algorithms Results Related Work Conclusions

11
11 Local Vectors For peer P i – Own estimate of global average (X) – Agreement with neighbor P j (Y) – Withheld knowledge w.r.t neighbor P j (Z=X-Y) Motivation Algorithms Results Related Work Conclusions

12
12 Possibilities 1. All 3 vectors inside circle 2. All 3 vectors outside circle 3. Some are inside, some are outside Case 1 Case 3 Motivation Algorithms Results Related Work Conclusions

13
13 Theorem If for every peer and each of its neighbours both the agreement and the withheld knowledge are in a convex shape (here a circle) - then so is the global average Motivation Algorithms Results Related Work Conclusions

14
14 Case 1 : All Inside Circle No more communication Motivation Algorithms Results Related Work Conclusions

15
15 Case 2: All Outside Circle Two peers independently estimate that global average vector outside Combined average can still be inside !!! Motivation Algorithms Results Related Work Conclusions

16
16 Case 2: All Outside Circle Solution – use tangent lines to bound circle A tangent or half-space is itself an unbounded convex region The theorem holds in this case as well Motivation Algorithms Results Related Work Conclusions

17
17 Case 3 : Inside & Outside Needs communication Motivation Algorithms Results Related Work Conclusions

18
18 Overall Algorithm (A) Area inside circle. (B) Seven evenly spaced vectors. (C) Borders of seven half-spaces u i.x ≥ define a polygon. (D) Area between circle and union of half- spaces Motivation Algorithms Results Related Work Conclusions

19
19 Results : L2 Scalability Quality Messages Motivation Algorithms Results Related Work Conclusions

20
20 Results : k-means Quality Messages Motivation Algorithms Results Related Work Conclusions

21
21 Related work Flooding / limited depth flooding [Bawa et al.‘04] – Unacceptable resource requirement Best effort sampling [Bandhopadhaya et al.‘05] – Inaccurate for long periods, expensive Gossip based sampling and aggregation [Kempe et al. ‘03] – No answer for dynamic data Local algorithms [Peleg; Kutten; Patt-Shamir; Wolff ‘03] Motivation Algorithms Results Related Work Conclusions

22
22 Conclusion Presents a general framework for bounding the L2 norm of the average vector within any convex shape (also non-convex shapes) L2 algorithm is local and hence highly scalable k-Means algorithm shows excellent accuracy Motivation Algorithms Results Related Work Conclusions

Similar presentations

OK

Straight Line Routing for Wireless Sensor Networks Cheng-Fu Chou, Jia-Jang Su, and Chao-Yu Chen Computer Science and Information Engineering Dept., National.

Straight Line Routing for Wireless Sensor Networks Cheng-Fu Chou, Jia-Jang Su, and Chao-Yu Chen Computer Science and Information Engineering Dept., National.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on tourism industry in india 2012 Ppt on trans-siberian railway route Ppt on input devices and output devices Ppt on cloud computing security Ppt on electric meter testing jobs Jit ppt on manufacturing business Ppt on power quality monitoring Ppt on the development of the periodic table Ppt on indian politics democracy Ppt on 3 idiots movie in english