Download presentation

Presentation is loading. Please wait.

Published byElisa Lucken Modified about 1 year ago

1
Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri Wednesday, April 15, 2015

2
Roadmap Motivation Algorithms – Local L2 – K-means Results Related Work Conclusions

3
3 P2P Network Networks connect millions of individuals Economical No structural bias – ad hoc connections Nodes equivalent in functionality Volatile network structure Motivation Algorithms Results Related Work Conclusions

4
4 P2P Setup Millions of peers (Skype ~50 millions) Dynamic topology and data Communication – reliable, bandwidth-limited, asynchronous, asymmetric Impracticalities / impossibilities – global communication – global synchronization Motivation Algorithms Results Related Work Conclusions

5
5 P2P Applications P2P file sharing – audio, video (e-Mule, Kazaa, BitTorrents) P2P sensor network applications Grid Computing Motivation Algorithms Results Related Work Conclusions

6
6 P2P Data Monitoring Models (or predicates) e.g. k-means, eigenstates, of current data Data and topology changes rapidly Does current model still represent data ? Motivation Algorithms Results Related Work Conclusions

7
7 Developed a monitoring algorithm Monitors the quality of data mining results Can be deployed in large peer-to-peer networks with very low resource consumption Our Work Motivation Algorithms Results Related Work Conclusions

8
8 Local Algorithms Property: – There exists k such that for any N there are instances (Graph, inputs) with runtime / messaging / memory below k – Eventual correctness guaranteed – Local stopping rule Motivation Algorithms Results Related Work Conclusions

9
9 Local L2 Algorithm Initial setup: each peer has – A data vector – Some global pattern vector Monitoring Problem: – is the L2 norm of the distance between the average data vector and the pattern vector greater than a given constant Motivation Algorithms Results Related Work Conclusions

10
10 K-means Monitoring Centroids of data are pattern vector Monitoring problem: Monitor the distance between current centroids and global average – raise flag if error more than C Computing Centroids: Expensive, non-local, best effort sampling Motivation Algorithms Results Related Work Conclusions

11
11 Local Vectors For peer P i – Own estimate of global average (X) – Agreement with neighbor P j (Y) – Withheld knowledge w.r.t neighbor P j (Z=X-Y) Motivation Algorithms Results Related Work Conclusions

12
12 Possibilities 1. All 3 vectors inside circle 2. All 3 vectors outside circle 3. Some are inside, some are outside Case 1 Case 3 Motivation Algorithms Results Related Work Conclusions

13
13 Theorem If for every peer and each of its neighbours both the agreement and the withheld knowledge are in a convex shape (here a circle) - then so is the global average Motivation Algorithms Results Related Work Conclusions

14
14 Case 1 : All Inside Circle No more communication Motivation Algorithms Results Related Work Conclusions

15
15 Case 2: All Outside Circle Two peers independently estimate that global average vector outside Combined average can still be inside !!! Motivation Algorithms Results Related Work Conclusions

16
16 Case 2: All Outside Circle Solution – use tangent lines to bound circle A tangent or half-space is itself an unbounded convex region The theorem holds in this case as well Motivation Algorithms Results Related Work Conclusions

17
17 Case 3 : Inside & Outside Needs communication Motivation Algorithms Results Related Work Conclusions

18
18 Overall Algorithm (A) Area inside circle. (B) Seven evenly spaced vectors. (C) Borders of seven half-spaces u i.x ≥ define a polygon. (D) Area between circle and union of half- spaces Motivation Algorithms Results Related Work Conclusions

19
19 Results : L2 Scalability Quality Messages Motivation Algorithms Results Related Work Conclusions

20
20 Results : k-means Quality Messages Motivation Algorithms Results Related Work Conclusions

21
21 Related work Flooding / limited depth flooding [Bawa et al.‘04] – Unacceptable resource requirement Best effort sampling [Bandhopadhaya et al.‘05] – Inaccurate for long periods, expensive Gossip based sampling and aggregation [Kempe et al. ‘03] – No answer for dynamic data Local algorithms [Peleg; Kutten; Patt-Shamir; Wolff ‘03] Motivation Algorithms Results Related Work Conclusions

22
22 Conclusion Presents a general framework for bounding the L2 norm of the average vector within any convex shape (also non-convex shapes) L2 algorithm is local and hence highly scalable k-Means algorithm shows excellent accuracy Motivation Algorithms Results Related Work Conclusions

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google