Presentation is loading. Please wait.

Presentation is loading. Please wait.

Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri.

Similar presentations


Presentation on theme: "Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri."— Presentation transcript:

1 Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri Wednesday, April 15, 2015

2 Roadmap Motivation Algorithms – Local L2 – K-means Results Related Work Conclusions

3 3 P2P Network Networks connect millions of individuals Economical No structural bias – ad hoc connections Nodes equivalent in functionality Volatile network structure Motivation Algorithms Results Related Work Conclusions

4 4 P2P Setup Millions of peers (Skype ~50 millions) Dynamic topology and data Communication – reliable, bandwidth-limited, asynchronous, asymmetric Impracticalities / impossibilities – global communication – global synchronization Motivation Algorithms Results Related Work Conclusions

5 5 P2P Applications P2P file sharing – audio, video (e-Mule, Kazaa, BitTorrents) P2P sensor network applications Grid Computing Motivation Algorithms Results Related Work Conclusions

6 6 P2P Data Monitoring Models (or predicates) e.g. k-means, eigenstates,   of current data Data and topology changes rapidly Does current model still represent data ? Motivation Algorithms Results Related Work Conclusions

7 7 Developed a monitoring algorithm Monitors the quality of data mining results Can be deployed in large peer-to-peer networks with very low resource consumption Our Work Motivation Algorithms Results Related Work Conclusions

8 8 Local Algorithms Property: – There exists k such that for any N there are instances (Graph, inputs) with runtime / messaging / memory below k – Eventual correctness guaranteed – Local stopping rule Motivation Algorithms Results Related Work Conclusions

9 9 Local L2 Algorithm Initial setup: each peer has – A data vector – Some global pattern vector Monitoring Problem: – is the L2 norm of the distance between the average data vector and the pattern vector greater than a given constant  Motivation Algorithms Results Related Work Conclusions

10 10 K-means Monitoring Centroids of data are pattern vector Monitoring problem: Monitor the distance between current centroids and global average – raise flag if error more than C Computing Centroids: Expensive, non-local, best effort sampling Motivation Algorithms Results Related Work Conclusions

11 11 Local Vectors For peer P i – Own estimate of global average (X) – Agreement with neighbor P j (Y) – Withheld knowledge w.r.t neighbor P j (Z=X-Y) Motivation Algorithms Results Related Work Conclusions

12 12 Possibilities 1. All 3 vectors inside circle 2. All 3 vectors outside circle 3. Some are inside, some are outside Case 1 Case 3 Motivation Algorithms Results Related Work Conclusions

13 13 Theorem If for every peer and each of its neighbours both the agreement and the withheld knowledge are in a convex shape (here a circle) - then so is the global average Motivation Algorithms Results Related Work Conclusions

14 14 Case 1 : All Inside Circle No more communication Motivation Algorithms Results Related Work Conclusions

15 15 Case 2: All Outside Circle Two peers independently estimate that global average vector outside Combined average can still be inside !!! Motivation Algorithms Results Related Work Conclusions

16 16 Case 2: All Outside Circle Solution – use tangent lines to bound circle A tangent or half-space is itself an unbounded convex region The theorem holds in this case as well Motivation Algorithms Results Related Work Conclusions

17 17 Case 3 : Inside & Outside Needs communication Motivation Algorithms Results Related Work Conclusions

18 18 Overall Algorithm (A) Area inside  circle. (B) Seven evenly spaced vectors. (C) Borders of seven half-spaces u i.x ≥  define a polygon. (D) Area between circle and union of half- spaces Motivation Algorithms Results Related Work Conclusions

19 19 Results : L2 Scalability Quality Messages Motivation Algorithms Results Related Work Conclusions

20 20 Results : k-means Quality Messages Motivation Algorithms Results Related Work Conclusions

21 21 Related work Flooding / limited depth flooding [Bawa et al.‘04] – Unacceptable resource requirement Best effort sampling [Bandhopadhaya et al.‘05] – Inaccurate for long periods, expensive Gossip based sampling and aggregation [Kempe et al. ‘03] – No answer for dynamic data Local algorithms [Peleg; Kutten; Patt-Shamir; Wolff ‘03] Motivation Algorithms Results Related Work Conclusions

22 22 Conclusion Presents a general framework for bounding the L2 norm of the average vector within any convex shape (also non-convex shapes) L2 algorithm is local and hence highly scalable k-Means algorithm shows excellent accuracy Motivation Algorithms Results Related Work Conclusions


Download ppt "Local L2-Thresholding Based Data Mining in Peer-to-Peer Systems Ran Wolff Kanishka Bhaduri Hillol Kargupta CSEE Dept, UMBC Presented by: Kanishka Bhaduri."

Similar presentations


Ads by Google