Presentation is loading. Please wait.

Presentation is loading. Please wait.

10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector.

Similar presentations


Presentation on theme: "10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector."— Presentation transcript:

1 10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector Coloring the vector space Coloring the vector space Grey:: function > threshold Grey:: function > threshold White:: function <= threshold White:: function <= threshold Goal: determine color of global data vector (average). Goal: determine color of global data vector (average).

2 10/5/2015 2 Bounding the Convex Hull Observation: average is in the convex hull  Observation: average is in the convex hull  If convex hull monochromatic then average too If convex hull monochromatic then average too But – convex hull may become large But – convex hull may become large

3 10/5/2015 3 Drift Vectors Periodically calculate an estimate vector - the current global Periodically calculate an estimate vector - the current global Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Each node maintains a drift vector – the change in the local statistics vector since the last time the estimate vector was calculated Global average statistics vector is also the average of the drift vectors Global average statistics vector is also the average of the drift vectors

4 10/5/2015 4 The Bounding Theorem [SIGMOD’06] A reference point is known to all nodes A reference point is known to all nodes Each vertex constructs a sphere Each vertex constructs a sphere Theorem: convex hull is bounded by the union of spheres Theorem: convex hull is bounded by the union of spheres  Local constraints!  Local constraints!

5 10/5/2015 5 Basic Algorithm Basic Algorithm An initial estimate vector is calculated An initial estimate vector is calculated Nodes check color of drift spheres Nodes check color of drift spheres Drift vector is the diameter of the drift sphere Drift vector is the diameter of the drift sphere If any sphere non monochromatic: node triggers re-calculation of estimate vector If any sphere non monochromatic: node triggers re-calculation of estimate vector

6 10/5/2015 6 Reuters Corpus (RCV1-v2) 800,000+ news stories 800,000+ news stories Aug 20 1996 -- Aug 19 1997 Aug 20 1996 -- Aug 19 1997 Corporate/Industrial tagging Corporate/Industrial tagging n=10 10 nodes, random data distribution

7 10/5/2015 7 Trade-off: Accuracy vs. Performance Inefficiency: value of function on average is close to the threshold Inefficiency: value of function on average is close to the threshold Performance can be enhanced at the cost of less accurate result: Performance can be enhanced at the cost of less accurate result: Set error margin around the threshold value Set error margin around the threshold value

8 10/5/2015 8 Performance Analysis

9 10/5/2015 9 Performance Analysis (cntd.)

10 10/5/2015 10 Balancing Globally calculating average is costly Globally calculating average is costly Often possible to average only some of the data vectors. Often possible to average only some of the data vectors.

11 SRDC 2013 10/5/2015 11 Shape Sensitivity [PODS’08] Fitting cover to Data Fitting cover to Data Fitting cover to threshold surface Fitting cover to threshold surface Specific function classes Specific function classes

12 SRDC 2013 10/5/2015 12 Fitting Cover to Data (using the covariance matrix)

13 10/5/2015 13 Fitting Cover to Threshold Surface -- Reference Vector Selection

14 10/5/2015 14 Distance Fields Skeleton, Medial Axis

15 10/5/2015 15 Results – Shape Sensitivity

16 e ΔV1ΔV1 ΔV2ΔV2 ΔV3ΔV3 ΔV4ΔV4 ΔV5ΔV5 f(v(t))  T epep ΔVp1ΔVp1 ΔVp2ΔVp2 ΔVp3ΔVp3 ΔVp4ΔVp4 ΔVp5ΔVp5 v(t) Stricter local constraints if local predictions remain accurate Keeping up with v(t) movement Prediction-Based Geometric Monitoring [SIGMOD’12]

17 SRDC 2013 Local Constraints 17 Let the nodes communicate only when “something happens” Tell me only if your measurement is larger than 50! Send me your current measurements! Safe Zones!

18 SRDC 2013 Local Distributions 18 584510 664420 435015 784317 853021 704711 762512 65585 564715 753416 Reasonable to assume future data will behave similarly… These Safe Zones save more communication!

19 SRDC 2013 Optimal Safe Zones 19 1. Legal / Safe 2. Large: Minimize Communication

20 SRDC 2013 Example: Air quality monitoring 20 What are the optimal Safe Zones…?

21 SRDC 2013 The Optimization Problem 21 Is this Convex? Is this Linear? How many constraints are these? BAD NEWS: This problem is NP-hard.

22 SRDC 2013 The Optimization Problem Step 3: Use non-convex optimization toolboxes (e.g. Matlab’s “fmincon”).  These toolboxes use sophisticated Gradient Descent algorithms and return close-to-optimal results. X

23 SRDC 2013 23 Data Set How the data looks like

24 SRDC 2013 Ratio Queries 24 Example of triangular Safe Zones

25 SRDC 2013 Improvement over convex-hull cover method 25 Why do we improve so much? Up to 200 nodes were involved in the experiment. The average improvement was by a factor of 17.5 Up to 200 nodes were involved in the experiment. The average improvement was by a factor of 17.5 5’000 hours

26 26 Higher Dimensions

27 SRDC 2013 Chi-Square Monitoring (5D) 27 Examples of axis aligned boxes as Safe Zones

28 SRDC 2013 Improvement over GM The improvement over the Geometric Method gets more substantial in higher dimensions. 28 1’000 hours 90 nodes

29 SRDC 2013 29 Safe Zones - Example

30 SRDC 2013 Biclique: Non-Convex Safe Zones 30 Safe Zone Algorithm (for 2 nodes): Take the data points, build a bipartite graph(how?), find the maximal Biclique, these are your Safe Zones!

31 SRDC 2013 Conclusions Local filtering for large-scale distributed data systems Local filtering for large-scale distributed data systems Saving in communication is unlimited Saving in communication is unlimited Bounded only by the aggregate over system lifetime Bounded only by the aggregate over system lifetime Saving bandwidth, central resources, power. Saving bandwidth, central resources, power. Not necessary to sacrifice precision and latency Not necessary to sacrifice precision and latency Less communication  more Privacy Less communication  more Privacy 10/5/2015 31

32 SRDC 2013


Download ppt "10/5/2015 1 Geometric Approach Geometric Interpretation: Geometric Interpretation: Each node holds a statistics vector Each node holds a statistics vector."

Similar presentations


Ads by Google