Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign

2 Acknowledgements Shripad Gade Lili Su

3 i

4 x x1 x2 Applications f1(x) f2(x) i
fi(x) = cost for robot i to go to location x Minimize total cost of rendezvous x f1(x) f2(x) x1 x2 Other examples may be: resource allocation (which is similar to robotic rendezvous problems), large-scale distributed machine learning, where data are generated at different locations i

5 Applications f1(x) f2(x) Learning Minimize cost Σ fi(x) i f3(x) f4(x)

6 Outline i Privacy Fault-tolerance Distributed Optimization 𝑓 3 𝑓 2 𝑓 4
𝑓 5 𝑓 1 Privacy 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Fault-tolerance 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Distributed Optimization

7 Distributed Optimization
Server 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 𝑓 1 𝑓 2 𝑓 3

8 Client-Server Architecture
f1(x) f2(x) 𝑓 1 𝑓 2 𝑓 3 f3(x) f4(x)

9 Client-Server Architecture
Server maintains estimate 𝑥 𝑘 Client i knows 𝑓 𝑖 (𝑥) 𝑥 𝑘 𝑓 1 Server 𝑓 2 𝑓 3

10 Client-Server Architecture
Server maintains estimate 𝑥 𝑘 Client i knows 𝑓 𝑖 (𝑥) In iteration k+1 Client i Download 𝑥 𝑘 from server Upload gradient 𝛻 𝑓 𝑖 ( 𝑥 𝑘 ) 𝑥 𝑘 𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 )

11 Client-Server Architecture
Server maintains estimate 𝑥 𝑘 Client i knows 𝑓 𝑖 (𝑥) In iteration k+1 Client i Download 𝑥 𝑘 from server Upload gradient 𝛻 𝑓 𝑖 ( 𝑥 𝑘 ) Server 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝑖 𝛻 𝑓 𝑖 𝑥 𝑘 𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 )

12 Variations Stochastic Asynchronous

13 Peer-to-Peer Architecture
f1(x) f2(x) 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 f3(x) f4(x)

14 Peer-to-Peer Architecture
Each agent maintains local estimate x Consensus step with neighbors Apply own gradient to own estimate 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝛻 𝑓 𝑖 𝑥 𝑘 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

15 Outline i Privacy Fault-tolerance Distributed Optimization 𝑓 3 𝑓 2 𝑓 4
𝑓 5 𝑓 1 Privacy 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Fault-tolerance 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Distributed Optimization

16 𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 )

17 Server observes gradients  privacy compromised
𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 ) Server observes gradients  privacy compromised

18 Achieve privacy and yet collaboratively optimize
𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 ) Server observes gradients  privacy compromised Achieve privacy and yet collaboratively optimize

19 Related Work Cryptographic methods (homomorphic encryption)
Function transformation Differential privacy

20 Differential Privacy 𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 𝑥 𝑘 +𝜺𝒌

21 Trade-off privacy with accuracy
Differential Privacy 𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 𝑥 𝑘 +𝜺𝒌 Trade-off privacy with accuracy

22 Proposed Approach Motivated by secret sharing
Exploit diversity … Multiple servers / neighbors

23 Privacy if subset of servers adversarial
Proposed Approach Server 1 Server 2 𝑓 1 𝑓 2 𝑓 3 Privacy if subset of servers adversarial

24 Privacy if subset of neighbors adversarial
Proposed Approach 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Privacy if subset of neighbors adversarial

25 Proposed Approach Structured noise that “cancels” over servers/neighbors

26 Intuition x1 x2 Server 1 Server 2 𝑓 1 𝑓 2 𝑓 3

27 x2 x1 Intuition 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31 𝑓 32 Server 1 Server 2
Each client simulates multiple clients

28 𝑓 𝑖𝑗 (𝑥) not necessarily convex
Intuition x1 x2 Server 1 Server 2 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31 𝑓 32 𝑓 11 (𝑥) + 𝑓 𝑥 = 𝑓 𝑥 𝑓 𝑖𝑗 (𝑥) not necessarily convex

29 Algorithm Each server maintains an estimate In each iteration Client i
Download estimates from corresponding server Upload gradient of 𝑓 𝑖 Each server updates estimate using received gradients

30 Algorithm Each server maintains an estimate In each iteration Client i
Download estimates from corresponding server Upload gradient of 𝑓 𝑖 Each server updates estimate using received gradients Servers periodically exchange estimates to perform a consensus step

31 Claim Under suitable assumptions, servers eventually reach consensus in i

32 Privacy 𝑓 11 + 𝑓 21 + 𝑓 31 𝑓 21 + 𝑓 22 + 𝑓 32 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31
𝑓 𝑓 21 + 𝑓 31 𝑓 𝑓 22 + 𝑓 32 Server 1 Server 2 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31 𝑓 32

33 Privacy 𝑓 11 + 𝑓 21 + 𝑓 31 𝑓 21 + 𝑓 22 + 𝑓 32 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31
𝑓 𝑓 21 + 𝑓 31 𝑓 𝑓 22 + 𝑓 32 Server 1 may learn 𝑓 11 , 𝑓 21 , 𝑓 31 , 𝑓 𝑓 22 + 𝑓 32 Not sufficient to learn 𝑓 𝑖 Server 1 Server 2 𝑓 11 𝑓 12 𝑓 21 𝑓 22 𝑓 31 𝑓 32

34 𝑓 11 (𝑥) + 𝑓 𝑥 = 𝑓 𝑥 Function splitting not necessarily practical Structured randomization as an alternative

35 Structured Randomization
Multiplicative or additive noise in gradients Noise cancels over servers

36 Multiplicative Noise x1 x2 Server 1 Server 2 𝑓 1 𝑓 2 𝑓 3

37 Multiplicative Noise x1 x2 Server 1 Server 2 𝑓 1 𝑓 2 𝑓 3

38 x2 x1 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝛼+𝛽=1 Multiplicative Noise 𝑓 1 𝑓 2 𝑓 3
Server 1 Server 2 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝑓 1 𝑓 2 𝑓 3 𝛼+𝛽=1

39 x2 x1 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝛼+𝛽=1 Multiplicative Noise 𝑓 1 𝑓 2 𝑓 3
Server 1 Server 2 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝑓 1 𝑓 2 𝑓 3 Suffices for this invariant to hold over a larger number of iterations 𝛼+𝛽=1

40 x2 x1 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝛼+𝛽=1 Multiplicative Noise 𝑓 1 𝑓 2 𝑓 3
Server 1 Server 2 𝛼𝛻 𝑓 1 (x1) 𝛽𝛻 𝑓 1 (𝑥2) 𝑓 1 𝑓 2 𝑓 3 Noise from client i to server j not zero-mean 𝛼+𝛽=1

41 Claim Under suitable assumptions, servers eventually reach consensus in i

42 Peer-to-Peer Architecture
𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

43 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝛻 𝑓 𝑖 𝑥 𝑘 Reminder … 𝑓 1 𝑓 3 𝑓 2 𝑓 5 𝑓 4
Each agent maintains local estimate x Consensus step with neighbors Apply own gradient to own estimate 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝛻 𝑓 𝑖 𝑥 𝑘 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

44 Proposed Approach 𝑓 1 𝑓 3 𝑓 2 𝑓 5 𝑓 4
Each agent shares noisy estimate with neighbors Scheme 1 – Noise cancels over neighbors Scheme 2 – Noise cancels network-wide 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

45 ε1 + ε2 = 0 (over iterations)
Proposed Approach Each agent shares noisy estimate with neighbors Scheme 1 – Noise cancels over neighbors Scheme 2 – Noise cancels network-wide x + ε1 ε1 + ε2 = (over iterations) 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 x + ε2

46 Peer-to-Peer Architecture
Poster today Shripad Gade

47 Outline i Privacy Fault-tolerance Distributed Optimization 𝑓 3 𝑓 2 𝑓 4
𝑓 5 𝑓 1 Privacy 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Fault-tolerance 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Distributed Optimization

48 Fault-Tolerance Some agents may be faulty
Need to produce “correct” output despite the faults

49 Byzantine Fault Model No constraint on misbehavior of a faulty agent
May send bogus messages Faulty agents can collude

50 Peer-to-Peer Architecture
fi(x) = cost for robot i to go to location x Faulty agent may choose arbitrary cost function x f1(x) f2(x) x1 x2 Other examples may be: resource allocation (which is similar to robotic rendezvous problems), large-scale distributed machine learning, where data are generated at different locations

51 Peer-to-Peer Architecture
𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

52 Client-Server Architecture
𝑓 1 Server 𝑓 2 𝑓 3 𝛻 𝑓 𝑖 ( 𝑥 𝑘 )

53 Fault-Tolerant Optimization
The original problem is not meaningful i

54 Fault-Tolerant Optimization
The original problem is not meaningful Optimize cost over only non-faulty agents i i good

55 Fault-Tolerant Optimization
The original problem is not meaningful Optimize cost over only non-faulty agents i i good Impossible!

56 Fault-Tolerant Optimization
Optimize weighted cost over only non-faulty agents With 𝛂i as close to 1/ good as possible 𝛂i i good

57 Fault-Tolerant Optimization
Optimize weighted cost over only non-faulty agents 𝛂i i good With t Byzantine faulty agents: t weights may be 0

58 Fault-Tolerant Optimization
Optimize weighted cost over only non-faulty agents 𝛂i i good t Byzantine agents, n total agents At least n-2t weights guaranteed to be > 1/2(n-t)

59 Centralized Algorithm
Of the n agents, any t may be faulty How to filter cost functions of faulty agents? X

60 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows

61 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows At a given x Sort the gradients of the n local cost functions

62 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows At a given x Sort the gradients of the n local cost functions Discard smallest t and largest t gradients

63 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows At a given x Sort the gradients of the n local cost functions Discard smallest t and largest t gradients Mean of remaining gradients = Gradient of G at x

64 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows At a given x Sort the gradients of the n local cost functions Discard smallest t and largest t gradients Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex

65 Centralized Algorithm: Scalar argument x
Define a virtual function G(x) whose gradient is obtained as follows At a given x Sort the gradients of the n local cost functions Discard smallest t and largest t gradients Mean of remaining gradients = Gradient of G at x Virtual function G(x) is convex  Can optimize easily

66 Peer-to-Peer Fault-Tolerant Optimization
Gradient filtering similar to centralized algorithm … require “rich enough” connectivity … correlation between functions helps Vector case harder … redundancy between functions helps

67 Summary i Privacy Fault-tolerance Distributed Optimization 𝑓 3 𝑓 2 𝑓 4
𝑓 5 𝑓 1 Privacy 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Fault-tolerance 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1 Distributed Optimization

68 Thanks! disc.ece.illinois.edu

69

70

71 Distributed Peer-to-Peer Optimization
Each agent maintains local estimate x In each iteration Compute weighted average with neighbors’ estimates 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

72 Distributed Peer-to-Peer Optimization
Each agent maintains local estimate x In each iteration Compute weighted average with neighbors’ estimates Apply own gradient to own estimate 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝛻 𝑓 𝑖 𝑥 𝑘 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

73 Distributed Peer-to-Peer Optimization
Each agent maintains local estimate x In each iteration Compute weighted average with neighbors’ estimates Apply own gradient to own estimate Local estimates converge to 𝑥 𝑘+1 ⟵ 𝑥 𝑘 − 𝛼 𝑘 𝛻 𝑓 𝑖 𝑥 𝑘 i 𝑓 3 𝑓 2 𝑓 4 𝑓 5 𝑓 1

74 RSS – Locally Balanced Perturbations Add to zero (locally per node)
Bounded (≤Δ) Algorithm Node j selects d k j,i such that i d k j,i =0 and 𝑑 𝑘 𝑗,𝑖 ≤Δ Share w k j,i = x k j + d k j,i with node i Consensus and (Stochastic) Gradient Descent

75 RSS – Network Balanced Perturbations Add to zero (over network)
Bounded (≤Δ) Algorithm Node j computes perturbation d k j - sends s j,i to i - add received s i,j and subtract sent s j,i ⇒ d k j = rcvd − sent Obfuscate state w k j = x k j + d k j shared with neighbors Consensus and (Stochastic) Gradient Descent

76 Convergence Let x j T = T α k x k j / T α k and α k =1/ k
𝑓 x j T −𝑓 𝑥 ∗ ≤𝒪 log 𝑇 𝑇 +𝒪 Δ 2 log 𝑇 𝑇 Asymptotic convergence of iterates to optimum Privacy-Convergence Trade-off Stochastic gradient updates work too

77 Function Sharing Let f i (x) be bounded degree polynomials Algorithm
Node j shares s j,i x with node i Node j obfuscates using p j x =∑ s i,j x −∑ s j,i (x) Use f j x = f j x + p j (x) and use distributed gradient descent

78 Function Sharing - Convergence
Function Sharing iterates converge to correct optimum (∑ f i x =f(x)) Privacy: If vertex connectivity of graph ≥ f then no group of f nodes can estimate true functions 𝑓 𝑖 (or any good subset) p j (x) is also similar to f j (x) then it can hide f i x well


Download ppt "Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google