Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University.

Similar presentations


Presentation on theme: "1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University."— Presentation transcript:

1 1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University

2 2 Summary of My Work Data Caching Update cost constraint Optimal algorithm for tree; approximation algorithm for general graph. Memory constraint with multiple data items Approximation algorithm for general graph number constraint w/h read/write/storage cost Optimal algorithm for tree Localized distributed implementations. Compare with existing work

3 3 Motivation Ad hoc and sensor networks are resource constrained Limited bandwidth, battery energy, and memory Caching can save access (communication) cost, and thus, bandwidth and energy Under update cost, memory, number constraint

4 4 Rooted in… Facility location problem: set up facilities in a network to minimize total access cost and setting up cost K-median problem: set up k facilities to minimize total access cost

5 5 1. Cache Placement in Sensor Networks Under Update Cost Constraint

6 6 Problem Statement Sensor Network Model A data item stored at a server node. Updated at a certain frequency. Other nodes access the data item at a certain frequency. Problem Statement Select nodes to cache the data item to: Goal: Minimize “total access cost” Constraint: Total update cost.

7 7 Why update cost constraint ? Nodes close to the server bear most of the update cost.

8 8 Problem Formulation Given : Network graph G(V,E). A data item stored at a server node Update frequency Access frequency for each other node Update cost constraint Δ Goal : Select cache nodes to minimize the “total access cost” Total update cost is less than Δ

9 9 Total Access/Update Cost Total Access Cost = ∑ i є V ( hop length between i and its nearest cache x access frequency of i ) Total Update cost = cost of the optimal Steiner tree over server and all caches

10 10 Algorithm Design Outline Tree Networks Optimal dynamic programming algorithm. General Networks Multiple-unicast update model -- Approximation algorithm. Steiner-tree update model – Heuristic and Distributed.

11 11 Tree Networks

12 12 Subtree notation Server: “r” Consider a subtree Tv. Let path (v,x) on its leftmost branch be all caches. Let C_v be the optimal access cost in Tv using additional update cost δ Next: Recursive equation for C_v r Tr v Tv x

13 13 Dynamic Programming Algorithm for Tv under update cost constraint δ Let u = leftmost deepest node in the optimal set of caches in Tv Path(v,u) can be all caches (update cost doesn’t increase) For a fixed u, C_v = Constant + optimal access cost in Rv,u for constraint (δ – δ_u) Here, δ_u is the cost to update u (using path(v,x)). Tv = Lv,u + Tu + Rv,u

14 14 DP recursive equation for Tv C_v = min u є Tv ( access cost in Lv,u using path(v,x) or path(v,u) + access cost in Tu using u + optimal cost in Rv,u with constraint δ – δ_u ) Here, δ_u is the cost in updating u (using path(v,x)). Note that Rv,u has a path (v, parent(u)) of caches on its leftmost branch.

15 15 Time complexity Time complexity: O(n 4 +n 3 Δ) Analysis Precomputation takes O(n 4 ) Lv,u with cache path (v,x): O(n 4 ), for all v,u,x Tu: O(n 2 ), for all u Recursive equation takes O(n 3 Δ) n 2 Δ entries: for each pair of (v,x) and all values of Δ Each entry takes O(n): n possible u

16 16 General Graph Network Two Update Cost Models Multiple-Unicast Optimal Steiner Tree

17 17 Multiple-Unicast Update Model Update cost: Sum of shortest path lengths from server to each cache node Benefit of node A: Decrease in total access cost due to selection of A as a cache Benefit per unit update cost.

18 18 Greedy Algorithm Iteratively: Select the node with the highest benefit per unit update cost, until the update cost is exhausted Theorem: Greedy solution’s benefit is at least 63% of the optimal benefit.

19 19 Steiner-Tree Update Cost Model Steiner-tree update cost: Cost of 2- approximation Steiner tree over cache nodes Incremental Steiner update cost of node A: Increase in Steiner-tree update cost due to A becoming a cache Greedy-Steiner Algorithm: Iteratively, select the node with the highest benefit per unit above-defined update cost.

20 20 Distributed Greedy-Steiner Algorithm Each non-cache node estimates its benefit per unit update cost If the estimate is maximum among all its non-cache neighbors, then it decides to cache Algorithm: In each rounds, each node decides to cache based on above. The server gathers new cache node information, and computes the total update cost The remaining update cost is broadcast to the network, and the new round begins

21 21 Performance Evaluation (i) network-related -- number of nodes and transmission radius, (ii) application-related -- number of clients. Random network of 2,000 to 5,000 nodes in a 30 x 30 region.

22 22 Compared Caching Schemes Centralized Greedy Centralized Greedy-Steiner Distributed Greedy-Steiner Dynamic Programming on Shortest Path Tree of Clients Dynamic Programming on Steiner Tree over Clients and Server

23 23 Varying Network Size – Transmission radius =2, percentage of clients = 50%, update cost = 25% of the Steiner tree cost

24 24 Varying Transmission Radius - Network size = 4000, percentage of clients = 50%, update cost = 25% of the Steiner tree cost

25 25 Varying number of clients – Transmission Radiu =2, update cost = 50% of the Steiner tree cost, network size = 3000

26 26 To Recap: Data caching problem under update cost constraint. Optimal algorithm for tree; an approximation algorithm for general graph. Efficient distributed implementations. More general cache placement problem: (a) under memory constraint; (b) multiple data items.

27 27 2. Data Caching under Memory Constraint

28 28 Problem Addressed In a general ad hoc network with limited memory at each node, where to cache data items, such that the total access (communication) cost is minimized?

29 29 Problem Formulation Given: Network graph G(V,E) Multiple data items Access frequencies (for each node and data item) Memory constraint at each node Select data items to cache at each node under memory constraint Minimize total access cost = ∑nodes ∑data items [ (distance from node to the nearest cache for that data item) x (access frequency) ]

30 30 Related Work Related to facility-location problem and K- median problem; No memory constraint Baev and Rajaraman 20.5-approximation algorithm for uniform-size data item For non-uniform size, no polynomial-time approximation unless P = NP We circumvent the intractability by approximating “benefit” instead of access cost

31 31 Related Work - continued Two major empirical works on distributed caching Hara [infocom’99] Yin and Cao [Infocom’ 04] (we compare our work with theirs) Our work is the first to present a distributed caching scheme based on an approximation algorithm

32 32 Algorithms Centralized Greedy Algorithm (CGA) Delivers a solution whose “benefit” is at least 1/2 of the optimal benefit Distributed Greedy Algorithm (DGA) Purely localized

33 33 Centralized Greedy Algorithm (CGA) Benefit of caching a data item at a node = the reduction of total access cost i.e., (total access cost before caching) – (total access cost after caching)

34 34 Centralized Greedy Algorithm (CGA) CGA iteratively selects the most beneficial (data item, node to cache at) pair. I.e., we pick (at each stage) the pair that has the maximum benefit. Theorem: CGA is (1/2)–approximate for uniform data item. ¼-approximate for non-uniform size data item

35 35 CGA Approximation Proof Sketch G’: modified G, where each node has twice memory of that in G caches data items selected by CGA and optimal B(Optimal in G) < B(Greedy + Optimal in G’) = B(Greedy) + B(Optimal) w.r.t Greedy < B(Greedy) + B(Greedy) [Due to greedy choice] = 2 x B(Greedy)

36 36 Distributed Greedy Algorithm (DGA) Each node caches the most beneficial data items, where the benefit is based on “local traffic” only. “Local Traffic” includes: Its own data requests Data requests to its data items Data requests forwarding to others

37 37 DGA: Nearest Cache Table Why do we need it? Forward requests to the nearest cache Local Benefit calculation What is it? Each nodes keeps the ID of nearest cache for each data item Entries of the form: (data item, the nearest cache) Above is on top of routing table. Maintenance – next slide

38 38 Maintenance of Nearest-cache Table When node i caches data D j broadcast (i, D j ) to neighbors Notify server, which keeps a list of caches On recv (i, D j ) if i is nearer than current nearest-cache of D j, update and forward

39 39 Maintenance of Nearest-cache Table -II i deletes D j get list of caches C j from server of D j broadcast (i, D j, C j ) to neighbors On recv (i, D j, C j ) if i is current nearest-cache for D j, update using C j and forward

40 40 Maintenance of Nearest-cache Table -III More details pertaining to Mobility Second-nearest cache entries (needed for benefit calculation for cache deletions) Benefit thresholds

41 41 Performance Evaluation CGA vs. DGA Comparison DGA vs. HybridCache Comparison

42 42 CGA vs. DGA Summary of simulation results: DGA performs quite close to CGA, for wide range of parameter values

43 43 Varying Number of Data Items and Memory Capacity – Transmission radius =5, number of nodes = 500

44 44 DGA vs. Yin and Cao’s work. Yin and Cao:[infocom’04] CacheData – caches passing-by data item CachePath – caches path to the nearest cache HybridCache – caches data if size is small enough, otherwise caches the path to the data Only work of a purely distributed cache placement algorithm with memory constraint

45 45 DGA vs. HybridCache Simulation setup: Ns2, routing protocol is DSDV Random waypoint model, 100 nodes move at a speed within (0,20m/s), 2000m x 500m area Tr=250m, bandwidth=2Mbps Performance metrics: Average query delay Query success ratio Total number of messages

46 Server Model: 1000 data items, divided into two servers. Data item size: [100, 1500] bytes Data access models Random: Each node accesses 200 data items randomly from the 1000 data items Spatial: (details skipped) Naïve caching algorithm: caches any passing-by data, uses LRU for cache replacement

47 Varying query generate time on random access pattern

48 48 Summary of Simulation Results Both HybridCache and DGA outperform Naïve approach DGA outperforms HybridCache in all metrics Especially for frequent queries and small cache size For high mobility, DGA has slightly worse average delay, but much better query success ratio

49 49 To Recap: Data caching problem for multiple items under memory constraint Centralized approximation algorithm Localized distributed implementation No update or storage cost are considered (otherwise, no performance guarantee) Can we consider and minimize the total cost of read/write/storage ?

50 50 3. Data Caching Under Number Constraint

51 51 Problem Formulation Given : Network graph G(V,E). A data item to be stored in the network Access (read) frequency for each node Write frequency for each node Caching (storage) cost for each node Number of allowable caching node: P Goal : Select cache nodes to minimize the “total cost” Under number constraint

52 52 Total Cost = Total read cost + total write cost + total storage cost = ∑ i є V (hop length between i and its nearest cache x access frequency of i) +∑ i є V (cost of optimal steiner tree over i and all caches x write frequency of i) +∑ i є cache nodes (storage cost at i)

53 53 Related Work K-median problem (access and storage cost) Tamir attains the best time complexity in tree We generalize it with write cost in both tree ( O(n 2 P 3 ) ) and general graph Kalpakis et al. solves the same problem, with time complexity O(n 6 P 3)

54 54 Tree Topology

55 55 Tamir’s DP Algorithm on tree Tr Transform arbitrary tree into full binary tree Each non-leaf node v has two children: v1, v2 For each v in binary tree, compute and sort the distance from v to all nodes “leaves to root” dynamic programming algorithm

56 56 Our DP Algorithm Ideal: For each node v in Tr: the cost of sub-tree Tv = access cost of nodes in Tv + storage cost of caching nodes in Tv + write cost of all the writer nodes in Tr due to edges in Tv

57 57 DP Algorithm - Definitions G(v, q, r): optimal cost for subtree Tv, exact q caches in Tv, closest to v is at most r hops away F(v, q, r): optimal cost for Tv, exact q caches in Tv; some cache nodes outside of Tv, closest to v is r hops away F’(v, r): optimal cost for Tv, no cache in Tv; some cache nodes outside of Tv, closest to v is r hops away

58 58 Recursive DP Equations: p cache nodes allowed 1.G(v, q, 0) -- v is cache node = storage cost at v + the cost of Tv1, Tv2 + the write cost on vv1, vv2 2. G(v, q 0) – there is some cache node outside of Tv = min{ G(v, q, r-1), // there is cache in Tv r-1 hops from v cost in “closest cache to v is r hops away” }

59 59 Recursive DP Equations - continued 3. G(v, q=P, r>0) – no cache node outside of Tv = min{ G(v, q, r-1), the cost of “closest cache is r hops away” } 4. F(v, q, r) – there is cache node outside of Tv = min {G(v, q, r-1), the cost of “closest cache to v is r hops away }

60 60 Minimum total cost of original tree Tr = min {1≤p≤P} G(r, p, L}, L is the hops of r to the farthest node in Tr Time Complexity – O(n 2 P 3 ) For each p, vary q from 1 to q For each (v, q), vary closest cache node to v (n possibilities) and spit q in to Tv1, Tv2 (q such possibilities)

61 61 Conclusion We design optimal, near optimal and heuristics for data caching under different constraint in ad hoc and sensor networks We show our algorithms can be implemented in distributed way

62 62 Questions?


Download ppt "1 Ph.D. Thesis Proposal Data Caching in Ad Hoc and Sensor Networks Bin Tang Computer Science Department Stony Brook University."

Similar presentations


Ads by Google