Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science.

Similar presentations


Presentation on theme: "Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science."— Presentation transcript:

1 Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science and Technology The Pennsylvania State University 3 School of Information Systems, Singapore Management University EDBT, Nantes, France, 03/28/2008

2  Background  Problem Analysis  New TNN Algorithms  Optimization  Experiments  Conclusions & Future Work

3  What is TNN? ◦ S is a set of banks ◦ R is a set of restaurants ◦ TNN distance = 5+1 = 6

4  What is TNN?  Given a query point p and two datasets S and R, TNN returns a pair of objects (s, r) such that ∀(s’, r’)∈S×R, dis(p, s) + dis(s, r) ≤ dis(p, s’) + dis(s’, r’) where dis(p,s) is the Euclidean distance between p and s.  First proposed by Zheng, Lee and Lee [1]. [1] B. Zheng, K.C.Lee and W.-C.Lee. Transitive nearest neighbor search in mobile environments. SUTC 2006

5  Server has all the data and broadcasts data in forms of radio signals in channels.  Mobile clients (cell phones and PDAs) tune in to broadcast channels, download necessary data and process queries.  Broadcast VS. on- demand ◦ Support an arbitrary number of mobile devices to have simultaneous access ◦ Efficient use of limited bandwidth ◦ Light workload on the server side

6  Assumption: ◦ Zheng, Lee and Lee assumed a single broadcast channel. ◦ Based on existing technology (dual-mode, dual- standby cell phone), we assume multiple channels. ◦ A mobile client can access information in multiple channels simultaneously  Challenges : ◦ How to utilize the parallel processing ability of mobile clients to facilitate query processing? ◦ How to reduce access time? ◦ How to reduce energy consumption?

7  1. We developed two new algorithms for TNN query in multi-channel access environment.  2. We proposed two new distance metrics (MinTransDist and MinMaxTransDist) so that our new algorithms efficiently reduce search cost.  3. We proposed an optimization technique to reduce energy consumption.

8  1. Two broadcast channels, for S and R  2. 2-dim points  3. Air-indexing: R-tree [2]  4. Broadcast in depth-first order, in order to avoid back-tracking  5. (1, m) interleaving [3]  6. performance metrics (in # of pages): ◦ Access time ◦ Tune-in time [2] A. Guttman. R-trees: a dynamic index structure for spatial searching. inSigmod’84 [3] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

9  Randomly choose ANY pair of objects (s’, r’ ), use the trans. dist. as a search range  Guarantee to enclose the answer pair (s, r)

10  Theorem [1] : ◦ the transitive distance determined by any pair of objects (s, r) is an upper bound.  General ideas of answering TNN queries: ◦ Estimate: find a search range from the query point p by searching the index ◦ Filter:filter unqualified data objects in the search range determined earlier to find the pair of objects with minimum transitive distance.

11  Deficiencies of existing algorithms: ◦ Approximate-TNN-Search:  Uses an equation to estimate the search range in the first step  Search range may be too large or too small ◦ Window-Based-TNN-Search:  Two sequential NN searches in estimation step  Search range estimation is done in sequential order  Large access time

12  Algo 1: Double-NN-Search ◦ Issue two NN queries in estimation step ◦ p’s NN in S, and p’s NN in R ◦ (s 1, r 2 )

13  Hybrid-NN-Search ◦ Increases interaction between two channels ◦ Uses result of the finished NN to guide the unfinished NN in order to reduce search range ◦ Uses new distance metrics to perform branch-and- bound ◦ Treat TNN distance as a whole

14  NN in Channel 1 finishes first  Already found s=p.NN(S)  Looking for r 2, instead of r 1

15  NN in channel 2 finishes first  Already found r=p.NN(R)  Looking for s 2 instead of s 1  Use new criteria when searching the index  Need new distance metrics for branch&bound

16  MinTransDist: ◦ Lower bound for trans. dist. from p to an MBR to r.  MinMaxTransDist: ◦ Upper bound for trans. dist. from p to an MBR to r.  Details given in the paper.

17  Algorithm description: ◦ If the two NN searches in both channels are not finished, follow the Double-NN algorithm ◦ If the NN search in Channel 1 (Dataset S) finishes first, let s=p.NN(S), use s as the new query point and perform NN on the remaining portion of R-tree for dataset R. ◦ If the NN search in Channel 2 (Dataset R) finishes first, change distance metrics, use MinTransDist and MinMaxTransDist to perform branch-and-bound. Find an s which can minimize the transitive distance.

18  Updating and pruning strategy ◦ Use queue to keep potential MBRs, sorted based on their arrival time ◦ Case 2 (s=p.NN(S) finishes first):  Switch NN query point to the s  Initial upper bound update  If there is an intermediate result r’, update the upper bound with dis(p, s)+dis(s, r’ )  Scan the queue of MBRs and use dist. metr. in traditional NN queries.

19  Updating and pruning strategy (cont.) ◦ Case 3 (r=p.NN(R) finishes first):  If there is an intermediate result s’, use dis(p, s’)+dis(s’, r) as the new upper bound  Then scan all the MBRs in the queue, use z=min Mi∈MBR_queue {MinMaxTransDist(p, M i, r)} to update the upper bound.  In traversal, use MinMaxTransDist to update the upper bound; use MinTransDist for pruning

20  Example for pruning:

21  Goal: reduce energy consumption  Analysis: ◦ Previous algorithms minimize the search range in the Estimate Step by issuing “exact” search ◦ Energy consumption in Filter Step is low ◦ Energy consumption in Estimate Step is high  Approach: ◦ use “approximate” search in Estimate Step to save energy in this step

22  Approximate Search: ◦ Relax the pruning condition ◦ Use ratio of overlapping area to estimate the probability ◦ Compare the ratio with a threshold α

23  How to determine α ?  factors: ◦ R-tree height and node depth  Use small α on the root and large α on leaves ◦ Difference in densities of the two datasets involved  Small α or 0 on the dataset with smaller density α01 exact searchapproximate search

24  Dataset 1: ◦ 39,000 * 39,000 square region ◦ Densities: 10 -7.0, 10 -6.6, 10 -6.2, 10 -5.8, 10 -5.4, 10 -5.0, 10 -4.6, 10 -4.2 ◦ # of points: 152, 382, 960, 2411, 6055, 15210, 38206, 95969  Dataset 2: ◦ 39,000 * 39,000 square region ◦ # of points: 2,000 – 30,000 with 2,000 increment

25  R-tree as air index  Broadcast in depth-first order  STR packing algorithm [3]  (1, m) interleaving [2]  1,000 query points generated for each of the experiments ParameterSize Index pointer2 bytes Coordinate4 bytes Data content1k bytes Page capacity64 – 512 bytes [3] S.Leutenegger, M.Lopez and J.Edginton. Str: a simple and efficient algorithm for r-tree packing. ICDE 1997 [2] T.Imielinski, S.Viswanathan, and B.Badrinath. Data on air: organization and access. TKDE 1997

26  Algorithms with exact search: ◦ Access time: Double-NN and Hybrid-NN have the same access time, which is smaller than Window- Based ◦ 1.8 ≥ size(S) / size(R) ≥ 1 / 40

27  Algorithms with exact search: ◦ Tune-in time: when 0.01 ≤ size(S)/size(R) ≤ 0.4 Hybrid-NN gives the best tune-in time

28  ANN vs. eNN ◦ Improvement in tune-in time ranges from 11%-20%

29  Hybrid algorithm with ANN:

30  Double-NN and Hybrid-NN effectively reduce access time  Cases in which our algorithms reduces tune- in time are stated and discussed  Optimization technique effectively reduces tune-in time of all three algorithms

31  Generalized TNN queries in broadcast environment: ◦ More than 2 datasets are involved ◦ Visiting order not specified ◦ Complete route query  Using new distance metrics in disk based environment

32  Any questions?

33  Def 1: (MinTransDist) ◦ Given two points p and r, and an MBR M S, MinTransDist(p, M S,r) finds a point s on M S such that MinTransDist(p, M S,r)=dis(p, s)+dis(s, r) and for any point s’≠ s, s’ ∈M S dis(p, s’)+dis(s’, r) ≥ MinTransDist(p, M S,r)

34  Def 2: (MaxDist) ◦ Given two points p and r, and a line segment ℓ, MaxDist(p, ℓ, r) = max i=I,2 {dis(p, v i )+dis(v i, r), where v i, (i=1, 2) are the two end points of ℓ ◦ MaxDist(p, ℓ, r) gives a tight upper bound for all the transitive distances from p to any points on ℓ, to r. p r ℓ

35  Def 3: (MinMaxTransDist) ◦ Given two points p and r, and an MBR M S, MinMaxTransDist(p, M S, r) = min 1≤i≤4 { MaxDist(p,ℓ i, r ) } where ℓ i (1≤i≤4) are the four sides of MBR M S  Lemma: ◦ Given a starting point p, an ending point r, and an MBR M S enclosing a point dataset S, ∃s ∈ S, such that dis(p, s)+dis(s, r) ≤ MinMaxTransDist(p, M S, r)


Download ppt "Xiao Zhang 1, Wang-Chien Lee 1, Prasenjit Mitra 1, 2, Baihua Zheng 3 1 Department of Computer Science and Engineering 2 College of Information Science."

Similar presentations


Ads by Google