Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.

Slides:

Advertisements

Similar presentations

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Advertisements

Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.

1 Top-K Algorithms: Concepts and Applications by Demetris Zeinalipour Visiting Lecturer Department of Computer Science University of Cyprus Department.

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,

Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.

Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.

Yasuhiro Fujiwara (NTT Cyber Space Labs)

School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.

A Simple Distribution- Free Approach to the Max k-Armed Bandit Problem Matthew Streeter and Stephen Smith Carnegie Mellon University.

Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)

Top-k Query Evaluation with Probabilistic Guarantees By Martin Theobald, Gerald Weikum, Ralf Schenkel.

1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.

Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.

Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.

Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

CS246: Page Selection. Junghoo "John" Cho (UCLA Computer Science) 2 Page Selection Infinite # of pages on the Web – E.g., infinite pages from a calendar.

1 Information Management on the World-Wide Web Junghoo “John” Cho UCLA Computer Science.

Mobility Improves Coverage of Sensor Networks Benyuan Liu*, Peter Brass, Olivier Dousse, Philippe Nain, Don Towsley * Department of Computer Science University.

Beneficial Caching in Mobile Ad Hoc Networks Bin Tang, Samir Das, Himanshu Gupta Computer Science Department Stony Brook University.

Synchronizing a Database To Improve Freshness Junghoo Cho Hector Garcia-Molina Stanford University.

1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April

Cache Placement in Sensor Networks Under Update Cost Constraint Bin Tang, Samir Das and Himanshu Gupta Department of Computer Science Stony Brook University.

Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.

ICNP'061 Benefit-based Data Caching in Ad Hoc Networks Bin Tang, Himanshu Gupta and Samir Das Computer Science Department Stony Brook University.

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Topic-Sensitive PageRank Taher H. Haveliwala. PageRank Importance is propagated A global ranking vector is pre-computed.

1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.

- 1 - Summary of P-box Probability bound analysis (PBA) PBA can be implemented by nested Monte Carlo simulation. –Generate CDF for different instances.

CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.

Impact of Problem Centralization on Distributed Constraint Optimization Algorithms John P. Davin and Pragnesh Jay Modi Carnegie Mellon University School.

CS246 Ranked Queries. Junghoo "John" Cho (UCLA Computer Science)2 Traditional Database Query (Dept = “CS”) & (GPA > 3.5) Boolean semantics Clear boundary.

Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Limits of Local Algorithms in Random Graphs

Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.

« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.

ANTs PI Meeting, Nov. 29, 2000W. Zhang, Washington University1 Flexible Methods for Multi-agent distributed resource Allocation by Exploiting Phase Transitions.

Analysis of Algorithms

Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.

Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.

Systems and Internet Infrastructure Security (SIIS) LaboratoryPage Systems and Internet Infrastructure Security Network and Security Research Center Department.

Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.

Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.

Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.

Distributed Spatio-Temporal Similarity Search Demetrios Zeinalipour-Yazti University of Cyprus Song Lin

User-Centric Data Dissemination in Disruption Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania State University.

Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.

MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.

Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.

1 The Threshold Join Algorithm for Top-k Queries in Distributed Sensor Networks D. Zeinalipour-Yazti, Z. Vagena, D. Gunopulos, V. Kalogeraki, V. Tsotras.

1 Slides by Yong Liu 1, Deep Medhi 2, and Michał Pióro 3 1 Polytechnic University, New York, USA 2 University of Missouri-Kansas City, USA 3 Warsaw University.

1 EL736 Communications Networks II: Design and Algorithms Class7: Location and Topological Design Yong Liu 10/24/2007.

Efficient Clustering of Uncertain Data Wang Kay Ngai, Ben Kao, Chun Kit Chui, Reynold Cheng, Michael Chau, Kevin Y. Yip Speaker: Wang Kay Ngai.

Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.

Forrelation: A Problem that Optimally Separates Quantum from Classical Computing.

03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.

Of 17 Limits of Local Algorithms in Random Graphs Madhu Sudan MSR Joint work with David Gamarnik (MIT) 7/11/2013Local Algorithms on Random Graphs1.

C ROWD P LANNER : A C ROWD -B ASED R OUTE R ECOMMENDATION S YSTEM Han Su, Kai Zheng, Jiamin Huang, Hoyoung Jeung, Lei Chen, Xiaofang Zhou.

1 Along & across algorithm for routing events and queries in wireless sensor networks Tat Wing Chim Department of Electrical and Electronic Engineering.

Spatial Online Sampling and Aggregation

Replications in Multi-Region Peer-to-peer Systems

Panagiotis G. Ipeirotis Luis Gravano

Efficient Processing of Top-k Spatial Preference Queries

Lu Tang , Qun Huang, Patrick P. C. Lee

Efficient Aggregation over Objects with Extent

Presentation transcript:

Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA

Why Extremes? Central Server Sensor 1Sensor 2Sensor n query: highest raindrop Sensor i (the highest one), plus its value Identifying severe weather conditions (flood / drought) Central Server link 1link 2link n query: slowest link link i (the slowest one), plus its transferring speed a network path from L.A. to N.Y. Identifying the network bottleneck Central Server AmazonBarns & NobleCampusI.com query: best Web site for “Computer Algorithms” Website i (the best one), plus the matching Web pages Identifying the best Web database for a user’s query

What Is the Challenge? Constant communication between sensors and the central server is too expensive Can the central server contact only a few sensors (i.e. use probing) to find out the maximum? Central Server Sensor 1Sensor 2Sensor n query: highest raindrop Sensor i (the highest one), plus its value

A Motivating Example Central Server Sensor 1 Sensor 2 Sensor n  expensive communication cost Sensor 2 the possible value range of Sensor 1 actual value of Sensor 1 (unknown) () Sensor n Sensor 1 () () a) The central server without the latest sensor updates Central Server Sensor 1 Sensor 2 Sensor n  Sensor 2 () Sensor n Sensor 1 () 1000 probe 1000 b) Probing sensors’ reading to reduce uncertainty

Data Model The reading of each source as a random variable, X 1, …, X n [l i, u i ] as X i ’s value range  Bounded model: l i, u i as real numbers  Unbounded model: [- , u i ], [l i, +  ], [- , +  ] Given X i ’s probability distribution in [l i, u i ]  f i (x), F i (x) X 1, …, X n independent Probing X i results in x i, costs c i  uniform-cost model, c 1 =c 2 = … = 1  non-uniform-cost model

U( ) = 0.12, cost: probing 1U( ) = 0, cost: probing 2 Uncertainty in The Answer Two variables X 1 and X 2, uniform distribution 0 f1(x)f1(x) X1X1 X2X2 f2(x)f2(x) 600 f1(x)f1(x) f2(x)f2(x)

Uncertainty / Probing Cost Tradeoff Uncertainty in the answer 0 Less probing, high uncertainty More probing low uncertainty Probing cost Tradeoff point The user-specified uncertainty threshold 

The Problem Given the uncertainty data model, design a probing policy P: X 1 P  X 2 P  …  X n P that  incurs the least probing cost  finds the maximum variable with an uncertainty lower than  Brute force searching takes n!

Optimal Probing under Zero-Uncertainty  = 0, i.e. return an absolutely correct answer Two policies P1: X1X2P1: X1X2 P2: X2X1P2: X2X1 0 f1(x)f1(x) X1X1 X2X f2(x)f2(x) 1000 f1(x)f1(x) f2(x)f2(x) 

Optimal Probing under Zero-Uncertainty Theorem 1: X 1, …, X n are ranked in a descending order of their upper bounds, i.e., u 1 > … > u n, P: X 1  X 2  …  X n is optimal in the zero-uncertainty case The upper bound u i as a “representative point” for X i

Optimal Probing under Non-Zero- Uncertainty  = 0.15 Two policies  P 1 : X 1  X 2, saves the 2 nd probing if X 1 >885  P 2 : X 2  X 1, saves the 2 nd probing if X 2 >850 0 f1(x)f1(x) X1X1 X2X f2(x)f2(x) 1000 

Critical Point Critical point,  i  [l i, u i ] s.t. P(X i >  i ) =  Lemma 1: With two variables X 1 and X 2, the optimal policy always probes the one with the larger critical point 0 f1(x)f1(x) X1X1 X2X f2(x)f2(x) x   2 F1(x)F1(x) F2(x)F2(x) (1-  )

Deriving The Optimal Policy from The Critical Points? Theorem 2: The optimal policy should always place X i before X j if: Cond 1 :  i >  j Cond 2 :  x >  j, F i (x) < F j (x) x 1-  1 Fi(x)Fi(x)Fj(x)Fj(x) jj ii

Applying Theorem 2 to Derive The Optimal Policy x 1-  1 22 11 nn F1(x)F1(x) F2(x)F2(x)Fn(x)Fn(x) Case 1: Optimal policy: P: X 1  X 2  …  X n

Applying Theorem 2 to Derive The Optimal Policy Case 2: Possible candidate policies  {X 1,X 2,X 3 }  {X 4,X 5 } and X 1 must be before X 2 X 1  X 2  X 3  X 4  X 5 X 1  X 2  X 3  X 5  X 4 X 1  X 3  X 2  X 4  X 5 X 1  X 3  X 2  X 5  X 4 X 3  X 1  X 2  X 4  X 5 X 3  X 1  X 2  X 5  X 4 x 1-  1 F3(x)F3(x) F4(x)F4(x) F5(x)F5(x) F1(x)F1(x) F2(x)F2(x)

Experimental Set-up 166 rainfall sensors across Washington State Recording the rainfall at each sensor location, on every day over the past 46 years

Probability Distribution From the historical data, generate one distribution per sensor per day Distinguish two kinds of historical data:  Yesterday was dry  Yesterday was rainy

Preliminary Results Complexity of optimal-policy searching

Future Experimental Study The behavior of the optimal policy on the rainfall sensor data  Uncertainty threshold  vs. number of sensor probing The behavior of the optimal policy on synthetic datasets  Reduction in the search space   vs. number of sensor probing

Summary Under the proposed data model, find the maximum variable with uncertainty less than  Optimal probing policy   = 0, sort variables according to their upper bounds   > 0, derive probing preferences (X i before X j ) and reduce the search space