A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Modeling and Querying Possible Repairs in Duplicate Detection George Beskales Mohamed A. Soliman Ihab F. Ilyas Shai Ben-David.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Representing and Querying Correlated Tuples in Probabilistic Databases
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Fast Algorithms For Hierarchical Range Histogram Constructions
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
1 Finding Shortest Paths on Terrains by Killing Two Birds with One Stone Manohar Kaul (Aarhus University) Raymond Chi-Wing Wong (Hong Kong University of.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Da Yan, Zhou Zhao and Wilfred Ng The Hong Kong University of Science and Technology.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Crowd-Enabled Approach for Efficient Processing of Nearest Neighbor Queries in Incomplete Databases Samia Kabir, Mehnaz Tabassum Mahin Department of.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.
Non-Linear Statistical Static Timing Analysis for Non-Gaussian Variation Sources Lerong Cheng 1, Jinjun Xiong 2, and Prof. Lei He 1 1 EE Department, UCLA.
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.
Probability Grid: A Location Estimation Scheme for Wireless Sensor Networks Presented by cychen Date : 3/7 In Secon (Sensor and Ad Hoc Communications and.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
Computer Science and Engineering Efficiently Monitoring Top-k Pairs over Sliding Windows Presented By: Zhitao Shen 1 Joint work with Muhammad Aamir Cheema.
Efficient Processing of Top-k Spatial Preference Queries
Spatio-temporal Pattern Queries M. Hadjieleftheriou G. Kollios P. Bakalov V. J. Tsotras.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Maximizing Lifetime per Unit Cost in Wireless Sensor Networks
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Computer Science and Engineering Jianye Yang 1, Ying Zhang 2, Wenjie Zhang 1, Xuemin Lin 1 Influence based Cost Optimization on User Preference 1 The University.
Probabilistic Data Management
Clustering Uncertain Taxi data
Sameh Shohdy, Yu Su, and Gagan Agrawal
Visualization of query processing over large-scale road networks
Probabilistic Data Management
Spatio-temporal Pattern Queries
Chapter 4: Probabilistic Query Answering (2)
Mining Frequent Itemsets over Uncertain Databases
Probabilistic Data Management
Random Sampling over Joins Revisited
Probabilistic Data Management
Probabilistic Data Management
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
Probabilistic Databases
Continuous Density Queries for Moving Objects
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong Kong University of Science and Technology Clear Water Bay, Kowloon Hong Kong, China {xlian, VLDB Seattle

Motivation Example Forest monitoring application 2 VLDB Seattle forest Sensory data:

Motivation Example (cont'd) Samples s i collected from sensor node n i 3 VLDB Seattle

Motivation Example (cont'd) Sensory data are uncertain and imprecise 4 VLDB Seattle uncertainty regions

Motivation Example (cont'd) 3 monitoring areas 5 VLDB Seattle forest

Motivation Example (cont'd) 3 monitoring areas 6 VLDB Seattle forest spatially close sensors sensors far away

Locally Correlated Sensory Data 7 Efficient Query Answering on Locally Correlated Uncertain Data Area 1 Area 2 Area 3 VLDB Seattle

Nearest Neighbor Queries on Locally Correlated Uncertain Data 8 VLDB Seattle

Outline Introduction Model for Locally Correlated Uncertain Data Problem Definition Query Answering on Uncertain Data With Local Correlations Experimental Evaluation Conclusions 9 VLDB Seattle

Introduction Uncertain data are pervasive in real applications  Sensor networks  RFID networks  Location-based services  Data integration While existing works often assume the independence among uncertain objects,  Uncertain objects exhibit correlations 10 VLDB Seattle local correlations!

Data Model for Local Correlations Data Model  Uncertain objects contain several locally correlated partitions (LCPs) Uncertain objects within each LCP are correlated with each other Uncertain objects from distinct LCPs are independent of each other 11 VLDB Seattle

Data Model for Local Correlations (cont'd) Bayesian network  Each vertex corresponds to a random variable  Each vertex is associated with a conditional probability table (CPT) 12 VLDB Seattle

Data Model for Local Correlations (cont'd) The joint probability of variables  Join tuples in CPTs and multiply conditional probabilities  Variable elimination 13 VLDB Seattle

Definition of LC-PNN Query Probabilistic Nearest Neighbor Query on Uncertain and Locally Correlated Data, LC-PNN 14 VLDB Seattle

Challenges & Solutions Challenges  Straightforward method of linear scan is costly  Computation cost of integration is expensive  Dealing with data correlations Filtering Methods  Index pruning  Candidate filtering with pre-computations 15 VLDB Seattle

Index Pruning Basic idea  Let best_so_far be the smallest maximum distance from query point q to any uncertain objects seen so far  Then, any objects/nodes e having mindist(q, e) > best_so_far can be safely pruned 16 best_so_far VLDB Seattle

Candidate Filtering with Pre-Computations Basic idea  Obtain an upper bound, UB_Pr LC-PNN (q, o i ), of the LC-PNN probability  Object o i can be safely pruned, if UB_Pr LC-PNN (q, o i ) <  17 How to obtain the probability upper bound? Derived from formula of the LC-PNN probability upper bound via pivots! VLDB Seattle

Derivation of Probability Upper Bound 18 pivot piv s5 VLDB Seattle

Range [min_, max_ ] of  Let min_ = and max_  = If online  is smaller than min_, then JP o (s 5 ) = 1 If online  is greater than max_ , then JP o (s 5 ) = 0 Thus, we do not need to store pre-computations with  outside the range [min_, max_ ] 19 VLDB Seattle

Candidate Positions of Pivots 20 sample s 5 pivot piv s 5

Selection of Pivot Positions We provide a cost model to formalize the filtering and refinement costs, and obtain a good value of parameter  to achieve low query cost 21 VLDB Seattle

LC-PNN Query Procedure Index uncertain objects containing LCPs in an R-tree based index For an LC-PNN query  When traversing the index, apply index pruning method and candidate filtering to remove false alarms Refine candidates and return true query answers 22 VLDB Seattle

Experimental Evaluation Data Sets  Real data: California road network  Synthetic data: lUeU, lUeG, lSeU, and lSeG Generate center locations of LCPs with Uniform or Skew distribution Produce extent lengths of LCPs with Uniform or Gaussian distribution Within LCPs, randomly generate locally correlated uncertain objects with Bayesian networks Competitor  Basic method [Cheng et al., SIGMOD 2003] Assuming uncertain objects are independent Measures  Wall clock time  Speed-up ratio 23 VLDB Seattle

LC-PNN Performance vs.  24 Extent length of LCP = [1, 3], data size N = 150K, average No. of uncertain objects in an LCP = 5 VLDB Seattle

Conclusions We proposed the problem of queries over locally correlated uncertain data, in particular, the LC-PNN query, which is important in real applications We designed the index pruning method, and based on a proposed cost model, we presented the candidate filtering method via offline pre-computations w.r.t. pivots We provided efficient query processing techniques to answer LC-PNN queries on locally correlated uncertain data , and discussed applying the same framework to answer other types of queries. 25 VLDB Seattle

Thank you! Q/A 26 VLDB Seattle