U of Minnesota Spatial and Spatio-temporal Data Uncertainty: Modeling and Querying Mohamed F. Mokbel Department of Computer Science and Engineering University.

Slides:



Advertisements
Similar presentations
1 Uncertainty in Spatial Trajectories Goce Trajcevski.
Advertisements

1 Query Processing in Spatial Network Databases presented by Hao Hong Dimitris Papadias Jun Zhang Hong Kong University of Science and Technology Nikos.
Efficient Evaluation of k-Range Nearest Neighbor Queries in Road Networks Jie BaoChi-Yin ChowMohamed F. Mokbel Department of Computer Science and Engineering.
Quality Aware Privacy Protection for Location-based Services Zhen Xiao, Xiaofeng Meng Renmin University of China Jianliang Xu Hong Kong Baptist University.
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
PrivacyGrid Visualization Balaji Palanisamy Saurabh Taneja.
Mohamed F. Mokbel University of Minnesota
Department of Computer Science Spatio-Temporal Histograms Hicham G. Elmongui*Mohamed F. Mokbel + Walid G. Aref* *Purdue University, Department of Computer.
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.
Voronoi-based Nearest Neighbor Search for Multi-Dimensional Uncertain Databases Peiwu Zhang Reynold Cheng Nikos Mamoulis Yu Tang University of Hong Kong.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
A Crowd-Enabled Approach for Efficient Processing of Nearest Neighbor Queries in Incomplete Databases Samia Kabir, Mehnaz Tabassum Mahin Department of.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Location-aware Query Processing and Optimization: A Tutorial Mohamed F. Mokbel Walid G. Aref Department of Computer Science and Engineering, University.
Nearest Neighbor Search in Spatial and Spatiotemporal Databases
Probabilistic Cardinal Direction Queries On Spatio-Temporal Data Ganesh Viswanathan Midterm Project Report CIS 6930 Data Science: Large-Scale Advanced.
University of Minnesota 1 / 9 May 2011 Energy-Efficient Location-based Services Mohamed F. Mokbel Department of Computer Science and Engineering University.
Location Privacy in Casper: A Tale of two Systems
--Presented By Sudheer Chelluboina. Professor: Dr.Maggie Dunham.
Cheng, Xie, Yiu, Chen, Sun UV-diagram: a Voronoi Diagram for uncertain data 26th IEEE International Conference on Data Engineering Reynold Cheng (University.
Spatio-Temporal Databases
Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
The Fourth WIM Meeting 1 Active Nearest Neighbor Queries for Moving Objects Jan Kolar, Igor Timko.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.
1 Location Information Management and Moving Object Databases “Moving Object Databases: Issues and Solutions” Ouri, Bo, Sam and Liqin.
Dieter Pfoser, LBS Workshop1 Issues in the Management of Moving Point Objects Dieter Pfoser Nykredit Center for Database Research Aalborg University, Denmark.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.
The Tornado Model: Uncertainty Model for Continuously Changing Data Byunggu Yu 1, Seon Ho Kim 2, Shayma Alkobaisi 2, Wan Bae 2, Thomas Bailey 3 Department.
Indexing Spatio-Temporal Data Warehouses Dimitris Papadias, Yufei Tao, Panos Kalnis, Jun Zhang Department of Computer Science Hong Kong University of Science.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
SIGMOD’03 Evaluating Probabilistic Queries over Imprecise Data Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar Department of Computer Science, Purdue.
Capacity Constrained Routing Algorithms for Evacuation Planning: A Summary of Results Speaker: Chen-Nien Tsai.
Click to edit Present’s Name Trends in Location-based Services Muhammad Aamir Cheema.
Adaptive Stream Filters for Entity-based Queries with Non-value Tolerance VLDB 2005 Reynold Cheng (Speaker) Ben Kao, Alan Kwan Sunil Prabhakar, Yicheng.
Privacy Preserving Data Mining on Moving Object Trajectories Győző Gidófalvi Geomatic ApS Center for Geoinformatik Xuegang Harry Huang Torben Bach Pedersen.
VLDB '2006 Haibo Hu (Hong Kong Baptist University, Hong Kong) Dik Lun Lee (Hong Kong University of Science and Technology, Hong Kong) Victor.
Location Privacy CompSci Instructor: Ashwin Machanavajjhala Some slides are from a tutorial by Mohamed Mokbel (ICDM 2008) Lecture 19: Fall.
Nearest Neighbor Searching Under Uncertainty
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 Probabilistic Continuous Update.
Systems and Internet Infrastructure Security (SIIS) LaboratoryPage Systems and Internet Infrastructure Security Network and Security Research Center Department.
Wireless Sensor Networks In-Network Relational Databases Jocelyn Botello.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Information Technology Selecting Representative Objects Considering Coverage and Diversity Shenlu Wang 1, Muhammad Aamir Cheema 2, Ying Zhang 3, Xuemin.
Trajectory Data Mining Dr. Yu Zheng Lead Researcher, Microsoft Research Chair Professor at Shanghai Jiao Tong University Editor-in-Chief of ACM Trans.
Information Technology (Some) Research Trends in Location-based Services Muhammad Aamir Cheema Faculty of Information Technology Monash University, Australia.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Location-based Spatial Queries AGM SIGMOD 2003 Jun Zhang §, Manli Zhu §, Dimitris Papadias §, Yufei Tao †, Dik Lun Lee § Department of Computer Science.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
Managing Location Information for Billions of Gizmos on the Move – What’s in it for the Database Folks Ralf Hartmut Güting Fernuniversität Hagen, Germany.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Spatio-Temporal Databases
Chapter 4: Probabilistic Query Answering (2)
Probabilistic Data Management
Spatio-Temporal Databases
Efficient Evaluation of k-NN Queries Using Spatial Mashups
Probabilistic Data Management
Uncertain Data Mobile Group 报告人:郝兴.
Spatial Databases: Spatio-Temporal Databases
Spatio-Temporal Histograms
Presentation transcript:

U of Minnesota Spatial and Spatio-temporal Data Uncertainty: Modeling and Querying Mohamed F. Mokbel Department of Computer Science and Engineering University of Minnesota

QUeST 2009 November Talk Outline Introduction to Uncertain Data Reasons for Uncertain Data Representation of Uncertain Data Querying Uncertain Data Summary

QUeST 2009 November Certain Data: The Good Days You trust whatever stored in a database Employee salary Banking information Flight reservation Fuzzy information..!! Yes. It was there But not in a database Data uncertainty The scale of uncertain data was not to the extent that needs data management techniques

QUeST 2009 November Data Uncertainty: Different Kinds of Uncertainty Defected data Completely erroneous data Incomplete data Some data is missing Probabilistic data A certain value is known to be true/defected with a certain probability Range data The reading is in this range (uniform or normal distribution)

QUeST 2009 November Data Uncertainty: Friend or Foe Foe: Inaccuracy in device reading. Temperature reading Object movement & Network delay Friend Privacy Less storage Expressing range of values: Menu price

QUeST 2009 November Talk Outline 6

QUeST 2009 November Sensor temperature reading GPS reading Cell phone locations Sources of Uncertainty: Inaccurate Reading Affected queries Which sensor gives the highest temperature What are the sensors that give temperature between 30 and 40 How many sensors give temperature over 40 Sensor X Sensor Y

QUeST 2009 November Historical data (Trajectories) Current data T0+Є0T0+Є0 T0+Є1T0+Є1 T0+Є2T0+Є2 T0T0 T1T1 Sources of Uncertainty: Sampling Range Queries Nearest Neighbor Queries

QUeST 2009 November Sources of Uncertainty: Privacy Example:: What is my nearest gas station Service 100% 0% Privacy 0%

QUeST 2009 November Talk Outline 10

QUeST 2009 November Given :  Start point  End point  Maximum possible speed  Maximum traveling distance S If S is greater than the distance between the two end points, then the moving object may have deviated from the given route Uncertainty Representation: Ellipse

QUeST 2009 November Given:  Start and end points Constraint:  An object would report its location only if it is deviated by a certain distance r from the predicted trajectory r Uncertainty Representation: Cylinders

QUeST 2009 November Given:  Start and end points Constraints :  Deviation threshold r  Speed threshold v Uncertainty Representation: Polygons

QUeST 2009 November Talk Outline

QUeST 2009 November Uncertainty-aware Query Processor A new uncertainty-aware query processor is needed to deal with uncertain data rather than exact data Traditional Query: What is my nearest gas station given that I am in this location New Query: What is my nearest gas station given that I am somewhere in this uncertainty region

QUeST 2009 November Data Uncertainty: Queries Two types of data:  Certain data. Gas stations, restaurants, police cars  Uncertain data. Measurements, personal data records Three types of queries:  Uncertain queries over Certain data What is my nearest gas station  Certain queries over Uncertain data How many cars in the downtown area  Uncertain queries over Uncertain data Where is my nearest friend

QUeST 2009 November Talk Outline 17

QUeST 2009 November Range Queries Uncertain Queries over Certain Data Range query Example: Find all gas stations within x miles from my location where my location is somewhere in the uncertain region The basic idea is to extend the uncertain region by distance x in all directions Every gas station in the extended region is a candidate answer

QUeST 2009 November Range Queries Uncertain Queries over Certain Data Extend the uncertain area in all directions by the required distance Answer per area Probabilistic Answer All possible answer Three ways for answer representation:

QUeST 2009 November Range Queries Certain Queries over Uncertain Data Range query Example: Find all cars within a certain area Objects of interest are represented as uncertain regions in which the objects of interest can be anywhere Any uncertain region that overlaps with the query region is a candidate answer

QUeST 2009 November Range Queries Certain Queries over Uncertain Data Range Queries: What are the objects that are within the area of Interest Any object that has an uncertainty region overlaps with the area of interest: C, D, E, F, H A C B F E D I G J H Probabilistic Range Queries: With each object, report the probability of being part of the answer (C, 0.3), (D, 0.2), (E, 1), (F, 0.6), (H, 0.4) Can be computed by the ratio of the overlapping area between the cloaked region and the query region Easy to compute for uniform distribution Challenging in case of non-uniform distributions

QUeST 2009 November Range Queries Certain Queries over Uncertain Data A C B F E D I G J H Threshold Probabilistic Range Queries: What are the objects within area of interest with at least 50% probability: E, F More practical version and much easier to compute The threshold value is used for answer pruning to avoid extensive computation for exact probabilities

QUeST 2009 November Range Queries Uncertain Queries over Uncertain Data Range query Example: Find my friends within x miles of my location where my location is somewhere within the uncertainty region Both the querying user and objects of interest are represented as uncertainty regions Solution approaches will be a mix of the previous two cases

QUeST 2009 November Talk Outline 24

QUeST 2009 November Aggregate Queries Uncertain Queries over Certain Data How many gas stations within x miles of my location Answer per area Minimum = 0, Maximum = 2 Prob (0) = 0.2, Prob(1) = = 0.5, Prob(2) = 0.3 Average = 1.1 Alternatively, each area can be represented by an answer

QUeST 2009 November Aggregate Queries Certain Queries over Uncertain Data Aggregate Queries: How many objects within area of interest Minimum: 1, Maximum: 5 Average: = 2.5 Probabilistic Aggregate Queries: How many objects (with probabilities) within area of interest Prob(1)=(0.7)(0.8)(0.4)(0.6)= …. [1, ], [2, ], [3,0.3464], [4, ], [5,0.0144] More statistics can be computed A C B F E D I G J H

QUeST 2009 November Aggregate Queries Uncertain Queries over Uncertain Data To be able to compute the aggregates, we would have to go through the same procedure for range queries to either compute the probabilities of each object or divide the query region into partial regions with an answer for each region A C B F E D I G J H

QUeST 2009 November Talk Outline 28

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data NN query Example: Find my nearest gas station given that I am somewhere in the cloaked spatial region The basic idea is to find all candidate answers

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer The Optimal answer can be defined as the answer with only exact candidates, i.e., each returned candidate has the potential to be part of the answer. Too cumbersome to compute A heuristic to get the optimal answer is to find the minimum possible range that include all potential candidate answers False positives will take place

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (1-D) Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (1-D) Given a one-dimensional line L = [start, end], a set of objects O= {o 1, o 2,…,o n }, find an answer as tuples where o i Є O and T  L such that o i is the nearest object to any point in L Developed for continuous nearest-neighbor queries Optimal answer in terms of only providing all possible answers. No redundant answer are returned Answer can be represented as all objects, probability, or by area

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (1-D) Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (1-D) A B C D E G F se Scan objects by plane-sweep way Maintain two vicinity circles centered a the start and end points If an object lies within the two vicinity circles, remove the previous object If an object lies within only one vicinity circle, then the previous object is part of the answer Draw a bisector to get part of the answer Update the start point Ignore objects that are outside the vicinity circle

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (2-D) Nearest-Neighbor Queries Uncertain Queries over Certain Data: Optimal Answer (2-D) For each edge for the cloaked region, scan objects with plane- sweep For each two consecutive points, get the intersection between their bisector and the current edge Based on the set of bisectors, we decide the point that could be nearest neighbors to any point on that edge All objects of interest that are within the query range are returned also in the answer p2p2 p5p5 p7p7 ses2s2 s1s1 p1p1 p3p3 p4p4 p6p6 p8p8 s2s2

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Finding a Range Nearest-Neighbor Queries Uncertain Queries over Certain Data: Finding a Range Step 1: Locate four filters. The NN target object for each vertex Step 2 : Find the middle points. The furthest point on the edge to the two filters Step 3: Extend the query range Step 4: Candidate answer m 12 m 34 m 13 T 1 T 4 T 3 T 2 v 1 v 2 v 3 v 4 m 24 This method is proved to be:  Inclusive. The exact answer is included in the candidate answer  Minimal. The range query is minimal given an initial set of filters.

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data: Answer Representation Nearest-Neighbor Queries Uncertain Queries over Certain Data: Answer Representation Regardless of the underlying method to compute candidate answers, we have three alternatives:  Return the list of the candidate answers to the user  Employ a Voronoi diagram for all the objects in the candidate answer list to determine the probability that each object is an answer.  Voronoi diagrams can provide the answer in terms of areas v 1 v 2 v 3 v 4

QUeST 2009 November Nearest-Neighbor Queries Certain Queries over Uncertain Data Nearest-Neighbor Queries Certain Queries over Uncertain Data NN query Example: Find my nearest car Several objects may be candidate to be my nearest-neighbor The accuracy of the query highly depends on the size of the cloaked regions Very challenging to generalize for k-nearest-neighbor queries

QUeST 2009 November Nearest-Neighbor Queries Certain Queries over Uncertain Data Nearest-Neighbor Queries Certain Queries over Uncertain Data Nearest-Neighbor Queries: Where is my nearest friend Filter Step:  Compute the maximum distance for each object  MinMax = the “minimum” “maximum distance”  Filter out objects that are outside the circle of radius Compute the minimum distance to each possible object for further analysis A C B F E D I G H

QUeST 2009 November Nearest-Neighbor Queries Certain Queries over Uncertain Data Nearest-Neighbor Queries Certain Queries over Uncertain Data All possible answers: (ordered by MinDist) D, H, F, C, B, G Probabilistic Answer : Compute the exact probability of each answer to be a nearest-neighbor The probability distribution of an object within a range is NOT uniform A much easier version (and more practical) is to find those objects that can be nearest-neighbor with at leaset certain probability D C B G F H

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Uncertain Data Nearest-Neighbor Queries Uncertain Queries over Uncertain Data NN query

QUeST 2009 November Nearest-Neighbor Queries Uncertain Queries over Certain Data Step 1: Locate four filters The NN target object for each vertex Step 2: Find the middle points The furthest point on the edge to the two filters Step 3: Extend the query range Step 4: Candidate answer m 12 m 24 m 34 m 13 v 1 v 2 v 3 v 4

QUeST 2009 November Talk Outline 41

QUeST 2009 November Uncertain data is ubiquitous Data uncertainty may be desired in many cases Various representations of uncertain data: Circle, ellipse, cylinder, polygon New types of queries for uncertain data Range queries, aggregate queries, and nearest-neighbor queries Summary

QUeST 2009 November 2009 List of References  Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. Evaluating Probabilistic Queries over Imprecise Data. In Proceeding of the ACM International Conference on Management of Data, SIGMOD, pages 551{562, San Diego, CA, June  Reynold Cheng, Dmitri V. Kalashnikov, and Sunil Prabhakar. Querying Imprecise Data in Moving Object Environments. IEEE Transactions on Knowledge and Data Engineering, TKDE, 16(9):1112{1127, September  Chi-Yin Chow, Mohamed F. Mokbel, and Walid G. Aref. "Casper*: Query Processing for Location Services without Compromising Privacy". ACM Transactions on Database Systems, TODS 2009, Accepted. To appear.  Xiangyuan Dai, Man Lung Yiu, Nikos Mamoulis, Yufei Tao, and Michail Vaitis. Probabilistic Spatial Queries on Existentially Uncertain Data. In Proceeding of, SSTD, pages 400{417, Angra dos Reis, Brazil, August  Haibo Hu, Dik Lun Lee: Range Nearest-Neighbor Query. IEEE Trans. Knowl. Data Eng. 18(1): (2006)  Mohamed F. Mokbel: Towards Privacy-Aware Location-Based Database Servers. ICDE Workshops 2006: 93  Mohamed F. Mokbel, Chi-Yin Chow, Walid G. Aref: The New Casper: Query Processing for Location Services without Compromising Privacy. VLDB 2006:  Jinfeng Ni, Chinya V. Ravishankar, and Bir Bhanu. Probabilistic Spatial Database Operations. In Proceeding of the International Symposium on Advances in Spatial and Temporal Databases, SSTD, pages 140{158, Santorini Island, Greece, July  Dieter Pfoser and Christian S. Jensen. Capturing the Uncertainty of Moving-Object Representations. In SSD,, Hong Kong, July  Dieter Pfoser, Nectaria Tryfona, and Christian S. Jensen. Indeterminacy and Spatiotemporal Data: Basic Denitions and Case Study. GeoInformatica, 9(3):211{236, September  Yufei Tao, Dimitris Papadias, Qiongmao Shen: Continuous Nearest Neighbor Search. VLDB 2002:  Victor Teixeira de Almeida and Ralf Hartmut Guting. Supporting Uncertainty in Moving Objects in Network Databases. In ACM GIS, pages 31{40, Bremen, Germany, November  Goce Trajcevski, Ouri Wolfson, Fengli Zhang, and Sam Chamberlain. The Geometry of Uncertainty in Moving Objects Databases. In Proceeding of the International Conference on Extending Database Technology, EDBT, pages 233{250,, March  Goce Trajcevski, OuriWolfson, Klaus Hinrichs, and Sam Chamberlain. Managing Uncertainty in Moving Objects Databases. ACM Transactions on Database Systems, TODS, 29(3):463{507, September  Ouri Wolfson and Huabei Yin. Accuracy and Resource Concumption in Tracking and Location Prediction. In Proceeding of the International Symposium on Advances in Spatial and Temporal Databases, SSTD, pages 325{343, Santorini Island, Greece, July 2003.

QUeST 2009 November Thank You …