LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.

Slides:



Advertisements
Similar presentations
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Advertisements

指導教授:陳良弼 老師 報告者:鄧雅文  Introduction  Related Work  Problem Formulation  Future Work.
Representing and Querying Correlated Tuples in Probabilistic Databases
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Partially Observable Markov Decision Process (POMDP)
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Danzhou Liu Ee-Peng Lim Wee-Keong Ng
School of Computer Science and Engineering Finding Top k Most Influential Spatial Facilities over Uncertain Objects Liming Zhan Ying Zhang Wenjie Zhang.
Probabilistic Histograms for Probabilistic Data Graham Cormode AT&T Labs-Research Antonios Deligiannakis Technical University of Crete Minos Garofalakis.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Efficient Query Evaluation on Probabilistic Databases
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,
Indexing the imprecise positions of moving objects Xiaofeng Ding and Yansheng Lu Department of Computer Science Huazhong University of Science & Technology.
Ming Hua, Jian Pei Simon Fraser UniversityPresented By: Mahashweta Das Wenjie Zhang, Xuemin LinUniversity of Texas at Arlington The University of New South.
Effectively Indexing Uncertain Moving Objects for Predictive Queries School of Computing National University of Singapore Department of Computer Science.
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Quantile-Based KNN over Multi- Valued Objects Wenjie Zhang Xuemin Lin, Muhammad Aamir Cheema, Ying Zhang, Wei Wang The University of New South Wales, Australia.
Spatio-Temporal Databases
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]
An Incremental Refining Spatial Join Algorithm for Estimating Query Results in GIS Wan D. Bae, Shayma Alkobaisi, Scott T. Leutenegger Department of Computer.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Recommender systems Ram Akella November 26 th 2008.
IR Models: Review Vector Model and Probabilistic.
On the Semantics and Evaluation of Top-k Queries in Probabilistic Databases Presented by Xi Zhang Feburary 8 th, 2008.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.
Wei Cheng 1, Xiaoming Jin 1, and Jian-Tao Sun 2 Intelligent Data Engineering Group, School of Software, Tsinghua University 1 Microsoft Research Asia 2.
Da Yan and Wilfred Ng The Hong Kong University of Science and Technology.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach Wenjie Zhang, Xuemin Lin The University of New South Wales & NICTA Ming Hua,
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
Nearest Neighbor Searching Under Uncertainty
ITCS 6163 Lecture 5. Indexing datacubes Objective: speed queries up. Traditional databases (OLTP): B-Trees Time and space logarithmic to the amount of.
A Survey Based Seminar: Data Cleaning & Uncertain Data Management Speaker: Shawn Yang Supervisor: Dr. Reynold Cheng Prof. David Cheung
Top-k Similarity Join over Multi- valued Objects Wenjie Zhang Jing Xu, Xin Liang, Ying Zhang, Xuemin Lin The University of New South Wales, Australia.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.
Clustering of Uncertain data objects by Voronoi- diagram-based approach Speaker: Chan Kai Fong, Paul Dept of CS, HKU.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
Spatial Range Querying for Gaussian-Based Imprecise Query Objects Yoshiharu Ishikawa, Yuichi Iijima Nagoya University Jeffrey Xu Yu The Chinese University.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Probabilistic Data Management
Probabilistic Data Management
Probabilistic Data Management
Probabilistic Data Management
Lecture 16: Probabilistic Databases
Spatio-Temporal Databases
Probabilistic Data Management
Distributed Probabilistic Range-Aggregate Query on Uncertain Data
Michal Rosen-Zvi University of California, Irvine
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Probabilistic Databases
Presentation transcript:

LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany Dagstuhl Seminar 2008 Uncertainty Management in Information Systems

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Introduction modern database applications involve data. often vague and imprecise attributes –sensor data, e.g. traffic monitoring –feature extraction, e.g. person identification  probabilistic databases spatial,temporal andmultimedia

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Introduction types of probabilistic databases –relational uncertainty representation tuples with confidence e.g. x-relation model (Trio system) –uncertainty in feature spaces uncertain vectors representations: –continuous, discrete (point objects) –spatially uncertainty representation uncertain spatially extended objects x y IDNAMECONF p1john0.6 p2fred0.3 p3mary0.7 ………

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Introduction types of probabilistic databases –relational uncertainty representation tuples with confidence e.g. x-relation model (Trio system) –uncertainty in feature spaces uncertain vectors representations: –continuous, discrete (point objects) –spatially uncertainty representation uncertain spatially extended objects x y IDNAMECONF p1john0.6 p2fred0.3 p3mary0.7 ………

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Introduction Probabilistic Similarity Queries –given: database with uncertain vectors (uncertain) query object Q –queries:  Q Q Q  -range query k-NN query ranking query

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Introduction Probabilistic Similarity Queries –given: two databases DB A and DB B with uncertain vectors –queries: –challenges: uncertain similarity distances, uncertain query results join query

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Modelling Uncertainty in Feature Spaces Uncertain Vector Data –vector data in d-dimensional space  d –objects are represented by multiple d-dimensional vectors that are mutually exclusive a confidence value is assigned to each vector –types of uncertain object representations x y pdf (continuous) x y vector samples (discrete)

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries Example: Probabilistic  -Range Query –query object and set of uncertain objects (discrete) q i = {q 1,…,q M } and o i ={o i,1,…,o i,N } –distance between q and o i : –probability that the distance between q and o i is less than  0 + :

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries Clustered Object Representation build approximations by grouping vector points of an object into clusters object o = {o 1,..,o s }simple object approximation MBR(o) clustered object approximation MBR(C 1 (o)),.., MBR(C k (o))

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries advantages of clustered object approximation –efficiently managed by spatial access methods e.g. R-tree, X-tree –supports multi-step query processing true hits can be reported very early reduced refinement cost –efficient computation of approximate answers PTSQ and PTopkSQ efficiently supported

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries multi-step query processing: –probabilistic filter Estimation of probability p = P(d(o,q) ≤  ): query point q uncertain object o (clustered object representation) ≤ P(d(o,q) ≤  ) ≤ 0.6 lower bounding prob. estimation upper bounding prob. estimation 

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries Filter Step for PSQs: –Probabilistic  -Range Queries (PTSQ type): for each uncertain object o: –compute lower and upper bounding probabilities based on cluster representations –if lower bounding probability P low > , then report o –if upper bounding probability P upper < , then prune o –otherwise refine o (partial refinement) query point q 

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Queries Filter Step for PSQs: –Probabilistic k-NN-Queries (PTSQ type) upper bounding probability that p is NN is P upper =0.7 Example: query point q object o object p

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking Ranking Queries –very important for similarity search applications –give the most relevant answers first –are more flexible than  -range and NN queries probabilistic ranking queries –results are associated with confidence values –in contrast to  -range / NN queries no unique query predicate

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking output of probabilistic ranking: –for each object: discrete pdf over ranking positions prob_ranked q : D  {1,..,N}→[0..1] –prob_ranked q (o,k) reports the probability that object o is exactly the k th -nearest-neighbor of the query object q probability k

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking Example: Probabilistic Ranking Output A B E C D G F H I J K L P M N O Q S T R Probability ranking coefficient k Objects Probability Table vector spaceprobabilistic ranking output A B q C D E F G H I J K L M N O P Q R S T

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking probabilistic ranking output is inconvenient for most users coping with probabilistic ranking: –ranking with unique order and confidences A B E C … … … RankOIDConf.

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking probabilistic ranking output is inconvenient for most users coping with probabilistic ranking: –ranking with unique order and confidences –aggregate conf. values to deterministic results A B E C … … … RankOIDConf. How should we extract the conf. from the prob. ranking output? Which ranking order?

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking Approaches for Ranking Objects: –Approach 1: highest confidence [Soliman ICDE’07, Yi ICDE’08] –problem: duplicates neglected objects –Example: 1. (A,0.45) | 2. (C,0.40) | 3. (C,0.45) Result: or with duplicate elimination 1. (A,0.45) | 2. (C,0.40) | 3. (B,0.35)

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking Approaches for Ranking Objects: –Approach 2: highest aggregated confidence object with the highest prob. that it is one of the first k objects is assigned to ranking position k. sensible with duplicate elimination –Example: 1. (A,0.45) | 2. (B,0.35) | 3. (C,0.45) Result: or 1. (A,0.45) | 2. (B,0.75) | 3. (C,1.00)

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking further approaches to determine the ranking order, e.g. –expected ranking position –etc. most intuitive and robust: Approach 2. problem: –full probabilistic ranking information is required required: –efficient computation of prob. ranking output

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking Iterative Probability Computation –ranking applied on object vectors (samples) –during the radial sweep: maintain for each object o the probability –for each accessed sample o i,j, compute the probability P(o i,j,k) that exactly (k-1) objects o  o i are within the sweep-range , for k = 1..N.  radial sweep with increasing range  ABCD PoPo

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking computation of P(o i,j,k): problem: comp. very expensive –a lot of possibilities for  i must be reconsidered 1) Approach: –pruning objects that are beyond  : reduce DB  DB‘ (|DB‘|<<|DB|)

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking applying only relevant objects: A (1.0)B (1.0)F (0.8)D (0.6)H (0.2)C (0.1)E (0.0)G (0.0) N‘ N  A B F D H C E G I q o i,j N‘‘

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl problem: computation still exponential 2. Approach –problem can be solved in polynomial time by means of dynamic programming technique: Probabilistic Similarity Ranking  F D H C q o i,j  F D H C q  F D H C q P(2 of 4 in  -range)P(1 of 3 in  -range) assuming C in  -range P(2 of 3 in  -range) assuming C not in  -range

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Probabilistic Similarity Ranking problem: computation still exponential 2. Approach: –problem can be solved in polynomial time by means of dynamic programming technique: –recursive function:

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Summary approaches to accelerate probabilistic similarity queries in vector spaces assumption: –objects are mutually independent –discrete uncertainty representations support by –traditional access methods –multi-step query processing techniques very high speed-up factor using Dyn. Prog.

DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl Discussion any questions? Thank you for your attention..