San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.

Slides:



Advertisements
Similar presentations
Introduction Distance-based Adaptable Similarity Search
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Fast Algorithms For Hierarchical Range Histogram Constructions
On Map-Matching Vehicle Tracking Data
Efficient Density-Based Clustering of Complex Objects Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle University of Munich Institute for Computer.
Clustering Prof. Navneet Goyal BITS, Pilani
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.
Using Trees to Depict a Forest Bin Liu, H. V. Jagadish EECS, University of Michigan, Ann Arbor Presented by Sergey Shepshelvich 1.
Multimedia DBs.
INSTITUTE FOR INFORMATICS DATABASE GROUP Region of Interest Queries in CT Scans Matthias Schubert 1 Joint work with Alexander Cavallaro 2, Franz Graf 1,
Iterative closest point algorithms
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Yield- and Cost-Driven Fracturing for Variable Shaped-Beam Mask Writing Andrew B. Kahng CSE and ECE Departments, UCSD Xu Xu CSE Department, UCSD Alex Zelikovsky.
Effective Indexing and Filtering for Similarity Search in Large Biosequence Databases O. Ozturk and H. Ferhatosmanoglu. IEEE International Symp. on Bioinformatics.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Hans-Peter Kriegel, Martin Pfeifle, Marco Pötke, Thomas Seidl A Cost Model for Interval Intersection Queries on RI-Trees Institute for Computer Science.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Washington, 08/27/03 Washington, 08/27/03 Martin Pfeifle, Database Group, University of Munich Representatives for Visually Analyzing Cluster Hierarchies.
1 University of Denver Department of Mathematics Department of Computer Science.
Kyoto, 03/26/03 Kyoto, 03/26/03 Martin Pfeifle, Database Group, University of Munich Spatial Query Processing for High Resolutions Hans-Peter Kriegel,
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Image Based Positioning System Ankit Gupta Rahul Garg Ryan Kaminsky.
Tree-Based Density Clustering using Graphics Processors
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Jointly Optimized Regressors for Image Super-resolution Dengxin Dai, Radu Timofte, and Luc Van Gool Computer Vision Lab, ETH Zurich 1.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
Efficient Progressive Processing of Skyline Queries in Peer-to-Peer Systems INFOSCALE’06.
1 An Efficient Index Structure for String Databases Tamer Kahveci Ambuj K. Singh Department of Computer Science University of California Santa Barbara.
Easiest-to-Reach Neighbor Search Fatimah Aldubaisi.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
ICDE, San Jose, CA, 2002 Discovering Similar Multidimensional Trajectories Michail VlachosGeorge KolliosDimitrios Gunopulos UC RiversideBoston UniversityUC.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
INTERACTIVELY BROWSING LARGE IMAGE DATABASES Ronald Richter, Mathias Eitz and Marc Alexa.
An Efficient Linear Time Triple Patterning Solver Haitong Tian Hongbo Zhang Zigang Xiao Martin D.F. Wong ASP-DAC’15.
Reconstruction of Solid Models from Oriented Point Sets Misha Kazhdan Johns Hopkins University.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Exact indexing of Dynamic Time Warping
Hierarchical Error-Driven Approximation of Implicit Surfaces from Polygonal Meshes Takashi Kanai Yutaka Ohtake Kiwamu Kase University of Tokyo RIKEN, VCAD.
Multi-object Similarity Query Evaluation Michal Batko.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Presented by Ho Wai Shing
Tomáš Skopal 1, Benjamin Bustos 2 1 Charles University in Prague, Czech Republic 2 University of Chile, Santiago, Chile On Index-free Similarity Search.
3D Object Representations 2011, Fall. Introduction What is CG?  Imaging : Representing 2D images  Modeling : Representing 3D objects  Rendering : Constructing.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
A Spatial Index Structure for High Dimensional Point Data Wei Wang, Jiong Yang, and Richard Muntz Data Mining Lab Department of Computer Science University.
3D Object Representations 2009, Fall. Introduction What is CG?  Imaging : Representing 2D images  Modeling : Representing 3D objects  Rendering : Constructing.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Dense-Region Based Compact Data Cube
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Cohesive Subgraph Computation over Large Graphs
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.
Research in Computational Molecular Biology , Vol (2008)
Real-Time Ray Tracing Stefan Popov.
3D Object Representations
Clustering (3) Center-based algorithms Fuzzy k-means
6. Introduction to nonparametric clustering
Scale-Space Representation of 3D Models and Topological Matching
Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.
15-826: Multimedia Databases and Data Mining
Donghui Zhang, Tian Xia Northeastern University
Approximate Graph Mining with Label Costs
Presentation transcript:

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle, Matthias Schubert ACM SIGMOD 2003 San Diego, California June 9-12, 2003 Database Group Institute for Computer Science University of Munich, Germany

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Vector Set Model new Outline of the Talk Evaluation Introduction Space Partitioning Models Data Partitioning Models Conclusion Introduction Space Partitioning Models Introduction Evaluation Conclusion Vector Set Model new Data Partitioning Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich System Requirements:  System should help to reduce the cost of developing new parts  Avoidance of „reinventing the wheel“  Reusing existing parts Introduction spatial objects complex CAD-DB similarity query timeout unapt results similarity query meaningful results in comparatevily short time Solution:  Efficient Similarity Search  Effective Similarity Search Similarity Model based on Sets of Feature Vectors }

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Space Partitioning Models Introduction

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich  Voxelization of triangle meshes and object normalization normalized, voxelized object Space Partitioning Models Feature Transformation CAD system  3D CAD object is represented by a mesh of triangles triangle meshes  Partitioning of the data space into disjoint, enumerated cells  Extraction of k spatial features for each cell feature vector  Similarity of objects = vicinity of according feature vectors

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Space Partitioning Models Notation r = 9 p = 3 CAD object representing V o [2D example]  The data space is partitioned into p axis-parallel grid cells in each dimension cells in each dimension  Let r = the raster (voxel) resolution V o = set of voxels representing object o  O V i o = set of voxels covered by o in cell i f o (i) = i-th value of the feature vector of o

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Space Partitioning Models The Volume Model 4 [2D example] V i o  Count the number of object voxels V i o in each cell i  Normalize by the voxel capacity of each cell K  Feature value for cell i: f o (i) = where K = in the 3D case f o (i) = where K = in the 3D case K V o i 3 p r )( 1/91/

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich  The solid angle model measures the concavity and convexity of surfaces  Compute the SA-value SA(v) for each surface-voxel v of object o: SA(v)=, where is a voxelized reference sphere around v SA(v)=, where is a voxelized reference sphere around v |S v  V o | |S v | SvSv Space Partitioning Models The Solid Angle Model [2D example] 0 1 SySy y SxSx x  Each cell is represented by one dimension in the feature vector  f o (i) = 0 if cell i contains no voxel of o  f o (i) = 1 if cell i contains only inside voxel of o  m j=1 m 1  f o (i) = SA(v)

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Space Partitioning Models Data Partitioning Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Cover Sequence Model S 2 =((C 0 +C 1 )+ C 2 ) Err 2 = [2D example] S 1 =(C 0 +C 1 ) Err 1 =14 Cover-Sequence: Error: 2D feature vector f o : f o 4·i+1 = x-position of C i f o 4·i+2 = y-position of C i f o 4·i+3 = x-extension of C i f o 4·i+4 = y-extension of C i  Approximation of the object by means of a cover sequence  Approximation of the object by means of a cover sequence (Jagadish 91) S 3 =((C 0 +C 1 )+ C 2 )-C 3 ) Err 3 =7  Cover sequence: S k = (((C 0  1 C 1 )  2 C 2 ) …  k C k ), where  i  {+, -}, k the number of covers, and C i axis-parallel (hyper-) rectangles k the number of covers, and C i axis-parallel (hyper-) rectangles  Approximation quality: symmetric volume difference Err k =|o XOR S k |  Computation of S k by means of a greedy algorithm  The object is represented by a 6·k dimensional feature vector (3D case)

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model S 4 query (original) = ((((C 0 + C 1 ) – C 2 ) – C 3 ) – C 4 ) S 4 database S 4 query (optimal) = ((((C 0 + C 1 ) – C 3 ) – C 4 ) – C 2 ) S 4 database database object query object q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey d euclid (, db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey ) q 1px q 1py q 1ex q 1ey q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 2px q 2py q 2ex d euclid (, db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey ) >>

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model position X position Y extension Y extension X q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey  the cover sequence S k = (((C 0  1 C 1 )  2 C 2 ) …  k C k ) is represented by a set of vectors X   6, | X |  k (in the 3D case) by a set of vectors X   6, | X |  k (in the 3D case) [2D example] query objectdatabase object  distance measure between two vector sets X and Y:  perfect matching  create a complete bipartite graph G = (X  Y, X  Y)  weight function for unmatched nodes if |X|  |Y|  weight of each edge (x, y)  X  Y is d euclid (x,y)  computed by the Kuhn Munkres algorithm in O(k 3 )  the minimum weight position X position Y extension Y extension X weight function for unmatched nodes= distance to a dummy cover

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Data Partitioning Models Vector Set Model  Efficient similarity queries based on multi-step query processing  range queries  range queries (Faloutsos et al. 94)  k-Nearest Neighbor Queries  k-Nearest Neighbor Queries (Korn et al. 96)  optimal Multi-Step k-Nearest Neighbor Search  optimal Multi-Step k-Nearest Neighbor Search (Seidl, Kriegel 98) Filter Step (index-based) Refinement Step (exact evaluation) candidates results  k (=cardinality of the two vector sets) times the distance between the centroides of the two vector sets, lower bounds the minimum weight perfect matching distance the two vector sets, lower bounds the minimum weight perfect matching distance query object database object position X position Y extension Y extension X  lower bounding property guarantees no false drops   o 1, o 2  O : d o (o 1, o 2 )  d f (o 1, o 2 ) query centroid database centroid

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Data Partitioning Models Evaluation

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Evaluation of similarity models by means of k-nn queries  report the k objects having the smallest distance to a query object q distance: distance: ,022 0,0178 0,0176 0,0 distance: 0,00,04 0,070,12 volume model: solid angle model: „good“ similarity model? „bad“ similarity model? volume model: solid angle model:  Problem: evaluation using k-nn queries is subjective evaluation using k-nn queries is subjective quality measure of a model depends on quality measure of a model depends on the choice of the query objects the choice of the query objects K-nn Queries

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Hierarchical Clustering:  More objective since each object of the database is taken into account to measure the quality of a similarity model is taken into account to measure the quality of a similarity model  OPTICS  OPTICS (Kriegel et al. 99) Yields a density-based hierarchical clustering Yields a density-based hierarchical clustering Insensitive to input parameters Insensitive to input parameters Result (so called reachability plot) can be easily visualized Result (so called reachability plot) can be easily visualized and is suitable for interactive exploration and is suitable for interactive exploration A1A1 A2A2 22 A1A1 A2A2 B B AB A B 11 Data Space Reachability Plot Hierarchical Clustering

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation Volume Model Class A A B C no classes found Class B Class C Solid Angle Model Car Dataset app. 200 parts, r=30, p=3 Space Partitioning Similarity Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation Class E Class X Class G X A C E G Cover Sequence Model Vector Set Model Class E Class G 2 Class G 1 Class F Class A 2 Class A 1 A1A1 A2A2 B C D E F G1G1 G2G2 A G Car Dataset app. 200 parts, r=15, 7 covers Data Partitioning Similarity Models

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Evaluation  Efficiency Evaluation:  nn-queries on the plane database, cover sequence with 7 covers cover sequence with 7 covers CPU time [sec] I/O time [sec] total runtime [sec] vector set without filter vector set with filter (X-tree) cover sequence (X-tree)  vector set model cover sequence model  vector set model outperforms cover sequence model Efficiency of the Vector Set Model  vector set model without filter vector set model with filter  Filter step leads to a speed up factor of approximately 2  Filter step has a selectivity of approximately 20%

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Conclusion Evaluation Conclusion

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Conclusion  Contribution:  Sets of feature vectors : a new way of representing objects in similarity search a new way of representing objects in similarity search somewhere between feature vectors and graphs somewhere between feature vectors and graphs  Effective and efficient similarity model for CAD data based on sets of feature vectors based on sets of feature vectors  Evaluation of similarity models based on hierarchical clustering position X position Y extension Y extension X q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey q 1px q 1py q 1ex q 1ey q 2px q 2py q 2ex q 3px q 3py q 3ex q 3ey q 4px q 4py q 4ex q 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey db 1px db 1py db 1ex db 1ey db 2px db 2py db 2ex db 3px db 3py db 3ex db 3ey db 4px db 4py db 4ex db 4ey

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Conclusion  Future Work:  BOSS (Browsing OPTICS-Plots for Similarity Search)  Interactive data browsing tool based on reachability plots  User-friendly method to support the time-consuming task of finding similar parts: of finding similar parts: Revealing the hierarchical clustering structure Revealing the hierarchical clustering structure of the dataset at a glance of the dataset at a glance Displaying suitable representatives for large clusters Displaying suitable representatives for large clusters

San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Thank you for your attention Any questions? ? ? ? ? ? ? ? ?